Oracle NoSQL Database uses Oracle Berkeley DB Java Edition as the underlying data storage engine. Berkeley DB Java Edition is a mature product that also provides many, but not all, of the features and characteristics that are necessary for building a distributed key-value store such as Oracle NoSQL Database.
The Berkeley DB family of embeddable database products was developed by Sleepycat Software, Inc., beginning in the early 1990s. Sleepycat Software, Inc. was later acquired by Oracle in 2006. Since the acquisition, Oracle has continued to invest in the Berkeley DB family of products by adding features and enhancements to meet the needs of a large and growing base of users. In addition to a SQL interface (the SQL API is available for Berkeley DB, but not for Berkeley DB Java Edition) for supporting ad hoc queries, there have been major performance and reliability enhancements as well as support for enterprise mobility. Enterprise mobility support is available through the SQL API for Berkeley DB.
Berkeley DB is a highly flexible, embeddable database engine that provides the application designer with a wide variety of choices for configuring and using the data management library. For example, you can run Berkeley DB as a pure in-memory database, change transactional constraints, run it on a wide variety of servers as well as embedded operating systems, and choose the appropriate API from a variety of available APIs. Further, Berkeley DB supports advanced data management features such as B-tree indexing and hash indexing (only available in Berkeley DB, but not Berkeley DB Java Edition) as well as replication and high availability. Figure 1 illustrates the architecture of the Berkeley DB family of products.
FIGURE 1. Berkeley DB product family
Products in the Berkeley DB Family
The Berkeley DB family of products encompasses three products: Berkeley DB, Berkeley DB Java Edition, and Berkeley DB XML. Berkeley DB is implemented in C and provides transactional key-value access to data. Berkeley DB supports a variety of programmatic and scripting APIs, including a SQL interface. Berkeley DB Java Edition is a pure Java implementation that provides similar functionality and features (except the SQL API) as Berkeley DB. Berkeley DB XML is designed to manage XML documents; it provides transactional XQuery access to XML documents. Berkeley DB XML uses Berkeley DB as the storage engine.
Though Berkeley DB (we use the terms “Oracle Berkeley DB,” “Berkeley DB,” and “Berkeley DB family of products” interchangeably in this discussion) was originally focused on providing simple, fast key-value access to large amounts of disk-resident data in a small, embeddable library, several enhancements and modes of operation (for example, pure in-memory support) have been added to the products over the years. The Berkeley DB founders recognized the widespread need of applications to efficiently manage large quantities of disk-resident data; after all, programs are a combination of data, data structures to represent information, and algorithms to manipulate that data. Very often, the application also needs capabilities such as concurrency, fast indexed access, transactions, and recovery. These key observations led to the genesis of Berkeley DB. Berkeley DB provides all the data management capabilities that we have come to expect from traditional database systems packaged into an embeddable database library. Because Berkeley DB is an embeddable database library, database capabilities are built into the application, as opposed to the application accessing data managed by a separate server. Berkeley DB APIs are intentionally designed from an application programmer’s point of view, rather than a database application developer’s point of view. Rather than specifying a data request declaratively in SQL, the Berkeley DB application developer accesses data using intuitive get() and put() API calls. This simple and intuitive interface eliminates the overhead of query parsing and optimization associated with SQL. In that sense, Berkeley DB applications are similar to the proprietary hierarchical database systems of the 1960s, where the data management engine was tightly coupled with the application. This tight coupling and simplicity of access enable the Berkeley DB application to get dramatic performance improvements for accessing vast quantities of data. Figure 2 illustrates the differences between an application using a SQL client-server system and an embeddable database such as Berkeley DB.
FIGURE 2. Conventional client-server system vs. Berkeley DB application
Berkeley DB’s high availability and replication feature allows an application to survive machine failures as well as improve read scalability. A highly available Berkeley DB application runs on multiple computers configured as a high availability cluster; updates to the database are allowed only on one machine, designated as a master. The application running on the other nodes (called replicas) can read the data. Berkeley DB propagates changes to the data on the master node to all the replicas on the other machines in the cluster to keep the replicas updated and current. If the machine running the master should fail, Berkeley DB provides an election mechanism that can be used to choose a new master from among the surviving replicas without interruption of normal activity.
Due to the ease of use and robust database features, Berkeley DB products are extremely popular; there are over 200 million deployments of Berkeley DB worldwide. A wide variety of production applications, ranging from mobile phone applications to special-purpose appliances such as LDAP and e-mail servers to ecommerce websites, are based on Berkeley DB. It is fair to say that Berkeley DB is one of the most mature, high-performance, and high-function embeddable databases available today.
In recent years, several customers have built their own distributed key-value stores using Berkeley DB as a foundation. For example, Voldemort, the database engine for LinkedIn, one of the most popular social websites, uses Berkeley DB Java Edition for managing information for millions of subscribers. It is no surprise, then, that the developers of Oracle NoSQL Database also chose Berkeley DB Java Edition as the foundation for building a distributed key-value store. Besides the high-performance transactional indexed access capabilities, the high availability and replication features of Berkeley DB Java Edition are crucial architectural components of Oracle NoSQL Database.