Over the last few years, Oracle has been focused on purpose-built systems that are engineered to have hardware and software work together, and are designed to deliver extreme performance and high availability, while at the same time making them easy to install, configure, and maintain. The Oracle engineered systems that assist with big data processing through its various phases are the Oracle Big Data Appliance, Oracle Exadata Database Machine, and Oracle Exalytics In-Memory Machine. Figure 1 shows the best practice architecture of processing big data using Oracle engineered systems. As the figure depicts, each appliance plays a special role in the overall processing of big data by participating in the acquisition, organization, and analysis phases.
Oracle Big Data Appliance
The Oracle Big Data Appliance is an engineered system built with optimized hardware and a comprehensive set of software designed to provide big data solutions in a complete, easy-to-deploy offering for acquiring, organizing, and analyzing big data. Oracle Big Data Appliance delivers an affordable, scalable, and fully optimized big data infrastructure in-a-box, as compared to building a custom system from scratch, which could be time-consuming, inefficient, and prone to failures. Oracle Big Data Appliance, along with Oracle Exadata Database Machine and Oracle Exalytics In-Memory Machine, creates a complete set of technologies for leveraging and integrating big data, and helps enterprises quickly and efficiently turn information into insight.
The Oracle Big Data Appliance provides the following benefits:
- Rapid provisioning of large and highly available big data clusters that can linearly scale and process massive amounts of data
- Cost control benefits of deploying a pre-integrated, engineered system that can be installed and managed easily
- High performance by engineering state-of-the-art hardware and pre-optimized software to assist with acquiring, organizing, and analyzing big data
The Oracle Big Data Appliance comes in multiple configurations of different-sized racks: the full rack, two-thirds rack, and one-third rack. The full-rack configuration comprises 18 Sun servers and provides a total raw storage capacity of 648TB. Every server in the rack has 2 CPUs, each with 8 cores for a total of 288 cores, and 64GB memory that can be expanded to 512GB, for a total of 1152GB expandable to over 9TB of total memory for all 18 servers. The two-thirds rack and one-third rack configurations have the hardware specs that are basically two-thirds and one-third of the respective full-rack configuration. These racks can be easily cabled together using the high-speed InfiniBand network in order to provide rapid scalability and incremental growth, thereby enabling the cluster to handle extreme data volumes and storage capacity.
As shown in Figure 2, the software preinstalled on the Oracle Big Data Appliance includes a combination of open source software and specialized software developed by Oracle to address enterprise big data needs. The Oracle Big Data Appliance integrated software includes:
FIGURE 2. Oracle Big Data Appliance software overview
- Cloudera’s distribution including Apache Hadoop (CDH)
- Cloudera Manager
- Oracle NoSQL Database Community Edition (CE)
- Oracle Big Data Connectors
- Oracle R Distribution (Oracle’s redistribution of Open Source R)
Oracle NoSQL Database Community Edition (CE) comes preinstalled on the Oracle Big Data Appliance by default, and configured upon the customer’s request at install time. You have the capability to run Oracle NoSQL Database on all the 18 nodes in the cluster, with each node having a dedicated space of 3TB or 6TB (one disk or two disks, other custom configurations are also possible). Oracle NoSQL Database is rack aware and its block placement algorithms minimize data loss when multiple racks are interconnected by placing mirrored blocks on different racks to enhance availability. The customer can purchase the Enterprise Edition (EE) license of Oracle NoSQL Database and get enterprise-level features.
Cloudera’s Distribution including Apache Hadoop (CDH) consists of open source Apache Hadoop and a comprehensive set of open source software components needed to use Hadoop, with Cloudera’s branding and support. Cloudera Manager is a proprietary product from Cloudera that provides an end-to-end management application that provides monitoring and administration capabilities of CDH clusters. It also incorporates a full range of reporting and diagnostic tools to help optimize cluster performance and utilization.
Oracle Exadata Database Machine
The Oracle Exadata Database Machine is an engineered system built to support all types of database workloads, ranging from data warehouse applications that scan large amounts of data, to OLTP applications supporting highly concurrent and real-time transactions. It has an award-winning combination of smart software that runs in the storage layers called Exadata Storage Server Software, the intelligent Oracle Database 11g software, and the latest industry hardware components from Oracle, all combined to deliver extreme performance in a highly available, reliable, and highly secure environment out-of-the-box.
The Database Machine has large amounts of memory and PCIe-based Flash storage, which allows caching and storage of frequently accessed data into entities that are hundreds of times faster than the hard disks, which helps boost OLTP-like workload performance. The smart features of the Exadata Storage Server Software offloads processing to run near the disks where the data resides, thereby eliminating a lot of unnecessary data movement between the database CPUs and disks, a feature that can provide ten- or twenty-fold speed-up for data warehousing workloads.
The Database Machine is also well-suited for consolidating multiple databases onto a single grid by utilizing the resource management, clustering, workload management, and the pluggable database features of the Oracle Database. Also, the award-winning Exadata Hybrid Columnar Compression feature allows you to achieve 10- to 50-times compression of data on disk, thereby offering cost savings and performance improvements because you store and scan less data.
The Oracle Exadata Database Machine has the capability to perform the organize and analyze stages of big data processing. The In-Database Analytics offers powerful features for knowledge discovery and data mining, which helps extract hidden intelligence and allows the organization of data in a manner suitable for making business decisions. The Oracle business intelligence tools, such as Oracle BI EE and Oracle Endeca, rely on the data residing in a relational system, for which the Exadata Database Machine is the ideal platform of choice. Connections between Oracle Big Data Appliance, Oracle Exadata, and Oracle Exalytics are via InfiniBand, enabling high-speed data transfer for batch or query workloads.
Oracle Exalytics In-Memory Machine
In the world of rapidly evolving economy and business dynamics, it has become even more important for organizations to perform real-time, visual analysis, and enable new types of analytic applications in order to assist with speed-of-thought decision process, in order to help them stand out from the rest. Static reports and dashboards have become passé; enterprises are now utilizing tools and techniques such as business modeling, planning, forecasting, and predictive analytics, and using rich and interactive visualizations to assist with actionable intelligence and real-time decisions.
Oracle Exalytics In-Memory Machine is an engineered system built to deliver high-performance business intelligence (BI) and enterprise planning applications. The hardware consists of a single server that is optimally configured for business intelligence workloads and includes powerful compute capacity and abundant memory to assist with in-memory analytics. The InfiniBand network connectivity provides an extremely fast option to connect Exalytics to other Exalytics or Oracle engineered systems such as Exadata. For example, this option can augment the business intelligence capabilities of Exalytics with powerful embedded in-database analytics capabilities of Exadata.
The software included in the Oracle Exalytics In-Memory Machine is the optimized Oracle BI Foundation Suite (Oracle BI Foundation) and Oracle TimesTen In-Memory Database. Business Intelligence Foundation takes advantage of the Exalytics hardware and system configuration to deliver rich and actionable intelligence. Exalytics also provides better query responsiveness and higher user scalability compared to standalone installation of Oracle BI Foundation. The TimesTen In-Memory Database for Exalytics is an optimized in-memory database that offers some exclusive features especially enabled for Exalytics, such as columnar compression to reduce the footprint for in-memory data.
NoSQL databases provide a simple and lightweight mechanism for storing new and diverse sets of digital data streams, which oftentimes would not be appropriate to store in a traditional RDBMS. NoSQL databases are optimized to handle quick reads and writes of large datasets by allowing the application to define loose durability and consistency models in order to favor read and write performance, which is a key factor for a big data application with real-time needs.
Oracle NoSQL Database is a distributed key-value database designed to provide highly reliable, scalable, and available data storage across a configurable set of systems. Oracle NoSQL Database plays a key role in the overall portfolio of Oracle’s big data offerings, to assist in analyzing enterprise big data. The rest of the chapters cover Oracle NoSQL Database in much greater depth.