Oracle Endeca is not a single product, but a number of products and product features that are separately installed and licensed. To understand the capabilities of Endeca, it is best to start by contextualizing it against a business intelligence system. Like a typical business intelligence system, Endeca includes the following major components:
- A data server
- A data integration tool
- An analytics toolset
The Oracle Endeca Information Discovery features that differentiate it from a traditional business intelligence system are:
- No need to predetermine questions to be answered by the information before ingesting the data
- End-user provisioning and exploration of data via self-service interfaces
- Ease of integration and enrichment of semistructured and unstructured data
- Unification of full-featured, advanced keyword search; data-driven guided navigation; and in-memory analytics
Let’s look at the major components, compare them to a business intelligence system, and discuss how they deliver these differentiating features.
A typical business intelligence deployment involves a database server designed specifically to store data for rapid retrieval, using indexed and denormalized tables. Likewise, Endeca features a data server designed for rapid retrieval of information acquired through information discovery activities. This design of the data server is a departure from that of a typical RDBMS, however: Data is stored as key-value pairs, and for every attribute, a full inverted search index and a membership index are created to quickly determine the association between attributes and records. The design is optimized for in-memory performance, allowing the retrieval of data to drive analytics at interactive speeds. The server also features its own query language known as Endeca Query Language (EQL), which is a rich set of SQL-like features providing both basic and complex aggregations of search results or the current navigation state.
Data Integration Tool
A typical business intelligence deployment features an extract-transform-load (ETL) tool to acquire information from other databases and data stores, perform transformations on the information, and insert or update records in the reporting database with the transformed data. Likewise, Endeca features an agile ETL tool for IT with all the typical features of a commercial-strength ETL product, along with unique unstructured data discovery features. Let’s examine three significant features that provide capabilities above and beyond nominal ETL functionality: Text Enrichment, Text Enrichment with Sentiment Analysis, and the Integrator Acquisition System.
Text Enrichment is the “silver bullet” that allows Endeca to discover significance in unstructured data. It is an add-on feature that provides the ability to find terms and phrases in text and then rank and organize the findings. Text Enrichment includes text analysis capabilities for extracting topics and themes in the data to determine subject matter, as well as for extracting entities to expose people, places, organizations, quotes, products, and custom entities. Text Enrichment also contains summarization capabilities for automatically creating abstracts and topical summaries.
Text Enrichment with Sentiment Analysis delivers all the Text Enrichment capabilities as well as advanced text analysis for extracting opinion or feeling related to each extracted concept. Sentiment is extracted with a score indicating the positive and negative nature of a document, a phrase, or an entity. These scores are used to show varying ranges of positivity and negativity across the data at any point during the search or navigation state.
A typical business intelligence deployment features an analytics toolset. Likewise, Endeca features a web-based analytics suite designed to require little or no training to build or consume new discovery applications. The in-memory performance of the Endeca data server enables analysis to be performed at interactive speeds using an intuitive UI pioneered in online commerce, where ease of use is critical to mass consumer adoption. The Endeca UI combines the best of search, guided navigation, and in-memory analytics to provide the end user with all the tools necessary to discover new insights in structured and unstructured content. The Endeca analytics toolset also allows end users to drag and drop from a library of discovery components to easily create their own applications for personal or small-scale use. It also allows IT to create advanced discovery applications with security and failover that may be published to and consumed by the end-user community. Endeca fully supports
The Integrator Acquisition System is a set of components that crawl source data stored in a variety of formats, including file systems, JDBC databases, delimited files, web content, and custom data sources.
both IT provisioned discovery applications and self-service discovery. In self-service mode, end users may upload their own data sets from personal files on their desktop (such as Excel or JSON) and then “mash up” this data with data enterprise sources, including the data warehouse or the Oracle Common Enterprise Information Model.
We’ve just compared the major components of Endeca to typical business intelligence systems, so you should now have a general understanding of Endeca. Next let’s look deeper at these components to gain a better understanding of the Endeca architecture.