Endeca Server is at the heart of Oracle Endeca and is the core search-analytical database. In Endeca Server, data is organized using a highly flexible data model known as a faceted data model. With this data model, it is not necessary to define a unified schema before loading and analyzing data; data models are derived from the data that is stored in the database, and every record has its own schema based on its own generated attributes. This is irrespective of the data source or whether the source is structured or unstructured.
Structured data can be directly loaded into a faceted model using standard ETL tools. Each row becomes a record, and each column becomes an attribute. An example of such data is a sales transaction. Each transaction row becomes a record, and every element of the transaction record becomes an attribute.
Semistructured data from enterprise applications, various feeds, and XML sources can also be loaded as attribute and value pairs. This is a common cause of heterogeneous record structure. For the sales transaction record, it is possible to extend it with more information about specific products. The attributes that gowith each product could be very different depending on the product. For example, a road bike has different components than a mountain bike. With Endeca, attributes become attribute and value pairs, and jagged records begin to emerge that look dissimilar, meaning the data sets do not have the same data model but have some commonality between them. With relational database technologies, this is difficult to implement, but the key-value pair data structure in Endeca makes it possible to implement and extend.
With Endeca Server, unstructured data can be linked to records by any available key. In addition, unstructured elements can be stored as their own records for side-by-side analysis. Some examples are documents, RSS feeds, Twitter and Facebook data, and data feeds from discussion forums. To continue with the sales transaction data mentioned previously, you could now integrate into the record online customer reviews and the customers’ Facebook or Twitter comments on the product and transaction.
The way to accomplish this through Oracle Endeca Information Discovery is to take the textual fields into the records as new attributes. These attributes can be combined in the same record with the sales transactions, product details, and customer review information. Here is a summary of the data sources in the examples we mentioned:
Endeca allows the mapping of sales transactional records with product details and employee information through a product ID and a sales rep ID. The online reviews can also be mapped to the transaction records through transaction IDs that are captured. In addition, users can create a whitelist text tagger to tag the employee first name or last name mentioned in the review text to an employee by full name. Now unstructured, semistructured, and structured data is aligned and loaded and could be analyzed side by side.
Data domains are the largest unit of data over which Endeca Server allows queries to be expressed. No data domains exist on an Endeca Server instance immediately after installation, so a server administrator must create each data domain. Users can specify meaningful names such as staffing, sales, or marketing to identify data domains. The Data Ingest Web Service (DIWS) facilitates loading data into a data domain.
Records and Attributes
Records are the fundamental unit of data in a data domain on Endeca Server. As data is loaded, or ingested, it is stored in data records. Data records generally correspond to traditional records in a source database, but differ in that they are standardized for consistency and classified with attributes. Attributes are the “facets” of the data; they are the storage for metadata. As an example, consider opinion data ingested from unstructured data regarding consumer opinions on automobiles. The data records would have attributes indicating automobile make, model, year, a sentiment such as “recommend” or “lemon,” and a feature of the automobile such as “handling” or “transmission.”
For each data domain, Endeca Server creates a run time process called Dgraph that manages data domain operations. The Dgraph process of Endeca Server is the main computational module that provides the features of Endeca Server, such as search, refinement computation, and guided navigation. Dgraph maintains indexes of records that are searchable documents of domain data associated with attributes.The Dgraph is stateless, which facilitates the addition of Dgraph processes for load balancing and redundancy. When more processing capability is required than can be achieved on one Endeca Server instance, an Endeca Server cluster can be deployed.
Endeca Server Clustering
Endeca Server clustering can be used in production deployments to handle heavy workloads and is deployed across multiple servers. Nodes can be added to an Endeca Server cluster as additional processing needs emerge, ensuring the scalability of the Endeca Server cluster. The central feature of the Endeca cluster architecture is a data domain cluster, which is a set of Dgraph processes that handles requests across multiple nodes. One of the Dgraph processes is designated as the leader node, and all other Dgraph processes are referred to as follower nodes. The leader node handles all write and update requests, while the follower nodes allow only read operations. A shared file system is used for disk-based versions of indexes, and only the leader node has write access to the file system. The node hosting the leader node Dgraph process must also host the Cluster Coordinator service, which is responsible for intercluster communications between Dgraph processes and for notifying follower nodes when indexes or data has changed.
Endeca Query Language
Endeca Query Language is a powerful integrated analytics language built within Oracle Endeca Server. EQL enables power users to define and create new metrics to compose their own discovery applications. Built on the core capabilities of Oracle Endeca Server, EQL extends the capabilities of Oracle Endeca Server with a rich analytic language that allows users to explore aggregated and pivoted views of large volumes of data. EQL also supports a variety of data types, including numerical, geospatial, and date/time values that enable applications to work with temporal data, performing time-based sorting, filtering, and analysis. IT professionals have full access to the language for the purpose of building special formulas, metrics, and more that can be made available in discovery applications. Some of the most important EQL features include tight integration with search and navigation, rich analytical functionality, and processing efficiency, as well as a familiar development experience.
This is an example of a simple EQL statement:
An EQL statement starts with either DEFINE or RETURN. DEFINE doesn’t return the result set. Rather, it creates the data set as a temp table. A DEFINE statement is typically followed by a RETURN statement that consumes the result set. Endeca uses the DEFINE statement to create views that can then be used to generate charts and other advanced visualization.
The statement then needs to have one or many SELECT elements, separated by commas. The SELECT clause is composed of an expression followed by an alias. Expressions are usually one or more attributes, operators, or functions such as summation or average, as you see in the previous example.
The GROUP BY clause specifies the method of aggregation. Other EQL capabilities include joining, sorting, paging, and filtering. Oracle’s documentation for Endeca Server includes product documentation titled “Oracle Endeca Server: EQL Guide.” This is an extensive reference for EQL, and will help developers and users learn more about EQL’s capabiltites.
Most of the Endeca Server APIs are exposed as SOAP web services. These services are used by other Endeca
components to interact with Endeca Server. The major services are briefly discussed in the sections that follow.
Data Ingest Web Service
Data Ingest Web Service (DIWS) provides an interface to ETL tools to load data into the data domains hosted in Oracle Endeca Server.
Conversation Web Service
This web service provides the primary means of querying data in the data domain hosted in Oracle Endeca Server. This service is used by Endeca Information Discovery Studio to query Oracle Endeca Server.
Endeca Server Version Information
Be aware that Endeca Server version numbers do not coincide with the version numbers of the other core components. The current version of Endeca Server is 7.6, whereas the current versions of Endeca Information Discovery Integrator and Endeca Information Discovery Studio are 3.1.
Entity and Collection Configuration Web Service
This web service, also known as sConfig, allows you to create and update collections and views for collections or for data sets.
Configuration Web Service
This web service is for updating the schema and configuring the records in a data domain.