Oracle Endeca

  • Best Practices Implementing Oracle Endeca Information Discovery

    We started our chapter with a broad discussion on establishing analytical capabilities. We then moved on to general best practices for enabling information discovery. This section lists the practical considerations for implementing Endeca Information Discovery.

  • Endeca IAS and Endeca WAT Comparison

    Newcomers to Oracle Endeca may view Endeca IAS and Endeca WAT as two tools that deliver the same capability and may think that the primary difference between the two is the user interface. After the previous two major sections, however, you know that this is not correct and that the two tools provide different capabilities.

    A good analogy to use in comparing Endeca IAS and Endeca WAT is that of a fishing net versus a fishing pole. A fishing net can easily cover a wide area and catches everything under the net as it is cast. A fishing pole captures fish one at a time, with a particular type of bait.

  • Endeca Information Discovery Integrator

    Endeca Information Discovery Integrator overviewThis blog provides more detail about the major components of Oracle Endeca Information Discovery Integrator. The Endeca Information Discovery Integrator consists of the following five components:

    • Integrator ETL
    • Integration Server
    • Integrator Acquisition System
    • Web Acquisition Toolkit
    • IKM SQL to Endeca Server
    • Endeca Information Discovery Integrator Explained

      Endeca Information Discovery IntegratorIn this blog, you will begin your exploration of Endeca Integrator. Let’s start with the product name, or more specifically, Integrator, which is a word whose definition in information technology can vary. Integrator, in the case of Endeca Integrator, refers to the ability to combine or incorporate data into Oracle Endeca Server and is really the output or end result of Endeca Integrator activities and usage. As you may recall, you use the term data ingest when data is added to Endeca Server. The inputs to Endeca Integrator are data sources along with the features used to capture this data. Between the outputs and inputs, Endeca Integrator transforms data. Figure 1 summarizes this simple paradigm.

    • Endeca Integrator Acquisition System

      Oracle Endeca IASIn this blog we will cover the Endeca Integrator Acquisition System, or Endeca IAS, a tool that has two major functions. Its primary function is to crawl source data stored in a variety of formats, including file systems, delimited files, and web servers. (You can find a complete list of supported file formats in the appendix.) Its other function is to store this data in a data repository known as a record store and make this data available to Integrator ETL through a web service.

      The goal of this blog is to introduce Endeca IAS and its capabilities and provide you with sufficient information to decide what place Endeca IAS has in your organization’s enterprise-based data exploration. Endeca IAS is a tool whose interface is the command line and lacks any graphical user interface. When selecting this tool, you should be certain that those who will use Endeca IAS have sufficient skills to use a command-line interface tool. Database administrators (DBAs) or systems administrators possess these skills, along with developers who typically work in UNIX or Linux. The files used to configure Endeca IAS are XML files and text files located on the server hosting Endeca IAS. These files can be edited on the server with the vi editor or can be edited on a workstation, and then they are transferred to the server.

    • Endeca Studio Application Creation for FDIC Failed Banks Data

      Oracle Endeca Studio Application CreationorLet’s take a look at an example of user-driven data exploration with some data available from data.gov, a U.S. federal web site launched in late May 2009, whose stated mission is to increase public access to high-value, machine-readable data sets. A major feature of data.gov is financial data, and the Federal Deposit Insurance Corporation (FDIC) has an interesting data set available. The FDIC is often appointed as the receiver of failed banks, and there is a list of failed banks going to back to October 1, 2000, available as a comma-separated value (CSV) file, banklist.csv. Endeca Studio does not currently support CSV format files as of version 3.1. The solution is trivial, though: Open the file in a spreadsheet program such as Microsoft Excel and then use the program’s Save As feature to save the file in .xls format for Microsoft Excel.

    • Integrator ETL installation and usage

      Integrator ETL overviewIntegrator ETL is the central hub of Oracle Endeca Integrator. The other components listed previously provide data to Integrator ETL. Only Integrator ETL is capable of executing all the steps in Figure 1. Integrator ETL is an ETL tool, and the “load” part of Integrator ETL facilitates data ingest into Endeca Server. Integrator ETL is not limited to loading data from Endeca IAS and Endeca WAT. Integrator ETL can read data from many data sources, including flat files, Excel spreadsheets, XML files, and nearly every type of database.

    • Integrator ETL Server

      Integrator ETL Server is a scheduling tool for Endeca ETL graphs. It also has the capability of running shell scripts and batch files and, as a result, can be used to run Endeca IAS crawls remotely before the graphs that consume the data are run.

    • Managing Oracle Endeca Server

      How to managin Oracle Endeca ServerNow that we have wrapped up our discussion on installing Endeca Server, let’s move on to subjects related to Endeca Server management. This article describes common activities for those involved in maintaining Oracle Endeca Server. Endeca Server, like all enterprise systems, requires a level of maintenance, monitoring, and management in order to keep it running smoothly. Planning the day-to-day management of Endeca Server will ensure that your Endeca deployment is productive for your end users. In this section we will cover the following:

    • Oracle Endeca Information Discovery Core Components

      Oracle Endeca Information Discovery comprises three core components. The sections that follow provide an overview of these components and how they relate to the major Endeca features that have been discussed. Further in my blog cover these components in greater detail. Figure 1 shows an overview of the Endeca core components. These core components are:

    • Oracle Endeca Information Discovery Overview

      Oracle Endeca Information Discovery OverviewOracle Endeca is not a single product, but a number of products and product features that are separately installed and licensed. To understand the capabilities of Endeca, it is best to start by contextualizing it against a business intelligence system. Like a typical business intelligence system, Endeca includes the following major components:

      • A data server
      • A data integration tool
      • An analytics toolset

      The Oracle Endeca Information Discovery features that differentiate it from a traditional business intelligence system are:

    • Oracle Endeca Information Discovery Studio

      Oracle Endeca Information Discovery Studio overviewOracle Endeca Information Discovery Studio is a web-based application that allows business analysts to rapidly assemble dashboard applications. These applications enable analysts and other end users to explore a full range of structured and unstructured enterprise data from Endeca Server. Each application consists of one or more pages, with each page containing a set of graphical components. Oracle Endeca Information Discovery Studio components include functions to:

      • Navigate to or search for specific data
      • Display detailed information about data
      • Display graphical representations of the data
      • Manipulate and analyze the data
      • Highlight specific data values
      • Oracle Endeca Licensing

        Oracle licenses each of the Endeca core components separately. There are three plug-ins that are also licensed separately, listed here:

        • Oracle Endeca Web Acquisition Toolkit This is an add-on module for the Endeca Information Discovery Integrator.
        • Oracle Endeca Text Enrichment This includes text analysis capabilities for extracting people, places, organizations, quotes, and themes as well as summarization capabilities for automatically creating abstracts and topical summaries.
        • Oracle Endeca Text Enrichment with Sentiment Analysis This delivers all the Text Enrichment capabilities as well as advanced text analysis for extracting aggregate sentiment related to each extracted concept. Sentiment is extracted with a score indicating the positive and negative nature of a document, a phrase, or an entity. These scores are used to show varying ranges of positivity and negativity in search, guided navigation, and analytics.

        Oracle sales consultants and Oracle license resellers can provide assistance and answer questions regarding Endeca licensing.

      • Oracle Endeca Server overview

        Oracle Endeca Server overviewEndeca Server is at the heart of Oracle Endeca and is the core search-analytical database. In Endeca Server, data is organized using a highly flexible data model known as a faceted data model. With this data model, it is not necessary to define a unified schema before loading and analyzing data; data models are derived from the data that is stored in the database, and every record has its own schema based on its own generated attributes. This is irrespective of the data source or whether the source is structured or unstructured.

      • Oracle Endeca Studio Application Creation: advanced analysis

        Oracle Endeca Studio Application CreationFor the first example of FDIC failed banks data, you obtained data from the data.gov repository. The FDIC web site also has data available for download with additional information, so you will use it in the second example. We will not review the process of creating the Endeca Studio application for this example, except to revisit the advanced features on the attribute review page on application creation. The data set available from the FDIC is more comprehensive than the one available from data.gov. It includes dollar amounts associated with the failures and the insurance fund used to pay depositors. The data is for the entire his tory of the FDIC, going back to 1934. It also indicates the type of failure, indicating whether the intervention was only assistance or a full failure of the bank.

      • Oracle Endeca Studio Systems Architecture

        Endeca Studio Systems ArchitectureIn this blog you will briefly explore the systems architecture of Oracle Endeca Studio and then learn about installing it. This information is intended to supplement and add clarification to the information contained in Oracle’s documentation. If you are interested in learning only about Endeca Studio and user-driven data exploration, you can skip to the next section.

      • Oracle Endeca Use Case Implementation: Claims, Patients, Operations analysis

        Oracle Endeca Use Case ImplementationThe purpose of describing this maturity model is to provide a framework to guide organizations in establishing analytical capabilities in an incremental manner. It is also a good way to organize the use case within such a context.

        The sample application in this article is intended to show how you can use Oracle Endeca to advance analytical capabilities in different stages of the maturity journey. The application is composed of the following function areas:

      • Oracle Endeca Web Acquisition Toolkit

        Endeca Web Acquisition ToolkitThe Oracle Endeca Web Acquisition Toolkit, or Endeca WAT, is a new offering with Endeca 3.1 and, like Endeca IAS, is a tool intended to capture unstructured data from web sources and make it available to Integrator ETL.

         

        Endeca WAT Background

        Oracle has partnered with Kapow Software to offer Kapow Katalyst as part of Endeca, branded as the Oracle Endeca Information Discovery Web Acquisition Toolkit. Kapow Katalyst is a widely used software product for acquiring web data. By partnering with Kapow for this offering, Oracle has chosen a venerable and reliable solution.

      • Planning the Endeca Server Installation

        Planning Installation of Endeca Server InstallationEndeca is a user-centric product; its infrastructure is sophisticated and of enterprise scale. Endeca Server sits at the center of this complex ecosystem. Refer to Figure 1 for an overview of the Endeca ecosystem. Endeca Integrator ETL pushes data sets into Endeca Server, and Endeca Server makes them available to Endeca Studio.

      • Understanding Oracle Endeca Server

        Oracle Endeca Server UnderstandingWe have covered installing and managing Endeca Server, and now we will wrap up this blog with an overview of the internals of Endeca Server and how Endeca Server fits into enterprise architecture with other enterprise products.

         

        EndecaServer.properties

        The EndecaServer.properties file contains settings for Endeca Server and can be thought of in the context of init or .ini files. Endeca Server administrators use the EndecaServer.properties file to change directory locations for the following:

        • Index files for data domain. This location is common for all data domains on a deployment of Endeca Server.
        • Log file directory, where the Endeca Server logs are written.
        • The “offline directory,” where files from endeca-cmd export-dd are stored.
        • Files associated with the Cluster Coordinator, for clustered operations.
        • Files associated with the data enrichment plug-in.

        Page 1 of 2