In this blog, you will begin your exploration of Endeca Integrator. Let’s start with the product name, or more specifically, Integrator, which is a word whose definition in information technology can vary. Integrator, in the case of Endeca Integrator, refers to the ability to combine or incorporate data into Oracle Endeca Server and is really the output or end result of Endeca Integrator activities and usage. As you may recall, you use the term data ingest when data is added to Endeca Server. The inputs to Endeca Integrator are data sources along with the features used to capture this data. Between the outputs and inputs, Endeca Integrator transforms data. Figure 1 summarizes this simple paradigm.
Figure 1. Basic paradigm for Endeca Integrator ETL
When data is ingested into Endeca Server, it is loaded into a data set. Endeca Studio users can use these data sets to create applications. Hence, there are two ways to create data sets for Endeca applications. With Endeca Studio, when you create an application from an Excel spreadsheet, JSON source, or JDBC connection, a data set is created behind the scenes. With Endeca Integrator, you create data sets at the output step of the paradigm shown in Figure 1. The process of creating applications does not change when data sets are created with Endeca Integrator. The only thing that is different is that the data set is selected at the time of application creation from a list of existing data sets, instead of loading data from an Excel spreadsheet, JSON source, or JDBC connection.
The name Endeca Integrator might suggest that Endeca Integrator is a single enterprise software product, but it is actually composed of three major software products, each with its own set of capabilities and with its own place in the paradigm shown in Figure 1. We already covered these individual components in some depth in this blog. What follows is a concise list of these software products in the order they will be covered in my blog late. We will cover the two acquisition tools first and then will cover Integrator ETL to provide insight into how these tools work with Endeca Integrator ETL.
- Endeca Integrator Acquisition System This is a command-line tool that is used to acquire data from a wide variety of unstructured sources and make it available to Integrator ETL. These sources are listed in the appendix.
- Endeca Web Acquisition Toolkit This is a new addition to Endeca Integrator as of Endeca 3.1; it provides different capabilities from the Integrator Acquisition System and also has a graphical user interface. The Web Acquisition Toolkit can access individual elements of a web page, and each of these elements can become an attribute for a schema.
- Endeca Integrator ETL Endeca Integrator ETL is the tool that executes the steps shown in the paradigm in Figure 1 of capture, processing, and delivery. ETL is an abbreviation for “extract, transform, and load.”
We did not include Integrator ETL Server in this list since it allows the processes shown in Figure 1 to be run, scheduled, and managed, but does not itself provide any of the capabilities shown in the figure.