We started our chapter with a broad discussion on establishing analytical capabilities. We then moved on to general best practices for enabling information discovery. This section lists the practical considerations for implementing Endeca Information Discovery.
Endeca Server is the foundation of the discovery application. It’s important to follow these best practices:
- Conduct proper sizing It’s important to size the deployment based on data volume such as number of records, number of attributes, size of attributes per record, number of users, and estimated analytical query complexity.
- Design for high availability Enterprise applications require a certain SLA. It’s common for discovery and analytical applications to become part of the critical business processes. Consequently, it’s important to design and deploy these applications with high availability. It’s recommended to plan production deployment with Endeca Server Cluster for automatic failover that protects the environment and maximizes uptime in the case of physical hardware failure.
- Schedule backup of data domains and views Backup and recoverability is another area that needs to be taken into consideration for implementation. Use the scripts provided in the blog to export a data domain and all related directories. Integrator ETL can be used to schedule and run backup jobs as well.
- Establish security Data breaches can be costly for an organization, as some companies have learned the hard way. It’s important to establish defense-in-depth with proper server and OS hardening. Enable SSL for the production environment. Note that SSL needs to be enabled during installation. Data access control can be achieved through an EQL filtering mechanism as well as data domain permissions.
- Continuous monitoring Define operational procedures to monitor the health of Endeca services. You can design scripts to either ping Dgraph processes or run a “low-cost” query to the Dgraphs based on your defined interval. You can further define query latency thresholds to determine outage or other health issues. These scripts can be scheduled in Oracle Enterprise Manager for an integrated view of server, storage, and software.
For more detailed implementation guidance of Endeca Server, please refer to the article.
Endeca Studio is a self-service user interface for business analysts. Here are a few considerations that could make the user experience more pleasant and positive:
- Educate users on performance considerations Fast response time is one of the key differentiators for Endeca. However, any system can experience performance challenges if no proper governance is put in place. Educating the users about potential performance pitfalls can improve user experience through setting the right expectations. First make them understand that loading a large data set through a provision service is a lengthy process and could potentially affect the usability of other Endeca Studio applications. So, it might be advisable to load large data sets after hours, for example. Some other considerations to improve Endeca Studio applications include the following:
- Reduce the number of components per page.
- Avoid inefficient EQL queries.
- Display the minimum number of columns needed in components such as the result table and available refinements.
- Provide a predefined data source IT should provide business analysts with predefined data sources to use in Endeca Studio to provide data source connectivity to the enterprise data warehouse, appropriate departmental data marts, and existing OBIEE environment. This will provide ease of data access and reduce data copying and proliferation.
- Choose the right visualization Training is important to educate users on different types of visualization, even though the design is extremely intuitive with Endeca Studio. Keep in mind that users might be at different technical levels, and some might benefit from a basic understanding of the proper uses of different visualization components.
For more detailed implementation guidance of Endeca Information Discovery Studio, please refer to this article.
Integrator ETL, WAT, and IAS
The Endeca Information Discovery platform provides a number of best-of-breed integration capabilities, including Integrator ETL, WAT, and IAS, as was discussed in this blog. Here are some of the guidelines on how to use these tools effectively:
- Automate data refresh Use ETL Integrator Server to automate scheduled workflow for certain data you want to acquire on a regular basis.
- Integrator ETL development and design
- Use provisioning services to explore the data set.
- Use global variables for ease of change management.
- Use proper metadata naming to avoid confusion.
- Follow a consistent naming convention for ease of design maintenance.
- Iteratively define and test each component and step in ETL for ease of troubleshooting.
- Use trash and debug components for validation before running loads against the server to minimize the need for cleanup and performance implications.
- IAS source management IAS is a command tool, and XML is used extensively for configuration purposes. Establish a development process to use source control tools to ease maintenance and change management.
- IAS versus WAT Both tools possess web acquisition capabilities. Please refer to the appendix for a comparison of the two products and a quick lookup guide for the supported data source types for each product.
For more detailed implementation guidance of Endeca Information Discovery Integrator ETL, WAT, and IAS, please refer to this article.