Best Practices for Implementing Data Discovery in General

Best Practices for Implementing Data Discovery in General

Traditional business intelligence falls short of meeting the full analytical needs of business users. According to various industry reports from TDWI and IDC, more and more enterprises are looking into establishing data discovery capabilities to reduce operational cost, proactively discover potential threats and issues, and gain competitive advantage. Data discovery removes the obstacle between business users and the data they need to analyze through self-service data provisioning and easy-to-use advanced visualization. But for all its promise to fulfill long-awaited analytics needs, organizations must carefully consider their approach to data discovery. Here are some of the best practices.

Table of contents[Show]

Architecture and Planning

The first set of best practices involves architecture and planning how this new class of data discovery capability fits into the overall information architecture, how to support effective data provisioning, and how to design a proper security model.


Integration with the Existing BI and Data Management Ecosystem

A number of easily downloadable, desktop-based tools have emerged in the market, empowering end users with the interfaces and flexibility they need to analyze data from their own perspective. Some of these offerings can also integrate with mature back-end infrastructures. However, there are inefficiencies when they are deployed without an enterprise integration plan. The risk of this ungoverned approach is that it can lead to users keeping local copies of data, replicating the same good, old “spread mart” problem.

Integration with the full spectrum of BI and analytics solutions is also a critical success factor. It’s important to understand and educate the users about the strengths and gaps of each of these tools and put the best one in use for what it’s intended for. For example, make OBIEE the recommended choice for standard dashboard and ad hoc reporting tool for operational staff, use Oracle Endeca Information Discovery for data exploration and discovery for business analysts, and enable data scientists with advanced analytical products such as Oracle Data Mining, Oracle R Enterprise, Oracle Text Analytics, and Oracle Spatial.


Data Provisioning

Data discovery applications allow business users to analyze a new data set without IT involvement. However, the majority of these users also need access to data residing in the enterprise data warehouse and various operational stores as well as departmental data marts. Providing unified information access is critical to enable users to work with the growing data and content with less IT involvement.

The most balanced approach is for IT to create predefined data sources that leverage existing data warehouse and data marts, as well as BI metadata, and let users personalize discovery and visualization by incorporating additional data sets as needed. There needs to be a continuous evaluation of personalized versus enterprise-level data sets throughout the analytical life cycle.



Earlier, we discussed four components of security, including perimeter security, data protection, access control, and visibility. Traditional security measures tend to focus on locking down the perimeters. While network security and physical isolation still play a critical role, more enterprises are implementing the defense-in-depth approach with a multilayered security structure.

The hybrid data platform and self-service data provisioning introduce a new set of security challenges. Here are some recommendations around security for analytics and data discovery systems:

  • Establish data classification and governance based on data sensitivity in all data store environments and platforms.
  • Store highly sensitive data on an RDBMS environment with mature and sophisticated data protection capabilities, including transparent data encryption, database vault, secure backup, database firewall, data masking, and data redaction.
  • Establish predefined data sources to enable integrated access for qualified users.
  • Implement label security (row-level security) capabilities and fine-grained access control from different types of users.
  • Implement separation of duty to protect data access from superusers and administrators.

Many new types of big data sources do not contain sensitive data, such as sensor data and social media feeds. It might not be practical or beneficial to go through a detailed rationalization process for all data assets.

One rising approach is to implement a stratified security measure that is composed of a top tier of critically sensitive data and everything else. For the most sensitive data assets, it’s recommended that you implement every security measure available, as we listed previously. For less sensitive data assets, open access is typical for the majority of users. This is not a move away from security and access control. The new approach is to allow open access while implementing sophisticated forensic analysis to detect, alert, and prevent unusual behavior.


Process and People

Organizational change is important to whether a data discovery application is going to achieve its potential. A company must be ready to be an analytical-driven organization. It needs to foster intellectual curiosity, the desire to access information, and the habit of data exploration for decision making. Preparing for this organizational change is a process, and it takes time and effort.


Communicate Data Discovery Capabilities

You can’t expect people to know what they don’t know. It’s important to provide education on data discovery and how it differs from and complements traditional business intelligence systems. Demos and training sessions are essential. Initial handholding is also critical to success. You should also conduct use case workshops with different lines of business; prioritize them based on value, impact, cost, risk, and practicality; and identify a set of best-fit use cases for initial prototyping. There is significant benefit to generating enthusiasm in the business community with new insights through these prototypes and quickly turning them into operational solutions.


Look for Quick Wins

It is advisable to strive for success early and demonstrate value. This is critical when adopting a new generation of business intelligence into your organization. A successful initial deployment leads to additional deployments. Find and document repeatable analysis patterns so that once an application is viable, it is made available to other users in the organization.



Experienced consultants from the vendor typically conduct most initial prototypes. This is not a substitute for training. A data discovery platform usually involves a sophisticated architecture for data storage, integration, visualization, and data provision capabilities. Apart from user training, it is also important to provide training for IT staff to enable them to manage and support this platform, to plan for growth, to implement integration and solutions, and to incorporate this platform into the existing operational procedures.



What do you see in Figure 1? Some people see a donkey immediately. But others first notice a hummingbird made with ceramic tiles.

The art of interpretation

 FIGURE 1 The art of interpretation

Different people notice different things. The same visualization or data set can present varied interpretations. Collaborative exploration is the convergence between data discovery and collaboration tools. It allows sharing of the output and discovery results as well as the hypothesis, which can also be published onto an enterprise social media portal for broader input, group thinking, and the ability to connect the dots. Collaboration also fosters sharing and reuse of data sets that bring greater benefit from these assets.



Вас заинтересует / Intresting for you:

Healthcare Analytics Use Case ...
Healthcare Analytics Use Case ... 1724 views Stepan Ushakov Tue, 17 Sep 2019, 06:09:25
Importance of Data Science
Importance of Data Science 1466 views Дэн Sun, 17 Jun 2018, 06:44:06
Data Science and Big Data
Data Science and Big Data 1146 views Дэн Sat, 16 Jun 2018, 17:53:54
Collecting Data in Real Time, ...
Collecting Data in Real Time, ... 565 views Валерий Павлюков Wed, 13 May 2020, 05:06:04
Log in to comment