In this blog, we’ll cover Business Analytics use cases that are pertinent to IT departments for various industries, including log processing, security management and intrusion detection, enterprise data warehouse expansion, and ETL offloading.
IT systems generate a significant number of log files. This includes network logs, operating system–level logs, storage server logs, various application server logs, and database log files; almost every application and software package generates logs continuously. These log files are typically discarded within a short period because of the size, the cost of storage, and the low value density.
It is not until recently that IT professionals have realized there is tremendous information and insight behind these perceived valueless piles of data. New analytical applications have been developed to mine these system logs to discover patterns of alerts and signals to predict potential system failures before they occur. More and more applications now proactively design architecture that will enable long-term storage and analysis of logs to provide insights on user behavior and to improve site and application design. Many of the health insurance exchange implementations have incorporated design for forensic capabilities to store and mine log files in order to detect and prevent fraud, waste, and abuse based on the mandate from the Center for Medicare and Medicaid Services (CMS).
Similar to fraud detection in financial services, the effective management of security control versus access is always a dilemma. It is often challenging to balance prevalent information access with nonintrusive but effective security measures. It is also important to be more proactive in determining a potential security breach without creating false alerts. The challenges are because most of these security measures are rule based. The simple fact is we don’t know what we don’t know. Sometimes, the answer is to increase restrictions. As was mentioned earlier, false alerts tend to dilute the confidence in future alerts.
The new method of security management is to use the historic access patterns to determine anomalous behavior. The data exploration, clustering, and association mechanisms can look into combinations of factors such as role, function, time of the day, day of the week, location (home, office, public), device, application type, activity performed, and more. This can point to the unknown associations and help security analysts define and design better security measures.
This doesn’t mean we abandon the rule-based security measures today. These rules come from trials and errors and past experience gathered in the field, and they are tremendously valuable. With new data-driven discovery and pattern detection capabilities, there is more power in the way we can proactively manage security and detect illegal access.
Here’s an example: One large U.S. bank was challenged with increased cyberthreats and had reached “security appliance” fatigue. With every new threat, a vendor would pop up with a new appliance. The bank had huge amounts of security data, including Windows and IDS logs, but had difficulty leveraging it. It was looking for a data-driven security strategy that included preventing account takeover fueled by malware. Incident response involved a time-consuming process of examining voluminous log files. The new solution stored more than 120 different types of data, including transactions, logs, fraud alerts, server logs, firewall logs, and IDS logs for more than two years, resulting in more than 120TB of data. They are now able to spot anomalies in log files and other machine data that could indicate a hacking or phishing attempt.
The biggest benefit with the big data strategy for forensics has been speed of detection. The combined signs of a spear phishing attack with the statistical methodologies boosts the bank’s ability to identify potential attacks. The bank can now quickly act on intelligence received from various sources on malware threats and counter them.
A common usage pattern lately is data warehouse augmentation. Here is what a traditional data warehouse process looks like: Data sources are being loaded into a data warehouse hub through ETL. Sometimes they are then moved into high-performance data marts for departmental analytics and reporting. Most companies can’t afford to keep data in their warehouse indefinitely. The older the data becomes, the more aggregated it becomes. Detailed and granular data archived to tape is usually never used.
Big data changes this fundamentally. More and more companies are building a data reservoir to augment their existing data warehouse architecture. Data in the data reservoir is persistent, and the focus is on data processing, data storage, and reuse of the most granular level of data.
Information discovery can also be performed within a data reservoir. We’ve seen this pattern emerge that creates a data reservoir for discovery data marts that taps into a wide range of data elements. It simplifies data acquisition and enables discovery on raw data. The value of the new data sets can be evaluated and incorporated into production and operations, once confirmed.
ETL offloading, also known as a data factory, is a usage pattern that enables an organization to integrate and transform in batch mode large diverse data sets before moving data into data warehouses. It’s not a use case of analytical application, but, rather, an extension and necessary step toward building these applications. As a result, data in the data factory is transient for the purpose of processing.