The initial use of NoSQL technology began with the social media sites as they were looking at ways to deal with large sets of data generated by their user communities. For example, in 2010 Twitter saw data arriving at the rates of 12TB/day, and that resulted in a 4PB dataset in a year. These numbers have grown significantly as Twitter usage has expanded globally.
While the social media sites such as Twitter gave users an option to share their thoughts, ideas, and pictures, there was no easy way to make sense of such a large tsunami of information as it arrived from millions of users. HDFS is used to store such data in a distributed and fault-tolerant manner, and MapReduce technology, with its batch processing capability, is used to analyze the data. However, this wasn’t the right technology for answering real-time analytics on the data. Each tweet is stored with a unique identifier, and Twitter also saves the user ID. This key-value store could potentially take advantage of the capability of NoSQL databases. NoSQL database technologies could be used to run queries such as user searches, tweets from a specific user, and graph database capabilities could be used to find friends and followers.
Present-day enterprises have come to value the insight that social media provides into customer behavior, opinions, and market trends. Combining social media data with CRM data can provide a holistic view about the customer, something that was not possible just a few years ago. Customer data is no longer just limited to the past interactions; it can now include images, recordings, Likes (as in Facebook likes), web pages visited, preferences, loyalty programs, and an evolving set of artifacts. This requires a system that can handle both structured and unstructured data. As more channels of communication and collaboration come and go, the data format keeps constantly changing, requiring that developers and data management systems know how to operate in a schema-less fashion. While each record in a transactional system is very critical for the operation of the business, the new customer data is high volume and sparse. This requires a distributed storage and computing environment.
Customer profile data is predominantly a read-only lookup and requires a simple key-based access. NoSQL databases, with their support of unstructured and semi-structured data, key-value store, and distributed deployments, are ideal candidates. When it comes to operational analysis, you might want to combine the customer profile data with that in your OLTP or DW systems. The tight integration between Oracle NoSQL Database and the Oracle Database makes it possible for you to join data across both of these systems. Therefore, enterprises now deploy NoSQL databases alongside RDBMS, and MapReduce technologies.
Another use case that will illustrate how the different data management and analysis technologies work together is that of online advertisers. Advertisers are always in search of a new set of eyes, and the fast growth of mobile devices has made that a key focus.
Usage patterns on mobile devices are characterized by short intermittent access, as compared to that of a desktop interface, and this puts stringent constraints on the time publishers have to make the decision about which ad to display. Typically, this is of the order of 75 milliseconds, and a medium-sized publisher might have more than 500 million ad impressions in a day. The short time intervals, the large number of events, and the huge amount of associated data that gets generated require a multifaceted data management system. This system needs to be highly responsive, be able to support high throughput, and be able to respond to varying loads and system fault conditions. There is no single technology that can fulfill these requirements.
To be effective, the publisher needs to be able to quickly analyze the user so as to decide which ad to display. A user lookup is carried out on a NoSQL database and the profile is loaded. The profile might include details on demographics, behavioral segments, recency, location, and a user rating, which might have been arrived at behind the scenes through a scoring engine.
In addition to displaying the ad, there are campaign budgets to manage, client financial transactions to track, and campaign effectiveness to analyze. NoSQL database technologies, in conjunction with MapReduce and relational databases, are used in such a deployment, as shown in Figure 1.
FIGURE 1. Typical big data application architecture for an advertising use case