Now more than ever, Java developers need to understand how to create data-centric applications. Data is an important commodity and organizations now try to capture, store, and analyze all the information they generate. As a result, many different forms of data exist and an equal number of different methods exist to store it. As a Java developer, you will likely face the challenge of writing an application that enables an organization to effectively use its data stored in either a single source or multiple sources.
Your chances of having to build an application that accesses enterprise data increase because Java continues to gain market share as the language of choice for creating server applications and the J2EE platform become increasingly popular. In addition, most server applications require access to data stores for information. As an example, an EJB component may need to update inventory levels in a database or send XML messages to other applications. As a result, your knowing how to access the different data stores is paramount in enterprise development.
However, client applications also need access to enterprise data stores. For example, a human-resources application that tracks employee vacation time must retrieve and store information from a database. In addition, you now have mobile clients that need access to enterprise data stores. Writing data-centric applications for these devices is challenging, as they operate with little memory, minimal processor speeds, limited power supplies, and intermittent network access.
Fortunately, Java provides a robust set of data-access technologies that enables you to access the most common types of enterprise data. Using these same technologies you can create both server-side components and client-side applications. The technologies consist of APIs for accessing databases, naming and directory services, and XML documents.
This article introduces the most common types of data enterprises used in their operations, from simple text files to complex specialty databases. This blog also covers the various Java-based technologies that you can use to access the data stores.
As you know, enterprises rely on data to make business decisions, generate revenue, and run daily operations. For example, managers generate sales forecasts based on historical sales data stored in a data warehouse. Companies also build online stores using live inventory levels that sell directly to their customers. Accounting departments use financial database applications to generate payroll checks and track accounts receivables. These are only a few examples of how enterprises use data.
As you also know, data can take many forms. Figure 1 illustrates some of the more common kinds of data an enterprise uses and how it stores them. It also shows how clients access the information residing in the data stores.
Figure 1. This figure shows an example of the more common kinds of data an enterprise uses and how it
For example, data most commonly takes the form of files stored in file systems on central servers or individual workstations. There are as many different forms of data files as there are applications. However, some categories include word- processing documents, spreadsheets, multimedia (graphic, sound, and video), and XML documents.
Most companies also use databases to store information and model business processes. Databases enable corporations to store, organize, and retrieve large amounts of data. Some organizations use them for data warehouses containing hundreds of gigabytes of information. Others may use databases to support high-volume transactional applications such as an airline-reservation system. Databases also offer a lot of flexibility in terms of how you interact with them. Almost all have proprietary data-access tools as well as mainstream APIs such as JDBC drivers for you to use.
Other forms of data exist as name-value pairs stored in a naming or directory service. These services store data in a hierarchical database system optimized for lookups. In addition, some organizations may use a directory service as an object repository. Distributed applications use the directory service to locate and download objects. This minimizes the problems associated with distributing updated code because applications always have access to the latest version.
When an organization uses different forms of data, it requires you, as a developer, to use different access methods as well. For example, most file access occurs across a LAN and so the network software and operating system handles the communication details. However, retrieving data from a database or directory service requires additional components. You will likely need special drivers or APIs. In addition, some organizations let clients access their data stores over the Internet. You must consider security issues as well as client-activity levels if you decide to do this.
As a developer, your job is to create applications that enable people, or processes, to interact with any form of data that contains the information they need. Therefore, you should understand the many different forms and how enterprises typically store them. In addition, you need to consider how clients access the information as it affects your application as well.
The following sections describe the most common data stores enterprises use to house their information.
Next to file systems, enterprises use databases to store most of their information. This enables centralized information storage, meaning that both clients and server have one data source. That is, everyone — onsite staff, field employees, and Web clients — looks at the same data. Centralizing data storage also enables administrators to perform maintenance routines such as data updates and backups more frequently and reliably.
Today’s databases can store more than just simple character or numerical data. The Internet has pushed database vendors to support more varied forms of data. For example, most database systems now enable you to store multimedia data such as sound and video. In addition, support for persisting native programming objects, such as those used by Java, also exists. Vendors developed this support because of the difficulty of combining object-oriented programming models with standard referential database systems.
There are many types of databases, including hierarchical, relational, object, and object-relational. Each has its strengths and weakness. However, by far the most popular type of database is the relational database. It is used by almost all enterprises employing database solutions.
The relational database gained popularity by providing the following benefits:
- Data integrity — Relational databases incorporate integrity rules to help protect against data corruption, duplication, and loss. You can use the built-in integrity rules or define your own.
- Common access language — SQL provides a universal access language for relational databases. The language enables you to build database structures, model business processes, and to add, delete, modify, and retrieve data. The core SQL language works with most relational database systems.
Because of their popularity, you should familiarize yourself with relational- database theory, SQL, and access techniques. Chances are that you will need them at some point as a developer.
Different Database Types
Many different databases are available to meet an organization’s data-storage needs. For example, some companies may need to persist Java objects. Others may want to model business processes or create databases optimized for retrieving data.
The following list describes the different database types available:
- Relational database — Stores all data in tables, among which you can define relationships in order to model most real-world processes. By default, relational databases have entity (table) and referential (relationship) constraints to protect against data loss or corruption. Relational databases are the most widely used database system.
- Hierarchical database — Stores data in records. Only parent-child relationships can exist between records. This creates a hierarchy wherein each record can participate in only one parent-child relationship, which makes it hard to model complex processes. Hierarchical databases provide fast data retrieval, but slow write operations. Directory services often use hierarchical databases.
- Network database — Similar to hierarchical databases except that they enable you to model more complex relationships. Network databases support many-to-many relationships among records.
- Object database — Supports storage of native programming objects and custom data types. Many object databases support object-oriented programming concepts such as inheritance, polymorphism, and encapsulation of the user-defined data types. Some support SQL while others have proprietary access languages.
- Object-relational database — A cross between an object database and a relational database. Most often, object-relational databases are relational databases that treat objects as new data types.
Naming and directory services
Naming and directory services are hierarchical (not relational) databases optimized for read (not write) operations. Therefore, you should not use them where significant insert, update, or delete activities occur.
Naming services store objects using a simple name-value format. A common example is a file system whose objects are files. As a naming service, the file system associates a name, the filename, with a value, the file handle. A user requests a file by its name and the operating system retrieves it by the associated file handle. An RMI Registry provides another example of a naming service. In this case, the name is the object identifier, and the value is the object itself.
A directory service extends the capabilities of a naming service by allowing you to attach attributes to objects. An example of a directory-service application is an employee directory stored in an LDAP-enabled directory service. In this example, an employee is an object and can have attributes in addition to his or her name. For example, you may attach attributes such as department, e-mail address, and phone number to each employee. In addition, you can search a directory service for objects based on attribute values.
The Lightweight Directory Access Protocol (LDAP) is often associated with naming and directory services. Contrary to popular belief, LDAP does not define a data-storage model or schema. Instead, it defines a communication protocol for interacting with directory services. Vendors use LDAP for communications and store data however they wish.
However, unlike with relational databases, with naming and directory services you cannot easily model processes or protect data using integrity constraints. Naming and directory services also lack a common data-access language like SQL and you usually rely on a vendor’s API for access. Fortunately, Java’s JNDI API addresses this lack of a standard access method by providing a common interface to many different naming and directory services.
Nonetheless, naming and directory services provide you with a powerful tool for retrieving data. In addition, they are useful when you do not need the overhead of hardware and DBAs to run a relational database.
The extensible Markup Language (XML) enables you to create self-documenting data. Enterprises now use XML as the standard for exchanging data and messages with other organizations or among applications. In addition, organizations use it in conjunction with XSLT to develop a single source of online content viewable from a variety of devices. As a result, most enterprise applications use some form of XML-service.
An XML-service is an application, whether EJB components or specific application classes that consume or generate XML. These services are quickly becoming a major component of distributed architectures and applications. Some examples of XML-services include:
- Processing configuration files such as EJB deployment descriptors
- Transforming data from one format to another
- Exchanging messages with other applications using JMS
Java provides significant support for XML. In fact, both technologies appeared in the mid-1990s and have grown together. During this time, many developers created numerous free Java tools for working with XML documents. Now the Java JDK and JRE distributions include many of these same tools, such as the SAX parser.
XML provides many benefits that have boosted its adoption rate. The following is a partial list of its advantages:
- XML is an open-standard — The World Wide Web consortium controls the XML specification, and therefore no one industry or company can control its direction.
- XML is text-based — XML documents are text files. As a result, you can read and edit them using text editors.
- XML is self-describing — An XML document can contain information about itself, meaning that it is self-contained. Other applications can use the document without any extra information.
- XML has free tools and processors — A multitude of Java tools exist to help you create, manipulate, read, and exchange XML documents.
Along with relational-database knowledge, a solid understanding of Java-XML technologies will help you significantly as you work with enterprise data using Java. Mastering both technologies definitely won’t hurt your career either.