The Oracle kernel components in the Oracle RAC environment are the set of additional background processes in each instance of an Oracle RAC database. The buffer cache and shared pool become global in the Oracle RAC environment, and managing the resources without conflicts and corruptions requires special handling. The additional background processes in the Oracle RAC environment, along with those normal background processes that usually exist in single instances, manage the global resources effectively.
Global Cache and Global Enqueue Services
In Oracle RAC, because more than one instance is accessing the resource, the instances require better coordination at the resource management level. Otherwise, data corruption may occur. Each instance will have its own set of buffers but will be able to request and receive data blocks currently held in another instance’s cache. Buffer manipulation in the Oracle RAC environment is quite different from a single-instance environment because at any time only one set of processes may be accessing the buffer. In Oracle RAC, the buffer cache of one node may contain data that is requested by another node. The management of data sharing and exchange in this environment is done by the Global Cache Services.
Each instance has its own buffer cache in the System Global Area (SGA). Oracle uses Cache Fusion to logically combine the buffer cache of multiple instances in a RAC cluster, so the instances can process the data as if it was stored in a single buffer cache.
Oracle RAC uses two key processes to ensure that each instance in a cluster gets the block it needs from the buffer cache: the Global Cache Services (GCS) and the Global Enqueue Services (GES). The GCS and GES together form and manage the Global Resource Directory (GRD) to maintain the status of all datafiles and all cached blocks. The contents of the GRD are distributed across all the active instances of a cluster. The following section explains the Global Resource Directory and its components in more detail.
Global Resource Directory
All the resources in the cluster group form a central repository of resources called the Global Resource Directory (GRD). Each instance masters some set of resources and together all instances form the GRD. The resources in the cluster group are equally distributed among the nodes based on their weight.
The GRD is part of the shared pool, and an undersized shared pool in an Oracle RAC environment can severely impact performance because the GRD competes with the Library cache and the Data Dictionary, as well as the other components of the shared pool.
When one instance departs the cluster, the GRD portion of that instance needs to be redistributed to the surviving nodes. When an instance enters or leaves a cluster, the components of the GRD need to be redistributed among all instances. The GRD is frozen during the redistribution process, in order to enable the atomic distribution of the GRD among the instances in the cluster. Especially in large RAC installations with large SGAs and large active datasets, this redistribution process will cause a brownout. When a new instance enters the cluster, the GRD portions of the existing instances must be redistributed to create the GRD portion of the new instance.
Oracle RAC Background Processes
Oracle RAC databases have two or more instances, each with its own memory structures and background processes. In addition to the usual single-instance background processes, Oracle RAC employs other processes to manage the shared resources. Thus, the Oracle RAC database has the same structure as that of a single-instance Oracle database, plus additional processes and memory structures that are specific to Oracle RAC. These additional processes maintain cache coherency across the nodes.
Maintaining cache coherency is an important part of an Oracle RAC. Cache coherency is the technique of keeping multiple copies of a buffer consistent among different Oracle instances on different nodes. Global cache management ensures that access to a master copy of a data block in one buffer cache is coordinated with the copy of the block in another buffer cache. This ensures the most recent copy of a block in a buffer cache contains all changes that are made to that block by any instance in the system, regardless of whether those changes have been committed on the transaction level.
The Importance of Coordination
It’s important to understand why inter-instance cache coordination is necessary in an Oracle RAC environment. Consider a two-instance environment without any cache coordination and communication among the instances, as shown in Figure 1:
FIGURE 1. Instances read a block without any coordination.
- Referring to Figure 2, consider at time t1, instance A reads a block in its buffer cache and modifies row 1 in it. The modified block is still in its buffer cache and has not yet been written to disk.
- Sometime later at time t2, instance B reads the same block in its buffer cache and modifies another row in that block. Instance B also has not written the block to disk, so the disk still contains the old version of the block.
- Now at time t3, instance A writes the block to disk. At this stage, modifications from instance A are written to disk (see Figure 2).
FIGURE 2. Instance A writes the block to disk without coordination.
- Later at time t4, instance B writes the block to disk. It overwrites the block written by instance A in step 3. As you can easily infer, the changes made to the block by instance A are lost (see Figure 3).
FIGURE 3. Instance B overwrites the changes made by instance A.
This scenario and many other similar situations require that when data is simultaneously accessed by multiple machines, the read and (especially) write activities must be coordinated among these machines; otherwise, data integrity problems will result that may manifest as data corruption.
Now let’s repeat the preceding operation sequence in the presence of coordination:
- At time t1, when instance A needs a data block with an intent to modify, it reads the block from disk. However, before reading, it must inform the GCS (DLM) of its intention to do so. GCS keeps track of the lock status of the block being modified by instance A by keeping an exclusive lock against the block on behalf of instance A.
- At time t2, instance B wants to modify the same block. Before doing so, it must inform the GCS of its intention to modify the block. When GCS receives the request from instance B, it asks the current lock holder instance A to release the lock. Thus, GCS ensures that instance B gets the latest version of the block and also passes on the write privilege to it (exclusive lock).
- At time t3, instance B gets the latest (current) version of the block that has the changes made by instance A and modifies it.
- At any point in time, only one instance has the current copy of the block. Only that instance can write the block to disk, thereby ensuring that all the changes to the block are preserved and written to disk when needed.
The GCS thus maintains data coherency and coordination by keeping track of the lock status of each block that is read and/or modified by the server nodes in the cluster. GCS guarantees that only one copy of the block in memory can be modified and that all the modifications are written to disk at the appropriate time. It maintains the cache coherence among the nodes and guarantees the integrity of the data. GCS is an in-memory database that contains information about the current locks on blocks and also keeps track of instances that are waiting to acquire locks on blocks. This is known as Parallel Cache Management (PCM) and has been a central feature of Oracle clustered databases since the introduction of Oracle Parallel Server (OPS) in the early 1990s.
PCM uses distributed locks on the resources to coordinate access to resources by different instances of an Oracle RAC environment. The GCS helps to coordinate and communicate the lock requests from Oracle processes between instances in the Oracle RAC environment.
Each instance has a buffer cache in its SGA. To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction, Oracle RAC instances use two processes: the GCS and the GES. The GCS and GES maintain records of the lock status of each data file and each cached block using a GRD. The GRD contents are distributed across all of the active instances as part of the shared pool on each instance. Hence, it is good to increase the SGA size by some factor, but not more than 5 percent of the total SGA size. Tests have found that the larger your block size, the lower the memory overhead for the extra GCS, GES, and GRD components in the SGA. For large SGAs that exceed 20GB, it has been noted that the overhead is dependent on the block size used and could be around 600MB to 700MB for a 16KB-block-sized database.
The cost (or overhead) of cache coherency is defined as the need to check with other instances if a particular access is permitted before granting any access to a specific shared resource. Algorithms optimize the need to coordinate on each and every access, but some overhead is incurred. Cache coherency means that the contents of the caches in different nodes are in a well-defined state with respect to each other. Cache coherency identifies the most up-to-date copy of a resource, also called the “master copy.” In case of node failure, no vital information is lost (such as committed transaction state) and atomicity is maintained. This requires additional logging or copying of data but is not part of the locking system.
A resource is an identifiable entity—that is, it has a name or reference. The entity referred to is usually a memory region, a disk file, or an abstract entity. A resource can be owned or locked in various states, such as exclusive or shared. By definition, any shared resource is lockable. If it is not shared, no access conflict will occur. If it is shared, access conflicts must be resolved, typically with a lock. Although the terms lock and resource refer to entirely separate objects, the terms are sometimes (unfortunately) used interchangeably.
A global resource is visible and used throughout the cluster. A local resource is used by only one instance. It may still have locks to control access by the multiple processes of the instance, but no access to it occurs from outside the instance. Data buffer cache blocks are the most obvious and most heavily used global resource. Other data item resources are also global in the cluster, such as transaction enqueues and database data structures.
The non-data-block resources are handled by Global Enqueue Services (GES), also called Non-Parallel Cache Management (Non-PCM). The Global Resource Manager (GRM), also called the Distributed Lock Manager (DLM), keeps the lock information valid and correct across the cluster.
All caches in the SGA are either global (and must therefore be coherent across all instances) or local. The library, row cache (also called dictionary cache), and buffer caches are global. The large and Java pool buffers are local. For Oracle RAC, the GRD is global in itself and also used to control the coherency.
After one instance caches data, in some cases other instances within the same cluster database can acquire a block image from another instance in the same database faster than by reading the block from disk. Therefore, Cache Fusion moves current copies of blocks between instances rather than rereading the blocks from disk under certain conditions. When a consistent block is needed or a changed block is required on another instance, Cache Fusion can transfer the block image between the affected instances. RAC uses the private interconnect for inter-instance communication and block transfers. GCS manages the block transfers between the instances.
The GRD manages the locking or ownership of all resources that are not limited to a single instance in Oracle RAC. The GRD is composed of the GCS, which handles the data blocks, and the GES, which handles the enqueues and other global resources.
Cache Fusion doesn’t require you to tune any parameters. Oracle Database dynamically allocates all Cache Fusion resources. The dynamic mastering of resources by keeping them local to the data blocks means a high level of performance.
Each process has a set of roles, and we will study them in detail in the following sections. In Oracle RAC, the library cache and shared pool are globally coordinated. All the resources are managed by locks, and a key background process also manages the locks. GCS and GES use the following processes to manage the resources. These Oracle RAC–specific background processes and the GRD collaborate to enable Cache Fusion:
- LMS Global Cache Services process
- LMON Global Enqueue Services Monitor
- LMD Global Enqueue Services daemon
- LCK0 Instance Enqueue process
- ACMS Atomic Controlfile to Memory Service (ACMS)
- RMSn Oracle RAC Management Processes (RMSn)
- RSMN Remote Slave Monitor
The LMON and LMD processes communicate with their partner processes on the remote nodes. Other processes may have message exchanges with peer processes on the other nodes (for example, PQ). The LMS process, for example, may directly receive lock requests from remote foreground processes.
The following sections explain the key functions of the various Oracle RAC–specific background processes.
LMS: Global Cache Services Process
LMS is a process used in Cache Fusion. LMS maintains records of the status of the datafiles and the cached block, and it stores that information in the GSD. The acronym is derived from the Lock Manager Server process. It enables consistent copies of blocks to be transferred from a holding instance’s buffer cache to a requesting instance’s buffer cache without a disk write under certain conditions. It also retrieves requests from the server queue that are queued by LMD to perform requested lock operations.
LMS also rolls back any uncommitted transactions for any blocks that are being requested for a consistent read by the remote instance. LMS processes also control the flow of messages between instances. Each instance can have up to 10 LMS processes, though the actual number of LMS processes varies according to the amount of messaging traffic between nodes. You can control the number of LMNON processes by setting the GCS_SERVER_PROCESSES initialization parameter. If this parameter is not set manually, the number of LMS processes automatically started during instance startup is a function of the CPU_COUNT of that node and is usually adequate for most types of applications. It is only under special circumstances that you may need to tweak this parameter to increase the default number of LMS processes.
LMS processes can also be started dynamically by the system based on demand, and this is controlled by the parameter _lm_dynamic_lms. By default, this parameter is set to FALSE. In addition, LMS processes manage Lock Manager Server requests for GCS resources and send them to a service queue to be handled by the LMS process. It also handles global lock deadlock detection and monitors for lock conversion timeouts.
LMON: Global Enqueue Services Monitor
LMON is the Lock Monitor process and is responsible for managing the Global Enqueue Services (GES). It maintains the consistency of GCS memory in case of process death. LMON is also responsible for cluster and lock reconfiguration when an instance joins or leaves the cluster. It also checks the instance death and listens for local messages. The LMON process also generates a detailed trace file that tracks instance reconfigurations.
The background LMON process monitors the entire cluster to manage global resources. LMON manages instance deaths and the associated recovery for any failed instance. In particular, LMON handles the part of recovery associated with global resources. LMON-provided services are also known as Cluster Group Services (CGS).
LMD: Global Enqueue Services Daemon
LMD is the daemon process that manages Enqueue Manager Services requests for the GCS. The acronym LMD refers literally to the Lock Manager Daemon, the term used for the process in OPS. The resource agent process manages requests for resources to control access to blocks. The LMD process also handles deadlock detection and remote resource requests. Remote resource requests originate from another instance.
LCK0: Instance Enqueue Process
The Lock (LCK) process manages non–Cache Fusion instance resource requests and cross-instance call operations for shared resources. It also builds a list of invalid lock elements and validates lock elements during recovery. An instance can use only a single LCK process because primary functionality is handled by the LMS process.
The RMSn (Oracle RAC Management) processes are responsible for performing the Oracle RAC manageability functions, such as creating the necessary RAC resources (for example, when you add a new instance to a cluster).
The RSMN (Remote Slave Monitor) process monitors and manages the background slave process creation and communications on remote instances.
This article introduced the various building blocks of Oracle RAC. Oracle Grid Infrastructure provides the foundation for Oracle RAC scalability and availability. Because storage is shared across all the nodes, it is very important to establish redundancy at the disk level, and ASM helps provide a very scalable and highly available storage subsystem.
Next we discussed storage and how the network components play a vital role in building the cluster. We looked into the different options available for configuring the interconnects, which are key to the systems in supporting scalability and availability. Bonded network interfaces at the operating system level provide the redundancy and required scalability for cluster operations. Redundant network switches should also be part of high availability architecture.
It’s important to understand the importance of resource coordination and the kernel components involved in the cluster-wide operations. Global Cache Services and Global Enqueue Services ensure the resources are properly queued, and they synchronize the database operations. With this background in mind, we’ll begin preparing the hardware to install Oracle RAC in the next article.