When data is not in memory, it is at rest, and must be saved somewhere. Storage technology has evolved rapidly over the past few years, so we no longer think of storage as a mechanical hard drive containing one or more spinning metal disks with a magnetic surface. But, old habits die hard, and colloquially we still refer to a nonvolatile storage subsystem as “the disk,” even if it might take another form. In this book, however, we refer to it as a “drive.”
In the context of SQL Server 2017 , the storage subsystem should have low latency, so that when the database engine accesses the drive to perform reads and writes, those reads and writes should complete as quickly as possible. In the following list, we present some commonly used terms with respect to storage devices.
- Drive. The physical storage device. This might be a mechanical drive, a solid-state drive with the same form-factor as a mechanical drive, or a card that plugs directly into the motherboard.
- Volume. A logical representation of storage, as viewed by the OS. This might be one drive, part of a drive, or a logical section of a storage array. On Microsoft Windows, a volume usually gets its own drive letter or mount point.
- Latency. Measured in milliseconds, latency is how long it takes for data to be read from a drive (seconds per read), and written to a drive (seconds per write).
- IOPS. Input/output operations per second, or IOPS, is the number of reads and writes per second. A storage device might have differing performance depending on whether the IOPS are sequential or random. IOPS are directly related to latency by means of the queue depth.
- Queue depth. The number of outstanding read and write requests in a storage device’s request queue. The deeper the queue depth, the faster the drive.
SQL Server performance is directly related to storage performance. The move toward virtualization and shared storage arrays has placed more emphasis on random data access patterns. Low latency and high random IOPS will thus benefit the average SQL Server workload.
In the next two chapters, we go into more detail about the preferred storage configuration for SQL Server.
Types of storage in MS SQL Server 2017
Nonvolatile storage can be split up into two main areas: mechanical and solid-state.
Mechanical hard drives
Traditional spinning disks have a built-in latency, called seek time, due to their shape and physical nature. The read/write head is mounted on an arm that must scan the surface of the disk as it spins, seeking a particular area to perform the I/O operation. If the data on the spinning disk is fragmented, it can take longer to access because the head must skip around, finding data or free space.
The standard interface for mechanical drives is Serial ATA (SATA) or Serial Attached SCSI (SAS).
As spinning disks increase in capacity, the tracks between data become narrower, which causes performance to decrease, and increases the likelihood of mechanical failure or data corruption. The limits are pushed because of the rotational energy in the disk itself, so there is a physical speed limit to the motor.
In other words, mechanical disks grow bigger but slower and more prone to failure.
Solid-state technology, which makes use of flash memory, eliminates seek time entirely because the path to each cell where the data is stored is almost instantaneous. This is what makes solid-state storage so much faster than mechanical storage.
Solid-state storage devices can take many different forms. The most common in consumer devices is a 2.5-inch enclosure with a SATA interface, which was common with mechanical laptop drives. This accommodates a drop-in replacement of mechanical storage.
In server architecture, however, flash memory can take several forms. For local storage, they make use of the Peripheral Component Interconnect Express (PCIe) interface and plug directly into the motherboard. An example of this is Non-Volatile Memory Express (NVMe).
As the technology evolves, the performance will only improve as capacity grows. Solid state is not perfect though; data can be written to a particular cell only a certain number of times before it fails. You might have experienced this yourself with thumb drives, which tend to fail after heavy usage. Algorithms to balance writes across cells, called wear-leveling, help to extend the lifespan of a solid-state device.
Another problem with flash memory is write-amplification. On a mechanical drive, if a file is overwritten, the previous file is marked for deletion, but is not actually deleted from the disk surface. When the drive needs to write to that area again, it overwrites the location without removing what was there before.
Solid-state drives must erase the location in question before writing the new data, which has a performance impact. The size of the cells might also require a larger area to be erased than the file itself (if it is a small file), which compounds the performance impact. Various techniques exist to mitigate write amplification, but this does reduce the lifespan of flash memory.
The performance problems with mechanical disks, and the lifespan problems with both mechanical and solid-state drives, can be mitigated by combining them into drive arrays, to reduce the risk of failure by balancing the load and increase performance.
Configuring the storage layer
Nonvolatile storage can stand alone, in the form of Direct-Attached Storage, or be combined in many ways to provide redundancy or consolidation, perhaps even offering different levels of performance in order to manage costs better. For example, archive data might not need to be stored on the fastest available drive if it is accessed infrequently.
Direct-Attached Storage (DAS) is plugged directly into the system accessing it. Also called local storage, it can comprise independent mechanical hard drives, solid-state drives, tape drives for backups, CD and DVD-ROM drives, or even enclosures containing storage arrays.
DAS has a lower latency than a Storage-Area Network or Network-Attached Storage (more on these later in the chapter) because there is no network to traverse between the system and the storage. However, it cannot be shared with other systems, unless the local file system is shared across the network using a protocol such as Server Message Block (SMB) 3.0.
For SQL Server, DAS comprising flash storage (solid-state) is preferred for TempDB, which is also supported (and recommended) in a Failover Cluster Instance. You can also use DAS for the buffer pool extension.
Storage arrays and RAID
Combining drives in an enclosure with a controller to access each drive, without any thought to redundancy or performance, is called JBOD (colloquially, “just a bunch of disks”). These drives might be accessed individually or combined into a single volume.
When done correctly, combining drives into an array can increase overall performance and/or lower the risk of data loss should one or more of the drives in the array fail. This is called Redundant Array of Independent Disks (RAID).
RAID offers several levels of configuration, which trade redundancy for performance. More redundancy means less raw capacity for the array, but this can reduce data loss. Faster performance can bring with it data loss.
Striping without parity (RAID 0) uses multiple drives to improve raw read/write performance, but with zero redundancy. If one drive fails, there is significant chance of catastrophic data loss across the entire array. JBOD configurations that span across drives fall under this RAID level.
Mirroring (RAID 1) uses two drives that are written to simultaneously. Although there is a slight write penalty because both drives must save their data at the same time, and one might take longer than the other, the read performance is nearly double that of a single drive because both drives can be read in parallel (with a small overhead caused by the RAID controller selecting the drive and fetching the data). Usable space is 50 percent of raw capacity, and only one drive in the array can be lost and still have all data recoverable.
Striping with parity (RAID 5) requires an odd number of three or more drives, and for every single write, one of the drives is randomly used for parity (a checksum validation). There is a larger write penalty because all drives must save their data and parity must be calculated and persisted. If a single drive is lost from the array, the other drives can rebuild the contents of the lost drive, based on the parity, but it can take some time to rebuild the array. Usable space is calculated as the number of drives minus one. If there are three drives in the array, the usable space is the sum of two of those drives, with the space from the third used for parity (which is evenly distributed over the array). Only one drive in the array can be lost and still have full data recovery.
Combinations of the base RAID configurations are used to provide more redundancy and performance, including RAID 1+0 (also known as RAID 10), RAID 0+1, and RAID 5+0 (also known as RAID 50).
In RAID 1+0, two drives are configured in a mirror (RAID 1) for redundancy, and then each mirror is striped together (RAID 0) for performance reasons.
In RAID 0+1, the drives are striped first (RAID 0), and then mirrored across the entire RAID 0 set (RAID 1). Usable space for RAID 0+1 and 1+0 is 50 percent of the raw capacity.
To ensure full recovery from failure in a RAID 1+0 or 0+1 configuration, an entire side of the mirror can be lost, or only one drive from each side of the mirror can be lost.
In RAID 5+0, a number of drives (three or more) is configured in a RAID 5 set, which is then striped (with no parity) with at least one other RAID 5 set of the same configuration. Usable space is (x – 1) / y, where x is the number of drives in each nested RAID 5 set, and y is the number of RAID 5 sets in this array. If there are nine drives, six of them are usable. Only one drive from each RAID 5 set can be lost with full recovery possible. If more than one drive in any of the RAID 5 sets is lost, the entire 5+0 array is lost.
SQL Server 2017 requires the best performance from a storage layer as possible. When looking at RAID configurations, RAID 1+0 offers the best performance and redundancy.
Centralized storage with a Storage-Area Network
A Storage-Area Network (SAN) is a network of storage arrays that can comprise tens, hundreds, or even thousands of drives (mechanical or solid-state) in a central location, with one or more RAID configurations, providing block-level access to storage. This reduces wasted space, and allows easier management across multiple systems, especially for virtualized environments.
Block-level means that the OS can read or write blocks of any size and any alignment. This offers the OS a lot of flexibility in making use of the storage.
You can carve the total storage capacity of the SAN into logical unit numbers (LUNs), and each LUN can be assigned to a physical or virtual server. You can move these LUNs around and resize them as required, which makes management much easier than attaching physical storage to a server.
The disadvantage of a SAN is that you might be at the mercy of misconfiguration or a slow network. For instance, the RAID might be set to a level that has poor write performance, or the blocks of the storage are not aligned appropriately.
Storage administrators might not understand specialized workloads like SQL Server, and choose a performance model that satisfies the rest of the organization to reduce administration overhead but which penalizes you.
Network-Attached Storage (NAS), is usually a specialized hardware appliance connected to the network, typically containing an array of several drives, providing file-level access to storage.
Unlike the SAN’s block-level support, NAS storage is configured on the appliance itself, and file sharing protocols (such as SMB, Common Internet File System [CIFS] and Network File System [NFS]) are used to share the storage over the network.
NAS appliances are fairly common because they provide access to shared storage at a much lower monetary cost than a SAN. You should keep in mind security considerations regarding file-sharing protocols.
Windows Server 2012 and later support Storage Spaces, which is a way to manage local storage in a more scalable and flexible way than RAID.
Instead of creating a RAID set at the storage layer, Windows Server can create a virtual drive at the OS level. It might use a combination of RAID levels, and you can decide to combine different physical drives to create performance tiers.
For example, a server might contain 16 drives. Eight of them are spinning disks, and eight are solid state. You can use Storage Spaces to create a single volume with all 16 drives, and keep the active files on the solid-state portion, increasing performance dramatically.
SMB 3.0 file share
SQL Server 2017 supports storage located on a network file share that uses the SMB 3.0 protocol or higher because it is now fast and stable enough to support the storage requirements of the Database Engine (performance and resilience). This means that you can build a Failover Cluster Instance (see the section on this later in the chapter) without shared storage such as a SAN.
Network performance is critically important, though, so we recommend a dedicated and isolated network for the SMB file share, using network interface cards that support Remote Direct Memory Access (RDMA). This allows the SMB Direct feature in Windows Server to create a low-latency, high-throughput connection using the SMB protocol.
SMB 3.0 might be a feasible option for smaller networks with limited storage capacity and a NAS, or in the case of a Failover Cluster Instance without shared storage.