Microsoft's Azure SQL Data Warehouse is a massively parallel processing (MPP) data warehousing appliance built for any volume of relational data and provides integration to Hadoop. Azure SQL Data Warehouse can provide up to 100x performance gains over other SQL Server platforms. This is the MPP platform that provides linear scalability for when data volumes grow and the number of users increases.
Azure SQL Data Warehouse is designed to parallelize and distribute the processing across multiple Symmetric Multi-Processing (SMP) compute nodes. Azure SQL Data Warehouse is only available as part of Microsoft's Analytics Platform System (APS) appliance.
Azure SQL Data Warehouse is a shared-nothing architecture, which means each processor has its own operating system, memory and set of disks. Nothing is shared! Data is “horizontally partitioned” across nodes. This means that each node has a subset of the rows from each table in the database. Each node is then responsible for processing only the rows on its own disks.
Above is the information about Microsoft's Azure SQL Data Warehouse, which is Microsoft's MPP system.
A Symmetric Multi-Processing system has multiple processors for extra power, but these processors share a single operating system, memory pool and they share access to the disks. This is a great architecture for speed, similar to a restaurant that is quick and organized, but it lack the ability for unlimited expansion. When there are too many cooks in the kitchen you need an MPP system that scales many SMP systems together as one parallel processing data warehouse.
A Symmetric Multi-Processing (SMP) system is what Microsoft is known for in their SQL Server suite of products. The only product that does not use SMP design is the new Azure SQL Data Warehouse. It uses a Massively Parallel Design (MPP).
What is Parallel Processing?
“After enlightenment, the laundry”
- Zen Proverb
“After parallel processing the laundry, enlightenment!”
- Azure SQL Data Warehouse Zen Proverb
Two guys were having fun on a Saturday night when one said, “ I’ve got to go and do my laundry.” The other said, “What?!” The man explained that if he went to the laundromat the next morning, he would be lucky to get one machine and then would be there all day. But, if he went on Saturday night he could get all the machines and he could do all his wash and dry in two hours. Now that’s parallel processing mixed in with a little dry humor!
The Basics of a Single Computer
“When you are courting a nice girl, an hour seems like a second. When you sit on a red-hot cinder, a second seems like an hour. That’s relativity.”
– Albert Einstein
Data on disk does absolutely nothing. When data is requested, the computer moves the data one block at a time from disk into memory. Once the data is in memory, it is processed by the CPU at lightning speed. All computers work this way. The "Achilles Heel" of every computer is the slow process of moving data from disk to memory. The real theory of relativity is to find out how to get blocks of data from the disk into memory faster!
Data in Memory is Fast as Lightning
“You can observe a lot by watching.”
– Yogi Berra
Once the data block is moved off of the disk and into memory, the processing of that block happens as fast as lightning. It is the movement of the block from disk into memory that slows down every computer. Data being processed in memory is so fast that even Yogi Berra couldn't catch it!
Parallel Processing of Data
"If the facts don't fit the theory, change the facts."
Big Data is all about parallel processing. Parallel processing is all about taking the rows of a table and spreading them among many parallel processing units. Above, we can see a table called Orders. There are 16 rows in the table. Each parallel processor holds four rows. Now they can process the data in parallel and be four times as fast. What Albert Einstein meant to say was, “If the theory doesn't fit the dimension table, change it to a fact."
A Table has Columns and Rows
The table above has 9 rows. Our small system above has three parallel processing units called distributions. Each distribution holds three rows. There are eight distributions per node. A four node system will have 32 distributions. Double your nodes and double your speed and power. The idea of parallel processing is to take the rows of a table and spread them across the distributions so each distribution can process their portion of the data in parallel.
The Azure SQL Data Warehouse has Linear Scalability
"A Journey of a thousand miles begins with a single step."
- Lao Tzu
The Azure SQL Data Warehouse was born to be parallel. With each query, a single step is performed in parallel by each distribution. An Azure SQL Data Warehouse system consists of a series of distributions that will work in parallel to store and process your data. This design allows you to start small and grow infinitely. If your Azure SQL Data Warehouse system provides you with an excellent Return On Investment (ROI), then continue to invest by purchasing more nodes (adds additional Distributions). Most companies start small, but after seeing what an Azure SQL Data Warehouse can do, they continue to grow their ROI from the single step of implementing an Azure SQL Data Warehouse system to millions of dollars in profits. Double your compute nodes and double your speeds. . . . Forever. The Azure SQL Data Warehouse actually provides a journey of a thousand smiles!
The Architecture of the Azure SQL Data Warehouse
“Be the change that you want to see in the world.”
- Mahatma Gandhi
The MPP Engine is the brains behind the entire operation. The user logs into the MPP Engine, and for each SQL query, the MPP Engine will come up with a plan to retrieve the data. It passes that compiled plan to each compute node, and each of 8 Distributions process their portion of the data. If the data is spread evenly, parallel processing works perfectly. This technology is relatively inexpensive. It might not "be the change", but it will help your company "keep the change" because costs are low. Microsoft's Azure SQL Data Warehouse uses both SMP and MPP technology. Each node is an SMP, but then many nodes are lined together to become one big MPP system.
Nexus is now Available on the Microsoft Azure Cloud
Why the Nexus Chameleon should be your query tool of choice:
- 1)Queries every major system
- 2)Provides visualization and automatically writes the SQL
- 3)Can perform cross-system joins with a few clicks of the mouse
- 4)Converts table structures and moves the table and data between systems
- 5)Compares and synchronizes databases
- 6)Can move an entire database of tables or views between systems
- 7)Has the "Garden of Analysis" to re-query answer sets inside your PC
- 8)Provides a dashboard of graphs and charts for answer sets
Download the Nexus for a free trial at site and use Nexus in-house or on the Microsoft Azure cloud.
The MPP Engine is the Optimizer
The brains behind all user queries lie in the MPP Engine. The MPP Engine receives the query, checks the syntax and the security and then comes up with a plan for the nodes to follow.
The Azure SQL Data Warehouse System
Above, is a pictorial of a Azure SQL Data Warehouse system. There is one Control Rack and many Data Racks.
The Azure SQL Data Warehouse System is Scalable
The Azure SQL Data Warehouse will take up at least two full racks of space, and you can add storage and compute capacity one data rack at a time. A data rack will contain between 8 to 10 compute servers. A great asset about the Azure SQL Data Warehouse is that it works on a wide variety of hardware. Vendors such as Bull, Dell, HP, and IBM provide the hardware, and Fibre Channel storage arrays come from vendors like EMC, HP, and IBM.
The control node controls the physical servers and guides them to work together, in parallel. It is the control node that acts as the optimizer and it accepts client query requests, and then creates the plan. It will then call upon one or more compute nodes to execute different parts of the query, often in parallel. The result set is then sent back to the user.
The Control Node
Think of the control node as the optimizer, or a conductor in the Azure SQL Data Warehouse orchestra of servers.
The Data Rack
Above, is a pictorial of an Azure SQL Data Warehouse Data Rack. This is where the data is stored and the parallel processing magic occurs.
The Landing Zone
This Landing Zone node is used to load data directly to the Azure SQL Data Warehouse.
The Backup Node
The Azure SQL Data Warehouse Backup node is used for backing up user databases.
Software as a Service (SaaS) and the Elastic Database
Software-as-a-service (SaaS) applications need versatility and flexibility and need the ability to focus on end-user solutions and not have to worry about managing databases, schemas and sizing requirements. The Microsoft Azure SQL Data Warehouse is designed to provide flexibility for the growth needed at the time. Different workloads that continually change over time have unpredictable database resource consumption. That is why the elastic database model provides users with the ability to pool resources to be leveraged among a single or groups of databases. The idea of using the resources you need, when you need them, and not having to worry about provisioning and predicting the unpredictable is priceless. The most important thing a system can do for its users is to give them the space they need.
Azure Data Lake
The future of computing is data! Not data on any particular system or structure, but simply data that resides anywhere in the enterprise. That is why Microsoft has created the Azure SQL Data Warehouse to store relational data in the cloud and the Azure SQL Data Lake unstructured data. A data lake is comprised of raw data that can be all types of data in its native format. This includes all data types and can consist of traditional structured data, unstructured data and semi-structured data. The Azure Data Lake is a data store for big data analytics. This great idea gives users the ability to have the best of all data worlds, thus mixing on-premise traditional systems with Hadoop HDFS file systems. In the Azure data lake, there are plenty of fish in the sea because this lake can store every type of data with no fixed limits on account size or file size. The Azure Data Lake is a Hadoop File System compatible with HDFS. It is integrated with Azure HDInsight, Revolution-R Enterprise, Hortonworks and Cloudera.
Azure Disaster Recovery
The Microsoft Azure cloud provides data availability with built-in replicas and a competitive 99.99% Service Level Agreement at the database level. Instead of worrying about a disaster, you can count on an estimated 360x lower disaster recovery objectives. Microsoft uses something called active geo-replication, which gives users the ability to create up to 4 readable secondaries in any Microsoft Azure region, and additionally they give users control when and where to failover. Currently, users have up to 35 days of backups available for recovery.
Security and Compliance
The Microsoft Azure cloud provides security and compliance-related tasks through a wide variety of features. Users can implement database level security such as database auditing, views, row-level security, data masking and encryption. And of course Microsoft Azure has independently verified cloud security and compliance through key cloud auditors as part of the scope of key Azure compliance certifications and approvals such as HIPAA BAA, E.U. Model Clauses, ISO/IEC 27001:2005 and FedRAMP. Some companies actually feel safer on the cloud than in their own on-premises data centers.
How to Get an EXPLAIN Plan
You can get an explain by placing the keyword EXPLAIN in front of any SQL. You can also hit the function key 6 (F6). If you are using the Nexus Chameleon you can click on the magnifying glass near the EXECUTE button.