Azure Storage Primer

Introduction

Microsoft has a number of different types of Azure Storage offerings, which include:

I’m not going to include Queue or Table Storage here.

Azure Storage Architecture
Figure 1: Azure Storage Architecture

Azure Files

These are fully managed, cross-platform SMB 3.0 file shares that you can mount from Windows, Linux or macOS. You can also cache these cloud file shares on local Windows Servers using Azure File Sync for better performance (due to less network latency).

You can choose between Standard files storage and Premium files storage. A General Purpose v2 storage account is a shared pool of storage in which you can deploy file shares as well as other storage resources, such as blobs or queues. Premium files storage is solid-state drive (SSD) based storage, designed to support I/O-intensive workloads that require file share semantics, with higher sequential throughput and lower latency than Standard storage. 

Redundancy options

  • Locally redundant storage (LRS)
  • Zone-redundant storage (ZRS)
  • Geo-redundant storage (GRS)
  • Read-access geo redundant storage (RA-GRS)
  • Geo-zone-redundant storage (GZRS)
  • Read-access geo-zone-redundant storage (RA-GZRS)

Premium file storage is currently only available with LRS. Standard file storage is available with LRS, ZRS, and GRS.

Azure Disk Storage

These are persistent managed disks for Azure VMs with different levels of capacity and performance. Azure VMs have one permanently attached OS disk that contains the boot volume and a temporary disk (which is not a managed disk). Data disks are managed disks that are attached to a VM. The size and type of the virtual machine determines how many data disks you can attach to it. It also determines what kind of storage you can use to host the data disks.

Azure managed disks currently have four disk types. These include Ultra disk, Premium SSD, Standard SSD, and Standard HDD.

Ultra diskPremium SSDStandard SSDStandard HDD
Disk typeSSDSSDSSDHDD
Max disk size64 TB32 TB32 TB32 TB
Max throughput2,000 MB/s900 MB/s750 MB/s500 MB/s
Max IOPS160,00020,0006,0002,000
Table 1: Disk Type Comparison

The actual performance of an Azure managed disk is constrained by the provisioned disk size and by the Azure VM size. Tim Radney recently wrote about this in his article: The Importance of Selecting the Proper Azure VM Size. There are also different performance limits based on the disk type. I also wrote about some of these issues in my article: Azure Virtual Machines for SQL Server Usage.

Blob Storage

Blog storage is scalable object storage for unstructured data. This is typically text or binary data, but it might be things like images, documents, streaming video or audio, etc. You can access objects in Blog storage via HTTP/HTTPS using many different client libraries.

There are three types of resources associated with Blob storage. First, you have a storage account, with one or more containers in the account, then you have multiple blobs in a container. A container is similar to a directory in a file system.

Additionally, there are three different types of blobs in Azure Storage. Block blobs store text and binary data up to roughly 4.7 TB in size. Append blobs are block blogs that are optimized for append operations, so they are good for things like logs. Page blobs store random access files up to 8 TB in size. They are meant to store virtual hard drive (VHD) files that serve as disks for Azure VMs.

Access Tiers

Azure Blob Storage has three different access tiers, which are Hot, Cool, and Archive. These access tiers have different costs associated with them.

  • Hot is optimized for storing data that is accessed frequently
  • Cool is optimized for storing data that is infrequently accessed and stored at least 30 days
  • Archive is optimized for storing data that is rarely accessed and stored at least 180 days

Keep in mind that there are early deletion charges for the Archive and Cool access tiers. If you store a blog in the Archive tier and then delete it (or move it to a warmer tier) before 180 days, you will be charged a pro-rated early deletion fee. The same thing will happen with the Cool access tier before 30 days.

Azure Blob storage also has two different performance tiers, which are Standard performance and Premium performance. The Premium performance tier is only offered in select regions and it stores data on SSDs rather than traditional magnetic hard drives, at a higher cost

Microsoft has a useful performance and scalability checklist for Blob storage here.

Azure Data Lake Storage

Azure Data Lake Storage (ADLS) is a scalable data lake storage solution for big data analytics. Data Lake Storage Gen2 extends Azure Blob Storage capabilities and is optimized for analytics workloads. It went GA on February 9, 2019.

Azure Data Lake Storage Gen 2
Figure 2: Azure Data Lake Storage Gen 2

Azure Archive Storage

Archive storage is the lowest priced Azure Storage tier. It automatically encrypts data at rest with 256-bit AES, and it lets you have another lower priced storage tier after your hot and cool tiers. This offering is suitable for things like long-term backup retention and off-site data backup. Any data in your archive tier is actually considered offline, and cannot be read or modified (except for its metadata properties).

If you need data from your archive tier, you will have to either rehydrate it to an online tier (hot or cool) or copy it to an online tier. There are currently two rehydration priorities, which are Standard and High (which is in preview status). With Standard priority, the request will be processed in the order it was received, and could take as long as 15 hours. A High priority request will get priority over Standard requests, and it may finish in less than one hour, depending on the size of the blob.

If you don’t want to rehydrate your archive blob, you can copy it to an online tier (hot or cold) in the same storage account. You would use a Copy Blob operation to do this, and you can set the optional rehydrate priority to Standard or High to have some control over how long it will take. Depending on the archive blob size and rehydration priority, this could take many hours. Using High priority is going to be more expensive.

Avere vFXT for Azure

This offering is aimed primarily at High Performance Computing (HPC). As you might expect, it is a high performance caching service that is most useful for read-heavy workloads. It uses Standard_E32s_v3 Azure VM instances to build a caching cluster in the cloud that can be used in front of other Azure storage types or in front on on-premises storage.

These are memory optimized VMs that have Premium Storage and Premium Storage Caching. This class of VMs have either Intel Xeon Platinum 8171M “Skylake-SP” processors or Intel Xeon E5-2673 v4 “Broadwell-EP” processors. These VMs have 32 vCPUs, and 256GB of RAM, and you can have up to 24 nodes in a cluster.

Avere vFXT
Figure 3: Example Usage for Avere vFXT

Final Words

This is just a quick overview of the different Azure Storage offerings and some of their capabilities. You can check the pricing for the different offerings by region here. The pricing for Azure Managed Disks is here.

Especially with Azure Managed Disks that are going to be used for SQL Server usage in an Azure VM (IaaS), I strongly advise that you do some disk benchmark testing. You can do this with free tools like CrystalDiskMark and Microsoft DiskSpd after you have provisioned the disks, but before you even install SQL Server.

Finally, after you have installed SQL Server, you should use my SQL Server Diagnostic Information queries during your pre-production testing to check your disk performance from SQL Server’s perspective. It is always better to find out about disk performance issues before you go live in production!

Categories Azure, Azure StorageTags ,

Leave a Reply

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close