First, I want to point out that several resources are available that do a good job of covering Azure Storage topics. I would like to add my contribution to the topic in hopes that it accelerates the spread of technical knowledge around this important core service. This blog was authored on 3/15/20 and it’s possible some of these features will change over time so please keep that in mind. Azure Storage is a foundational component of Azure. Azure PaaS and SaaS offerings such as Office 365 all depend and use Azure Storage. Azure Blob Storage is a key component for Big Data Use cases such as Azure HDInsight. Many Microsoft Services work with storage behind the scenes. Because it is a core building block, it’s important to have a good overall understanding of the moving parts. The Azure Storage offering you select will dictate the underlying services that are offered. This is also referred to as the Storage Account Type. For Example, I can create a storage account of type File storage and only the file service is available. I won’t be able to use Blob, Queues, or Tables. If I create another storage account type of General-Purpose V2 which supports all storage services (Blob, File, Queue, Table, Disk, and Data Lake Gen 2). For blob storage, I recommend the General-Purpose V2 storage account type because it offers the most options for performance, durability, and availability. Please see the following article for more details around storage account types and what you get with each one. After I create a storage account of type General-Purpose V2 in Azure Portal, I see the sub-components available directly on the storage account admin page:
This blog series is focused on an Intro to Azure Blob storage. From the picture above, Azure blobs are stored in Containers. Azure Blob storage is a collection of container/s and blob/s. Think of blobs as files. Blobs are considered unstructured data meaning it can store an object of any file type. A ton of use cases exists for using Azure Blob storage service. Customers use blob storage for storing backups and archive data and/or for hosting data for custom application/s. For optimization of working with blobs in blob storage, Microsoft has three types of blobs.
1. Block Blobs and which is optimized for sequential I/O and most commonly used storage type. This is the blob type to use when you want to store normal files. It is optimized for handling large upload/downloads of large files. When you upload a large file to Azure blob service, it stores them in blocks. The block size is configurable and goes up to 100mb chunks. This is the blob type customers will typically use.
2. Append Blobs is another type of block blob. The append blob is designed to append blocks to the end of an existing block. Once added, it cannot be modified. It’s great for storing logs.
3. Page Blobs which are optimized for sparse reads and random access. Behind the scenes, Azure disks are built on page blobs. While it’s common for other storage providers to tap into page blobs but for most general usage scenarios, block blobs are the recommend approach.
Blob Storage Hierarchy
It’s important to have a good understanding of blob storage hierarchy. At each level in the hierarchy, configurable settings exists to configure blob storage service. These are important decisions that require high level architecture discussions. The basic blob storage hierarchy consists of the following:
Storage Account -> Containers -> Blobs
A single Storage Account can contain one or more containers. Containers can store one or more blobs. After creating a storage account, the Blob Service Endpoint is how you manage blobs. In the example above, The Blob Service endpoint at each layer is the following:
Storage Account – https://sally.blob.core.windows.net
Container – https://sally.blob.core.windows.net/pictures
Blob – https://sally.blob.core.windows.net/pictures/img001.jpg
These are not URL’s you give to a user and expect them to use in a browser session to list, upload, or download blobs. Rather this is a REST endpoint for the blob service that contains several api’s that are ideal for wiring up custom applications to interact with a specific storage account, container, and/or blobs. While you can directly call Rest endpoints, Microsoft has simplified the process with SDK’s available for various languages. Some example SDK’s to interact with blob service include .Net, Java, Python, JavaScript, and more.
Access Tiers
Blobs can be stored in one of four different access tiers. The access tier chosen depends on factors such as how often data is accessed, costs, and performance. The access tiers are Premium, Hot, Cool, and Archive. These access tiers are only available for storage account types General Purpose v2 and Blob storage.
Each access tier has costs associated based on how much data is stored and how often data is accessed. While the Premium Access tier has the highest costs per GB stored, it has the lowest cost for data access. On the opposite end of the spectrum, the Archive tier has the lowest costs per GB stored but the highest costs for data access. One additional item that stands out for Premium is that it’s backed by Solid State Drives (SSD). The other tiers are all backed by HHD’s. Premium tier is a perfect solution for high transaction workloads or applications that require consistent low latency in the single digit MS range. I will do a deep dive into Access Tiering and life cycle management in a future blog post.
Blob Storage Durability and Availability
Durability and Availability are two interchanging topics when it comes to protecting Azure Storage data. Not only do customers want to prevent data loss (Durability), they also want to ensure it’s highly available in case of an outage. Both durability and availability requirements usually map to a specific Azure Storage Replication option. The idea is creating a certain number of replicas of your data and putting them in different areas. The lowest cost option is Locally Redundant storage which provides excellent durability however if the data center gets hit with a natural disaster, you potentially lose that data. As the data replicas are split further apart based on the replication option, both costs, durability, and availability numbers increase. To ensure high availability, you can go with Geo-redundancy options which will put replicas of your data into a paired region. If a region were to go down, you would fail over to the paired region and maintain availability. The redundancy options are the following:
1. Locally Redundant Storage (LRS)
This storage offering creates three replicas of your data within a single data center\single region. The writes are synchronous which provides very high level of data consistency. We place these three copies across different fault domains and upgrade domains to avoid a single point of failure. This means that if something fails on a rack (fault domain), it doesn’t impact the replicas of the data in the other domains.
2. Zoned Redundant Storage (ZRS)
Zoned Redundant Storage also creates three copies of your data. The difference is that each copy is stored in its own availability zone. Each availability zone is a few miles away from each other providing separate power and cooling. This provides greater durability and availability from localized datacenter outages however doesn’t protect you in the unlikely scenario of a regional disaster.
3. Geo Redundant Storage (GRS) and Read Access Geo Redundant Storage (RA-GRS)
Geo Redundant provides 6 replicas of your data. Three local replicas and three replicas in a different region. This is done via asynchronous replication and a great scenario for mission critical data that maintain availability in the very rare event of a regional outage. If only read access is required in secondary region in the event of failure, you might go with a more cost-effective option of RA-GRS.
4. Geo Zonal Redundant Storage (GZRS) and Read Access Geo Redundant Storage (RA-GZRS)
This is the same as Geo-Redundant with the exception that we spread three copies in each region across availability zones. So locally, you would get the same sort of protection as Zoned Redundant Storage with the bonus of cross regional availability. This is currently in preview.
In the event of regional failure in options 3 or 4 above, you can fail over to the paired region. This is currently in preview and Microsoft has lots of details available here. When configuring replication to be Geo Redundant, Microsoft dictates the region where the replicas are stored. Each region where GRS and GZRS is available, a paired region exists as the replication partner. Check the list here for more details. Finally, the availability of the above redundancy options depends on the storage account type. For Example, the General Purpose V2 storage account type supports all redundancy options above. To see what redundancy options are available for a given storage account type, review the summary of redundancy options here.
Tools available to work with Blob Service
As I stated earlier in the blog post, the rest endpoints, SDK’s are your entry point for custom solutions. For Azure Administrators or Cloud engineers, you also have other options available for managing an Azure Storage Account. The most obvious is the Azure Portal. I will assume you are familiar with Azure Portal so let’s move on to some other tools.
PowerShell
The new Azure PowerShell Az module can be used to list, upload, and download blobs. To install the Azure PowerShell module please check this link. Some of the blob related command-lets:
New-AzStorageContainer //creating a container
Set-AzStorageBlobContent //upload file to blob service
Get-AzStorageBlob //listing blobs
Get-AzStorageBlobContent //downloading blob
For more information, click here.
Azure Storage Explorer
Azure Storage is a client application available across various platforms including Windows, Mac, and Linux. You can manage several aspects of a storage account using this application. Some common blob operations include creating contains, listing blobs, and creating blobs. For more information, click here.
Azure CLI
Azure CLI is the command line tool for managing Azure. You can use Azure CLI in a shell so for Windows that is either PowerShell or Command Prompt. For Linux, that would be Bash. The think I like about Azure CLI is the commands are the same regardless of what shell you use. Some of the blob related commands:
az storage container create //create a container
az storage blob upload //upload a blob
az storage blob list //list a blob
az storage blob download //download a blob
For more information, click here.
AzCopy
AzCopy is an excellent command-line tool to copy data in a variety of ways against a storage account. It’s available for Windows, Mac, and Linux platforms. Some of the blob related commands:
Create Container: azcopy make ‘https://<storage-account-name>.<blob.core.windows.net/<container-name>’
Upload a file: azcopy copy ‘<local-file-path>’ ‘https://<storage-account-name>.<blob.core.windows.net/<container-name>/<blob-name>’
Download a file: azcopy copy ‘https://<storage-account-name>.<blob.core.windows.net/<container-name>/<blob-path>’ ‘<local-file-path>’
For more information, click here.
This is a basic introduction to Azure Blob storage and is a good starting place for Azure Cloud Architects and Cloud Engineers to gain some understanding of Azure Blob storage basics. I will dive into other areas in future blog posts including data availability, redundancy, security, and performance, and cost analysis.
Resources
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction
https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy
https://docs.microsoft.com/en-us/azure/storage/common/storage-disaster-recovery-guidance
https://docs.microsoft.com/en-us/azure/storage/common/storage-account-overview
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-storage-tiers?tabs=azure-portal
Thank You,
Russ Maxwell, MSFT