We all know cloud computing like Microsoft Azure offers you several ways to store data. And even choosing wrong services, thanks to its agility you would be still able to migrate to the right things. That is from technical possibility perspective. From business perspective, wrong decision would make an impact on financial and long-term plan of transition. Like resizing virtual machine I wrote here, it is so easy to change to the designated virtual machine to meet your demand without any data loss (as long as you don’t store your data in temporary drive)
I had an interesting conversation with my students in an Azure training class in which we came up with decision considerations between Blobs and Files in Azure Storage service. Based on my experience, I’d like to give some factors to help in the case you are in dilemma. This article is also beneficial to those who are struggling with customers to explain differences between the two types of Azure Storage.
Disclaimer: this article is out of the scope of choosing right data storage strategy in Azure.
“Blob” may get you surprised at what it is at the very first time hearing. Blob (aka binary large object) is whatever it can be. It can be an image, an audio, any kind of multimedia object. A blob can either have an extension or not. If you work with SharePoint Remote Blob Storage (RBS) to externalize uploaded documents out of SQL Server (meaning binaries are not in MDF file), you would imagine how a blob looks like.
The example above shows you that blob does not have to have an extension. If you open such a file, Windows will ask you the program to open it. Ironically there is not one-fit-all program that can open Blob because it doesn’t know which is stored and structured in blob. Such a file is called unstructured data.
“Files” is a familiar term because we all work and call it in a daily basis. A file can be a Microsoft word with *.docx extension, a photo with *.png or an SQL server database backup in a format of *.bak. Literally it has an extension which a program can read. If blob is interchangeable with file, both Azure Blobs and Files can be used to store your files. It means whether you upload and retrieve your files, both Azure Blobs and Files meet your demand.
If you look more into real-world scenario, for example a file server in your on-premises, which one are you going to go with? Azure Files or Azure Blobs? A file server is used to share files across departments in your organization. When it comes to file sharing, end-users should not need to open browser and paste the file URI to get it. Files need to be mapped locally in their computers. This is when Azure Files fits your need. Moreover, if you need to shift your file server to Azure but don’t want to maintain Azure virtual machine, Azure Files is a good choice as the mapping capability is supported across operating system such as Windows, macOS and Linux. With Azure Blobs, there is not any mapping feature. If you need, you have to develop an application to sync to the local drive.
Azure Files is all about file storage and sharing. Imagine in a development environment where every developer needs an access to IDE and tools without going to the Internet to download it. Using Azure Blobs, you can only store development tools then have to give your team a link to those tools. They need to download by each individual.
Large File Uploading
Handling large file is one of the things you must plan in your application. A common approach is to split a large file into multiple smaller parts (aka chunk) to upload to storage. Split is not the only function in the entirely uploading process. All parts need to be merged into a file after the process is finished. With this approach, you can avoid any network connection issue. From user experience perspective, it is always considered good practice. So if your solution includes large file handling, Azure Blobs or Azure Files.
First, you need to grab some basic knowledge of both. In Azure Blobs, there are three types: Block, Page and Append. I’d like to only focus on Block Blob. A block is a single unit in a blob. A blob can contain many blocks but not more than 50,000 blocks per blob. This basically means you can split a blob into 50,000 blocks to upload to Azure Blobs storage. The minimum size of a block is 64KB and maximum is 100 MB. If you look at (for example .NET library), one of the object is BlockBlob which is part of CloudBlockBlob class. This class offers you tons of things to work with a block blob. For example, StreamWriteSizeInBytes property allows you to set a block blob size that can be good to handle unstable network speed. Each block blob also has metadata to allow you to control individually. For example, you want to make sure all the block blobs are successfully committed to a given blob, you can use Content-MD5 . I’d highly recommend you to look into two important operations when working with large files: Put Block and Pub Block List.
This article gives you a comprehensive approach to uploading large file to Azure Blobs storage
Is there a similar block for Azure Files? The answer is not. The single unit is a file (CloudFile object) which can be found via CloudFile class. There is not any so-called “Chunk” we wish, e.g. CloudChunkFile or CloudBlockFile like Azure Blobs. If you need to split file, you need to find an indirect method (e.g FileStream class) to split file or use 3rd party library such as FineUploader (this free one was used for one of my SharePoint projects a couple of months ago). There are tons of articles in Google showing you how to split a file.
For Large File Uploading, Azure Blobs is obviously designed to handle. At this point, Azure Files is at a disadvantage.
Client library and REST API
Thanks to the model of API today, REST API is supported in both even Azure Files uses SMB (Server Message Block). REST API is used to work with blobs or files over HTTP/HTTPS operation. If you look at the client library, Blobs has a competitive advantage because it supports more programming languages & framework than Files. Let’s have a look at the comparison table of client library:
|Library Supportability||Azure Blobs||Azure Files|
If your application is only a .NET-based web application, there would not be a problem. If the requirement needs more than that. Saying there is a need of iOS mobile app which allows end-user to upload and retrieve files, Azure Blobs must be chosen. It sounds like Azure Blobs gives you a little more of confidence to work with a complex system. And your developers may not need to be familiar with only .NET to work with Azure Blobs.
.NET skill is not mandatory to work with Microsoft Azure today. I have a great opportunity to work with a Java team in an Azure project.
Knowing quota and limits of Azure is imperative to planning for Azure Storage. Azure Blobs and Files have their own scale target. While Blobs is using Container as the highest level of hierarchical structure, Azure Files uses Share. If we compare the maximum capacity allowed, Blobs gives you 500 TB for a single blob container while Files only allows 5 TB for a file share. Azure Blobs is at best for a ‘file system’ in a Big Data implementation where containers are organized to store different blob contents (e.g. image, photo, video, audio, unstructured blob data…)
Comparing maximum a single blob unit (with a single file), Azure Blobs offers 4,75 TB while Azure Files allows 1 TB. It means you can upload a blob whose size 4,75 TB to Azure Blobs, with the use of Block Blob (see Large Uploading Section). Here is an extracted information related to scalability target from Microsoft.
|Azure Blobs Resource||Target|
|Max size of single blob container||500 TiB|
|Max number of blocks in a block blob or append blob||50,000 blocks|
|Max size of a block in a block blob||100 MiB|
|Max size of a block blob||50,000 X 100 MiB (approx. 4.75 TiB)|
|Max size of a block in an append blob||4 MiB|
|Max size of an append blob||50,000 x 4 MiB (approx. 195 GiB)|
|Max size of a page blob||8 TiB|
|Max number of stored access policies per blob container||5|
|Target throughput for single blob||Up to 60 MiB per second, or up to 500 requests per second|
.. and for Azure Files
|Azure Files Resource||Target|
|Max size of a file share||5 TiB|
|Max size of a file in a file share||1 TiB|
|Max number of files in a file share||No limit|
|Max IOPS per share||1000 IOPS|
|Max number of stored access policies per file share||5|
|Maximum request rate per storage account||20,000 requests per second for files of any valid size|
|Target throughput for single file share||Up to 60 MiB per second|
|Maximum open handles for per file||2000 open handles|
|Maximum number of share snapshots||200 share snapshots|
Even the maximum size Azure Files allows is 1 TB, I don’t think uploading 1 TB once a time without network or connection handling is a good practice.
First, both Azure Blobs and Azure Files are under Azure Storage account level. That said, Storage Service Encryption (SSE) is supported. When your application uploads/writes new blob or file to Azure Storage, every such an operation is encrypted using 256-bit AES encryption. When it needs to retrieve or read encrypted data, the decryption process is run to decrypt before returning data to requester. If calling via REST API, both Azure Blobs and Azure Files are supported by enabling Secure Required Transfer. This feature works at storage account level and enforces HTTPs for every REST call.
Both Azure Blobs and Azure Files need Shared Access Signature (SAS) to get delegated access to blobs and files. In addition to authorization, both are not still supported with Azure AD . It means Azure AD based access to your blobs or files is not possible. Access is only controlled by storage access key. With Azure Files, perhaps Azure File Sync is our hope. Azure File Sync can preserve and replicate ACL or Active Directory-based to all server endpoints that it synchronizes to. There is not an official announcement or road map to state that Azure File Sync works well Azure AD. Azure File Sync is also under preview stage.
Although encryption/decryption process is stated not to have a performance impact, let’s consider a case you only need to encrypt individual blob. Azure Blobs allows you to achieve so by BlobEncryptionPolicy class with Azure Key Vault. Azure Files uses built-in encryption in SMB 3.0 protocol.
Both Azure Blobs and Azure Files support CORS (Cross-Origin Resource Sharing) rule (before the version 2015-02-21, Files service was not supported). CORS rule can be considered a security feature because it allows you to set whitelist for HTTP header request from in a scope of domain. Supported elements include AllowedOrigins , AllowedMethods , AllowedHeaders , ExposedHeaders , MaxAgeInSeconds
From networking perspective, with Service Endpoint, you have more control of incoming network traffic to both Azure Blobs and Azure Files. It means only selected virtual network or IP range are allowed to read to data stored in Azure Blobs or Azure Files. Although, service endpoint works at Storage account level, I’d like to bring it down here to comprehend the scenario.
[Updated 07/08/2018] If you work in finance institute and require immutable data to comply with national or corporate security, the only Azure Blob supports you. Data stored in Azure Blob can be non-erasable and non-rewritable format thanks to a new feature called Immutable Storage. More information about this feature, read here.
From the security perspective, there is not a silver bullet. It is not easy to say which one offers better security. However, if there is any specific case I’d be interested in Azure Blobs with blob encryption while I’d have to find a way to encrypt my files before writing them to Azure Files. In a nutshell, we are to cover storage account level, Azure allows you to encrypt from end to end to help you completely protect things inside.
Recovery is part of your service availability in which you normally need to plan for disaster recovery. If something badly happens (e.g. data corruption, accidental deletion) you can recover data loss. As of this article, Azure Backup and Recovery service doesn’t have built-in automation capability to back both Azure Blobs and Azure Files data up.
To handle from your side, you can download your data down to your on-premises infrastructure (depending on the DR and IT service policy in your organization), or create a new storage account in another region to store your data. You have to write your own synchronization module to get and copy your data to either on-premises storage or another Azure storage account. Both Azure Blobs and Azure Files support this case by consuming REST API
- Azure Blobs: use Copy Blob (https://docs.microsoft.com/en-us/rest/api/storageservices/Copy-Blob) to asynchronously copy your blob to destination storage account.
- Azure Files: use Copy File (https://docs.microsoft.com/en-us/rest/api/storageservices/copy-file) to asynchronously copy file share to destination storage account.
If the destination storage account is in different region, you are charged for data transfer.
To achieve automation, depending on storage library you use, there are several ways. Basically you could have a background job running under Web Job to perform the copy (or calling an exposed Web API hosting backup method).
Another way is to harness Azure Functions. If you decide to use Azure Functions, there is a couple of note:
- Azure Functions only supports Blob Storage trigger by default. If you’d like to use it with Azure Files, you have to handle yourself (e.g. authenticating to Azure Files and perform the copy action)
- Azure Functions is billed by the number of execution. It means if your data is created frequently, execution is also triggered frequently which results high cost. Otherwise you also would like to experiment Azure Functions.
How about snapshot which is also considered a backup solution? Of course, snapshot is supported on both Azure Blobs and Azure Files services. With Azure Blobs, there is no limit on the number of snapshot you have while you can have maximum 200 snapshots of file share in Azure Files service. Storage account limits are still applied on both services.
Another interesting thing which Microsoft announced yesterday is public preview for soft-delete feature on Azure Blobs. Basically, when you delete data, it becomes permanently erased and you have no way to pull it back directly. You must have a look at your backup or snapshot stored somewhere and write it back to your storage. However, with soft-delete feature, deleted data is hold in the middle stage which still lets you decide to whether take it back or purge it. When enabling soft-delete, you also need to indicate retention period which lets Azure know when it can purge your deleted data. For more information about soft-delete feature on Azure Blobs storage, read here.
Pricing is always a consideration when choosing cloud service. Pricing model of Azure Blobs and Azure Files has similarity. Microsoft charges you not only storage capacity but also operations on your blob or file. With Azure Files, depending on the redundancy option, the price may vary. Minimum cost we know is $0.06 per GB if configured for Locally-redundant storage. While Azure Blobs costs $0.024 per GB on the same redundancy option for the first 1 TB. Even if your capacity exceeds 1,000 TB (~ 1 PB) Azure Blobs cost at $0.0224 per GB which is still cheaper than Azure Files.
For operation, let’s have a look at Write as an example, Azure Blobs offers $0.0004 for every 10,000 calls. However, with the same volume, Azure Files charges you $0.015. This price is based on Southeast Asia location and for general purpose. For new version of general purpose storage, Microsoft has a new pricing model which is detailed here. For generation purpose v2, Azure Blobs storage offers better price but operations look a little more expensive if you use Hot tier which is optimized for Read/Write intensively.
Perhaps, there may be more decision considerations based on each individual’s capability between Azure Blobs and Azure Files. For example, you cannot set quota for an Azure Blobs container but Azure Files share has a built-in quota configuration which you can set via Azure Portal. Vice versa, Azure Blobs allows you to use custom domain, integrate with Azure CDN and Azure Search.
If I was to choose ONLY between Azure Blobs or Azure Files, I would much prefer Azure Blobs thanks to its capabilities. Azure Files is designed for specific purpose but personally it is limited to modern file sharing and collaboration today which OneDrive for Business completely meets the need (I’d write another comparison between Azure Files and OneDrive to give you an overall scene).
In the revolution of digital transformation, applications are being modernized and it needs large-scale pattern not only in application layer but data layer which I don’t think Azure Files can fit. Azure Files would be still good if your application is served for specific audience (e.g. photo hosting and sharing for photographer).
Last but not least, I’d indicate this link for additional reference. Although this does not give you details, it is still a good starting.