Azure Data Lake Storage Gen2 - Optimizing Costs
Is your data storage expanding in Azure Data Lake Gen2 Storage (ADLS)? Is your default tier set to Hot to save on read costs? Explore our blog to gain valuable insights on optimizing your storage costs for the future.
Problem
The challenge at hand revolves around the expenses associated with file retention in ADLS. If your data access or ADLS utilization for data pipelines is infrequent, consider implementing Lifecycle Management policies. These policies can automate tier adjustments for your data, effectively reducing long-term costs.
Note: Recognizing the specific use case for your data is crucial. This guide may not universally align with every data solution, emphasizing the importance of tailoring it to your unique requirements.
Understanding Storage Tiers
Hot: More expensive to store. Cheaper to read the data.
Cool: Less expensive to store. More expensive to read the data.
Cold: Less expensive to store (than Cool tier). More expensive to read the data (than Cool tier).
Archive: Even less expensive to store (than Cold tier). Must rehydrate the files to Cold or Hot. (Rehydrating a blob from the archive tier can take several hours to complete.)
More information: Azure Data Lake Storage pricing
Solution
Note: This specific case is for ADLS that has been set up with Hot as the Default access tier.
1. First, go to your Storage account > Data management > Lifecycle management.
2. Go to + Add a rule > Enter a Rule name > click Next
3. Enter a date for More than (days ago) > choose Move to cold storage. (This specific case will make any file last modified x days ago change automatically from Hot to Cold tier) > press Add.
You can re-iterate your steps above to create for different tiers such as a cascading rule where you go from Hot to Cool > Cool to Cold > Cold to Archive > and Archive to Delete.
Summary
Explore our blog for insights on optimizing Azure Data Lake Gen2 Storage (ADLS) costs. If facing growing expenses in ADLS file retention, consider implementing Lifecycle Management policies for automated tier adjustments, especially if data access is infrequent.
Related Documentation
Blob rehydration from the archive tier