Taming the Cloud
From Chaos to Clarity, How I transformed a Billion-Dollar Cloud Bill
Overview
Usage-based billing offers companies the flexibility to pay only for the resources they actively use. In contrast, subscription-based cloud models often result in costs for unused resources.
Engineers can easily provision new instances with a simple click, a feature that promotes agility and speed. However, this convenience can pose challenges for accounting and operational efficiency.
The Challenge
The fluctuating nature of cloud costs makes it difficult to accurately predict and optimize resource spending, adhere to budgets, develop reliable financial forecasts, and avoid surge pricing—all of which are critical for an organization’s survival. Many engineering teams are unaware of their cloud costs, and there is often a lack of accountability for spending.
To address these challenges, I sought to develop a plan that would gain the support of engineering teams and achieve the following objectives:
Provide Vice Presidents with a method for allocating costs to their departments.
Accurately predict the costs of large projects (epics/initiatives) to calculate return on investment.
Ensure effective provisioning of cloud resources to prevent surge pricing.
Dial in forecasting of the cloud bill to within a +/- 5% margin of error.
Trace each dollar of incurred costs back to the responsible engineering team.
The Process
To begin, I sought to understand the current state of cost management within the organization. Prior to my arrival, engineering teams and finance were working independently on piecemeal reporting. Each engineering manager had their own approach to calculations and cost allocation, leading to a fragmented and inconsistent understanding of costs for the finance team.
I conducted interviews with engineering leads to identify their existing cost management practices and explore whether other teams were working on similar initiatives.
Next, I mapped out the current cost allocation, determining the percentage of the overall monthly cloud spend that was accounted for. This analysis revealed a significant gap in cost tracking.
To address this gap, I prioritized allocating as much cost as possible. Once a sufficient coverage of costs were allocated, I focused on standardizing the cost calculation methods across different engineering teams to ensure consistency and comparability.
In parallel, I collaborated with leadership to identify major initiatives or releases that would have a substantial impact on cloud costs. Working with individual teams, I predicted the magnitude of these cost impacts.
To incorporate these factors and tiered discount rates, I developed forecasting models. These models enabled me to create a calculator that could assess the potential cost savings from negotiating different cloud rate reductions. For example, I could determine the projected savings over 12 months if I negotiated a specific percentage decrease in cloud costs with our vendors – an impossible task prior to this initiative.
01
Unified Cost Allocation Method
Rather than relying on individual engineering teams to develop their own cost calculation methods, I established standardized approaches that were both reliable and predictable for engineering and finance teams.
02
Driver-Based Forecasting
To accurately forecast future cloud spending, I implemented a process that focused on identifying major initiatives and quantifying their impact on resource utilization and costs.
03
Efficient Cloud Usage
Resource utilization can fluctuate depending on the number of active users on a platform. Excessive usage can lead to surge pricing if not planned for. I developed a method for engineering teams to calculate and forecast peak usage periods to avoid incurring additional costs.
04
Negotiation Advantage
Experience with vendor contracts reveals the complexity of calculating costs for usage-based procurement. By building foundational models to analyze current costs, I empowered the organization to quickly assess potential cost savings from rate negotiations, reducing the time required from weeks to days.
Results
Significant Cost Management: I successfully managed a billion-dollar cloud bill and improved forecast accuracy to a deviation of within 5%.
Substantial Savings: By developing a framework to identify underutilized cloud resources, I achieved annual savings of $2.9 million through the optimization of S3 resources.
Team-Based Budget Adherence: In response to increased demand for budgeting tools, I implemented a process that empowered engineering managers to take ownership of budgeting operations, resulting in a 95% budget adherence rate.
Strategic Leadership: I provided valuable forecasting and operational recommendations directly to the CEO, demonstrating my ability to contribute at the highest levels of the organization.
Key Learnings
Collaboration is Key: When tackling large-scale challenges, securing buy-in from others is essential. Successful management of such programs often requires a collaborative approach, leveraging the strengths and expertise of various team members. Teamwork and trust are fundamental to achieving positive outcomes.
Flexibility and Adaptability: While developing standardized methodologies, it’s important to recognize that unique circumstances may not always fit neatly into a predetermined framework. Adaptability is crucial in such situations. Rather than forcing solutions that don’t align with specific contexts, it’s essential to manage these exceptions separately.
Proactive Leadership: Even though teams may be willing to assist, taking initiative and going the extra mile can significantly contribute to project success. During my time in this role, I actively managed several teams’ weekly processes, demonstrating my commitment to driving progress. Don’t hesitate to step outside your traditional role to make a difference. Such proactive engagement can foster strong relationships and create a shared sense of purpose among your peers.