BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Taking Control of Confusing Cloud Costs

Taking Control of Confusing Cloud Costs

Key Takeaways

  • Complex pricing from cloud providers makes selecting services confusing, and the lack of accurate billing visibility often causes budgets to go off track. 
  • The lack of responsibility for managing cloud budgets is a prominent cause of overspending and under-utilising resources. Instead, budget needs to become just as much a part of the project lifecycle as delivery, with teams working together in new ways to understand requirements and prioritise cost management.
  • There are two practical ways that teams can manage and save money on their cloud costs: Implementing a consistent, comprehensive tagging process or utilising a product that provides cost visibility and precise breakdowns. 
  • The process for managing a cloud budget as it stands wastes time and resources - an upward shift in visibility would remove the need for teams to conduct time-consuming research manually and allow developers to be part of the conversation.

In an on-premise environment, you know what you’re spending on infrastructure; it involves an upfront capital cost. A transition to the cloud provides flexibility, access to a wide range of services, and the ability to delegate deployments, but can come at a cost (pun intended). Cloud pricing is unfamiliar and can be difficult to get to grips with, because you’re charged for individual items as you consume them. Without strong cost-management practices in your business, budgets can easily spiral out of control.

When teams start a new project in the cloud, budget management is typically far from top of mind for developers and delivery teams. This is how it usually unfolds: the CFO determines and oversees an overall budget. The product owner sets a go-live date for the project and is laser-focused on keeping everyone on track. Before provisioning resources for this new project, cost isn’t represented in terms of monthly budgets or overall cloud hosting cost expectations - just delivery timelines. Developers need resources and operators are busy trying to iterate and provision for their needs while staying on top of security. The most pressing priority is delivering a secure application by the deadline, not trying to understand confusing, complex cloud services pricing and utilisation.
 
The costs continue to pile up, unnoticed, until the monthly bills start coming in and the CFO wants some answers on why cloud costs are so unexpectedly expensive. By then the damage has been done, and application teams struggle to identify the source of the hemorrhaging budget because none of the resources have been tagged with useful information. This cycle continues with no one taking complete ownership of costs or keeping track of  how much cloud services are eating into  the allocated project budget.

Cloud initiatives are expected to account for 70% of all tech spending by the end of 2020 so, as companies spend more, it’s increasingly important that organisations are able to come to grips with confusing cloud pricing and take back control of budgets to optimise spending.

The (lack of) clarity around cloud pricing

It’s a tale as old as the cloud: The pricing models from major cloud service providers (AWS, Microsoft Azure and GCP) aren’t as straightforward as on-premise. In consequence, organisations lose out on time, budget and potential cost savings trying to navigate the muddy waters. Respondents of this year’s State of the Cloud Report estimate that organisations overspend on cloud budgets by 23 percent, and also waste 30 percent of their overall cloud spending.

The cloud providers don’t provide a simple 1:1 mapping of the way you currently pay for infrastructure. Their primary focus is to make it simple to migrate IT into their environments, and to offer all of the services necessary to do so.

Let’s take a look at AWS Lambda as an example. Imagine you have a web application using the cloudfront CDN. When a user interacts with the application it triggers a HTTP request through an API gateway that invokes a Lambda function that takes in the data and stores it in DynamoDB.

The requirement here seems quite straightforward, however, you're now consuming four AWS cloud services: CloudFront CDN for caching, API Gateway for routing the HTTP requests, Lambda for execution and handling the request and DynamoDB for storing data based on that request made by the user. Each of these has its own pricing structure,  with some free tiers mixed in.

1. CloudFront offers 50GB data transfer and 2,000,000 HTTP requests each month for one year for free but then there is a data charge that differs per region per data tier as well as a price per HTTP request per region, a price for invalidation requests at a price of $0.005 per invalid path requested, a price per field level encryption for every 10,000 requests, real time log request at a price for every 1,000,000 etc.

Confused yet?

2. API gateway pricing again has a free tier of 1M RESTful API requests, 1M HTTP API Calls, 1M Messages, 750k connections minutes. Beyond that there is then a price per million API calls in tiers of banded millions, a price per caching if you decide to start caching at a size per cache memory allocation, a price per million message transfers with tiers within 1 billion segments and a standard cost of 0.25 per million connection minutes.

Are you following?

3. Lambda pricing varies per region, but has a cost per 1M requests with a duration cost for every GB-second. The duration cost depends on the amount of memory you allocate to the function with varying duration costs per memory segment per 100ms. There is then additional concurrency pricing to improve speed, but then there is also a cost in relation to the amount of concurrency you configure for the period of which you configure it. Beyond that, there’s data transfer that could happen outside of your region, which falls under the standard data transfer rates of EC2 data, (separate pricing structure completely to go and check).

I could continue on to DynamoDB, but I think we’re starting to paint the picture!

These complex price points, that involve understanding your end state before you’ve even begun, are what cause teams to make mistakes around projecting approximate pricing. The only logical approach to cloud is to start consuming it, and keep track of how much you are spending as you are iterating.

There are too many terms

It’s difficult to compare services across multiple clouds, because each provider uses different terminology. What Azure calls a ‘virtual machine’ is called a ‘virtual machine instance’ on GCP and just an ‘instance’ on AWS. A group of these instances would be called ‘autoscaling groups’ on both Amazon and GCP, but Scale Sets on Azure. It’s hard to even keep up with what it is you’re purchasing and whether there is even an alternative cloud comparable service, as the naming convention is different.

As outlined above in regards to the simple web application using Lambda, it would be very time consuming for someone to compare what it would cost to host a web application in one cloud versus another. It would take technical knowledge of each cloud provider to be able to translate how you could comparably host it with one set of services against another before you even got into prices.

You spend what you provision

Cloud pricing uses an on-demand model, which is a far cry from on-prem, where you could deploy things and leave them running 24/7 without affecting the cost (bar energy). In the cloud, everything is based on the amount of time you use it, either on a per hour, per minute, per request, per amount or per second basis. The way that logs are stored, the database you use and even the country that it’s hosted in will ultimately affect the price you pay.

If you forget to turn a virtual machine off or if you provision something you don’t actually need, you’re wasting money. Because of the elasticity of the cloud, it’s easy to subscribe to a number of resources at any given time, so if you then over-provision to try and meet performance requirements, you’ll spend more than you needed to. According to a study from Quocirca, the average organisation only uses 37% of its cloud-based servers. On the other side of the coin, though, the fact that cloud resources are ‘metered’ means that you only need to provision (and pay for) what you need.

There’s no single unit of cloud pricing

I’ve already demonstrated that there is no single tangible unit of pricing. You’re billed for the cost of compute and memory required to host it, but there will also be line-items for storage, databases and other components. And there are several types of cloud services: Infrastructure as a service (IaaS), platform as a service (PaaS), functions as a service (FaaS), storage as a service (STaaS), security as a service (SECaaS) that are all billed in different ways.

How to get costs under control

According to Flexera’s 2020 State of the Cloud Report, respondents expect to increase cloud spend by almost 50 percent this year, but still struggle to forecast their spending accurately. As a result, cost savings is a top cloud initiative for the fourth year in a row.

Companies can start saving by managing their cloud costs in one of two ways:

  1. Implement an approach that adds costs into the design life-cycle up-front, or
  2. Utilise a product that will streamline cost data and increase visibility.

1. Making costs part of the life-cycle

In order for teams to manage costs on their own internally, budget awareness needs to be built into the team's way of working at the very beginning and remain a priority. Costs should become part of the life-cycle, just as much as delivery.
 
Upfront thinking of how you want to improve transparency on costs is important. If you are mixing your cloud accounts with a few teams, then you can no longer use the cloud account/project ID as a segmentation point. This means that you will need a rigorous tagging methodology and a good tangible business taxonomy around the cloud service provisioning for people to make sense of what cost is proportionate to what team.
 
From a team perspective, they may want to organise their costs in terms of microservice and environments. Therefore, every cloud service will need to be tagged with every microservice that it is consumed by, per environment, per team.
 
The team may also have useful cost centre data, so that when the invoice comes in it is clearly attributable to a specific cost centre making the recharging simpler by the business.
 
When it comes to knowing about how much you’re spending, who is the right person to be informed? Setting budgets in the cloud to notify the budget holder and the development team, is critical to encouraging the right engineering behaviours. Breaking down your annual budget into monthly segments, will help set budget targets for the team. All cloud providers have mechanisms for this, but without a good granular tagging structure, you will not necessarily understand how to react to the budget as your insights are lacking on what service is responsible in which environment, for such a large bill.

Budget and Right Sizing

Developing with costs in mind means efficiencies can be designed into the application, and an understanding of  powering things off when they’re not being used or right sizing can be factored in. Budgeting should help shape team behaviour, and each function has a role to play, be it the DevOps with the cloud and operational skills, Developer with the application engineering skills or the product owner / Manager with the vision and budget understanding.
 
So, what techniques can be used by teams to make cost saving easier? There are already some useful tools out there on Github for each cloud provider, that use different cloud providers functions-as-a-service to shut down cloud resources during certain hours, but you ironically are then paying to run a service to shutdown a service to save money on cloud services.
 
You can also use the cloud schedulers service that allows for compute instances to be started and stopped using a schedule, however, again, there will be a cost on that service and it is limited to standard compute machines and not designed for other services that sit around that.
 
If you already have something like Kubernetes, then running something in-cluster, such as the Kube Downscaler and allowing Dev/DevOps teams to scale down their applications, to then trigger the auto-scaler to scale down the instances, will start to have an impact on costs. There are also point solutions for each service within a specific cloud such as one Appvia wrote for powering off RDS in AWS using Kubernetes called the rds-scheduler. This is similar to the Kube Downscaler by taking in Cron-style time parameters, for the period of which you want to shut things down.
 
Right-sizing and auto-scaling your infrastructure is basically a continual process of assessment and involves good monitoring around your application and infrastructure at regular intervals across multiple environments. Setting up monitoring alerts around underutilisation is just as useful as budget setting and will help the team adjust the infrastructure according to the need.

2. Streamline cost management with a product

To implement the above process in a way that will work, it takes time, effort and some upfront thinking on approach. There’s no universally adopted process for managing cloud budgets, and companies need to either prioritise internal management processes or look towards a product to streamline that cost visibility. You can put practices in place yourself to keep teams and budget in check, but it’s incredibly manual and requires continual maintenance of tagging and reviewing. For it to work, teams need to come together to understand cost just as much as the release cycle.
 
You probably lack those behaviours in your team, and don’t want to put mass amounts of resources into planning, tagging, consumption behaviours, right-sizing and budget management. You want your team to be freed up to iterate and deliver faster. So you reach a fork in the road: An optimised budget or an agile team?
 
Developers have traditionally been out of the loop when it comes to the financials of a project, but they’re the ones with the skills and understanding of the build to be able to assess where costs can be saved. Using a product to provide your team with increased visibility removes the need for them to spend time researching costs and tagging services. Instead of the typical aggregated view, you can be provided with spending insights for each project to help optimise budgeting and prevent out-of-control spending.

The future of cloud spending

The process for managing a cloud budget as it stands wastes tons of time and resources. It’s riddled with frustrations and inefficiencies that are damaging to morale and the operation of teams. And, when you spend on cloud resources that aren’t needed, it’s not only damaging for teams but wasting budgets has a hugely negative impact on innovation at large. If your teams are so focused on making sure the budget is met and are perfecting the processes, tools and engineering practices to support that; then they are less focused on engineering the innovation on the business applications.
 
Whether managing costs fully in-house or utilising a cost-visibility product, the most important element in optimising your cloud spending is communication. Devops and development teams need to share insight, but it’s difficult when the current process limits visibility and requires extensive research.
 
An upward shift in visibility would remove the need for teams to conduct time-consuming research to compare costs manually, allowing developers to be part of the conversation. And budget-setting allows teams on track so there are no unexpected spending surprises.

About the Author

Jonathan Shanks is the CEO and co-Founder of Kubernetes delivery platform Appvia. He is a DevOps expert and entrepreneur with nearly two decades of experience in leading developers and engineers in scaling and delivering new solutions. He leads a highly talented team of engineers and developers to deliver his vision of building a ground-breaking platform of tools that enables large organisations to quickly and securely create innovative new products and services, harnessing the power of Kubernetes to simplify their infrastructure, speed up delivery and reduce costs. Prior to joining Appvia, Jonathan was Head and Technical Lead at the Home Office. His role involved working with a team of engineers to deliver solutions across multiple digital projects. Jonathan spearheaded initiatives that revitalised aspects of the Home Office’s technical infrastructure – saving significant time and money. Jonathan gained his experience from his work as a Linux Architect at the NYSE Euronext and a Senior Linux Engineer at Betfair.

BT