A single line of code can shape an organization's financial future. Erik Peterson, the CTO and founder at CloudZero, presented an engineering perspective on cloud cost optimization during day three at QCon San Francisco. His session was part of the "Architecting for the Cloud" track.
Peterson’s talk focused on five real examples of "million-dollar lines of code" and how this can challenge conventional views on engineering's pivotal role in cloud cost optimization. He started his talk by emphasizing the importance of considering costs as a critical metric often overlooked in engineering decisions. He stated that every engineering decision is a purchasing decision, and harboring unwarranted skepticism about cloud services can become a self-fulfilling prophecy.
Next, Peterson discussed five practical examples with code samples, which include optimizing debug lines to curb logging costs, streamlining API usage to reduce expenses, being mindful of database write volumes to control costs, ensuring resource cleanup in infrastructures as code, and the potential benefits of rewriting code related to content delivery networks.
One of the examples was regarding a debugging statement that led to high costs:
In the code sample, Peterson showed the Lambda function in question (obfuscated from the real function):
from aws_lambda_powertools import logger
logger = Logger()
def something_important(really_big_list-of_big_files):
# This is a really important function that does a lot of stuff
results = []
for file in really_big_list_of_big_files:
with open(file) as f:
for line in f: result = do_important_something_with_line(line)
logger.debug("Processed line", extra={"line": line "result" : result})
results.append(result)
logger.info("Finished processing files")
return results
The problem was the line of code with the logger.debug statement that generated the cost. The solution was to delete that line.
Peterson's key takeaways from the examples included:
- Storage is still cheap, but calling APIs costs money
- We have an infinite scale (with cloud infrastructure) but not an infinite wallet
- CDNs are very good at eating traffic... and money
And provided a quote from Donald Knuth:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%
He continued that all the examples he provided were only problems at scale. Peterson stated that software engineers deploying to the public cloud should think about the following questions iteratively and over time, not all at once:
- Can it be done?
- Is this the best way to do it as a team?
- What happens if this thing becomes popular?
- How much money should it cost to run?
Regarding the last question, he stresses that the metric to answering the money question is not money (cost) but tracking your desired Cloud Efficiency Rate (CER).
Peterson states that the CER should become a non-functional requirement for any cloud project, with defined stages to aid prioritization. These stages are:
- Research and Development: A negative CER is acceptable
- Version 1/MVP: Break even or low CER (0% to 25%)]
- Product Market Fit (PMF): Acceptable margins are conceivable (25% to 50%)
- Scaling: Demonstrate Q over Q path to healthy margins (50% to 80%)
- Steady State: Healthy margins = Healthy Business (CER is 80%)
Lastly, Peterson ended the session with a final thought from Sir Tony Hoare:
I call it my billion-dollar mistake. It was the invention of the null reference in 1965. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.
Note that there is a QCon London session on the null reference billion-dollar mistake: Null References: The Billion Dollar Mistake