InfoQ Homepage Presentations Uncomplex: Modern Hardware for Better Software

Culture & Methods

Uncomplex: Modern Hardware for Better Software

View Presentation

Speed:

Download

52:23

Summary

John O'Hara discusses how recent hardware & software advances can help founders and CTOs succeed.

Bio

John O'Hara is a technical founder and venture partner. He is currently a director at Finbourne, Adaptive Financial Technology, and Taskize which he co-founded and sold to Euroclear. Prior to his career in fintech, John held senior leadership roles at Bank of America and JPMorgan. John is the inventor of AMQP which is incorporated in cloud products from Amazon, Microsoft, and Red Hat.

About the conference

Software is changing the world. QCon London empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

O'Hara: What compelled me to stand on a stage and talk to you, when I've got nothing to sell at all? It's just an uneasy feeling about the state of software. It's also a tremendous excitement about the state of hardware. I am the most excited about hardware in the upcoming decade than I have been since the late 1990s, since the iPhone. What's coming is going to be amazing. It's up to you guys to use it.

This is a talk about hardware. The hardware won't feature till the end. Let's talk about software. Let's talk about business. Because I went from being a software architect and developer to someone who is sits on the boards of companies, does venture capital type stuff, which changed my perspective completely on how software works. That journey was interesting as well. I want to share the perspective that comes with that.

Where I Learned About Complexity

I've got a view on complexity that makes me have a physical reaction to complexity of the illness type. It was born from a story of when I was starting out as an engineer in my 20s. I worked for a bank, worked in a very large system back in 1997. I was part of this 100-engineer project. That's a rare experience in its own right. We wrote 1 million lines of C++, which is almost impossible in 1997. Just imagine how long that thing took to compile. It was a service-based architecture. It had 10 services, each with its own team, its own database.

Does this stuff sound familiar? The services communicated over a Kafka-like persistent messaging system with protobuf-like RPC marshaling and data representations. It had a web user interface with TLS in 1997. It had a Docker-like management and install system. We built all of this ourselves. We ran the 300 programs that comprise this system on two supercomputers 10 miles apart. This is the first time that synchronous disk replication had been done over fiber optics at 10 miles. The business has specified that the system should survive a nuclear bomb. I'm not joking. They paid accordingly. It processed $1 trillion notional of financial derivatives. It put the company way ahead of the competition. It was an amazing system.

It was an amazing team. Everyone on it can't believe how lucky they were to be on it. They can't believe how lucky they were that the thing actually worked in the end. Ten years later, it was completely rewritten, much more simply, and run in three servers, much more small ones. This gave me at the early age of 27 a tremendous insight into the nature of complexity.

Whenever I saw this, and you've all seen this, CNCF, the Cloud Native Computing Foundation list of projects, and I think someone somewhere thinks this is a good diagram. Each little box is a project. You can browse through and look at them all. They're all suspiciously funded to about $3 million. They're all working on cloud-y type stuff. Basically, they're all trying to help you run stuff distributed. They don't help your business application at all. Why all this stuff? I thought, 27 years since we were building all this stuff by hand, people are still building this stuff and selling it or open sourcing it.

Why haven't we consolidated, why haven't we converged? Why isn't there a greater depth of engineering talent that we can focus on these things like we can with Linux? You think Linux is on most people's phones. It's won the operating system war. It's installed Windows, which is my favorite part. Our depth of understanding of Linux is tremendous. It goes from having been a hobby operating system that couldn't scale past 4 CPUs to something that now runs on supercomputers, will scale well past 200 CPUs. That's amazing. It's free.

Background

I'm John O'Hara. I'm a board member of three fintechs. The first, Adaptive. They specialize in low latency exchange connectivity, hence financial track. The second, FINBOURNE. They're a data company. They specialize in solving the data problem that banks and financial investment houses have. If you work for a financial investment house, you know the data problem I mean. How do you represent an instrument? How do you evolve it over its lifetime? How do you price a whole bunch of them? How do you track the differences? How do you handle the time skew across your organization? How do you handle operating in 12 different time zones, or whatever? I'm a mentor with Accenture's FinTech Innovation Lab.

I'm a venture partner with Fidelity International Strategic Ventures. Left the investment banking scene, started a startup. Worked at it really hard for 10 years, and exited to Euroclear. I also invented AMQP. My entire career has spanned this gap. I don't have to stand here and talk. I stand here and talk because I actually want us as an industry to move forward, which is why I did AMQP, I made an ISO standard for messaging, so we can make more messaging.

Evolution of The Cloud

Back in 2009, the cloud contained Velcro. Has anyone seen this picture before? It's the picture of a Google server from 2009, when they were just starting. They're trying to get the cost of the servers really cheap. This idea of just stacking them on cardboard in open warehouses and using Velcro to put a battery in the corner as your standby power supply. This is where it was at. This really got into the minds of people. Cloud is fragile. Cloud fails a lot. Cloud faces the internet, people can steal your stuff. Your server might go down at any moment. It's cheap, so we got to build for cheap because cheap is worth it. That's not cloud today.

Cloud today is the state of the art in hardware. They control Intel and AMD's roadmaps. They have special processors fabricated just for them, like 4 CPU cores with the hard drive strapped on? No, now you've got over 200 cores in their server processors, 400 threads. You've got fiber optic networking everywhere. You've got very carefully controlled thermals, the highest efficiency levels on the planet for running these things. They don't fail.

The biggest tell for me that these things don't fail, is that Microsoft and Amazon have both changed the depreciation schedule for their hardware to 5 years from 3 years. Why is that? Do these things fail? State of the art. We haven't moved with the times. We still build for the old stuff. Amazon is 18 years old this year. It was launched just before the iPhone. What's happened in 18 years is that computers became more than 1000 times faster on every dimension you can measure. We're still building applications the same way with the same fear that's got into our minds of your server is going to fail. Your data is insecure. No, it's easier to do cryptography in the cloud than anywhere else.

The Role of CEOs and CTOs

I want to put a little story around this. I said we're going to talk about business a little bit as well. There's going to be some stuff in here that some of you may not know, some of you won't have seen. What I'm doing is putting a business context around what it is to make technology in the sense of a technology startup. It's useful because it gives you a sense of what's driving you. What is a startup is an interesting thing to start with, because everyone has a definition for it.

My favorite definition is it's an experiment. It's an experiment to find a new business. Your experiment is extremely likely to fail, otherwise it wouldn't be an experiment. You have to acquire resources to discover this new business model. You have to make sure that works before you run out of money. I love ChatGPT for drawing copyright free images. This is a CEO running out of runway. That is how it feels. Behind every CEO, there is a CTO. Which one's which? Is the CEO behind or in front? It depends. You can see different things in this picture, depending which way around you look at it. The CTO needs to be an effective technology leader and deliver a working product. The person who can make something new in an empty space turns out to be a very rare person. You think there'll be more of them, but there aren't.

The company you're making will live or die by the strength of the product that this person realizes, the team that they bring into place, how they put it together. How well they talk to their customers. How well they talk to a CEO. How well they engage with their business. They're really the harder thing. One of the things I get asked a lot in some of my VC related work these days is, does this company have the right CTO? It's quite a scary question, the right CTO. I know from my own experience of working with technical people, that technology people don't realize just how valuable they are, just how much everybody else is depending on them.

The investors, the CEO, the clients, they're all depending on the vision of this person. The staff you hire, the reason you got them for cheap was because they look up to the founder and the technology leadership, they think they know something that they don't. They're wrong. That's a fairly heady mix, and they're going to look to the CTO for answers. Gregor's talk was interesting, because he said, "I don't give answers, I give parameters to make answers." Actually, people are looking for answers the whole time. When you start at 7 people, 12 people, 50 people, people need the answers fast, and they matter.

What's the CTO's job? The CTO's job is to delight clients. What's the parameter here? Is to delight the clients as quickly as possible, because you're running out of money. To delight them as completely as necessary. You don't have to make them completely happy. You just have to make them want your product enough that they'll pay for it, and that you will make money from it. Because you're running out of money runway. This is where it gets interesting, because technology people get disconnected from this.

They're unaware of their economic impact, completely unaware. They think technology is why they were hired. They should be stepping back and looking at the business, looking at the client's problem, looking at the context around it. It's not just the technology. It's not about the shiny new thing. It's about, how long can you make the money last? It's about how productively can you emit the product to delight the client. It's so easy to be distracted from the core mission.

The interesting thing is, at the start of a startup, because you want to be a unicorn, every startup wants to be a unicorn, they want to be Airbnb, they want to be Uber. I'll talk about how you build a unicorn in the next few slides. It's absolutely fascinating. It's terrifying. Every technical action this guy takes that doesn't advance the business will be killing the business. It's that stark. You've got no space to run someone else's beta test.

In the early days of a startup, you have no customers. You call them design partners. It's customers you want to sell to but you haven't sold to yet, it's 10 pressure VCs. They're going, "Yes, that kind of sounds cool. I might buy that. Yes, I'll tell you what the problem is." You frantically could have went, would you buy this? Would you write me a check right now? That's what's happening. In the meantime, you know you're going to be a unicorn, so let's solve the scalability problem right now.

There's a whole bunch of stuff from the CNCF I can use to build into my architecture to be ready for scaling, so whenever this thing takes off, I'm going to be ready. You start to do premature optimization. You start to build your organization as if it were Google or Amazon, as if it were 2009. As if you were worried your system was going to fail at any point in time. As if you were worried you couldn't scale past the customer numbers that you need to get. You're actually hurting your business. I'll come back to exactly why.

The Cost of "Modern Complex Systems"

This is the meme that's out there. It's modern, complex systems. These words appear over again, in business cases, in pitch decks, in websites, in how you do cloud. You're doing modern, complex systems. This is a screenshot from a vendor website that I blanked out to protect the vendor, because it's just right there. It happens all the time. What is the cost of modern, complex systems? These are real numbers. A young company, they're making £6.5 million a year of revenue per year. That's decent. That's good.

Their cloud costs them £2 million a year. That comes out of the top number, and then you start paying your staff. This is terrifying. The Stability AI news recently, the fact that they ran out of money, they couldn't pay their AWS bill, was $7 million that month. Are your profits going to your company or your cloud provider? Are you a cloud reseller? Ask that question. That's a very knowing question. If you can get a cloud vendor thinking that the more your product sells, there's a direct correlation to compute and storage and network usage, they'll help you a lot.

That's actual business advice. This is terrifying. I went into this company, I had a look at it. By day one pass over their architectures, they will save £400,000 a year off their bill. You're engineers, if you deploy all the CNCF stuff, the CEO's credit card is being controlled by his development team. It's utterly terrifying. I've been there.

Yummy cloud. I'm not anti-cloud, but the industry of course encourages consumption. They do it through lots of different ways. The developer advocates, the people employed by them, there's a certain set of messaging. You get to a certain scale, and there's a certain level of penetration of this messaging in the marketplace. Suddenly, it's the emperor's new clothes. If you talk against this messaging, which I'm doing today, it doesn't do you any good. It's not going to get you any brownie points anywhere. You have the scalability scaremongering, where you're going to need to grow really rapidly tomorrow. You're going to want to be able to increase for all these thousands of customers that you hope will arrive tomorrow because your business plan went like hockey stick.

We saw your business plan, hockey stick. If you've watched "Silicon Valley," the sitcom, you'll know where the hockey stick thing is. You over-engineer for scalability early on. Reliability scaremongering, you need to have all the availability zones. Why? Run one. The availability numbers in the cloud are really good, 99.5. That's not as good as a bank would require, but they're probably higher than that. If you can come back online in 4 hours, that will satisfy the most demanding contracts in the world. Your users will probably put a 2-minute outage down to their DSL going down or their mobile phone losing connectivity. I have a relative who works in the mobile phone industry. He said, you'd be surprised several times you have a dropped call, it's just one of our servers rebooting.

I just nearly fell off my chair. Because, again, it's against what you believe. The big data FUD, you mustn't delete any data. Keep it all, one day it will be valuable to you. I'll come back to that later. A stat someone fired at me the other day was that half of the data written never gets read again. You're paying to store it. You're paying to move it. You're paying to back it up. You're paying to query it.

Distributed by default has spawned a cottage industry because it's so much easier to write middleware, than to actually serve clients' needs. Cloud is flexible, not cheap. That's the bottom line here. The number you should be looking for though, is that you should be looking to spend 10% to 15% of your startup money on your infrastructure. If you're going north of that number, you're in trouble.

We got distracted a little bit, we're building software for customers and we're busy scaling it for cloud. Software is modern, complex, agile, iterative, and continuous, which means fashionable, irreplaceable, unplanned, repeated forever. Software is only complete when the money stops. That's the truth of it. I tried explaining this to my wife. We recently extended our home. I said, imagine we built the extension here, we decorated it, and then the builder didn't leave.

This is my wife in the background. This is the well-paid builder up front, done a great job. We got to maintain it, release frequently. Another situation was one of my investors once asked me, the guy from the accounting department, so when you've built the software, you can fire all the developers, yes? Back then I was 15 years younger. I said, no, that's not how it works. You need to keep these guys forever. Would you accept this anywhere else? Really, you wouldn't.

The Unicorn Plan

Back to the unicorn plan, because we're building a unicorn. The unicorn plan, which is basically how to get to a unicorn in 7 to 10 years. This makes me laugh as well, because people will say, I've joined a startup, will be exited in 3 years. No, the fact that VC funds have a 10-year horizon should give you a clue. Most people don't know that either. To get to unicorn status, you need to have got to $2 million revenue per year, at the end of year 2.

That means your product has to be valuable enough, hitting enough sweet spots for enough people that they're separating $2 million of their hard-earned cash because they expect to get $6 million or $20 million worth of value out of this. You've done that in the first 2 years, and you've sold it to the customers. You're really moving. There's only room for one experiment in your startup, and that is your business. Your job is not to be a beta tester for anybody else ever. You cannot afford the risk.

You have to inculcate this through your entire organization, through the entire engineering staff: smaller teams, more leverage, more capability, more reliability, right-sized architecture. How do we build a system that we can adapt incredibly quickly? It's the speed of adaptation, Darwin style, that drives you forward. This is the OODA, the creative destruction, the U.S. defense strategy way of thinking of the world. Your idea, you build it, you measure it, you get some data, you learn, you do it again. Repeat as fast as you can. That's your startup. Only room for one experiment.

This is where the numbers come from. Battery Ventures did a little bit of analysis and worked out for all the unicorns that they could see, what's their commonality? They noticed that they all followed a similar track. Two years in, they were making about $2 million a year. Then, by year 4, they were making $50 million. Then, $100 million year 7, or higher. That's usually where you get. When you get to $100 million, you've got a comfortable billion-dollar valuation. You're there. You're a unicorn. ChatGPT, thank you.

This is actually called the triple, triple, double, double, double pattern, because you have to triple your revenue in each of the first two years, and then double your revenue every year after that. Then you can be a unicorn in 7 to 10 years. That's a really interesting thing to know. Because then you can actually ask yourself, of what shape of system would I need to do this? These days, people advocate selling business to business, because business to consumer is way too competitive, is what the current mantra is. B2B, easier sale, you can find someone with a really deep problem, you can deliver some real value.

You hopefully find a few thousand customers. You get a decent amount of money for it, because that was a valuable problem. I'm actually doing B2B SaaS growth and you're at revenue at year 4. In a perfect world, by year 4, you're charging an average of 6 digits for your product. You're charging your enterprise customers hundreds of thousands of dollars for their subscription. That's what you need to be doing. Let's imagine you've fallen off the curve a bit, you're only charging them $50,000 on average, that's still a lot of money. Your software can replace an FTE and more.

Their math, you got 1000 clients, they're paying $50k per annum, that takes me to my $50 million. Imagine each of them have got 100 staff that are using my system. This is Salesforce, it's Confluence, it's Jira, it's an HR product, it's something like that, 100 people are probably using it. Maybe they're using it each day, intensively. Maybe they do 10 interactions per hour. That's pretty heavy. If you've designed your app right, it's not being too chatty. These are 10 meaningful, hefty interactions per hour. You're getting 10 system calls per interaction. It's like building the screen took 10 calls, because you built your app to be not too chatty, because you're sensible. That gives you some mathematics.

First, before we go into the mathematics, usage is never evenly scale out. We have to allow for the fact that everyone's going to use the system at the start and the end of the day. They're going to come in on Monday morning and log in and do something, then have coffee and go and chat. Then, they're going to go, I got to do some stuff here, get the work done at the end of the day, and go home. All your calls are actually compressed into 2 hours a day, not it? We get 10 million server calls per day, is roughly what's happening there. The number of calls per second you need is about 1500 per second.

One thousand meaningful server calls against your application doing database-y things, across multiple tenants, whatever it is, per second. Maybe you need 100 calls per second for a server, maybe 15 servers, maybe in 2010. Not today. Is 5 million requests per hour a lot? No, it's not. A million requests per second is a lot. A million requests per hour is easy today. pgbench is one of my favorite benchmarks. I'm a fan of Postgres. I'm just going to wear it right on my sleeve. It's an amazing open source, freely licensed product. The benchmark is very real. It does the sort of thing you do if you did an ATM transaction.

It reads some information, it gets some information for you, maybe a mini statement, does an update or two. It does quite a lot of work. When you run that benchmark on a system like this, 96 CPUs, 384 gigabytes of memory, will cost you $50,000 a year to rent from Amazon. That's a medium-sized machine. It's almost smallish in today's money. For the next decade, this is your M3 of the future. You can get 68,000 transactions per second out of that, which is 200 million an hour, which is a transaction with everyone on the planet who's got an internet connection each day. Is that enough? You only need the little bit at the bottom there, so you can run in a much smaller machine. What this means is, you don't have to build a crazy architecture. You can build an architecture designed to adapt first.

Unless you're doing something genuinely out there, unless you're doing AI training, unless you're doing telemetry for everything that moves in the planet, unless you really are Google or Airbnb, unless you're in year 7, 8, 9, or 10. Year 4, this is great. The first 4, 5, 6 years of your company, you can scale to serve an entire country worth of users off an average-sized server. It's amazing. This has changed a lot. People just don't realize how fast things are improving. I'm about to go to the hardware part of the presentation, where you see how much more it's improving as well.

The Power of Commodity Software and Hardware is Astonishing

Moore's Law is still relevant. It's like people have been predicting the death of this for ages. Jensen Huang of NVIDIA did say it was dead 2 years ago. He's changed his mind now. Moore's Law can run till 2030. The way that keeps working is that they go 3D. You're used to chips being flat. They're stacking them, they're going 3D. That means they got more density. Intel have said they could have a trillion transistors by 2030, which is amazing. The power of commodity, these supercomputers we were running on back in my little story at the start were Starfire E25000.

They had 72 processors, 172 gigabytes per second of memory bandwidth, and they cost $4 million back in the year 2000. We had two of them. This is my workstation. It's a few years old. It's a 64 core, 200 gigabytes per second, $10 grand, plays Call of Duty really well. Three percent CPU, 160 frames per second. These are equivalent. This is amazing. This isn't even server grade hardware anymore we're talking about here. You cannot go wrong for the first 5 years of your initiative with being massively vanilla.

People who sell you space and storage and bandwidth do not want you to know this. The AI guys have let the cat out of the bag. The AI guys need this performance. It's not that Intel and AMD can't have it. No, they manufacture their chips in the same factory. They get the same benefits that you're about to see. They're mind blowing. I stay abreast of hardware. I was massively shocked. I went to a conference earlier this year, heard CEOs and CFOs talk about their plans for the coming years. You hear the facts coming out of their mouths, and just jaw dropping performances coming our way and why they're delivering it. This is what you got in software.

That system that I wrote in 1997 with my 99 colleagues in C++, C++ is a massively better language now. Solaris was great, Linux is better now. Sybase was ok. At the time, Postgres was rubbish. Postgres is amazing, though. SQL, the one true survivor in the language wars. If you can express your problem in SQL, you've expressed the answer. It's an incredible thing. Now you've got the whole computer working with you to make it fast. It's doing the thinking for you. You got amazing middleware. I'll plug AMQP, of course. You got Kafka.

You got Aeron. Different types of middleware for different types of things. The one thing you shouldn't be doing is web services. You really shouldn't be doing HTTPS inside your data center. It's an outside your political control thing. The lowest common denominator to an alien environment is what it is. I have strong feelings on that. Web technology converged. You don't have to support a gazillion browsers. JavaScript is awesome. It's amazingly fast what that compiler can do. The ecosystems around it, it's not just the language, because Rust is great, and Zig is great.

The ecosystems around Python and Java and C++, and C#, and SQL are just immense, you can do anything in these platforms. You can integrate with anything. You can't go wrong. It's funny because as technologists, we like new shiny things. We like the challenge, the fun of the complexity. Back to that point from earlier, you don't know how valuable you are to your company. The guys in the sales team are depending on you making a product they can easily sell to bring in the revenue to make the investors happy, so your stock options become worth something in 10 years' time. It really matters. Every decision actually matters. Your job as technology leaders is to get everyone on board to understand that you're solving the one problem of the company, which is the customer problem you decided to solve.

Fast Trading Systems are Compact Systems

The application is incredibly simple. You get this Ruby on Rails type thing. I just picked this because it's a handy graphic. Also, because Mr. Ruby on Rails, DHH, Mr. Heinemeier, he recently famously de-clouded his startup and said how much money he'd saved. The application architectures are tremendously simple. Even Shopify and GitHub are running Ruby on Rails applications with really simple architectures. Why anything else? No, this would not be a financial track if we didn't have the anything else. It turns out that the fastest trading systems in the world have always been single chip systems. FPGA were the fastest chip you can get.

I remember once, one of the chip manufacturers asked me and one of the guys who was a technical architect from one of the other investment banks, we're going to rev the Intel chips, is there anything we can do for you? Do we add opcodes to the microprocessor? I nearly fell off my chair. What can we do for you? The guy from the other bank said, I want the fastest single-threaded performance you can give me. The guy from the chip manufacturer just then answered, how much power can you get to the cabinet? No warranty. It was fantastic. It's all about single, straight-line velocity because the market is ordered. The market has a total order, there's a tip to the total market. You have this incredibly fast event flow.

What is a modern chip? A modern chip is actually a network, is an entire data center on a chip. If you've got middleware that can let you have location transparency to the point where, am I on an InfiniBand network, or am I on the on-chip network? Am I in the in-memory, shared memory IPC? Then you can actually build an entire system the traditional ways, with different components that talk to each other, event, and you can have choreography, you can move things around. It will go incredibly quickly. It's Martin Thompson's work, and their switch. He did Disruptor years ago. He was famously part of the architecture team for LMAX. He then went on to do Aeron messaging, which is used by the Chicago Mercantile Exchange. It's open source.

You can go and have a play with it. Its premise is basically, run a task per CPU, run a single thing per CPU, have it entirely in the L1 cache, use Java. At which point, other than the FPGA guys, someone goes, what? The FPGA guys just spilled a drink. Why use Java? Because Java can profile from the machine you're on. It goes, you've got AVX-512, I'm going to recompile your code on the hot path to use AVX-512, unroll that loop. Java is a good one. You're now faster than C++. You can build these things. You basically spin each CPU core. You lock it down and spin it at full speed. You use lock-free structures in memory to pass data through ring buffers. It's like wheels on a bus, they pass the work between each other. It goes at millions of requests per second with 10 nanosecond latency, 10 billions of a second, the time it takes light to travel that far. That's interesting. You can do that with a single box. It doesn't require anything else.

Why Avoid Distributed Computing?

Back to this L1 cache thing. If you translate CPU time to human time, an L1 cache reference, which is the working memory of the CPU, would be half a second, if you translate CPU time a billionth of a second, which is what it is. To our time, half a second is a heartbeat, if this L1 cache references a heartbeat. A main memory reference is the time it takes to brush your teeth. Pinging a local computer is a working day. I want to get something from my cache. That's a day in the office. Reading something from a disk is taking a vacation.

Reading it from the spinning disk is to start a family. Pinging San Francisco is to take a master's degree. There's a version of this that's done with spatial time, like the time it takes a round trip to Mars and stuff. Our brains just can't comprehend these figures. If you can put your entire program into the L1 cache, it will go 1000 times faster than whatever else you're running on that machine. Small equals close together. Close together equals efficient because you're not moving the electrons further. Efficient equals fast because it's all close together. Fast equals cheap, because you need less of it. All of this equals green because you're using less electricity to push the electrons around. That's got to be good these days.

There's some other interesting fights that are floating around or emerging, discovered. This is my new favorite product, DuckDB, open source. Other DBs are available. This one in particular, its party trick is PostgreSQL compatibility, and the ability to read any flat file into memory, then run a relational query on a really sophisticated one. Then finish doing that all within the blink of an eye, even for files that are 10 gigabytes in size.

It makes data analysis an absolute joy. If you've got a fast enough machine, it's amazing. The guy who's behind the company is commercializing this, Jordan Tigani, used to be an engineer on Google Bigtable, so he might know what he's talking about. He said big data is dead, the machines caught up. There's a benchmark he had that took 3000 machines which could run on one machine today, one CPU, not just 3000 machines, one CPU today. Most firms have under a terabyte of live data, even big ones. That'll fit in the memory of any small machine these days. You can buy that machine outright for $30,000. The numbers have got silly.

Data Center on a Chip

We really have got to the point where it's data center on a chip. Spot the common feature. My workstation is small, the baby version of this, 128 CPU cores, 82 billion transistors. It's lightweight. That's AMD's entry in this horse race. Intel, they've got many entries. They have announced a 380-core processor for 2025. This is getting crazy. The doubling effect is becoming entirely incomprehensible. This has got 64 gigabytes of RAM on the chip. You can boot the server with this chip in it with no memory plugged into the memory slots. That's weird.

Who's heard of HBM? Host Based Memory, they've worked out that getting things close together is the name of the game. That 64 gigabytes is HBM. It's in these little things here. It's actually attached to the silicon. These chips are 700 layers deep, 700 layers of silicon. The manufacturers have got a bit good at this. You can get everything really close together. That's neat. This one. Everyone's got one of these: M1, M3 Macs, 16 cores, 40 GPUs, 128 gigabytes of RAM on the package. This is not HBM. This stuff here is actually on the same package.

It's like your iPhone just got big. That's the RAM that's actually on the SIM, a little mini tiny PCB, a little silicon PCB. It's super special. It's got super-fast interface into the main chip, 92 billion transistors. The daddy of them all currently, and tens of thousands of GPUs, it's at least 16,384 GPUs on this chip, probably 32,000, because they've taken two and glued them together. What I find interesting is another processor called the Apple M2 Ultra. The M2 Ultra is two M2 Macs glued together side by side. They're flipped symmetrical and glued together, like you'd fold a piece of cardboard.

NVIDIA did the same thing. They took two things and they lined them up down the middle and put them together to get this 208 billion number. This has got 192 gigabytes of server memory on the chip, these things here. It's like 64 gigabytes of LDDR5. The memory bandwidth of this thing is 5 terabytes per second. It's like throwing hard disks at you. I used to throw CDs at people, then I threw Blu-ray discs at people. Now I'm throwing actual hard drives at people. It's incomprehensible. Clearly NVIDIA are using this really for their AI processor, which is the GPU variant, which is a very simple thing. It's Single Instruction Multiple Data.

This is not for them only, this is going to come down to x86, and Arm, and everybody else. That's what I'm showing here. These are all real. You can buy all these. Some of them are a year and a half old. What's coming down the line towards us is a new class of incredibly accessible hardware. For the cost of renting something today, you can buy one of these things tomorrow. The storage has got faster too. The networking has got faster. Networking now is running at 100 gigabits per second, that's 10 gigabytes per second.

That's throwing SSD cards at people. The fastest network is 800 gigabits per second. Storage has caught up as well. In the old days, whenever I had my story, the hard disks would do 150 writes per second. Less than that actually. The key fact for running a traditional relational database, is holding the entire database in RAM, and then writing your transactions to disk, just the transactions. You're writing the transaction log. It's all about how fast you can write the transaction log. Because once you've got everything in memory, super-fast.

How fast can you write a transaction log. That's called IOPS. That's why IOPS matter. If you just buy an enterprise SSD, top line one, you'll get a capacity of 4 terabytes and 2.5 million read IOPS, 400,000 write IOPS per second, versus 150 from 20 years ago, 150 full stop, versus 400,000 for one drive. Persistent storage. Are you excited? Are you going to write software the same way you did before? I really hope not.

Autonomous Entities Can Manage Internal Complexity

We've got these data centers on chips. Of course, we can't all just be in one big party. The world's a big mess. What you have is these autonomous entities, as the IETF guys like to call them. Political domains. Within a political domain, you can manage chaos. You can control complexity. You can crush chaos. You can crush complexity. You can get your team together. You can win. When you meet the red team, or the blue team, you go across teams, they've got a different view of the world. They've crushed their complexity differently. You got to interface between the two of them.

We have language for this. We negotiate treaties, called APIs. We have special envoys called gateways and adapters, and we muddle through. The trick for any organization that wants to move fast, and cheap, and have delightful, awesome products, and given how capable the hardware is now, is to make your system fit within your political entity. Make your autonomous sphere of control as big as you can before it collapses or becomes fragile. That typically tends to be teams of seven engineers clubbed together in smaller teams.

Don't ever make your company too big. WhatsApp had 50 odd engineers, or Instagram, they were both on the double-digit engineers when they sold for billions of dollars, and they had made their internet scale things. You can do this, and you have the technology. The technology out there in open source is absolutely tremendous. Now, the silicon guys have come and given us magic. They've given us actual magic. You can use all the on-machine tooling. You can take a snapshot of a running process. You can ask top, what's the longest running thing? You can add all the tools and the operating system will tell you, we'll solve your observability problem, just ask the operating system. You don't need the rest of that stuff. That's what gets to my goal, to a degree. It's like, we have all this power now, we should be using it. We should not be architecting systems like it's 2009 anymore.

Consume Cloud Responsibly

You do need cloud. If you're a startup, no one will believe you. You won't be able to sell to an enterprise unless you're on a known cloud. They've got over their AWS, and Microsoft Azure, and GCP worries. They're now acceptable. If you're on them, you're good. Treat it as a machine. You've got a fixed budget. That should be clear now. You're controlled by money. People depend on you. You run out of cash, it ends. Who has been in a company when it ran out of cash? How does it feel? It's a really abrupt stop. It's a terrifyingly abrupt stop. You all go for coffee and go, what happened there? Treat the cloud as if you're building an actual data center just using cloud bits. Use it for global network reach. Scaling, backup and stuff is easy. Security. It's great to have the ability to absorb load and things like that.

If you have a blank check, go serverless. Serverless is not your friend. It's a bit like unlimited vacation at your employer. I think we have an understanding. If you have unlimited scalability needs, if you're the government, if you're tracking all your citizens. Short-lived large workloads, if you're doing genome sequencing. If you're a market maker, if you're a stock exchange, you might actually choose to go with the cloud, to put all your data in the cloud, to get fair geographic reach and solve a whole bunch of problems. Because as the trading venue, you care less about the performance of the trades, you care about the cost of providing the service to everybody, to more customers. This is the killer. It's the only place you can do AI at scale, inference or training. Then, of course, you have a blank check. That's maybe not our problem here. Like wine, drink responsibly. My tag for this is, uncomplex, engineer for this decade.

Questions and Answers

Participant 1: You argued against cloud. It's too expensive. It's a waste of time, a lot of the time. Are you really arguing against distributed systems, and some of like the changes in practices that have happened, and like, Infrastructure as Code and all of this stuff?

O'Hara: You need a lot less of it. We've turned it into a whole industry, as a mutual friend calls it the Kubernetes industrial complex. Anytime someone says, I just need to have six more servers to spin up Kubernetes. Six servers was my entire production budget for a startup. There's something wrong there. That's why I asked at the start, who's working for a company with a technology budget more than $100 million? You live in a different world.

Participant 2: We're moving to cloud because of compliance, and future features, not replace features.

O'Hara: Pretend it's a data center, make it small. Keep your costs low. Make your runway long. Make your people productive, so you can please your clients as quickly as possible. Some people have very simple technologies that perform miracles.

See more presentations with transcripts

Recorded at:

Aug 22, 2024

John O'Hara

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?