Transcript
Atkinson: I'm Astrid. Prior to co-founding this company a couple years ago, I was at Google. I was there for about 15 years. I was really fortunate to join Google very early in their history, at a time when they were really pioneering the broad approach to distributed computing that ended up really defining Google as a company and underlying a lot of their growth, from about 2004 onward. That was something that I was really fortunate to help develop as part of their early SRE team, and later, generalize and scale, as I went on to go lead up various software engineering teams within the broader organization. After I was at Google for about 10 or 12 years, I really started to think about what I was going to do next. One of the things that I kept coming back to was that, while I thought that the work I was doing at Google was very important, and that it did really help people. That one of the issues that I had become increasingly concerned about in my own life was climate change. I had learned a lot about it. I had begun to really follow news about energy and climate impacts, and began to realize that this wasn't an abstract issue or something that other people were going to take care of. Rather that it was going to be profoundly impactful to not just my life, which is one thing, but also the life of my son. My son was born in 2013. In 2050, which is when we should be fully decarbonized, if we really want to avert the worst of global warming, he's only going to be 37, which was about the age that I was when I started thinking about this.
I want a pile of grandchildren. I want to retire here on Earth where all my stuff is. I want his life to be at least as good as mine, if not better. I want that for everyone's kids. I really started to think about how I could pivot my career in a way that might positively impact this issue, which I care so much about. Basically, what I could do to help? I think that it's really important to set the stage when we think about climate action. Something that comes up a lot when I talk to people is some combination of either, what can I do, or, aren't we doomed already? What's the point? The truth is there's a really big difference between a truly unlivable future versus one where the impacts of climate change can be mitigated and adapted to. Every degree counts. Every fraction of a degree counts. I think that's really helpful for framing because it's not really about, can you individually stop climate change? Rather, is the work that you're doing in some way going to positively impact the world that you're going to live in, or the kids are going to live in, the world that your friends' kids are going to live in? The one that we leave for the generations to come?
Decarbonize the Grid and Electrify Everything
That was really the setting in which I started to think about like, in that case, what is the work that's needed to decarbonize the grid? What role could I play in that? Broadly speaking, when we look at just the energy aspects of decarbonization, and of course it's more than just energy that's involved. The energy is a really big part of it. The strategy for decarbonizing our energy system is actually in broad strokes, very simple. It's basically, we electrify everything, and then we decarbonize the grid. That's appealingly simple. Then there's this really big question of like, that sounds great, but how do we do that? It was really in the context of this work within the energy system that I started to think about, my background is in distributed systems and into large scale computing. Which parts of this problem look like a distributed systems problem? It was in this context that I personally got really interested in grid technology.
What Role Can Software Play?
There's some appealing similarities between the electrical grid and the large scale computer networks and computing systems that I was really fortunate to work on in my time at Google. Both of them rely on broadly distributed resources to do work. Both have physical connectivity components as well as logical management components. Both have a really large number of moving parts. Both are in their way really critical to people's lives. Although, arguably, the grid is quite a bit more important than search results. Nonetheless, some of the work that we did in my time at Google to pioneer approaches to getting utility grade reliability out of distributed computing systems began to look really applicable when we look at some of the transitions that the grid is going through. As we switch towards decarbonized energy sources. As we switch towards a grid in which customers have more of a role to play. A lot of the challenges involved in this transition are really about, how do we put all the pieces together? The great news is that as we look at the challenge of energy system decarbonization, we have probably pretty much all the pieces that we need, in order to make the grid really zero carbon. We have renewable energy sources that are zero emissions. We have nuclear, if we care to use it. We have a lot of increasingly practical and affordable technology for saving energy and using it later. We have a lot of technologies to control how and when we use it. We need something that puts all those pieces together in a way that makes sure that we have energy at the time and place when we need it.
The Way in Which Today's Grid Is Managed Won't Get Us There
The existing grid is really not managed in a way that makes this practical. The existing grid is really managed in a model that assumes that energy is going to be procured from large generators far away, and distributed to people's houses, where those endpoints don't play an active role. There's really not much to do on a day to day basis within the distribution grid, which is the wires that connect your house to the bigger wires that come eventually from power stations that are far away. Traditionally, the way that this has worked is a utility would capacity plan that grid for 10 years. They would build it. Then they would effectively sit in a distribution center, and when something breaks, they dispatch trucks to go fix it. That's really far away from the dynamic, day-to-day real time managed systems that we think of when we look at a distributed systems application in a computing sense. Or even the dynamic real-time grid operation that we use for managing today's transmission grids, which is the really big wires that go across long distances and connect to distant power plants.
Because the grid is not actively managed today, it means that simple management changes like, could we turn solar off if there's too much of it? That's one thing. Instead of just turning it off, how about we move that demand around in time. It sounds really simple, but the infrastructure to manage the grid as this large scale distributed system really isn't in place today. If we think about the job of the future grid being to get energy from a time and place where it's produced to the time and place where it's needed, there's really an additional level of management sophistication and logic that's required to do that. This is a place where distributed systems technologies really have a role to play.
Status Quo: Current Grid Management Technology from the '90s
One of the things that's going on in this space is that the utility industry is legendarily conservative about adopting new technologies. Part of the reason for that is that, frankly, they have to keep the lights on. When I was carrying a pager for Google and managing operations teams, one of the truisms of an operations role is that we hate change, even if we're in an innovative change positive system. The reason that we hate it is because when you change things is when they break. That's true for utilities as well. They have operators, they're concerned with keeping the lights on. It's really important that as we look at technology change within the industry, that it's done in a way that supports those reliability goals, as well as introducing sophisticated new technologies that can help us to better understand data, and help us to better manage real time operations, and all of those things.
Part of what I'm talking about here is an intro to places where we see applications for distributed systems technologies to solve some of these technical problems. I think it's probably worth noting that one of the biggest challenges here is also just a people question. A question of how you integrate these technologies into the workflows of people at utilities who are responsible for maintaining the grid, really folded in, in a way that makes sense to them and fits into their jobs in a way that's practical and can be done very quickly. Because we need to make this change quickly.
Key Distributed Systems Technologies
What are those technologies? There's a few technologies that I tend to think of as being pretty foundational to the process of managing a distributed system. When I'm talking about this, I'm not really thinking about the tooling that we might use for managing builds or continuous deployment infrastructure, or those kinds of things, but really more of the building blocks. Like, what are the foundational pieces of those systems? At some point, there's a lot of resources that need to do work. Those resources need to be managed. You need to make sure there's enough of them. That you know where they are. You know how to get to them. You know how to talk to them. You can monitor them. You can tell what's going on with them. That's really the technologies that go into the orchestration layers, so thinking about VM orchestration, or container orchestration. It's really like fleet management and lifecycle management. I was fortunate to do some work in this area at Google. The orchestration team that was working for me in Google Cloud, actually went on, somewhat against my wishes, to start Kubernetes. These technologies are very closely related at their most conceptual levels. Just like, how do you take a whole bunch of things, and make them reliable enough and addressable enough that you can get them to do work together?
Step number two is, how do you take those individual components and make sure that you're spreading work across them appropriately in ways that take advantage of their ability to do work? In ways that allow them to behave reliably as a whole, and allow you to work around any places where there are individual failures? Then, lastly, can you see what they're doing? Can you both look at the behavior of the system as a whole? Can you go down to the level of individual components? Can you look at how they're interacting? Can you manage the connections between them, which is the network or the physical lines of the grid, and draw reasonable conclusions from an operations and a planning perspective about what's safe? What's not safe? What's happening right now? What might happen in the future? Ultimately, are the lights on and how do we keep them that way?
Camus Products: Grid Management Platform
As I look at this, and jumping forward a little bit, translating these technology sets into the grid landscape, this is ultimately the product set that my company has put together. That we built and deploy today with smaller utilities, and going forward as we start to work with larger ones. It's really managing resources out there on the grid. It's thinking about how to get them to work together. It's thinking about how you really understand them as a collective, and how you manage towards that. Then thinking about how you spread work across them. How you get them to work together in ways that are effective. How you make sure that that's something that can be reasoned about and fits into the workflows of the operators who are out there trying to manage this stuff in the field.
Technical Architecture: Real-time Data Platform
Just really briefly going over some of the ways that we've thought of this, and technology sets that we've brought in to make this problem tractable. When we started working on the platform that became the grid management platform for Camus, we really needed to understand the data that was out there. The data about the grid. What is the nature of the network? What's its capacity? What traffic is happening on it? What resources are there that can do work? As you look at this, there are analogs in the grid landscape to the way that you might approach this in a distributed system sense. Grid GIS systems have information about the physical network that energy needs to traverse in order to do work. SCADA systems have information about the state of control infrastructure, and about the components of that physical system. Smart Metering, or Advanced Metering Infrastructure, which is AMI on this diagram, has information about what's happening at the customer location, and at the endpoints of the grid. Then, of course, all the devices that are out there, from rooftop solar, to batteries, to EVs, or whatever, can also tell us what they're doing.
Just putting all of those things into a common data platform, thinking about them together, using that data to draw inferences about what might happen in the future, and what has happened in the past. The foundational level of really beginning to understand how to manage the grid, not just as a physical infrastructure, but also as a distributed system. This is really where we started. Some pieces of this exists in the field today, and some of the services that we're still in the process of building out. This hopefully gives you a flavor of how some familiar infrastructures might translate to solving problems like monitoring, but in the context of the energy system.
Practical Example: Monitoring Dashboard for the Grid
In practice, what's this look like? We provide a monitoring dashboard for the grid. This, to me, coming in from the distributed systems space is something for which the need was very straightforward and obvious. This is actually something that we've had to work with folks on the utility side to understand, what's the state of the art in monitoring? What should you expect? It's pretty normal in utilities for their current operational view for a distribution system to be 24 hours behind real time. Really starting to increase those expectations around having this data available in real time. Having it in an integrated way. Being able to go all the way from information about the nature of the network to information about the endpoints and the devices that are out there, one of the foundational things that we started to do.
How Are DERs Like Containers? Fleet Management and Orchestration
This is useful in and of itself. It also forms a foundation for being able to move on to really thinking about the grid as a distributed system. I mentioned earlier thinking about how to manage the resources that are doing work, as there start to be more of them. In the grid of the past, this was quite simple, in the sense that there were really only a few hundred to a few thousand generators out there. They were managed by professionals, big power plants, lots of people. You call them on the phone, if you wanted them to turn up or turn down. Obviously, that's not the world that we're moving into with batteries in people's houses, with things like managed charging, with solar panels on people's roofs. You really need computers to solve those coordination, and dispatch, and provisioning, and capacity planning issues. This is really the set of work that from a distributed systems perspective, I think of as fleet management or orchestration. Again, this is the normal work that is done by an orchestration system.
As we started to think about what's the set of technologies that really matter in the grid of the future, there's some way to transition from that closely managed view of the world where power is produced by a small number of professional participants. Towards that view where it might be produced, consumed, or moved around by very large numbers of them. Looks similar to that transition in the distributed system space from pets to cattle. Going from a small number of named servers that do work, to a really large number of collectively managed fleets, requires a lot of the same technologies around managing groups of things in a virtual and distributed environment, as managing containers or managing servers. That's really, ultimately, a boring set of things to do. That's registration. It's monitoring. It's capacity management. Distribution of work. Verification that work is done. Accounting. These ways that we make sure that there are resources available in the system to do work, and then verify that that work was done as expected.
Putting the Distribution in Distributed Systems - Load Balancing and Distributing Work across Systems
Once you've done that, there's a lot of things that become possible that were very difficult before. One of them is really adapting the ways in which we think about distributing work across those resources. In the grid of the past, that was a very simple technical proposition, in the sense that, yes, you needed to build really big wires and really big generation plants, and you needed to connect them to everyone's house. That is a lot. Happily, that work has been done. As we look at the grid of the future, what's happening is really just that we need those systems to do new kinds of work. They need to work a little bit differently. They need to work in a way that is more flexible to having more participants. When you think about distributing work over a very large number of things, that problem is like load balancing. It really is effectively load balancing on the grid, as well as it might be in a software environment. If you draw that analogy out, and you think of the simple job of load balancing as being spreading work across a whole bunch of capacity resources, but there's actually a number of more complex components to that. I was really fortunate to work with the teams that were building out load balancing systems at Google. A lot of the design considerations and architectural considerations are actually quite similar between how we think about distributing work across servers versus how we might distribute work across the energy system.
Load Balancing (for the grid) and Distributing Work across Systems
Ultimately, this is the process of mashing up a whole bunch of resources that have different characteristics that exist on a network that has certain constraints, to do the work that is needed at the time and place where it's needed. In the internet context, you can move packets around. You can route traffic. You have a little bit more control over where traffic shows up. Within the energy system, the analogies aren't bad. Your goal is to distribute work across capacity resources. You need to manage and understand the health and availability of those resources. You need to make sure that they're available to do work when they're needed. You need to manage those network constraints. You need to be able to incorporate things like caching to reduce overall demand on the network, which has an elegant analogy to storage. Although, again, the analogy is not perfect. Within the energy system as within a load balancing system, sometimes you may want to shed load in order to reduce the overall demand, either for network capacity reasons, or for cost reasons.
As we look at this, sure, the technology is not exactly the same, but there's some really interesting commonalities in the technical challenges. This is something where I think that the tech industry, in particular, work in the domain of distributed systems management and operations, really have something to offer to folks in the grid space. Because the problems are similar enough that we can draw that analogy, and some of the technology approaches can actually just be reused.
Practical Example - Using Batteries to Shave Peak Load
For a really simple example of ways that this might work in the grid landscape, one of the really basic questions is, if we have a grid that has got a lot of renewable energy sources, and a lot of demand that's not at the time when the sun is shining, or the wind is blowing, can we move load around to better align with the availability of renewable sources? This comes up a lot in the grid landscape. Because as I mentioned, whether you want to reduce peak load for network capacity reasons, or you want to do it for cost reasons, there's really good financial propositions for doing this for utilities. They can save money on energy. They can save money on network costs. These are both things that are appealing from a business model perspective. From a technology perspective, telling a battery to charge at one time and discharge at another time is not terribly complicated. Once we start to look at things like introducing EVs and shifting load around, managing the role of rooftop generation, managing the role of smart water heaters or thermostats. Looking at the role of batteries, all of which have their own jobs and different load profiles, and thinking about how to use them in a coordinated way, that starts to look a lot like that load balancing problem. How do you take a whole bunch of things and get them to do work really reliably?
Integrating Local Renewables into Real-time Power Supply
A lot of this is not really a technical problem, a lot of it is actually a people and a workflow problem. One of the things that we do when we work with utilities is really work closely with them to understand like, ok, what does it take to make it practical for you to think about using these local and decarbonized resources as part of the workflows that you need to carry out throughout your day? How can we change that and provide new tooling around it in ways that makes it practical to really rely on these resources for decarbonization, and for reliable and cheap power supply. One of the things that we do is really work with the utility to make sure that we provide the tools that fit into their existing workflows. For example, how do you make sure you can use local resources? That really needs to be integrated into the utility's view of how they procure power. Or, otherwise, that's never going to be incorporated when they think about, how am I going to serve the load that I have to keep the lights on for my customers?
That's just one example of simple ways that you can plug in these complex technologies. It really does need to be simple, because everybody who's doing this work also has a day job. That day job involves a lot of things, from buying affordable power for their community, to making sure that trucks are in the right place at the right time to restore power after a storm. Making sure that these utilities have the tools that they need to be able to do their work a little bit differently, is actually one of the important keys to making sure that they can transition to a grid that relies on these local and decarbonized resources more, and then fully.
From Action to Value: Incentivizing Customers to Participate in Control
As we think about this, there's also just this really big question of, let's say you could manage all these resources and load balance across them. How does that fit into the energy system that involves people, and rates, and money that they pay? Why would someone let you access all of those resources? This is also something where I think there's a lot of technology lessons that we can bring in, but there's also a lot of policy and people considerations that have become really important. The really simple answer that we have to this is that, personally, I think if you want people to let you the utility use their batteries, or their EVs or anything like that, you need to pay them. Our view of this is that there's a role for pricing and market signals as an extended operational signal to express the ways in which those resources can be grid supporting. This is the perspective that we bring in when we work with local utilities as well.
What's Happening Now? Connecting to Existing Balancing Markets - And Creating New Ones
While there are mechanisms today for resources that provide energy or flexibility to participate in large scale markets, that's really not something that's available for local resources. It's really something that does need to be plugged in to an awareness of how that local grid is managed, provisioned, operated, and planned for. That's really the enabling role that we see the set of distributed systems technologies bringing to the grid landscape.
Questions and Answers
Breck: In terms of the distributed computing challenges, do you think there's unique considerations for energy that are maybe different from consumer IoT or mobile applications, or these kind of things? Or is it really just about taking what we already know about distributed systems and fundamentals there and applying them to energy?
Atkinson: I think the areas that probably represent the biggest differences, one of them is technical, and one of them is not. The technical one is that in the grid space, there is a very important set of considerations around the physical capacity of the network. That's true for computing as well. If you overfill your internet connectivity or network connectivity, from any perspective, you're going to end up with trouble. If you overload a network, you'll drop packets. If you overload a grid, you'll blow up stuff that's outside people's houses. The safety considerations of making sure that you're getting it right are pretty significant. Beyond that, it's a variant on a lot of the network management and asset management and device management problems that are common in the consumer space. I think considerations around managing many distributed devices, managing change across them, making sure you're getting updates to them, and keeping them secure, all those kinds of things. It's also true for a lot of IoT domains. It's really the centrality of the grid, and the extent to which it really impacts people's lives that makes it, I think, like a bit of a different class of problem.
The other consideration that's different is the extent to which change in the space is really bound up with business models and regulatory considerations. There are a number of things that might make a lot of sense, like using our existing network more efficiently, that aren't necessarily within the incentive structure of the utilities that manage it today. It's just not something that really is a great fit from a business model perspective, like they won't make more profit if they use the network more efficiently for the most part. I think that's actually a really underrated consideration in grid transformation. It just makes a lot of things more difficult when the business model doesn't incent the kinds of changes that we need to say.
Breck: I've even seen incentives that are supposed to encourage renewables actually getting in the way of building good distributed systems. It's frustrating when that happens.
You're not just a technology leader. You've also been an organizational leader. This distributed system that you're talking about spans many organizations, from regulatory bodies, utilities, vendors, OEMs, cyber physical security, regulation. Do you have any thoughts on what will make it successful organizationally?
Atkinson: I think that success in this space does require recognizing that it's not just a software problem. My personal perspective on building organizations has always been, you want a team where many different kinds of people are coming together to bring in their superpowers. That is so incredibly true of the grid space, because it does have so many dimensions that aren't just software. You need people who understand the regulatory landscape, who understand how utilities think, and have worked there. You need folks who can engage with rate structures and business model incentives. You need all those people really working together to think about how it could be different. I think the best technical teams are always the ones that have those additional dimensions. Folks with a lot of experience of the dimensions of the problem that are non-technical. I think it's really a requirement for this one.
I think the other thing too, is that that's more than just how we hire within my company, for example. I also think that one of the things that's really slowed down innovation from a technical perspective in the grid space is that there's not a lot of collaboration between technical players. Utilities don't really compete, in a normal sense, because there are geographic monopolies, but many specific variations where that's not true. Vendors do. Vendors in the grid software space are so incredibly concerned about keeping like the one algorithm secret that they won't often even tell you what they do. Let alone how they do it, or share technology around building platforms or systems that can interoperate. I really think that that's one of the reasons why the state of the art for grid technology is really a lot behind what we see in the consumer space. Where there's, at this point, a very solid tradition of sharing development, sharing standards, building standards together, open source development, collaboration and conferences like this one. You don't really see people in the grid space get out and talk about exactly how they do the things they do very much. The truth is they're all doing the same things. This would move better and faster if people work together better.
Breck: What do you think the place is for standards? We see those in distributed systems in terms of gRPC, or Kubernetes as a platform that everybody uses. How do standards evolve in this space to accelerate the change?
Atkinson: There's lots of standards in this space. I think any standards are better than no standards, because at least you have something you can implement against, even if it ends up being dozens of things. It's still a consistent set of standards you could potentially engage with. One thing that's really different in the energy space to rest of consumer tech, though, is that typically, those standards are not openly accessible. I think the IEC 61850 spec is $10,000. There's not reference implementations for using them generally available either. I don't think that helps. I do think there's also a lot of areas where those standards don't effectively cover, where at least a common implementation or something like that would be very helpful. I actually think really bringing in some of that just like cultural practice around more open standards, once they're free, that have a reference implementation available that you could pick up and use and build on, actually, probably really helps a lot.
Breck: Somebody is asking about the tension between local autonomous operation, and centralized control of these hundreds of thousands, or potentially millions of energy assets? Do we need new ideas in order to do that? In my own work, I think one of the hardest problems is the fact that you can't think of these systems as individual systems, you need to think about them in terms of the aggregate. That distributed system that spans the cloud and the edge, how do you see that evolving? Do we need new ideas there?
Atkinson: I think there's a lot of really good practice that can be adapted into this space. If we think about, what does a distributed systems reliability model really look like? It's a model where the individual nodes within any system can behave independently, pretty much all of the time safely, and do a job reliably with only intermittent connectivity. That's true for distributed serving nodes. It's true for distributed energy resources as well. One of the really key lessons from the distributed systems space is that if you want system level behavior, you need system level coordination. I think the distinction between coordination and control is really important. In a distributed systems model, if you're just thinking about servers, like your control model isn't typically one lead node or master node, exactly telling everything what to do all of the time. It's typically that lead node providing lightweight connectivity, and periodic communication to the distributed ones that can update their behavior, but allow them to continue to operate independently.
You've got to have that central coordination, because it's not emergent. What is emergent is a whole bunch of really unpleasant system level effects. If you don't coordinate the behavior of nodes, and they're all following their own signals, they will throw into feedback loops. They'll see a signal, they're like, "Price just went above $7 a megawatt, we're all turning on now." You'll start to see these hurting effects and feedback loops emerge much quicker than the network capacity might allow, in terms of just how much the network or other resources are being used. We're starting to see this already, in some places on the grid, where even five EVs charging at the same time is enough to melt a residential transformer. This is just like not aligned with the duty cycle of the transformer. If they were spread out by a couple hours, they probably wouldn't. I actually think that that system level coordination is really critical to avoid the naturally organic groupings or behavior that you get in a distributed system that's not coordinated.
Breck: There's room here for backoff and retry, jitter, or some of these distributed systems principles.
I'm curious what your opinions are on the need to build an IoT platform yourself to solve these problems versus somebody adopting Google Cloud IoT? Is the IoT platform in the cloud vendor there already in place to solve these problems, or do we really need to build specialized software to solve this problem?
Atkinson: I think there's a lot of applicability of the existing systems. I also don't think that the existing application level infrastructures do exactly what you need. We do use Google Cloud mostly for convenience. A bunch of folks on the team came from Google and helped to build it. It's not better than any other cloud necessarily, but it's familiar to us. There's a lot of base capabilities within that that are very helpful. We use Docker. We use Kubernetes. We use Google Cloud's DNS system, standard cloud stuff. We use off-the-shelf time series databases, and SQL databases, and so forth, or Influx and Postgres right now. All of those are helpful, but the cloud level infrastructure that handles IoT coordination, it could be applicable, but a lot of the problem is actually just getting the data in and making good decisions about it.
The load balancing problems are not exactly the same, because most existing load balancing is doing instantaneous load balancing, not balancing over time. There's a really big time component to balancing in the grid. It's not just about, battery, do something now. It's about, battery, you need to have been charged eight hours ago, so that you can do something now. Then there's going to be a cooldown period after that as well. How do you make the best use of resources where some of them are like that, and others are like, ok, now it's a good time to charge a car, or heat a hot water heater, or something like that. It's just a different enough balancing problem that I think it's not directly applicable. Although I think many of the tools are similar and can be helpful.
See more presentations with transcripts