Transcript
About a month ago, Europol, the European division of Interpol, posted this on Twitter. There was a photograph, they apparently had a sex offender and they had some evidence that he was in this particular region. And they put out an appeal, "Could you please help us find this?" About two weeks later, this gentleman came out and said, "I think I found it." And through a search that we'll explain in a moment, he was able to identify the precise location as to where that photograph was taken, and pass it on to Europol. He was asked how did he do it? "Well, it was quite straightforward," he said. He took a look at a pattern matching of what the architecture was, it gave him some sense of the region. He pattern matched the outline of the hills in the background and he did a very extensive search in the area, until he finally found it by actually looking at some Google map data and extrapolating what it might look like from the ground, having seen it from the air. Turns out, this is a very computable problem, and a very relevant kind of problem.
A Systems Engineering Problem with AI Components
It requires the identification of architectural styles, that's an AI kind of issue, matching those styles to a particular place, which is possible if you have data to do so, looking at local topography and seeing where you might be, matching that topography to places. And then looking at building features and going from a top-down view to a horizontal view and seeing where you might be. So a very computationally possible system. Turns out, this is not an AI problem. This is a systems engineering problem with AI components. And this is the nature of a lot of the systems we see today. It involves the AI aspects of pattern matching, some just raw processing of geometrically transposing 2D and 3D features, a wide search across vast amounts of information, and in the end, it's constraint resolution with degrees of probability of the answers. A human can do it, it's possible for us to program to do so. In fact, if you look at a lot of contemporary AI, much of it is really about pattern matching of signals on the edge.
Pattern Matching
Let's tease that apart. It's really about pattern matching, teaching a system through ground truth, lots of evidence of things that we'd like to look for, and having it pattern match against it. It happens to be mostly with signals, we're talking mostly images and video, and audio, and it tends to be on the edge as opposed to being the center of a system. And a lot of this deals with what we call inductive reasoning. But it's not about decision making. And there's a little bit of AI in that space, but not a great degree. And it's also especially not about abductive reasoning, the looking at specific data and trying to build a theory from it, which the individual gentleman did. So AI is not a dessert topping and a floor wax, it is, for those of you who follow Saturday Night Live, it's really useful in some specific places.
Contemporary AI
Contemporary AI is, in a sense, not all that modern. Many of the existing algorithms have been around for a few decades. But what's different today is two things. There are huge bodies of tagged data. In fact, it was ImageNet that really kicked off the current round of AI advances with the presence of this large amount of tagged data of images. And we also have the abundance of computational power. These two things together have made where we are today. Now, a lot of people use AI in ways that is not intended and people do this with Blockchain too, but we're not going to go there. Blockchain, AI, any questions? I'm done. So, for me, the litmus test is an artificial intelligence system is one that reasons and one that learns, and those both have to come together. Reason in the sense that it must do some sort of human-level kind of induction, deduction, abduction. And learning is key as well too. It may be a system that's taught and more ideally, we'd like to have a system that learns over time, we aren't quite there. But if you have any one of those aspects, it's probably not AI. If it's anything without those, it's less than AI.
And the problem I find that people are putting AI on everything, much like Blockchain and making it look exciting. Turns out that the concepts of AI have been around for a really long time. This is perhaps the first artificial neuron. It's a thing called snark, built in 1956 of all things. It's a single neuron made out of vacuum tubes. Now, we've gone a long way since that time frame. But the point is, the idea of doing computation at the neural level has been around for literal decades. But what we've missed, of course, is the computational power and the data to support it. Now, to set this in context, if we take AI at large, which has had a long, long history of algorithms and developments and setbacks and step forwards, machine learning is only one aspect of AI, and deep learning isn't even a narrow piece of that. So while all the focus has been upon deep learning these days, please keep in mind that there is a large body of exciting stuff that goes far beyond that.
In fact, this is impossible to read if you're not in the first row. But this is pointing out that even within the area of machine learning, there are many aspects below that. There's supervised, unsupervised learning. There's learning that's probabilistic. There's learning that's non-probabilistic, so many different paths that people are exploring these days.
A Bit of AI History
Let's go back in time a little bit, because I'm already feeling a little bit of a chill, I'm wearing my sweater these days, and you'll know why for a moment. But there have been many periods of springs and winters in AI. The first winter was perhaps in the '50s. Here we were at the height of the Cold War and there was great excitement about doing machine translation, where we could take Russian and translate it into something else. And as the story goes, they fed in statements such as, "The spirit is willing but the flesh is weak," translated into Russian and back, and it came back something like, "The vodka is strong but the meat is rotten," and so not quite right.
And so I believe what they certainly learned in that time frame is that language is a lot harder than we first realized. And so while there were some initial simple things that were done, very quickly, it was realized this is a dead end. And keep in mind here, we're dealing with symbolic approaches to AI. So a lot of the funding dried up, people walked away from the concept. But then, a little bit later, we saw the beginnings of the Golden Age of AI. And this is where the ideas of Newell and others came to be, the logic theorist Terry Winograd and his work on small worlds, the idea of having these blocks that I could now talk about in English forum and have a system manipulate them, and there was great progress along the way. But Marvin Minsky said around that time frame, "Hey, we're going to have human-level intelligence in three years." Of course, people don't say that anymore these days, do they, Elon? I mean, sorry. But again, we saw a winter come upon us, because these symbolic approaches, again, we find limits to what they could do, of computational power and their expressiveness.
We saw another rise in the area of Ed Feigenbaum and crew with the idea of rule-based systems. And in fact, there was some interesting work that came out of this. MYCIN, for example, was used to predict certain kinds of diagnosis within medical systems. The story goes also with Campbell Soup used these kinds of techniques. Apparently, there was a gentleman who was their, I guess, chief mixologist for Campbell. And he was going to retire but he knew all the secrets for how to get Campbell soups right. So they went in and interviewed the guy and, you know, basically came up with a set of rules that codified how he did the mixing of the soups themselves. So there's value to it. But again, the problem they found was, this just doesn't scale. After a few hundred rules, it's impossible to build systems that were meaningful at all. In fact, a whole industry around this time had built up around it. Symbolics and other companies had actually started to build hardware that supported these kinds of systems, and they kind of collapsed at the end. So there was this dramatic, dramatic collapse. This is perhaps the biggest "winter is coming." DARPA around this time had funded quite a bit, and all the funding dried up.
AI Today – Deep Learning
So the question now is, are we in fall, the fall of AI? Well, I don't think it's going to be quite the same, because much of the advances we've seen in AI today are moving their way into the spaces of our systems, emphasis being systems, in ways we had never anticipated before. Gary Marcus, you should follow him on Twitter, he's a great contrarian and I think a great pragmatist that points out that, yes, deep learning has given us lots of great things. That's wonderful and good. But remember that it's primarily on the level of signals. And it indeed has made some contributions across AI. But, he says, remember that deep learning is not a universal solvent. It's useful as a component in many other kinds of things. So I think what's different now is that we're in a place where people are recognizing that artificial intelligence approaches that we have, especially deep learning systems, are useful as parts of larger systems.
And that's where you come into play. I would imagine most of you are just good old hardcore developers and you see the AI stuff out there and wondering, "What does this mean for me?" And I think the answer is simply, "Wow. I've got some really cool new toys that I can put into my system to make them even better." In fact, I may be in a position, were I you, to say, "I can actually help make those AI components even better. So it becomes a systems problem in many ways.
Tools and Processes in The AI Space
Now, many of you are familiar with your own kind of toolset. As a developer, you have your own favorite environment in which you work, your favorite frameworks and the like. There is emerging a parallel world of tools and processes in the AI space that are very, very different than what you might be used to. There's a lot of buzz, of course, around TensorFlow. And that makes sense because a lot of the great advances that we've seen in the immediate areas of deep learning have come from this. And that's a good and wonderful thing.
But it's not the only game in town. The other thing that we see happening is the commercialization of a lot of these AI components. So whereas you see these dramatic things taking place, many of those companies are then taking those developments and turning them into services. We see that through IBM and Amazon and Google and others. And that's great because it now means you've got this landscape, this infrastructure of components out there for you to start putting within your systems. So, lesson here is, there's a whole different tool space. And we're going to see developments in this particular domain for the coming years until it becomes stabilized. Every one of the companies, and I won't mention names here, tends to be growing their own particular kind of IDE. The marketplace is going to decide over time which one of those is the winner. There is no obvious winner right now, simply because there hasn't been enough use of these kinds of tools to decide which is really the best. So we're going to see a time of great innovation and that's good for all of us.
The Life Cycle Perspective
From the life cycle perspective, building AI systems is also radically different than it is building the traditional software-intensive system. The primary reason because a whole lot of it deals with data. As you start training a system, you're going to quickly discover that the exciting fun stuff might be in building, you know, training, defining a new neural network and constructing it, and developing the weights for it. But the real time you're going to spend is actually on doing the data curation. And there is more than that you may ever realize. This is where ethical issues come into play. Is my data actually biased from up front? Has nothing to do with deep learning, but is my selection of the data meaningful and socially appropriate? Is it going to lead me to unintended consequences? So one of the natures of this particular life cycle means you've got a whole different set of skills of people you need to bring into your system than what you would have had normally with a traditional software-intensive system, namely, your data scientists are going to be there for day one, and also active throughout the lifecycle of your system.
So identifying the data, identifying the use cases, identifying ground truth needs to begin long before you define the solution itself. This happens to be the life cycle that we described at IBM. But it applies to pretty much every kind of AI component I have seen thus far.
The Hardware Infrastructure
The other thing that's happening is that we're seeing the emergence of a whole different set of hardware infrastructures that support what we're doing. And this is also a change for us as developers, especially if you're in the DevOps world, you know about this interaction between the hardware and the software that you try to manage. What's happening now is we're seeing the emergence of a whole new set of hardware components that are now part of our systems. Google, of course, having its own TPUs to support, essentially TensorFlow. They're not the only game in town though. We see the advent of neuromorphic computing. There are a number of companies that are doing things in this space. We have a project called True North and there are others very similar to this, which are in a brain-inspired way, looking at the way the brain does neurons and seeing if we can build silicon that supports it directly.
In fact, we, I speak of the industry in large, are on a path to be able to build neurons, artificial neurons, about the density of the human brain. Human brain … Let's go backwards. The average AI system probably has a few billion transistors, probably runs in several hundred watts of power, whereas the human brain has about 100 billion neurons and runs about 20 watts of power. So we know that there are different form factors. And each of those form factors has advantages and disadvantages. The computer system, of course, can do things faster than the mind can ever do. Our brain runs at around 20 hertz, some run slower if you happen to be closer to Washington. I have absolutely no idea what to say next. But, so we know that, from the infrastructure perspective, we know how to build these things, but we simply don't know how to architect the software around them. One other difference while I get into it, is that most of the artificial neurons that we see today tend to be relatively binary in nature, weights at the end, but they give us generally a probability output zero to one.
Whereas the neurons in our brains are spiking neurons, meaning that there's actually time signals associated with it. And we've only begun to tap what that particular thing means. So, at any rate, here's a whole new round of hardware coming out for us. And if you look on the GPU world, I mean, my goodness, Nvidia has done some amazing things in this sense, to shrink down GPUs to an amazingly small platform and increase their performance as well.
Distributed Deep Learning
Now, one of the things, or two of the things we see happening in this space are this. There's this interesting pendulum going back and forth about where inferencing and learning takes place. We did an exercise once where we were training a robotic arm and it required numbers of hours of cloud time. But ultimately, we trained the neural network and we could squeeze it down to a Raspberry Pi, which could do this in real time. So there's a great disconnect between the computational resources needed to do training and learning as it is to actually doing inferencing. And so the pendulum we see swinging back and forth is, maybe not all of this has to be done in the cloud, but perhaps we can push some of this stuff to the edge, and that yields a number of architectural decisions one must make.
Approximate Computing
Another reality is that a lot of what happens in the AI space does not require the degrees of precision that we see in other domains. In particular, it means that we can use ... [Oh, hello. Excuse me. Ignore the man behind the curtain. How do I even get rid of that? Is there a computer engineer in the house? Because this is not going away. All right, this is extraordinarily embarrassing. Ah, here we go. Help guide me in here.] As I was saying, the other thing, of course, is that, AI systems don't require the degree of precision. And it means that the kinds of computation we need on the edge are a little bit different than what you might find in other kinds of systems. This impacts our architecture as well.
So to level set this again, remember that, yes, AI is fundamentally changing the kinds of systems we can build. They will enable the automation of the Europol scenario I mentioned earlier. But remember, also, don't forget everything else you learned, because deep learning is ultimately a small piece of AI, which is only a piece of computing in general. So be strong. The things you know and love today will still be relevant in the coming years, you just have a larger palette upon which to build.
The Architectures of Systems
Now, I tend to be a systems engineer, this is where most of my work comes into play. And so I'm always falling back to what I know in this particular space. There's a delightful book by John Gall called Systemantics, that I urge you to take a look at if you hadn't, in which he offers some amazing wisdom with regards to what systems are all about. "Everything is a system," he points out, "Everything is part of a larger system." In fact, this is so true of software-intensive systems we have today. We tend to build upon components that we ourselves have not created, but others have. And we ourselves are building things that others may in turn build upon.
So we're in this wonderful primordial soup of software, in which there are pieces we'll never, never know about that enhance what we're doing and vice versa. Another important lesson that Gall points out is that the systems we build, you can't make them work. They tend to grow through smaller systems that worked in the first place. Now today, there's tremendous philosophical battles of Agile and DevOps and Lean, you name it. But in the end, they seem to come back to the same kind of principles, the growing of systems through the release, the continuous release of executable architectures. Now, the AI world hasn't quite learned from the software engineering world. And I think we have some things to learn from them as well. We're beginning to see the communication between the two. But believe me, those two life cycles are a bit different and we're going to see in the coming years how those merge together.
So, being an architect, I thought it would be interesting to look at the architecture of some AI systems and see what we can infer about them. So we're going to look at Watson, we're going to look at AlphaGo, and we're going to look at one other, a system called Self. So Watson's Jeopardy was pretty cool. And if you think about what it did in its time frame, we beat all the humans that had played Jeopardy. But if you dive into its architecture, and to use Mary Shaw's term, it effectively is just a pipe and filter architecture that has AI components. So at the front of it, you get a particular statement for which you're trying to find the question. We break apart that question and we get a number of potential search results, then we make some hypotheses, which expands out into hundreds if not thousands of possible things. And then we start to build evidence against it. So this is in effect forward chaining and then the rest of it is a bit of backward chaining, in which we look for evidence that supports those hypotheses. So we start from one, grow to hundreds, to thousands, grow to potentially hundreds of thousands, and then reduce it through probabilities to the three top choices. And where is the AI in this? Happens to be in the components with inside it.
So as we look at the natural language, we actually throw a whole bunch of algorithms simultaneously at it, and they come back and say, "Hey, here's my response. Here's my response." And we basically vote against them. But it's the pipeline system architecture that pulls it all together, the AI components. By the way, the pipeline is open source, it's a thing called UIMA. So if you want to build your own Watson, go for it. Many of the pieces are already out there for you. The trick, of course, is to do this within just a few milliseconds. And that's when we threw lots of hardware at it, lots and lots of system z systems, cores to it. And you wouldn't do this at home normally, but we did that to accelerate the processing. So at its core, it's just a pipeline-based architecture.
AlphaGo is interesting. Now, AlphaGo is at its core is a convolutional, a neural network, great. But if you look at the things around the outside of it, you'll come to discover that AlphaGo is holonomic. Let me define what I mean by that. If I build a system, and I look at a, like a game, I'm looking at Mario Kart or something, and I look at an instantaneous frame, I can be trained to know what I do next, based upon what I see right there. And the previous history does not impact what I'm doing. That's pretty much what AlphaGo does, a little bit of a simplification. But it says it looks at the immediate board setup and reacts upon it, and it doesn't care what happened before that time frame. Many autonomous cars are like that as well, too. But as it turns out, there are things that you can't do if you don't have that history. This class of architectures is therefore largely reactionary. You give it some state, and it infers based upon that state.
If I'm driving a car, for example, I as a human, I might be driving along and I see some kids carrying balls. Immediately, my attention becomes alerted to the fact that, "Oh, there are some young children in the area with balls. They might jump into the street." Most autonomous vehicles don't look at that past history, but look at just what they see ahead of themselves. So there are limits to this kind of architecture.
We've been exploring hybrid architectures. This is difficult to see, it's a bit of a UML diagram. But it takes a look at a system we called Self, that kind of combines these together. And a lot of AI is old algorithms. The ideas of gradient descent has been around for a few decades. We went back into the literature and experience and said, "You know, agent-based systems, a la Minsky Society of Mind and blackboard systems, a la the Hearsay experiments from CMU from a few years ago, seem to work really well." So what we've been exploring is how can one use massive agent-based systems that use then blackboards to communicate opportunistically with components of themselves where they are themselves AI? We've been able to build social avatars, social robots, and social spaces using this particular approach.
The Ethical Issues and Their Implications
There was a time, talking about AGI, you'd be laughed at. This is not that time anymore. So we're in a very interesting space in the AI world, in which people are actively talking about, and without being laughed about, the notion of having AIs that are very much like us. This is both exciting as well as disturbing. It's exciting in the sense that we begin to see the opportunity for building systems that reason and look more like us and act more like us. That's great, and you as system developers have the opportunity to put in some of those kind of components. But it also means that the ethical issues and the implications thereof, are even greater than they were in the past. In fact, let's talk about those forces.
From a systems engineering perspective, all software development, all development of software-intensive systems, is ultimately a resolution of forces. And those forces vary upon your particular domain. I'm delighted to see all the talks dealing with performance and the like. But, you know, that's one kind of performance. I also deal with systems that are hard real-time, that if you literally miss a cycle, a plane is going to fall out of the air, very different than a web kind of systems. So your mileage may vary. But consider for your particular project, what are the things that weigh in upon you? What are the forces that shape what you're building?
Some of those are purely business: cost, schedule, and mission. If you're with Rockstar Games, then schedule becomes very important, right? You've heard me talk about them if you follow me on Twitter. Talk about a really fascinating development process. We won't go there. If you look at it from the development culture itself, the particular tools that I've used, the kinds of people that I have, all of these shape it as well. And then the various ilities, the performance, the reliability, the usability, all these things weigh upon us. But increasingly, for the kind of systems that you and I are working on, we also see some more social ilities that face upon us, issues of legal issues and ethical issues. Should I be, in fact, be building this kind of thing? As I often say, every line of code represents an ethical and moral decision. Now, you may say, "Well, gosh, it really doesn't." But in the end, it does, because you choose to be there to write that code. And you are participating in a system, building a system that may have implications far beyond what you just write.
Constraints and Limitations
We also know that in the kind of systems we build, there are things that prevent us from taking our vision and turning them into reality. Some of those limits deal with laws of physics. I can't pass information faster than the speed of light. There are pragmatic limits to how densely I can pack memory. These weigh upon us. Although for many web-centric systems, these are not especially necessarily important constraints. As they move up the chain here, we know that there are certain algorithms that make it possible or impossible for me to move forward. I may need a breakthrough, algorithmically, to do something. The Viterbi algorithm, for example, which we see within our cell phones, an important algorithm that made possible the kind of communication we see on cell phones in a fairly noise-free way, wouldn't have been possible to have cell phones as we have without that algorithmic breakthrough.
Another thing that we see happening, and this is especially in the AI space, is trying to strive for what the right architectures are. For many classes of software intensive systems, we kind of know what the basic architecture should be. And in fact, many of our frameworks codify those architectures for us, so that's a good thing. This is not quite so in the AI space. We begin to understand some of the architectures that work for certain classes of problems. But we're still probably a few years away from really grokking that. And then continuing on, we move into the social issues. How do I best organize my teams? The architecture of your software is one thing, the architecture of your team is another. And, lastly, we reach the human issues, the social and ethical issues.
The Common Life Cycle
Earlier, we saw the life cycle for AI systems. Well, in the end, building any kind of software-intensive system has this common kind of life cycle. You have a period of discovery, in which you're understanding the problem space, you're understanding the possibilities. You have a period of invention, and then you have a period of implementation. And these tend to overlap with one another. Whenever I'm parachuted into a system, I generally ask a couple of questions first, to get a sense for the health of the system. First question I'll ask is, "Tell me about your release process. Do you have a regular process of releases? And what is that rhythm?" And if they say, "Well, no, we don't. It's kind of erratic," we fix that first, because everything else is meaningless until you get to that point. Then the next question I'll ask is, "Do you have a sense of architecture? It may not be a physical architect, but at least you have an architectural vision that you drive for?" And if they say, "Well, no, not really, it's in these people's heads," then we go fix that. If you attend to those two things, a continuous process and a sense of architectural vision, then, believe me, you've solved 80% of your problems along the way and everything else, in many ways, becomes details.
Conclusions
So in the end, I always fall back to these. Even in the presence of AI components, build your systems with crisp abstractions, have a clear separation of concerns, have a balanced distribution of responsibilities, and try to strive toward simplicity. Ultimately, you want to grow your system through the incremental and iterative release of executable architectures. Everything else is kind of details.
There's work to be done. This is an exciting time because there is so much potential of what we can do with these kinds of AI breakthroughs. One of the things that we see, that's not quite solved yet, is how do I bring together symbolic, connectionist, and quantum models of computation. I haven't mentioned quantum much yet here, but it's certainly on the horizon for us. But how do I build software-intensive systems that weave together those pieces? There are some, especially in the DeepMind community, who say, "Gee, AI, neural networks are going to be the center of our systems." And I, with all due respect, beg to differ, that even if you look at the human brain, there's this wonderful paradox called Moravec's paradox (I didn't pronounce it right, but close enough), that says, if you look at the distribution of neurons in the human brain, far more of them are used for signal processing, the visual cortex, the hearing cortex, than there are the decision part of it.
And we're seeing that kind of split within the hybrid systems today. AI for on the edge, symbolic systems for decision processing and all the other gorp around it. So we're going to see an emergence over the coming years as to the resolution of how we bring those together. There's also the pendulum that swings back and forth on the architecture. Too much architecture, not enough architecture. Again, this depends upon where you fit within your organization's culture. The edge/cloud pendulum swings back and forth. Today, we're at the cloud. We see moving back to the edge itself. Allen Newell, I mentioned him earlier - in one of the AI winters, he observed that "Computer technology offers the possibility of incorporating intelligent behavior in all the nooks and crannies of the world. With it, we can build an enchanted land." And you're the ones who are going to build that. You are the ones who will help us create that next generation. So I'd like to say, software is the invisible language, and it whispers the stories of possibility to our hardware. You are the storytellers. Thanks very much.
See more presentations with transcripts