BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations DeepRacer and DeepLens, Machine Learning for Fun! (and Profit?)

DeepRacer and DeepLens, Machine Learning for Fun! (and Profit?)

Bookmarks
38:08

Summary

Jeremy Edberg shares his work with DeepRacer and DeepLens, talking about some of the basics of ML used in these projects and showing a DeepRacer in action.

Bio

Jeremy Edberg Jeremy is an angel investor and advisor for various incubators and startups, and co-founder of CloudNative. He was the founding Reliability Engineer for Netflix and before that he ran ops for Reddit as it's first engineering hire. Jeremy also tech-edited the highly acclaimed AWS for Dummies.

About the conference

QCon.ai is a practical AI and machine learning conference bringing together software teams working on all aspects of AI and machine learning.

Transcript

Edberg: My name is Jeremy, and I'm the founder of a company called MinOps, which has absolutely nothing to do with AI or anything like that, this is basically just a hobby of mine. I did cog psy in college, which involved a bunch of machine learning and AI, and I've used AI throughout some of my jobs, and I just liked to play with these toys. Today I'm going to talk to you about the DeepLens and the DeepRacer, mostly the DeepRacer.

We're going to be talking mostly about the AI concept of perception, all of these are based on visuals, they all have these cameras, everything they do is image recognition. We'll skip the philosophical debate about whether you need a human body to process images. These are all the services that Amazon offers for AI, we're going to be talking about all the learning tools. They call them learning tools, but they're really more like learning toys.

AWS DeepLens

If you go to the machine learning page here at the top, we're going to start with the DeepLens. Here are the tech specs on the DeepLens, the DeepLens is basically a small computer with a camera attached to it. It's a fully powered Ubuntu computer, being an engineer, I had to rip mine apart to see what was inside. I have it right here, so you can look at it later if you want. Basically, it's just this little micro-motherboard there. If you rip it apart, you can take the heat sink off, and you can see there, there's the four RAM chips and the USB ports and all of that. It's pretty simple, just a basic computer with the Intel atom processor. The nice thing about the atom processor is that it's powerful enough to process at about 10 frames per second, doing inference at about 10 frames per second. That's good enough for most basic use cases.

They have a simple webpage to set it up, you go in, and you tell it which Wi-Fi, give it some certificates, make sure you save those certificates because it's the only time they ever give it to you, or you can go right into Ubuntu if you get a keyboard and a micro mini tiny HDMI that I've never seen before and had to buy a special cable for it, then you can boot it right up into Linux. They give you a nice deployment system for it, it runs Greengrass and Lambda, they give you an interface in Amazon where you can pull out the projects.

The first thing I did was I tried to build a security camera with my DeepLens. If you were here last year, you might have heard this last year, but it'll be over soon. I started with trying to do face detection because I figured a security camera, face detection, that's a good thing. This is what it does running the face detection model, that's me at my house, and then you can see that it has not detected my face as I am burgling my house. In fact, you have to get really, really close to it for it to even find a face. I had to walk basically right up to the window before it finally found a face.

I thought, ok, let's try something else, I switched it to object detection instead. Luckily, they make this super easy in interface, you just say, "Deploy this other project." This is what the deployment looks like, it tells you, "Here's where I'm going to do it. Here's what your Lambda function is." It's running Greengrass, which is Lambda on the device. It's the exact same Lambda functions that you run in Amazon. You deploy them out, and this is what it looks like when you're deploying, it takes a while to deploy, I don't know why, it's uploading it and then doing something before it actually starts running. It takes a little while, it tells you what the project is, it'll tell you that there's an update available, no matter whether you're up to date or not and then it gives you the status of the machine and some details about it.

If you're familiar with Amazon, you can use the ARNs to do some logging and find the object in whatever system you're using to manage your Amazon, then it outputs the inference stream to a log. Here it is with the object detection running, and you can see that it does a much better job. It says, "Oh, there's a person," and so it's doing a pretty good job of finding me. I did walk right up to it, but it can find me.

During the day I set it up in the front window, and it's running inference all the time. It's constantly finding the car, it identifies the house across the street is a train for some reason. I had to go in the code and tell it "Ignore the car that's in this space and ignore the train that's in this space because it's always there," because that's my car parked in the driveway. It'll find cars across the street occasionally, here comes the babysitter with my daughter, and you'll see it's identifying them and as it's going. This is running an image inference, it's not actually running a 10 fps, it seems to run in closer to, like, 4 fps or 5 fps, but again, more than sufficient for a security camera. You'll see here, it's a little hard to read the labels, but it says, "train" for the house. It says, "person" for my daughter and it says "horse" for the babysitter. It's still learning, there's the person, and there's the horse, and the train, I told my babysitter, she's cool with it.

This is the log that comes out of it, you see the log, and it's saying, "person”, that first number is the confidence between zero and one, i has a fairly high confidence that it's identified a person. On that last screen, it actually shows the percentages next to the labels. This is the actual inference code that's been deployed, this is the Lambda, the first part is getting the frame, we have to resize it because I'm using Amazon's model, which is built on a specific size picture. You have to resize the picture that comes off the camera, which is much bigger than what they built the model on. You can build your own models on much bigger pictures.

This is the part where it basically just labels the image in real time and publishes it out to the stream. I added this little part where it uploads everything that it captures to S3, so it uploads it to S3 with a label that says, "This is what I found, and the confidence, and here's the actual image." It cuts out that little piece that's in the bounding box and uploads it to S3. This is the code that actually writes it to S3, all it's doing is putting it into a bucket with a timestamp and a name. When you have this image, it's essentially that every one of those bounding boxes is being uploaded.

This is the power of combining the cloud and the physical hardware, after it does the inference locally on the machine, it uploads to the cloud. If you wanted to, you could actually do a detached, you can put a little SD card into the box, and you can store the images locally and then have it upload when you get back to the network. One of the things I've thought about doing is actually putting it in my car and having it identify types of cars just to see if it could do it.

We use recognition because recognition is much more powerful. Recognition is actually the service that Amazon is getting a lot of flak for right now about, "Should we be selling this to the government?" Honestly, I'm not too worried because it's just not that great, but it will get better, it can do stuff like this, it can identify outdoors. It can do a lot of stuff with faces with the eyes opened, identify gender, or things like that. It can identify famous people, if a famous person walks up to my door, it can tell me about that, but the thing that I'm using most is similarity. I can train it on the faces of the people that I know myself, my wife, the postman, the UPS guy, and so it can tell me when it sees those things. It's super easy to set up, it's all based on Lambda, you just set up a trigger that says, "Whenever an object lands in the S3 bucket, send it to recognition for processing, and then recognition pushes out its results to a log, which I can then pick up on my phone, tell it if it got it right or wrong.

DeepLens For Toll Taking

This is the code that just pushes it out, it detects the labels, and it indexes the faces, and then it says, "Here's the faces that I found in this image." I can tie that into an iPhone app that says, "Here's the face, I think it was the postman. Is that correct? Yes or no." Real simple and a little bit better than Nest Cam, which just says “Somebody showed up”, and a fun little hobby project.

This is a video I found of something else you could do, like a more practical thing, with the DeepLens. This is an example of using it to do different tolls for different sized vehicles. Here they've placed a taxi, a small car, and the light has gone green to say this is a taxi. Then he replaces it with the bus and then it lights up blue instead. It says, "This is a different type of vehicle, and we're going to do the toll," but you saw how long it took, so this is obviously not going to work in a real-time situation, but it'll work if people are slowing down or having to stop or something like that. People have been using the DeepLenses in their neighborhood for identifying cars that aren't stopping at stop signs, things that your neighbor is really going to love you for.

DeepRacer

Now let's talk about the DeepRacer, I have one here, and I will hopefully make it work later, it's very finicky, so we'll see, but I'll show you some cool videos when I did get it working. The DeepRacer, it was supposed to release on Monday, it's been pushed back to July for I don't know why, essentially, Amazon is doing this thing. The idea is to learn machine learning, they created this racer, and they created an entire racing league so you can race the racing league and qualify and win a free trip to Las Vegas at the end of the year to participate in the finals. Some people managed to get one at re:Invent last year, but most people are still waiting.

Under the hood, what's interesting is it's basically the same as the DeepLens. It has this same CPU board except with extra USB port in the compute module there. It's basically the same thing, the same atom processor, running Ubuntu. This one is running ROS on top of Ubuntu so a little bit better in that respect. What's interesting is the deployment methods are completely different, and I'll get into that in a little bit.

It's basically the same platform as the DeepLens, they took the DeepLens put it on wheels. This is the same camera, in fact, I had to change the camera when this one went flying off. It's actually pretty robust, it's got a good suspension, you can see it, it bounces pretty nicely. It has two separate batteries, one for the compute and one for the car itself, and the car is pretty powerful. If you put it up to full throttle, it can go15 miles an hour.

Of course, the very first thing I did when I got the car was put into manual mode, put it in the hallway and see what I could do with it. They give you a little interface, and you'll notice that I'm not very good at making it go in a straight line. It was going super-fast even at 70% throttle. I had to throttle it down to 50%, and you'll notice that the controls are backwards. It turns out that was because it was miscalibrated and it was set for negative velocity for forward and positive for backwards. I just had to switch that around and then the controls worked, but I guess if you're a pilot or something, you can reverse it and have it go in reverse.

I got a little bit better at driving, but not the best, I decided to take it out to the playroom because my son really loves cars. I thought, "Oh, he's going to be totally excited." He saw the car earlier, he was super excited about it, you can see here the delay in the frames coming in, you can see how it's behind. There's my son, he's super excited, he sees the car, he wants to touch the car until it moves, and then he runs away, so I started chasing him. Unfortunately, he's much more agile than I am with the car, so I'm going to have to teach it to autonomously chase him and see how that goes.

The real purpose of this car is to use it autonomously, they're going to have an interface to do it autonomously, it's not available yet, but you can still train it using SageMaker and RoboMaker. These are two Amazon services, SageMaker is a machine learning platform and RoboMaker is specifically about robots and simulation, it runs Gazebo, which you might've heard about a couple of sessions ago. It's built on those two things.

Deep Learning

A quick diversion into deep learning, the purpose of this is to learn about deep learning. Deep learning is actually a fairly old concept, it's just one small part of machine learning, and it's good for lots of things like image search, facial recognition, MLP, NLP, today we're using it for image search.

UPS used deep learning back in the '90s to train on identifying handwritten envelopes. Today they have like a 99.99% success rate on identifying envelopes that come through the scanners, their neural net is that good. Google created a network, they fed in 10 million YouTube videos and gave it 20,000 label objects to try to see what it could identify, the first thing it found was cat.

What's interesting is if you look at the Nvidia car computer, the neural net connectivity, the number of connections in its neural net is about 27 million. That cat finder had about 30 million because they had 1600 cores running and your human brain has 100 trillion. We're still quite a ways away from the human brain, but we can still make lots of pretty good inferences.

This device, in particular, is designed to teach you about reinforcement learning, all the defaults on it are reinforcement learning tasks with gradient descent. If you imagine your problem space as a gradient, like this one, where your solution is the blue spots, then what you're doing with reinforcement learning is you're attempting to walk the gradient and find the solution. There's minima, there's places that you can get stuck, and so all of this will be relevant in a moment. Here's an example of a neural network and you can see it converges on a solution pretty quickly with just a few nodes. You don't need a lot of nodes to converge on a solution, this one converges really quickly with a bunch of nodes. Even if you constrain the number of nodes, it can get on a solution pretty quickly, but you can also get into a situation where you're oscillating, you're stuck in a minima. You can see here this will never come to a solution because they got stuck in a spot where it can't get out of so it's going to constantly oscillate.

DeepRacer and Deep Learning

How do we train our DeepRacer? Luckily, they provide us with a Jupyter Notebook with all the Python, and there's a ton of Python in here, most of it, you don't even need to know what it does. The nice thing is once their interface is available, it's going to take care of all of this stuff for you. It starts with a bunch of set up, S3 buckets, variables, you got to create the permissions, the IAM role does it for you. You have to configure all the networking in Amazon to do the training, they take care of all of that for you, which is nice. You even have to manually edit one of the roles to allow it to talk to the Robo Service and SageMaker. They would do walk you through it, but it is somewhat complicated, then you have to do the setup of the training environment for RoboMaker, you have to do a setup of the connection between them.

All of this code is basically just setup code, this is all stuff that just sets up the environment, sets up the simulations, gets the simulations ready to run. The actual reward function is in a different piece of code, and the hyperparameters are in a different piece of code, which I'll show you in just a minute. Finally, we get to the point where we can actually launch the simulation on RoboMaker. These are the hyperparameters, and then this is the default set of hyperparameters. The three that are in the yellow boxes are the three that I messed with when training my DeepRacer.

The first thing I did was I built a model using all their defaults. I just took the defaults that they gave with the default settings and a trained it for a couple hours. It was about $10 worth of compute time, I let it run for about 4 hours, and it got through about 450 epics. Remember, we're on this gradient descent, what I did with the second model is I modified these parameters, and the learning rate basically says, "When we descend, how much are we going to weight that? How good is it? Are we going to consider that? Are we going to keep descending or are we going to go back up?" You can see that I changed it, I made that a smaller number, it adds randomness, so it makes us go back up more often. The randomness is actually a good thing when you're doing reinforcement learning because you want to try to find the best solution and you don't want to get stuck in a local minima.

I also increased the batch size, the batch size basically says how many images are we going to use in each run of the training. The more images you use, the better the training is going to be, the longer it takes. Then the last one is this discount, the discount is how far in the future are we going to look? I reduced it by an order of magnitude, it's not a direct equivalent, but basically, if you think of one of them as being a 1,000, the next was 100. We're looking less in the future, which can actually be a good thing with the reinforcement learning because it gives us more randomness, it makes us look at more possible solutions.

This is what the official track looks like, if you go to race at one of their events, this is what the track looks like. The important thing to note about this track is it has thick lines on the sides and a high contrast coloration between the track and the outside of the track with a white line dividing them and then dashed yellow line down the middle. By default, the one we're training our DeepRacer, what we're training it for is a reward function based on the dashed yellow lines, thick white lines, and the contrasting road versus outside.

This is the default reward function, it says how close are we to the center dashed line? The further away we are, the lower the reward, that's it. There's a couple of other reward functions you can try, this one says, "Stay closer to the center." It puts a lot of weight on being in the center, The reward is a pyramid, being in the center is highly rewarded, and being to the outside is not.

This the same thing, but it adds another feature to it, which is how hard are we essentially turning the steering wheel? We highly rate the center, and the less we turn the steering wheel, the better, so we get a higher reward for turning less. Then this one is the same thing again, except what we've added is this has a gyroscope in it, so it knows how hard the car is pitching and yawning in this particular case. We penalize it for yawing, for tilting basically on its axis, the less tilt we have, the more reward we get. If you change your reward function, then during your training it's going to try to maximize the reward. The more things you take into account, the better the reward will be in theory.

This is what it looks like in the Gazebo when it does a training run, you can see the cute little car there. Then this is the beginning of the simulation, this is right at the beginning when it hasn't really learned anything. You can see it's driving along, and it's sort of staying towards the middle of the track, but as soon as it gets to the first curve, it has no idea what to do and it runs itself right off the track.

This is the same thing from the point of view of the camera, you can see it drives a little drunk, kind of moves back and forth, it's going to go down the line, and it's going to try to make the turn, and it's going to fail as soon as it gets to that first turn. It's going to run itself right off the road and say, "Ok, I failed."

This video is from after doing many epics of training, you can see it's driving much straighter now. It's still wobbling a little bit, but it's mostly staying down the center. Then when it gets to the curve, this is the first hairpin turn on the track. It's a full U-turn, it figures out it needs to swing out to the side and then come back around. You can see it, it's figured out how to negotiate the U-turn through the different reward function. Amusingly though, it then gets to this much gentler, easier turn and has no idea what to do and goes right off the road. It still has some more iterations and learning to do.

This is what the interface will look like when they eventually release it, you can see that they've simplified it a lot. To get those videos, I had to launch RoboMaker, I had to go to the console, I had to launch Gazebo, connect to it, wait for it to load up, load up the camera view to put it in there. I removed all of that from the presentation, but it was a bunch of steps to get that view. Hopefully, their interface will be much easier, it'll just put the view right there for you, and you can watch train as it goes, you can see it getting better and better over time. It will put the training reward score right next to it so you can just tell it to stop training when it's succeeded.

In my particular case, I had to extract the training rewards from CloudWatch from the logging system, I pulled them down and had to graph them. Luckily, Python can do that pretty easily, and it's in the Jupyter Notebook. You just rerun that one function over and over, and you get the graphs. This is the default model with no changes, and you can see the reward function, it's slowly learning over time, slowly making its way up towards 400, which is the minimum in this particular case for what we're looking for. After 400 episodes or almost 400 episodes, we managed to get almost to 400 on a semi-consistent basis.

Remember, I basically added randomness to my model, this is the training rewards from when I added the randomness. You'll notice that it's obviously much more random, a lot more fluctuations between each training run, but you'll also notice that it's hitting training rewards of 1400. I didn't change the reward function, I only changed the hyperparameters. Just by changing the hyperparameters, it's achieving much higher rewards. It's jumping back and forth more, so it's having a harder time to converging on a good solution. If you look at sort of the average line, you'll still see that it is slowly making its way up and it has exceeded 400 in just 200 iterations of the training. The more randomness, bigger batch size, this took actually about the same amount of time, the same 4-ish hours, and it only did 200 episodes because I doubled the batch size, but it did seem to get a much better reward.

Now that we've built the model, we have to download it from the cloud and get it onto the car. The recommended method to get it on the car is with a USB stick, you put your USB into your computer, you put the file into a directory, you put it into the DeepRacer, you turn on the DeepRacer, and it loads the model. Unfortunately, the only stick I had was some pictures from a vacation that I got so that's what I had to use. This is what it looks like running, you'll notice that the camera is not updating because it only updates 10% of the time, I haven't figured out why. You'll see here, right now what it's doing is it's loading the model, I've selected the first model, I'm waiting for it to load. It's successfully loaded. I've scrolled down, I'm going to hit the start button and then you'll see the car start moving.

In the bottom, what you see there are the output probabilities. What that says is, "Turn left, half turn left, forward, fast, slow, turn right." You can see what it's doing is it's turning left, turning right and if you look the wheels, you'll see them juggling back and forth real fast cause it can't decide if it wants to turn left or turn right and it's exploring. I didn't give it a track with high contrast or anything like that because I didn't have the resources to do it basically nor the space in my house here in California. This is the cul de sac out by my house, and I figured the black line down the middle of might suffice as a line for it to follow, it did not. You can see it just sort of wandered around in circles.

This is model two. I've already loaded the model in this case, as you can see, model two is actually doing a better job. It does turn hard, but one of the cool things about this model is it sees that curb and turns away from it. This is driving fully autonomously, it's exploring around, it's just trying to find those high contrast lines basically. You'll see that it'll drive along and then it'll get stuck on that a driveway in a second because I have it set to 50% throttle, and that's it. It gets stuck, and then I stop it.

Here's some other examples of model one driving, this is the one with all the defaults, it's just cruising around. It does actually do a pretty good job of missing the cone because it was mostly going to miss anyway, but as you can see, it has no concept of the curb and just smashes right into it. I sent my assistant out to retrieve the vehicle for me, and then she did so and put it back in the middle of the street. Then you'll see that it starts to wander again, it's basically seeking out that contrast, but you'll notice that it starts to essentially go in a circle. That's what model one seems to do when it gets confused, it seems to just turn left. It'll just get stuck turning left, in fact, this is how it ends up, it ends up spinning donuts after a while.

This is one of those few times that I actually got the camera working, you can see through the camera. This is model two, and you can see it's doing a pretty good job of avoiding the obstacles, it avoided those cones, and it's driving along and here comes the curb, but in this particular case, it didn't know what to do with the curb, and so it smashed right into it, but the wheels are still spinning, and you can see at the bottom, you saw that it did a full left real hard there. That was it trying to get out, and of course, it flipped itself over.

Just Toys?

Are these things just toys, or are they actual serious machines? Can we actually use them for anything useful? Well, the DeepLens actually has a lot of use cases, there's the security camera one that I showed and the toll collection. Some businesses are actually using them to recognize customers when a customer comes in. There's a whole company who's built their entire company is built on this premise of we put the cameras at the register and so as soon as a known customer comes in, it pops up, and it says, "Here's their name," which it got from their last credit card transaction. It says, "This is what they usually order," so they start preparing the sandwich before that customer ever gets to the front so that when they get to the front of the line and they order their usual order, they can have a great experience because the food comes right out. There's whole companies that are basically helping small restaurants, retailers build these kinds of things. If you've ever seen "Minority Report," it's very similar to that, the advertising part, not the pre-murder stuff.

Then law enforcement, this is what Amazon is having some controversy about now. Using cameras and image recognition to find criminals, a lot of ethical dilemmas around that, but something that can be done. Pretty much anything where visual processing helps, there's people who are you putting them in their back of house of a restaurant and it checks every plate to make sure that all the garnishes are there and things like that and flags it if not. Anything where you can pass something in front of a camera and have it make a determination.

For the DeepRacer, the only use case I found is winning a free trip to Las Vegas. Some of the problems with the DeepRacer are that, first of all, it has two batteries, so it has a battery pack for the compute and the other one inside for the motor. The only way to charge the motor one is to take the car apart and take the battery out and charge it, and it drains while it's idle. I've had an idle here the whole time, I'm hoping that it will work when I go to try it. The other day, I left it plugged in, I didn't realize you had to unplug it and it was dead a few hours later, you have to open it to unplug it. Also, loading the model on it, you got to put a USB stick in there, you can hack it since it's got Ubuntu to SSH models but it's not the friendliest for a commercial business environment. If they could solve those problems, it could be really cool for bringing deliveries to people or you could use probably a bigger one to run a warehouse, which Amazon is doing. This particular thing is more of a toy, it's good for racing and it's good for freaking out your small children.

Of course, no presentation about cars would be complete without crash videos, here's the one video where it was going so fast and just flipped itself over. Also, unfortunately, I didn't catch a video of it, but it's so powerful that if you put it full throttle forward and then pull it full throttle back and will actually do a full backflip because it's so strong.

This one is my favorite crash video where it got a little too close to that cone, and it flipped over, you notice the camera fell off, the camera went flying off and then basically it didn't know what to do, and so it just defaults to spinning in circles. It was making a slightly bigger spiral, and it actually ran into my wife, the video ends actually about one second after that where she screams, "Ouch," and jumps up in the air, but she was the one taking the video.

Questions and Answers

Participant 1: I notice you built the models on RoboMaker, or was it SageMaker?

Edberg: SageMaker builds the model, and then it's just tested on RoboMaker. It pushes it out to RoboMaker for simulation, it's a loop between the two. It runs a simulation, and then that data is fed back to SageMaker for the next iteration.

Participant 2: When you buy the DeepRacer, do they give you complimentary time?

Edberg: My DeepRacer, I got at re:Invent. I don't know, I hope they give some sort of credits, that would be nice because it can get pricey. Training this thing, the 2 models cost me about $25, it was about $10 for the first model and $10 for the second model and about $5 in other random compute related to it. I know with DeepLens, at every tutorial, they would give you $100 of SageMaker credit, I suspect it'll come with credits, but I don't know because they're not out yet. Unfortunately, they were supposed to come out Monday and then they were delayed until July.

Participant 3: I'm curious if there's anything special about the camera used for the DeepLens itself. It seems like with the compute model you could attach any camera to it, and it would use the same machine learning.

Edberg: Yes, you can, in fact, attach any camera you want. The one that comes with is a pretty nice four-megapixel camera. One of the things I want to do is put two cameras on it and see if I can get it to do anything special with stereo vision, it could take any camera. This one has three USB ports, the DeepLens has two, you can put whatever you want in the USB ports.

 

See more presentations with transcripts

 

Recorded at:

Jul 10, 2019

BT