Sure. So, my name is Gabriel, I am based in San Francisco bay area, originally from Toronto, for the past year and a half or so I have been working at dotCloud doing a variety of development everything from our building system through to front-end stuff and as you said I have been doing a fair amount of JavaScript both on the front end and on the back end, as well as Python, is what I worked on most up until now.
2. Would you like to tell us a little bit about what stack.io is and what problem is tries to solve?
Sure. So, stack.io is basically, it came from the realization that the frameworks that existed in order to do sort of request- response based REST type communication just weren’t good enough for types of applications we wanted to be building. So, the main architects behind it were Yusuf Simonson and Efix Bourle, he wrote a thesis on something called gender of communication, that is were a lot of the ideas behind stack.io came from. So, that was very academic research, it was done for his master’s thesis, he came out and combined that with the knowledge we had at dotCloud, we built a large distributed system and we needed a system to tie that together. So, stack.io came out of this marriage of the academic along with the practical concerns that we had, so basically like I said we wanted to tie together a series of back end services and let these communicate with the front end, realizing that REST as a model is great for doing page type viewing of documents for documents that don’t change very frequently, it’s a really great model and that it was it was originally designed for. But as we moved further and further towards having web applications full-fledged rich applications that paradigm starts to fall apart.
Sure. So, like I said in part this stemmed from uses researches of gender of communication but it’s mixed with our practical experiences. At dotCloud we have a large distributed system, we do application deployment, we’re wrangling whole set of physically distributed systems, so obviously the first requirement is that it worked well in a service based model. We tried to minimize the size and complexity of each of those components because it’s the only way we found and we are obviously not the only ones and Amazon has talked a lot about building things in a service oriented way just because when you’re starting getting into this sort of scale you need to keep the of complexity of each individual component down or else the overall complexity in the system just explodes. So, basically stack.io is a collection of services that optionally can also be clients and call each other, there’s no broker, there is point to point direct communication but there is a central configuration node so that allows to do service discovery.
4. That’s something like a registry?
Essentially it does registration of name spaces and then monitors for updates. So any time that one of the individual components goes down, it will remove it from register, essentially the register is cached locally on each node, but then if any node falls out the system there’s a constant heart beat back to the system so for the sake of removing it from the registry and allowing updates. That central node also provides a bridge from the front end, everything we’ve talked about so far is all back end oriented. And so we are using these technologies quite heavily to do all out backing communications, also we wanted to be able to access a lot of stuff, obviously not all of them, from the front end but when you start to open your internal firewall to the outside world obviously you need something providing a layer of access control and also you need a certain different sort of transport, you can’t just be using a raw TCP socket.
On the back end everything in terms of communication it’s all based on the zero queue which is a fairly thin overtop of TCP socket to provide a bit better abstraction, but like I said obviously that can’t go to front end, zero queue can’t actually even be exposed on the public internet because it’s not designed to be, it’s not inherently secured. And so we have this central node also access a web socket bridge, the central node is written in Node.js and so we are using Socket.io in order to provide fall backs for clients, intermediaries that don’t support web sockets which are unfortunately are still many. So I guess one of the other things might be, the central node is written in Node.js but the system as a whole is designed to be language agnostic. At the moment we have implementations for Python and Node.js, so Python and JavaScript as well as the client side JavaScript client. But there are certainly and there have been discussions from the outside community of building clients for other languages as well.
It’s interesting. Certainly JavaScript is much newer on the server. Obviously in some ways there are fairly similar languages, they are fairly dynamic, scripting languages they are both interpretive. The big difference between the two is that today at least Python has been primarily designed with the interface in mind, is designed as a beautiful language itself, the actual implementation itself it’s not bad, it’s pretty good, it’s not amazing, it’s not blazingly fast and it’s designed in a fairly easy to learn easy to read but fairly traditional model of having everything asynchronous. JavaScript and specifically Node.js really the reason those have gained traction recently it’s because although the language, JavaScript as a language anyone would claim it’s super beautiful, it has a lot of rough edges, but the implementation, Node.js itself, V8 the venting system underneath it, those are incredible pieces of technology. They have completely turned around really turned the page on what we can expect in terms of performance from interpretive languages, really other than maybe the JVMs that we’ve seen, obviously a ton of work put on getting there, really other than that there hasn’t been another language interpreter that’s had nearly the man hours and intelligent Ph.Ds. put into as V8 has. And since V8 we’ve seen that happen in other browsers and in the other engines.
I guess the other reasons that the Java has seen a lot of adoption on the server mention briefly the venting system, so what really Node.js has brought in a lot of ways is that it didn’t have anything, it was really thin, it was just the language built in library very low level things it’s essentially very raw sockets that it gives you out of the box and what that meant though is that there’s not a legacy baggage either. So, it’s been very well loved by the hacker community the makers and thinkers because of this combination of being both very performant and being light weight in terms of being able to write in tiny services up and running really quickly. The other thing that I think really has driven adoption of Node.js is Socket.IO itself. Socket.IO it wasn’t the first to do a lot of this certified real time communication had comet frameworks before it but for whatever reason, maybe some combination of good marketing and perhaps better performance, maybe feeling a little less hacky perhaps the beginnings of real availability of web sockets, it sort of won as the system the people use to build real time systems. What is interesting though is that I think that Python has learnt a lot from these things. In a lot of ways I think Python, probably in terms of overall adoption Python is still ahead, sort of the silent majority. But in a lot of these newer ways, it’s sort of catching up.
We see in the Python world that there has been a huge adoption of evented systems, we’ve seen the explosion of people using gevent, it’s a really nice way in my opinion of doing it, using code routines as opposed to the call back based system that Node.js is using, which means I think in some ways it means it brings the barrier to entry for a lot of these higher performant systems down. And so, with stack.io we are using the Gevent on the Python side and so it means that your code looks a lot more like what traditional synchronous code would look like but in the background those calls are asynchronous.
6. [...] Do you practically have problems like that?
Dio's full question: As we talked about Node.js you mainly focused on developing with Node.js. how has your experience have been operating and monitoring, because you have a large installation and sometimes things work really nicely when you are testing them then you are deploying and you cannot monitor them and if you have codes that run in infinite loops or code the spins your CPU and you don’t know why your CPU is that high. Do you practically have problems like that?
It can definitely be challenging to run, the infrastructure is still very much evolving, things like these things that you mentioned you still have, with Python is still very rare, I don’t know if you have ever found a memory leak in the Python implementation itself, that still happens with Node, you still have memory leaks sometimes in the core, in Node itself, sometimes in core libraries, sometimes it’s in your own code but it’s when it brings out this question of you never assume that C itself was leaking memory, obviously it’s going to be you. With Python it’s usually you, there’s something wrong with the garbage collector.
With Node, if you find you have a memory leak, this happened to us recently, it’s why I’m bringing up, we found that we have a memory leak in our we also have a system to our proxy system is built on top of Node which is because we wanted to give people access to web sockets when they were deploying web applications on to dotCloud. It’s basically handling all the traffic coming to our platform, rather important piece of code. And we realized the memory slowly trending up over a few days and we’d get killed off and it would be restarted but it would certainly cause some problems especially with these long held connections. I think just the immaturity of that is certainly problem but we probably see a year or a year and half ago, this was a real big problem but I think it was really blocking people from adopting. I think over this year we’ve seen Node starting to settle a bit and mature in terms of the core library the speed at which is moving which gives people the confidence that they aren’t going to be new the core is introduced, but in terms of the monitoring there is certainly interesting work being done, I don’t know how many people are actually making use of it. For instance, DTrace, if we were all using DTrace, if we could all use DTrace, it’s not very well used on Linux right now, that would give us more insight into our Node processes are doing we have virtually into any other language but again is very under development, some very promising, maybe not 100% there yet.
Basically what that means is that we end up in some ways I think it’s maybe not a good thing but have good side effects because it forces us to build, if you’re using Node it forces you to architect your systems in a certain way, building these very small, very simple, very well decoupled and redundant systems, because you assume they are sort of like mosquitoes, they are made to score quick and then they potentially die, so I think in a lot of ways as using it for overall system architecture it’s great, where it gets problematic with these quick to die services not traditionally but originally most of the early examples of Node, probably most of the examples we find today when you first start up store a ton of state in process, in memory, which is really cool that you can do that certainly gets you going quickly and shows you why vented communication is cool, but also certainly creates problems when you have these processes die off, oh, well I had all these users connected to my game that was only running into single process. So, I think that’s why the Node community Redis community but certainly Node people seem to use Redis a lot as a way to persist that state and I think that is something we are going to see going forward more. We’ve seen it with. Daniel Shaw has a, not sure it’s ready to go, but with an implementation of Socket.IO that uses Redis to store its state instead of having that state stored in process. Because so many people use Socket.IO that traditionally stored all its state in memory and so I think that it’s going to be changing going forward.
The other issue with running Node in production it sort of forces this architecture of having small decoupled systems, a small decoupled components that build up a larger system. It can be really annoying at first for doing deployment and monitoring because you have so many more tiny little processes, all these things that would have been built into a library before Python or Java or whatever, you just build them as a separate process. And again it sort of forces you to have good practices in terms of being rigorous about having automated process monitoring and having monitored deployment systems, and so this speaks to what we are doing at dotCloud, we automate deployment and we build all of that on it a sort of service oriented way from the ground up and do that the communication coordination of these different services, that like I said becomes more important to have all that automated once you have so many lying around.
Well, probably the first pitfall they need to be aware of is web sockets very frequently just don’t work. So, I was a Realtime conf recently, there was a lot of talk about this and there was basically an entire talk sloth dedicated to making their argument the web sockets are not ready for the real world. And I would say I generally agree with that, web sockets on their own are probably not ready to be deployed in general applications. If you are in a well-controlled environment, know the devices the users are going to be using, if you are on your corporate intranet, maybe, maybe you can get away with it there. But in the general case, yes, I don’t think web sockets on their own are enough. That being said, there are fortunately, great abstraction layers that are built on top. We have Socket.IO, now sort of split into Socket.IO and engine.IO, and there is also Socket.JS, they use web sockets where they can but they fall back to others where they can’t. So, I think it’s going to be quite a while before we see raw web sockets used.
The other thing where you run into problems is it’s not only the client devices it’s that there is a whole bunch of infrastructure between our servers and clients, a lot of which doesn’t really understand web sockets or even real time communication, like long held connections are a problem. So, for instance I think I mentioned earlier that we originally used engine as our front end proxy, great tool, nginex, very widely used, very high performant, also doesn’t happen to support web sockets, it only supports http1.0 back end which creates problems. so, what that meant we ended up having to put a lot of engineering effort into building our own replacement for this proxy system called Apache, it’s all open source on GitHub, encourage people to use it if they are looking to have a web socket compliant proxy, but it’s not the only piece, there are other layers as you get certainly with incorporate networks there are often front end filters on incoming and outgoing traffic, or other ISPs proxies or whatever can create issues, I’ve certainly seen problems with there was a lot of talk as well at realtime conf, all the problems you see with mobile devices specifically, Android is the big problem it just doesn’t support web sockets at all on the Android browser fortunately they seem to be moving towards using Chrome I know that now Android is the most widely deployed and I know that most of those are using Android2.0 and I know that the Android2.3 browser just sort of shrugs when you throw web sockets at it. so, that’s certainly going to be a problem is a problem today and is going to be a problem for a while.
Dio: Thank you very much Gabriel.
Yes.