Yes, I’m head of the Software Engineering at Wix, the backend engineering group. At Wix we built a web publishing platform that currently manages 45 million websites. We got customers all over the world, we just make it very easy and usable for users that are not technical to build just beautiful websites.
Wix has been around for about 7 years. Since 2010 we started with Flash Website Builder and then we switched to HTML5 Website Builder because Steve Jobs did a very good job of killing Flash websites with the whole mobile devices.
Sadek: Which maybe was a fortunate move for Wix too.
It was, it was a very risky move but fortunately for us we made a very good product and we really succeeded to take on the market.
Our growth was actually a linear growth, it all depends of how much money we spend in customer acquisition, so when you are a small company you don’t have a lot of money so you start building your user base slowly and once you get investments and your user base just grows, so we start investing more and more money and we are getting more users, so it’s not an exponential growth but a linear growth which is keep increasing.
Sadek: Right, linear, and that gives you an opportunity to adapt your architecture from time to time and to maybe get it more scalable or accept more users.
Yes it’s actually made us much more comfortable in scaling our architecture, however since our users are building websites, we can’t really anticipate how successful the site will be and which companies will what presence on our platform. Like for instance Asus just built a support center on our platform which is a big website and we need to take care of scaling for not just like the Mom and Pop shops that get few, maybe tens or hundreds of hits per day, but now handling large websites, like Asus or EA [Electronic Arts] which is a gaming company and just build their gaming mini sites on our platform too.
4. [...] So it can be all kind of sizes, now is it, how did you deal with that fact?
Sadek's full question: So there are kind of different use cases and it’s very hard to do an architecture that’s adapted to different cases. For instance you have some websites that have lots of content but maybe not so many users, there are some that will have a few content but a lot of users and it might be even both together, right? So it can be all kind of sizes, now is it, how did you deal with that fact?
We treat everyone as equally, we basically eat our own dog food so Wix is built upon Wix and Wix as a company gets like millions of users, we bring millions of users to our website to build websites, so by supporting ourselves we support everyone the same, so regardless if it’s a big website with a lot of traffic or just a small website with very little traffic, our infrastructure just supports everything equally.
Sadek: So like is it, OK for instance if a customer has lots of content, is it any problem for you, is it like storage-wise, is it scalable storage-wise? And the pricing system, how does it work? Two different questions but kind of related.
Actually like I said, we treat everyone the same. Our pricing is the same regarding of how many pages and how many page views you get. We got 5 different packages and in terms of scaling, I mean we scale like everyone, we don’t just do something that is very unique for big websites, everyone sits on the same platform with the same infrastructure and we just serve everyone equally.
Well Wix it’s a freemium service so you can build your website for free but then when you want to connect your, you can connect your domain and then will allow you to have your own domain and that is one pricing. It’s basically less of bandwidth but more of a functionality for different packages, so you can have like e-commerce package which costs a little bit more but than no e-commerce.
Sadek: Right, so it’s feature based, basically.
And we got a VIP level of support.
So now what is the first thing that with this linear growth, you said : “Wow we should be dealing with this problem or otherwise we'll be in trouble”. What is the first thing in your architecture where you had to say, since you’ve joined of course, where you had to say: “We should take care of that before we get too many users that can break the system or something”. 6.21
A: Well it depends on how young or mature the company is or the product is, because when we started we had like a monolithic server which kind of a, because that is hard to grow and scale because you had dependencies between features and one server handled everything. So the first thing we do is actually we broke the system apart and built a service oriented architecture which allows us to scale by adding more servers. Now the second problem is the number of files, we do a complete hosting, so we host all the content of the users, currently we have around 800 terabytes of static files. So we need to build a lot of big storage and serve all those files and images in the content. And the third issue as we grow is actually data because not just the static content, you actually have a lot of databases behind the scene to support all the data and the structure on the side. So that is another big issue for us to actually scale the data and model the data in a way that we can scale this.
6. A question that comes to mind here, what kind of data stores do you use for scaling your data?
We tried a lot of them, we tried Mongo; we started with MySQL and then the NoSQL hype came and we tried MongoDB and now we are trying Cassandra, but to tell you the truth what works best is simply MySQL. Most of our database and we default to MySQL because it works great and it’s a great key-value store. The way we model our data is just use MySQL as a key-value store and query by primary key is very fast and very efficient in MySQL, so most of our data is actually MySQL.
The shard is basically per domain, we try since we are doing a service oriented architecture, so every service has its own database and handles its own data.
It’s per service, let’s say you got e-commerce, e-commerce has its own database which is completely decoupled from let’s say the website structure, or registration has its own data, so because we have different services not everyone uses everything, so it’s different scaling problems for different features in the system.
So for instance for e-commerce we built a multi tenant database for e-commerce stores, so in terms of number of stores, so we don’t have too many stores, like around let’s say you have a million stores built with the system. However in terms of product for the stores, each store has like tens of products, hundreds of products so when you have one million stores it’s not too much data, but when you have 100 products per store then you get scaling issues about, but that is on the product side of e-commerce. So we are trying to break the system in a way that we need to scale only one aspect of the domain and having to touch other aspects only when we actually need to. So we don’t shard: “Here is the store, here is the product and shard everything” but in terms of let’s say the e-commerce we can handle the scaling of just a product, so we might pick for instance….
No, but for instance we can pick the store to leave it in MySQL because let’s say we have a million stores, but then when we will have a hundred million products, we might chose to use Cassandra for instance.
So when Wix was first built they didn't really know solutions for the problem, so we had to build our own eventually consistent data storage, so right now we have about 32 nodes on each data center which files are being served off of them and that is being sharded and it’s also datacenter aware, so if someone uploads a file let’s say to one datacenter and someone request a file from this second datacenter, we know how to automatically fall back to one datacenter to another and retrieve the file and while the file being replicated across datacenter.
12. Nice, what S3 could do for you today?
Yes today, when we did that, S3 didn’t exist.
Sadek: With the difference that you didn’t open it up as a platform.
Yes, well we couldn’t really open it up because it used proprietary software so we couldn't do that.
Yes, one of the interesting things that we did is we divided our system based on SLA, so if you look at what people are doing in our platform, I mean you got the users that are building the website and editing the website and there we need to consider scale of that data, a lot of data, data validations and a lot of computational work in order to manage the structure of the site and the editing flow. But on the other hand you have the users who visits those sites and there you have a different kind of scale, I mean if you had a scale on the editor about the data and we save revisions so it’s a lot of data, on the viewing side the scaling is about performance, you got a lot of users, if you have like 45 million users who build websites, on the viewing site we got the users of those, the visitors of those 45 million so it’s like 10x or 100x of a scalability issue, but there we have to think more about performance, so the way that we are doing is we are actually copying data from one segment of the system, denormalize it into the viewing part in terms that the viewing part will be a highly denormalized data for a very high performance system.
Sadek: Right, because basically you kind of break the structure for visitors so that it's much more performant, if I understand right.
Yes, we copy data, we duplicate data and we denormalize it and one thing that we did in order to keep our operational cost sane is, our architecture is built on the fact that we offload a lot of the work to the browser, because all the rendering, because our websites are very rich and beautiful and you get all of those jumping things and moving parts, and since every laptop now is like a mini server, they are very powerful machines, we offload a lot of the rendering and the business logic of rendering a website to the browser, so we build a really big JavaScript framework in JavaScript code and that renders the site on the browser, which helps us scale in terms of servers, we don’t need a lot of servers, because servers really don’t do much, they just offload the work to the client.
Yes, well basically when you make a request to the server, the server just brings the browser like a Bootstrap HTML with just references to the data that consist of the site structure, of the pages, and we have JavaScript on the browser that goes and fetches the data from the static content which is on the CDN and cached, it’s a JSON file which represenst the site structure to all the pages, it parses the JSON and renders the site in real time on the browser.
There is no problem with SEO, for SEO we use something that is called “Ajax Crawling”, it’s something that Google does for Blogger for instance. We have special metadata on the page that says to the Google bot or any other bot that supports this technology, most search engines do. Basically what it tells Google bot: “This site is an Ajax site and you should go to a different service, you should make another call with a special parameter, and when Google does that we render the site on the server side”. However rendering those sites on the server is much easier because we don’t need to render everything that the client does, we don’t need to render the animations, it’s very simplified, we need the content and a simplified site, so rendering on the server side is much easier then the JavaScript code.
16. What is the size of the company, how many people work in the company?
Currently we have 600 people work at Wix.
Sadek: So 600 people, how do you make your architecture able to handle so many, I don’t know how many developers are among these 600, we need to have some kind of system that can be developed with so many developers, it’s a big challenge.
Yes, we have around 200 people in the R&D, 100-150 developers, that's a lot. The way we do that is we structure our company in a way that we give parts of the system to different groups and they have a complete autonomy of how this system works. For instance for the e-commerce team, they do everything that has to do with e-commerce, they define the roadmap and they have their own dedicated resources, we try to give all the resources to the different group that there will be as few dependencies as possible between groups. So they have their own product people, they have their own QA, they have their own servers and clients.
17. Completely different servers?
Yes, however they all use the same framework and they all use the same methodology and deployment processes.
Sadek: Right, so that is some kind of shared patterns and methodology among these different teams.
Yes, we invest a lot in culture, we do continuous delivery and developer centric culture and when we grow the company we try to assimilate new groups into the culture before we actually start giving them the independence and autonomy.
Most of our services are Java and Scala based at the backend, however we don’t constrain ourselves to just Java and Scala or just basically JVM, we chose the right technology to solve the right problem. For instance most of our applications are JVM based but the static grid, the grid that we built to serve static content because it has to be highly performant then we build that in C. Our e-commerce platform is actually built on PHP because we took some Open Source it’s built upon Magento, so it’s built with PHP. But we got other things like in Ruby and Erlang and it really depends on whatever the need is.
19. You use basically Java and Scala on the JVM, what kind of web frameworks you use?
Currently we use Spring MVC but it’s something that we are kind of rethinking and maybe we are starting to play around with Play Framework and start building new services with Play.
We like the fact that it makes your life a lot easier, you know when you do things there and you don’t need to configure a lot of things you just put it where it's supposed to be and everything works, it doesn’t really constrain you and it doesn’t have all that Spring magic that you have, it’s really easy to just put and start to play with it, you don’t need to do a lot of configuration.
We started playing with Scala around 2 years ago, actually more than 2 years ago, our first experience was not so good, the tooling was not there and it was really hard to make a really good commercial software with a team using Scala, so we dropped it for like a year and after a year we came back to it and we evaluated it again and the tools became better, more mature, and it’s easier to use, and we started with about a pilot program and one of my developers started it and he really liked it and we started to move more and more mini projects into using Scala. With new developers there is a really steep learning curve especially for people that know Java to get into functional programming especially Scala. We decided that we are going to go all the way and we just took a training course for all of our developers for several months and now everybody develops in Scala. It’s hard but once you take that step productivity really improves.
23. But you need to have the will to go beyond that step, right?
You need to have the will and you need to have the capacity to go because not everybody can do that. I have developers that couldn’t go and couldn’t do that step.
Sadek: And you don’t think maybe more training or more attention can get them to be able to be comfortable, or are they uncomfortable with it or are there some concepts that they couldn’t really grasp what is exactly the problem like for these developers.
It’s a different way of thinking and another problem with Scala is that are a lot of different ways to do the same thing and unless you really know what you are doing it’s very easy to do it in the wrong way.
24. Or not the optimal way at least. So do you program Scala yourself?
I myself have played with it but unfortunately I didn’t have the time to actually do an actual real big project in Scala.
Sadek: Are you planning to do that?
I hope so, I very much would like to..
Currently we have, what we are trying to do is build a lot of, we are doing a rapid growth at Wix where we are trying to add more and more new services, so we are trying to build an architecture that will allow rapid building of new services and to allow new teams to quickly come and understand the framework that we work with and start building new services. In terms of scaling and architecture there isn’t really one thing that we do because we do a lot of things, every service has its own challenges. I think we’ve kind of know already how to scale a service and how to build a service that is either performant or we care about a lot about the data validation and storage, so we kind of already cracked those patterns and it works well for us, just building new services and how to integrate them into the whole system.
Sadek: Nice, well thank you very much for taking the time to do this interview!