Introduction
Too often in our work as architects and designers we focus on the task at hand, seldom reflecting on the past. We should really know better, how else do you improve? This article summarizes six learnings from 55 months as an architecture team lead at Skype. Some of them will be technical while some will focus on softer aspects of an architects work. But first, some context.
Skype context
Skype is a piece of software that allows people to place both audio and video calls to each other, call ordinary phones and send SMS messages. The company was founded in 2003 and has seen an incredible growth curve since then. We currently have more than 520 million registered users and about 650 employees. Those users generate an average of 210 000 parallel calls about one third of which contain video. These numbers amount to roughly 8% of international calling minutes globally.
Needless to say, this amount of traffic presents unique scalability challenges. For Skype, the main weapon of choice for dealing with those challenges has always been our peer to peer technology. The p2p network (the core of which is implemented in C) is supported by a range of server-based services that are mainly C++ and use Postgre databases with a healthy dose of python thrown in. Our web services are built using PHP.
The technical side of things
Rules of thumb do not apply
As you work your way through your career as a software engineer, patterns emerge. Certain rules of thumb make themselves apparent and a tendency develops to apply these rules whenever you go. After all, they have worked well in the past. Right?
Turns out the fact that you have a great hammer does not turn all objects around you into nails. In the rapidly changing modern world of technology the rules of thumb do not always apply. Let’s look at how Skype’s databases are structured for an example.
The conventional wisdom says that one should never implement business logic in the database. What are the reasons this belief is so widespread? Most of us have experienced the Mother Database that tends to become this 3 massive beast outgrowing hardware, not performing and being very difficult to maintain.
One of the reason for the emergence of this Cthulhu impostor is that major database platforms usually lack two crucial pieces of technology out of the box: the ability to split databases horizontally (i.e. partition the data of one entity) and the ability to split databases vertically (i.e. have different entities reside in different databases). Of course, both of these abilities can be built but usually it’s somebody other than the database team dealing with the matters.Hence, these abilities are something the DBAs need to live with rather than something that help to solve their problems. Also, the technology to do splits or queuing usually lives outside the database making it necessary for developers to deal with protocol translations, several interfaces, data integrity issues and so forth.
In Skype case, however, the people maintaining our databases happen to be heavy contributors to Postgre. From very early on, they refused to consider databases as a monolithic jar-shaped pieces in the remote corner of system architecture. Instead, they took an active stance picking up technology and actively solving scalability, performance and maintainability issues they faced. As you might have guessed, this was not enough as even the best database architecture can be nullified by inconsiderate coding. Luckily enough Skype DBAs took control of the development going on in databases very early on and did not relinquish the control before having developed a good set of non-functional requirements, coding practices and peer review processes designed to make sure the code built suits the DB layout and the other way around.
Figure 1 explains how they have used these tools to set up Skype database architecture.
It consists of four layers
- Access Layer that provides access to the databases either handling database partitioning (plProxy) or connection pooling (pgBouncer). Is also used for providing transparency for developers
- OLTP Layer where the OLTP databases live
- Queue layer that is responsible for transporting and replicating data between databases within the layers
- Internal servers layer that contains databases for logging, statistics, monitoring, batch processing and ETL purposes
All in all we are in a position where database scalability is not a problem for developers, we can keep essential business logic close to the data where it can work most efficiently and that the rule of thumb of keeping business logic out of the database does not apply. Certainly other limitations like difficulties with releasing, debugging and unit testing are still there but we surely aren’t afraid of Mother Database raising its ugly head.
Figure 1: Database layers
The same goes for architectural patterns. While extremely important in establishing common vocabulary among the engineers and providing tried and tested recipes for common problems, they should be treated with caution. Skype p2p network is a good example of this. If presented with a question ”design internet telephony”, people went and designed SIP as it was the most obvious way of doing this. But Skype did not bring about the change in communication industry it did by implementing SIP-based services. Instead, the people at Skype’s cradle did not let themselves be limited about how things are usually done but rather figured out the best possible solution they could build.
In summary, a slightly different organizational and skills setup might necessitate application of a completely different set of architectural patterns. Always welcome a challenge to your conventional thinking as those differences can be very subtle.
You neglect functional architecture at your own peril
Very few of us get a chance to work on a green-field project, to be the First Designer. Most of our work is about changing existing systems, making change management a crucial element of architect’s work. Now, most of the change management we do focuses on technical architecture and most of our efforts designing systems deal with making sure the design still makes certain sense after the changes have been implemented.
Unfortunately this is not the entire story.
The thing is that all technical changes are rooted in functional changes, we very seldom get to play around with system just for the sake of refactoring. There is usually some external driver, a need for the system to behave differently in some manner. It might be that there is a new product on the market, it might be that the legislation has changed or that the operations guys need it to scale better. Whatever the reason, usually a technical change is accompanied by functional change.
So our systems and processes are supposed to make that technical change reasonably easy and we hopefully manage in a way that does not leave a bowl of spaghetti for the next guy to figure out. But what about the functional change? Who takes care of the functionality of the system and makes sure that it does not turn into a bowl of spaghetti?
Let me illustrate with an example.
At regular intervals over past 4 years, an urge has come over me to fix our web store architecture as I witness the pain going into every little change. Selling a total of four paid products over the internet can not be that complicated, right? Most of the times the overall architecture remained exactly as it is, sometimes a couple of small pain-points were identified and consequently fixed.
This is why.
Figure 2 is a diagram depicting all functional components of Skype web store. There are around 200 of them. The diagram is not supposed to be legible, it just seeks to illustrate the sheer amount of functionality of the application and its complexity. It is the result of countless changes, additions, modifications, legal issues and tweaks. All of them, of course, justified and bringing value.
Quite like a decent system architecture gets turned into a convoluted pile of spaghetti by mindless technical changes, the functionality of your application will be turned into a similar mess by mindless changes in functionality. This does not necessarily mean that, as a software architect, I had the means to prevent all this from happening. But it surely means that if no-one is taking active care of the functional architecture of a system it results in a broken functional architecture. Which can only translate into a broken technical architecture.
Figure 2: Web store functional architecture
In summary, you should always keep an eye on the functionality of a system you maintain and do a round of gardening around it at least as often as you do for the technical architecture.
Simple things work
In a nutshell, anything that takes more than three sentences to explain to your peers does not work in reality. This is why REST tends to work and scale SOAP does not and this is why people favored Hibernate over J2EE beans.
A good example of how sometimes simplifying the requirements a little bit can yield a good result is PgQ [1]. For all sorts of messaging systems, message reliability is a major performance issue. Marking messages as consumed for different consumers, archiving them so they don’t clog up the unconsumed message store, it’s all a headache. Turns out that when you promise deliver each message at least once instead of exactly once, a large chunk of those headaches disappear. It also turns out that this is actually an OK thing to 7 do for most cases as the consuming application is free to implement their own verification mechanisms as necessary.
One more benefit of simple solutions is that they make you think and thinking is always good. Designing WSDLs with a GUI sure is fun but how much of your thinking concentrates on what objects of what type should go into what other objects and how much into what you actually seek to accomplish? Exactly.
In summary, always work towards making your systems and applications simpler challenging all requirements, dogmas and standards mercilessly removing excess fat that’s slowing you down.
The not-so-technical perspective
Buzzwords are dangerous
Every now and then somebody comes up with a great way of building software, invents a catchy name for it and before you know it turns up on PowerPoint decks everywhere. Unfortunately most of these ideas are quite complicated and very few of them are practical. Things like J2EE, CORBA, SOA are not designed to solve everyday problems software engineers face every day. Oh, they sometimes manage to do that but it’s accidental.
We have had this problem repeatedly at Skype and have been reasonably successful in dealing with it. Another organization we heard of had a rather different experience, though. At some point we started seeing a lot of applicants from a big software house that had just recently seen it’s entire engineering management replaced.
One of the expats told the story.
Apparently the top management had certain issues with the amount of time their main product took to customize and had decided, based on what some consultants told them, that going all cloud- and SOA based would help them. So they started talking to the engineering leads who answered with blank stares, a flurry of Dilbert strips and long tirades about how this is all a huge lake of snake oil. After a while the inevitable happened, the management got fed up with being pictured as morons (the consultant had been an expensive one) and as nobody seemed to be solving the initial problem they had the next logical step was taken. Get rid of the clearly incompetent bunch of naysayers who offer insults instead of constructive cooperation. The company is probably never going to recover.
It was the architect’s fault, really.
The challenge of the story lies in the duality of architect’s responsibility: on one hand we need to be critical about these ideas and only admit things into our systems that actually make sense, that move the game forward. On the other we can’t ignore those, often meaningless, terms as there is often a real problem hidden behind them. Finding the original problem is, however, complicated as our customers in the upper management usually lack the vocabulary to express their needs in a way we understand. Thus, when a concept pops up that seems to be solving a problem they have been mulling over for a while, they pick it up in a hope for help and being the men in power, unleash it onto the organization. Responding to these situations from the technical perspective (declaring the whole thing bogus) does not solve the original problem the exec had and it isn’t very constructive either. When a leader sees the organization having a problem, believes in having found a solution and finds that you refuse to implement or even talk about it, you will be overruled. Instead of giving the buzzword meaning yourself you’ll have a host of consultants define the meaning for you which never ends well.
In summary, your customer is seldom out to fool you and neither should you. Work with them, find the real problem and solve it because trust me, your CEO has much better things to do than throw meaningless catchphrases at you just for the thrill of it.
Architecture needs to fit your organization
Most of us go to work every morning wanting to do the best possible job we can. To create a thing of beauty, which in architects case translates into creating a marvelous system that scales indefinitely and is indefinitely extensible and modular.
Turns out, this is not what we are paid to do.
Every system exists in a context. That context includes both existing technical systems but also the skills, attitudes and culture of the people dealing with it. Even more importantly, all systems exist within a business context. A startup is different from a large incumbent telco, a bank is different than a government agency. With such a variety of different business and organizational settings it becomes clear that there is no such thing as a good or beautiful architecture. The architecture either fits your organization and helps to achieve its goals or it does not. This, more often than not, means that you need to suppress your natural desire to build a beautiful system because usually what you consider beautiful and what your organization needs are two different things.
This realization, by the way, puts the concept of technical debt [2] into a slightly different light as now you don’t take on debt for building stuff that’s technically inferior but for building stuff that doesn’t do an excellent job at helping the organization.
In Skype context, this has always been a very important issue. Vast majority of the services our users consume are provided in our peer to peer network. That peer to peer network is a pretty nifty thing but not necessarily what you call ”neat” or ”simple”. For someone coming from conventional web application background it can look positively taunting. Building, maintaining, debugging, launching, testing and explaining this thing is rather difficult, especially because we are the only ones operating a p2p of this magnitude. Thus, there is constant pressure to fall back to a server-based architecture, to do things like everyone else.
From technical perspective, this pressure can be understood and for a number of reasons it would make sense to make the switch. When one looks at what this change would do to our business model, the decision becomes much more difficult, though. For example, the volume of the video calls our users make is in the same order of magnitude as YouTube video traffic. Due to the peer to peer architecture, none of it ever touches a single piece of hardware paid for by Skype. Changing that would most certainly mean an end to free video calls on Skype which in turn would effectively mean an end to our non-subsidizing premium business model. So, regardless of what I think about p2p and whether I like working with it, it is going to prevail.
In summary, all of your architectural decisions need to be made in the context of your organization and not in the context of your own preferences.
Communication is important
We saw previously, how the architecture you build needs to be in accordance to how your business functions. Since having the right system architecture is fundamental to how a business functions, it is quite logical to conclude that people responsible for welfare of the business are (or should be) quite interested in what their system architecture looks like. But how would a pinktied City dweller learn about the intricacies of the systems your developers find complicated and how would a software engineer find out how the business functions?
The answer is surprisingly simple. It’s communication. Both sides need to reach out, cross the cultural divide and start talking to each other. The job of an architect is to translate business strategies into technology. The very term implies communication.
This is by no means easy, gaining respect among the management is very difficult. But unless that mutual respect and communication is there, the engineers need to live with arbitrary technical decisions and the business must deal with systems that constrain their business development. As long there is no communication, there is no understanding and thus no cooperation.
Figure 3: Architecture organization
The same goes for the other significant customer of architecture, the developers. Architecture is only so good as the developers implementing it, it can not provide value to the business unless it is implemented in actual code servicing the customers. Again, trust and mutual respect are the key words here.
Figure 3 shows the general organization of Skype architects, not necessarily team or reporting boundaries. It’s a quite simple affiliated model which consists of a central team team of architects whose main job is to maintain relationships and lay down general architectural direction. They are complemented by architects both in business units (called solution architects, quite similar to analysts in their role) and development teams (called technical architects). The former are responsible for helping business units formulate their ideas in technically edible format and providing feedback about what makes technical sense. The latter are responsible for development oversight and detailing the high-level design provided by the architect.
This setup provides enough structure and coordination between different stakeholders while still allowing sufficient degree of latitude. Of course, you need to find a model that fits your organization but whatever the solution, it needs to facilitate communication between both critical customers of your architects.
In summary, talk to people!
Conclusion
As you have seen, the years have taught me a lot. If some of these lessons feel familiar or trivial you must have already learned your lessons. Hopefully in a less painful way than I did. All in all, there are two main realizations an architect needs to reach to be successful in their job at this time and age:
- Whatever has worked for you in the past, works for the big hitters like Facebook and Skype or is the current talk at your local CIO community should merely serve as a useful starting point for figuring out what would help your organization reach its goals. Nothing more, nothing less.
- Technical skills are a hygiene factor for architects. You need to have them to be accepted for the job. But emotional intelligence and ability to understand organizations are the skills that define how good you really are.
References
[1] “Skytools page at pgfounry.”
[2] M. Fowler, “Technical debt,” August 2004. 12