Wesley Reisz talks to Sachin Kulkarni, Director of Engineering at Facebook, about the engineering challenges of building Facebook Live, and how it compares to the video upload platform at Facebook.
Key Takeaways
- The Facebook Infrastructure teams provide the platform that powers the board family of apps including the Facebook app itself, Messenger and Instagram. It is largely a C++ shop.
- The video infra team at Facebook builds the video infrastructure across the whole company. Projects include a distributed video encoding platform which results in low latency video encoding, video upload and ingest.
- Encoding for Facebook Live is done on both the client and the server. The trade-off between encoding on the client side and the server side is mostly around the quality of the video vs. latency and reliability.
- Facebook gets around a 10x speed-up by encoding data in parallel when compared to serial processing.
- They also have an AI-based encoding system which results in 20% smaller files than raw H.264.
Subscribe on:
Facebook Infrastructure
- 1:48 - Facebook infra. powers the board family of apps including the Facebook app, Messenger and Instagram. The group is responsible for storage, caching, pub/sub, monitoring, streaming and so on.
- 2:30 - The video infra team builds the video infrastructure across the whole company. Projects include a distributed video encoding platform which results in low latency video encoding, video upload and ingest. Ingesting is about moving the bytes from the client apps to the Facebook data centre, while encoding is about server side processing to reduce size while keeping the quality high.
- 2:58 - Another angle is video clustering, where similar videos can be clustered together for better search ranking.
Facebook Live encoding
- 3:35 - Facebook Live does encoding on both the client and the server.
- 4:03 - The trade-off between encoding on the client-side and the server-side is mostly around video quality vs. latency and reliability. Since encoding is typically lossy if the network is good the Facebook app will try to keep the quality as high as possible and use little or no encoding. Conversely if the network is less good then more encoding is done on the phone to keep the amount of data to be transferred smaller.
Video at Facebook Scale
- 4:55 - There are 1.28 billion daily users of Facebook. A good example of where this causes problems is with comparably rare situations such as race conditions because the normally rare cases will get hit more frequently with that volume of users. In consequence avoiding race conditions needs to be thought of at design time.
- 6:15 - Facebook launches everything on an experimental basis starting with internal users. Then roll-out to the wider public is gradual - 0.1% of users, then 1%, 5% and so on to expose race conditions and other issues early.
- 6:56 - For back-end systems the release is typically done weekly so the release goes from just a handful of users to 1.28 billion users in a week.
Facebook Live
- 8:28 - Facebook Live is a live streaming platform open to all users. The latency depends a lot on where the broadcaster and viewer are, and the network conditions in both of those places. The aim though is to keep the latency in single digit seconds.
[NOTE: Originally, Wes used the incorrect video latency units for streaming with Facebook Live. Latency should be measured in single digit seconds. The original recording will be edited to indicate the correction.] - 9:44 - The product started at a hackathon in 2015. A small team built a working prototype infrastructure in just a few days. The first thing they streamed was a clock to measure the latency of the system!
- 11:30 - It took around 4 months to get from the prototype to launching Live to public profiles in August of 2015. By December 2015 the platform was scaled to all users on iOS, Android and browsers.
- 12:04 - It was possible to build the product that quickly because a lot of infrastructure already existed - the Everstore BLOB storage system is solid, and they could also rely on open-source software like NGINX for doing the encoding and processing.
- 13:14 - Facebook Infrastructure is largely a C++ shop. There is some Java and Python, and the business logic is all done in PHP. The iOS apps are written in Objective C and the Android apps are in Java.
Facebook Live Architecture
- 13:54 - It all starts with the broadcast client - this could be an Android or iOS app, the Mentions app, or via the Live API. In the client app there are libraries which do packaging, encoding and so on.
- 14:13 - The stream is sent via RTMPS (Real-Time Messaging Protocol) to a geographically local PoP. Then the connection is forwarded over an internal Facebook network to a Facebook data-centre.
- 14:34 - At the data-centre the stream hits an encoding host which authenticates the stream, encodes it into multiple formats at different bitrates, and packages it into RTMP or DASH (Dynamic Adaptive Streaming over HTTP). The stream is then cached in a CDN before it hits the player.
- 15:27 - Users can be broadcasting from anywhere in the world so the geographically local PoP reduces round-trip time.
- 15:44 - A key thing for load balancing is hashing based on the stream ID. When you make a request to go live you get a stream ID and a URI. Facebook does a hash based on the stream ID and maps the stream to different data centres based on the hash.
- 16:13 - The client libraries run a speed test on iOS and Android to figure out the video bitrate to be used for encoding. They then encode the uncompressed bitstreams from the phone using the H.264 and AAC codecs for video and audio before warping the compressed frames in an RTMP compatible format and sending the packets to the server.
- 17:03 - Network bandwidth is not a static thing and can change during the broadcast. Facebook Live uses Adaptive Bitrate to cope with this.
- 19:57 - To stream the live stream out they use MPEG-DASH, an adaptive bitrate streaming format that enables streaming data over HTTP. It comprises a manifest file, essentially an index which points to media files, and individual media files, for example for each second of the live stream.
- 20:43 - When you see a live stream in your feed and you click on it the player requests the manifest. If it isn't already on your local PoP the request goes to the data centre to get the manifest, and then fetches the media files. As they get sent back they are cached on the PoP if they aren’t there already.
Video Upload
- 24:10 - One of the key challenges with Live is that it has to happen in real-time. For video upload you can batch the workload and hence latency is less key.
- 25:08 - They key requirements for building a stable and scalable video encoding platform is that it needs to be fast, flexible, able to cope with spikes, and very efficient.
- 25:28 - At a high level the client library takes the video and breaks it up into smaller chunks corresponding to GOPs (Groups Of Pictures) roughly equivalent to a scene in a video which is sent to the server. On the server-side a pre-processor receives the chunks and writes them to a cache and then starts encoding them in parallel as the chunks arrive. Facebook gets around 10x speed-up by encoding in parallel compared to doing this serially.
AI Encoding
- 26:25 - Facebook also tries to be efficient using bitrates since this affects user’s data plans. Creating smaller files without using an exorbitant amount of CPU is a hard problem because modern encoders have so many combinations of encoding settings.
- 27:17 - The key insight is that not only is each video is different, but also that even within a video the encoding settings could be different for each scene.
- 27:53 - Facebook uses AI to optimise the encoding settings. A training set is used to train a neural net model which can then come up with the right settings for each scene. The disturbed encoding system naturally lends itself to this kind of approach.
- 28:27 - AI encoding resulted in 20% smaller files than H.264.
New Live Features
- 28:55 - Facebook Live now allows people watching the stream to be invited to join it directly and ask questions during the stream. This requires very low latency - in the order of 100s of Milliseconds.
- 29:49 - This is done using a different protocol - WebRTC - which is typically used for video calling.