BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Personalized Notifications at Twitter

Personalized Notifications at Twitter

This item in japanese

Gary Lam, staff engineer at Twitter, spoke about personalized notifications at QCon London 2017. This involved giving a high-level overview of their personalization and recommendations algorithms, and an explanation of how they work at scale despite the large volumes of data and bi-modal nature of Twitter.

Personalized fanout is the concept of only sending a notification to a user if it is about their interests. The example given by Lam is a Tweet by Elon Musk about electric cars. Rather than all of his followers receiving a notification, only the ones who like electric cars would.

Lam explained that the personalized fanout algorithm functions by keeping track of two things:

  1. Recent engagements with entities: These are likes, replies, and other user interactions that a person has had with a particular entity, such a hashtag or account. Lam stresses the importance of this data being up to date, as users will only be interested in what they have been Tweeting about recently.
  2. Top followings: Although a user may follow hundreds of other users, only certain ones will make it into their top followings - these are the ones that a person would be the most interested in hearing about.

When applying the algorithm, the first thing that happens is extracting the entities from a Tweet. Then, for each follower, a check takes places to see if the entities belong to one of their recent engagements, and another check takes place to see if the Tweet comes from a top following. If both of these conditions are true, then the user will receive a notification as they are likely to be interested in the Tweet.

Lam explains that the main problem with personalized fanout is asymmetry. If a user has millions of followers, then whenever they Tweet the algorithm must be applied to every single one of them. On the other hand, other users may only have a couple of followers. 

To work around this, Lam explains how they make use of data co-location. Each user is sharded, and their recent engagements and top followings are kept together with those shards. This means whenever the algorithm is run, there are no network hops, greatly reducing latency.

Lam points out that recent engagements don’t need to stick around for very long, as by their nature they are short lived. This has led to them being kept in memory.

In the event of a shard going down, data rebuilding has been heavily optimized to happen as quickly as possible, in order to make sure users still receive their notifications. This is done by replaying all the Tweets over the last day from a queue, but then batching the messages and removing redundant data before feeding them to the shard. This is known as a "slim firehose".

Top followings are calculated with an offline machine learning algorithm, which works by looking at the historical interaction between users. Because they are calculated in advance, the data can be copied onto disc on the shard at boot time, and then lazily loaded when required.

Lam also spoke about recommendations. These are slightly different to personalized fanout, in the sense that a user does not have to be a follower to receive a notification - they only have to be potentially interested in the content.

In this case, rather than a feeding in events, each user can be looped over. Lam explains that this makes it easier to utilize resources, as the number of users, thus load, can easily be predicted. During the process, several steps take place:

  1. Fatigue: If a user is not interested in, or does not engage with notifications then they will not be sent them.
  2. Candidate sources: User ID’s are exchanged for notifications that may be relevant to them. Two technologies pointed out to help with this are GraphJet, Twitters real-time graph processing library, and Scalding, Twitters offline map-reduce algorithm.
  3. Ranking: Making use of machine learning to pick the best notifications for the user.
  4. Push: Pushing the notification to the user's device.

The full talk is available online, and is also preceded by a talk from Saurabh Pathak on delivering notifications in real-time, also summarised in an article

Rate this Article

Adoption
Style

BT