Pull/Push Model in a News Feed Service

January 1, 2020 4-minute read

Recently, I found it interesting to learn how a system or a service os designed. After learning some topics, when I use an app or click an website I tend to start think what kinds of technologies behind it. This blog I will cover a typical way how newsfeed are generated in a social network application. We keep in mind Instagram, Facebook, Youtube, Snapchat even Wechat to think about the model I will cover in this blog - Pull/Push Model.

In my previous blog, I covered git usage. We know commands git pull and git push. git pull is fetching code from remote end whereas git push is sending code to remote end. Similarly, Pull and Push model follows the same logic.

Pull Model

Pull model, meaning fetch something from others. In the news feed(Moment service for some applications) service, we can fetch all data from the people you are following together then sort the data according to datetime from recent to previous. We may know the problem called merge k lists into a sorted list. The way to generate one user’s news feed is following the same algorithm. The complexity of generating one user’s news feed is N times' DB read(disk access), where N represents how many people the user are following.

Push Model

Push model, meaning passively waiting information. For example, in Instagram, under push model, every time I post an ins, the backend of the application will generate n records of my post into DB with the information of my n followers in a new table, we say News Feed Table. The table will be a giant table since every users’ posts are recorded in this way. Consider the table like this:

post_user_id	posts_id	follower_id	post_date

When a user would like to requests news feed, the backend of the application will select DB and filter by the user’s id in follower_id column and order by post_date column. The post_id will point to some post contents saved in some NoSQL DB probably. The procedure of adding data into the News Feed Table is called Fanout. The complexity here for push model 1 time DB read. But the complex writes happen when people post contents. Each posts will have N times’ writes, where N is the number of the followers the user have.

Comparison and Drawbacks

As we talked about the complexity of these two model in previous sections, we might derive the drawbacks from these two models.

For pull model, it relies on DB reads a lot. And the merge procedure happens when users request news feed. So it might cause user waiting for a long time after requests. Also, this model will bring read pressure for database. when large numbers of users requests news feed in the same time.

For push model, fanout takes lots of write times. This will cause issues when a user have large numbers of followers like a super star. The writes will take a long time so that some of the followers may not get the news feed in the same time. Although there are some ranking algorithm can involve to make sure important followers get the news feed faster but it causes bad user experience anyway under the scenario of a person with high influence.

Potential Optimize Methods

For pull model, we probably can consider adding cache before DB access.

Cache every user’s timeline. We can consider cache all posts or latest 100 posts according to different scenario. The principle of decision is to maximums the cache hits.
Cache news feed. For those who don’t have cache, we merge the news feed of the followings. For those who have cache before, we merge cache content with new posts data after a specific timestamp.

For push model, we need to label out inactive users to get rid of fanout records for those inactive users to save writes. We also need to label out ‘Star’ whose number of followers are greater greater than the number of followings and then apply pull model for them instead of push model.

Takeaways

Knowing what is a good solution for a system, I have to know the cons and pros of different models or tools. And then trade off according to the business scenario. It seems no shortcut of improving this ability but reading more and more documentations and learning the most modern trend of models and tools. In one word, there is no harm but accumulated experience for an engineer to keep reading and thinking