Recently, I found it interesting to learn how a system or a service os designed. After learning some topics, when I use an app or click an website I tend to start think what kinds of technologies behind it. This blog I will cover a typical way how newsfeed are generated in a social network application. We keep in mind Instagram, Facebook, Youtube, Snapchat even Wechat to think about the model I will cover in this blog - Pull/Push Model.
In my previous blog, I covered git usage. We know commands git pull
and git push
. git pull
is fetching
code from remote end whereas git push
is sending code to remote end. Similarly, Pull and Push model follows the
same logic.
Pull Model
Pull model, meaning fetch something from others. In the news feed(Moment service for some applications) service,
we can fetch all data from the people you are following together then sort the data according to datetime from
recent to previous. We may know the problem called merge k lists into a sorted list
. The way to generate
one user’s news feed is following the same algorithm. The complexity of generating one user’s news feed is N times'
DB read(disk access), where N represents how many people the user are following.
Push Model
Push model, meaning passively waiting information. For example, in Instagram, under push model, every time I post
an ins, the backend of the application will generate n records of my post into DB with the information of my n
followers in a new table, we say News Feed Table
. The table will be a giant table since every users’ posts are
recorded in this way. Consider the table like this:
post_user_id | posts_id | follower_id | post_date |
---|---|---|---|
When a user would like to requests news feed, the backend of the application will select DB and filter by the
user’s id in follower_id
column and order by post_date
column. The post_id
will point to some post contents
saved in some NoSQL DB probably. The procedure of adding data into the News Feed Table
is called Fanout
. The
complexity here for push model 1 time DB read. But the complex writes happen when people post contents. Each posts
will have N times’ writes, where N is the number of the followers the user have.
Comparison and Drawbacks
As we talked about the complexity of these two model in previous sections, we might derive the drawbacks from these two models.
For pull model, it relies on DB reads a lot. And the merge procedure happens when users request news feed. So it might cause user waiting for a long time after requests. Also, this model will bring read pressure for database. when large numbers of users requests news feed in the same time.
For push model, fanout
takes lots of write times. This will cause issues when a user have large numbers of followers
like a super star. The writes will take a long time so that some of the followers may not get the news feed
in the same time. Although there are some ranking algorithm can involve to make sure important followers get the
news feed faster but it causes bad user experience anyway under the scenario of a person with high influence.
Potential Optimize Methods
For pull model, we probably can consider adding cache before DB access.
-
Cache every user’s timeline. We can consider cache all posts or latest 100 posts according to different scenario. The principle of decision is to maximums the cache hits.
-
Cache news feed. For those who don’t have cache, we merge the news feed of the followings. For those who have cache before, we merge cache content with new posts data after a specific timestamp.
For push model, we need to label out inactive users to get rid of fanout
records for those inactive users to
save writes. We also need to label out ‘Star’ whose number of followers are greater greater than the number of
followings and then apply pull model for them instead of push model.
Takeaways
Knowing what is a good solution for a system, I have to know the cons and pros of different models or tools. And then trade off according to the business scenario. It seems no shortcut of improving this ability but reading more and more documentations and learning the most modern trend of models and tools. In one word, there is no harm but accumulated experience for an engineer to keep reading and thinking