In this blog, I would like to summarize the strategies I learned recently to keep the cache and database consistent. In most of production applications, they will have a cache layer to cache the data users request from database to make sure the efficient data fetching for the future request. Therefore, it’s important to keep the consistency between cache and database to ensure to perform the correct functionality of the application as we expected.
Let’s introduce the data read procedure I mentioned above. When we try to read some data from database, we will first check the cache hit. If the cache get hit, we will return the value directly from the cache. If the cache misses, we will get data from database and write the value into cache and then return the values. Normally, the way we can write values into the cache is that we use a key-value store scheme. The key could be a serialized hash string of the request and the value is the database result. Now we can look into following strategies to keep the write operation at cache and database consistently one by one,
a. Update the cache and then update the database
b. Update the database and then update the cache
c. Delete the cache and then update the database
d. Update the database and then delete the cache
We normally don’t use a strategy because we can imagine that if we update the database with failure after the cache gets updated, then obviously this way result in the dirty cache in our application. So we can skip a strategy. For strategy b, we don’t recommend as well because it has flaws. For example, when process A updated the database and doesn’t update the cache in time(network issue or something else). Process B update the database again with a newer value and the cache is updated by the newer value as well. After that, the unfinished process A get the cache updated with the older value, which means the cache update of process B gets missed. Then this strategy results in the dirty cache as well.
After learning the infeasibility of strategy a and b, let’s take a look at the rest of strategy. For strategy c, we can succeed at most of the times but we will still fail to keep the consistency some time. For example, under a concurrent scenario, when process A delete the cache and before process A update the database, a process B does an read operation. Process B will find the cache missing and then read data from database and write the old value back to cache again. Then process A finished updating the database. Now obviously, we get the new value in database but get the old value in cache. To resolve such conflict, we can flush the cache twice, which means after the first deletion, we can wait a few time to delete the cache twice. It is worth noting that the waiting time has to be longer then a read operation.
For strategy d, we will still have inconsistency cases. For example, process A is doing a read operation from database. Before process A try to write the value into cache, process B is doing a write operation to update the database and successfully delete the cache. After that, process A write the old value into cache, which results in the inconsistency between cache and database. But such case is under a very low probability because normally a write operation takes longer than a read operation. To resolve such conflict, we can apply the cache flushing twice solution. Remarkably, we probably fail to flush cache. If that happens, we can put the key into a queue to keep try to get the cache deleted until it succeed.
So with the above strategy comparison in mind, we can clearly work with the cache and database consistency in future application design and implementation.