Lesson Learned in Scaling Django Admin

In this post, I will summarize my recent lesson learned on scaling django admin site.

The motivation of scaling django admin site is the slow loading of the admin view when the backend table has more than millions of records. Even, for those quite large tables, they may reach the nginx proxy timeout and raise 502 bad gateway error. As we know, infinitely increasing the timeout is not a good practice in real deployment. The disadvantages of increasing timeout include the delay of failure showing up and awful user experience while waiting for the pages spanning for requests. So the better solution here is to scale django admin site.

Silk

The first thing to introduce is silk for django application profiling. After right configured in django as a middleware. We can easily get access to a silk interfaces via localhost:8000/silk(We normally do the local profiling for development). In this page, we can see the request profiling of localhost:8000. It includes some key info like total time of a specific request, sql statement required for the specific request, and execution time and number of joins of each sql. So with this profiling tool we can easily see the bottleneck of each request and find out the improper implementation of the admin site.

Paginator

All data you see from a website is coming from the storage system. Thinking in this way, we can easily imagine that the paginator of each admin view requires total number of records, which involves a count query over the whole table. For a table with millions of records, count query will take a while. So for the admin view for large table, we can try to avoid the count step. In django, the count step is under paginator object. We can override the paginator attribute of admin object by following block of code.

from django.contrib import admin
from django.core.paginator import Paginator

class BigTablePaginator(Paginator):

    @property
    def count(self):
        return 9999999


class BigTableAdmin(admin.ModelAdmin):
    paginator = BigTablePaginator
    show_full_result_count = False

For every large table admin, we can inherit on this BigTableAdmin. In this way, we can get ride of the time spending on count query.

N + 1 Query

N + 1 query is one issue people might easily oversee. For a table with foreign fk from another table, if we don’t select related object(in ORM perspective) in advance, a request of displaying the whole table will involve join queries with the same number of records with foreign object. In other words, the records will join one by one during the query. Such problem is called N + 1 query problem. To avoid this, we can select related object in advance by overwrite the get_queryset method under an admin object by following,

    def get_queryset(self, request):
        queryset = super().get_queryset(request)
        return queryset.select_related("foreign_object")

This will reduce significant number of joins and achieve the join within one query, definitely improve the speed of loading a site.

Use raw_id_fields or display id instead of objects

Even if we solve the N + 1 query problem, there is still a possibility that the all-in-one join query joins too many times across multiple tables. To avoid the join happens, we can also directly display foreign object id in the admin view or put the object inside raw_id_fields attribute of admin. Under this way, we can get rid of the big cost of join query.

Above all, these are the perspectives and measures we can take to scale django admin site. They should be enough for most of cases. If the admin site still has functionality issue due to the unscalability, we might need to consider partition of the existing tables and other distributed ideas.

comments powered by Disqus