Django admin with large tables

05.07.2011 22:33

It is a pretty well-known fact that an unconstrained SELECT COUNT(*) in MySQL Innodb tables is slow. It is also known that Django admin page likes to do counts on tables in its change list view so that it can draw a pretty pagination widget at the bottom. These two facts make for a miserable admin experience when you have more than a handful of rows in your database. One visitor to the admin site will at best have to wait for a while before the page loads, several will most likely bring the database server to its knees.

Most work-arounds for this problem I've seen, like for instance the patch attached to ticket #8408, focus on removing the need for the row count from the pagination logic and require patching Django.

Yesterday I came up with this solution that only requires a somewhat dirty monkey patch to the QuerySet object and seems to run fine on stock Django (version 1.2.5 in this case):

class ApproxCountQuerySet(QuerySet): 
	"""Counting all rows is very expensive on large Innodb tables. This 
	is a replacement for QuerySet that returns an approximation if count()
	is called with no additional constraints. In all other cases it should
	behave exactly as QuerySet. 
	""" 
 
	def count(self): 
		# Code from django/db/models/query.py 
		 
		if self._result_cache is not None and not self._iter: 
			return len(self._result_cache) 
 
		query = self.query 
		if (not query.where) and \ 
				query.high_mark is None and \ 
				query.low_mark == 0 and \ 
				(not query.select) and \ 
				(not query.group_by) and \ 
				(not query.having) and \ 
				(not query.distinct): 
			# If query has no constraints, we would be simply doing
			# "SELECT COUNT(*) FROM foo". Monkey patch so the we 
			# get an approximation instead. 
			cursor = connections[self.db].cursor() 
			cursor.execute("SHOW TABLE STATUS LIKE %s", 
					(self.model._meta.db_table,)) 
			return cursor.fetchall()[0][4] 
		else: 
			return self.query.get_count(using=self.db) 

As you can see this substitutes a SELECT COUNT(*) with an approximate count from the row count column of SHOW TABLE STATUS output. While this query returns instantaneously on Innodb tables it only gives an approximate count (actually, executing the query twice on an otherwise idle server gives you two similar, but different counts).

Of course, even ignoring for the moment that this is highly MySQL specific, blindly using this can be dangerous if your code expects the count to actually be accurate. However Django admin page seems to be pretty happy with it. You can plug it into your admin.py like this:

class FooAdmin(admin.ModelAdmin): 
	def queryset(self, request): 
		qs = super(FooAdmin, self).queryset(request) 
		return qs._clone(klass=ApproxCountQuerySet) 

Of course, curious people clicking on page number 100000 can cause just as much trouble for the server as those count queries we just removed. Removing those links from the page templates is somewhat easier and left as an exercise for the reader.

Posted by Tomaž | Categories: Code

Comments

This isn't a "monkey patch" using the commonly accepted definition of the term.

Also, would be better style to return super(ApproxCountQuerySet, self).count() than to perform the self.query.get_count call yourself.

Posted by lamby

After reading the Wikipedia article on the topic I agree with you and I'll use the term more carefully in the future. However while this code doesn't do actual run-time code modification it does abuse the duck typing system and suffers from all the same pitfalls as a monkey patch.

You're already depending on intricate internal details of the count() method, so is calling the overridden method in one branch really an improvement? In that case I would rather go the whole way and just merge "if self._result_cache" and the other if-statement together and just do either an approximate count or call the parent count() method. But I think that would just give an illusion of future-proofing.

Posted by Tomaž

Tomaž, I added some simple checking for MySQL here:

https://gist.github.com/1068255

Not sure if that helps you or not, but for my needs it allows me to keep developing on SQLite without running into issues on my dev server.

Thanks Umbrae. It's always nice to check such assumptions in the code. I'll be adding your check to my code just in case somebody in the future attempts a move away from MySQL.

Posted by Tomaž

Add a new comment


(No HTML tags allowed. Separate paragraphs with a blank line.)