• ru
  • Language: en
  • Documentation version: 2.0

16. How to efficiently select a random object from a model?

Your category models is like this.

class Category(models.Model):
    name = models.CharField(max_length=100)

    class Meta:
        verbose_name_plural = "Categories"

    def __str__(self):
        return self.name

You want to get a random Category. We will look at few alternate ways to do this.

The most straightforward way, you can order_by random and fetch the first record. It would look something like this.

def get_random():
    return Category.objects.order_by("?").first()

Note: order_by('?') queries may be expensive and slow, depending on the database backend you’re using. To test other methods, we need to insert one million records in Category table. Go to your db like with python manage.py dbshell and run this.

INSERT INTO entities_category
            (name)
(SELECT Md5(Random() :: text) AS descr
 FROM   generate_series(1, 1000000));

You don’t need to understand the full details of the sql above, it creates one million numbers and md5-s them to generate the name, then inserts it in the DB.

Now, instead of sorting the whole table, you can get the max id, generate a random number in range [1, max_id], and filter that. You are assuming that there have been no deletions.

In [1]: from django.db.models import Max

In [2]: from entities.models import Category

In [3]: import random

In [4]: def get_random2():
   ...:     max_id = Category.objects.all().aggregate(max_id=Max("id"))['max_id']
   ...:     pk = random.randint(1, max_id)
   ...:     return Category.objects.get(pk=pk)
   ...:

In [5]: get_random2()
Out[5]: <Category: e2c3a10d3e9c46788833c4ece2a418e2>

In [6]: get_random2()
Out[6]: <Category: f164ad0c5bc8300b469d1c428a514cc1>

If your models has deletions, you can slightly modify the functions, to loop until you get a valid Category.

In [8]: def get_random3():
   ...:     max_id = Category.objects.all().aggregate(max_id=Max("id"))['max_id']
   ...:     while True:
   ...:         pk = random.randint(1, max_id)
   ...:         category = Category.objects.filter(pk=pk).first()
   ...:         if category:
   ...:             return category
   ...:

In [9]: get_random3()
Out[9]: <Category: 334aa9926bd65dc0f9dd4fc86ce42e75>

In [10]: get_random3()
Out[10]: <Category: 4092762909c2c034e90c3d2eb5a73447>

Unless your model has a lot of deletions, the while True: loop return quickly. Lets use timeit to see the differences.

In [14]: timeit.timeit(get_random3, number=100)
Out[14]: 0.20055226399563253

In [15]: timeit.timeit(get_random, number=100)
Out[15]: 56.92513192095794

get_random3 is about 283 time faster than get_random. get_random is the most generic way, but the technique in get_random3 will work unless you change changed the default way Django generates the id - autoincrementing integers, or there have been too many deletions.