Database-efficient API pagination

When designing APIs, you will probably need to handle a way to paginate the results in a collection.

Depending on the database you are using, the first thing that could come to your mind could be to use your database limit and offset to paginate the results. This may be tempting, but sometimes it could be better to rely on something else.

Imagine you have a list of messages in a chat application, and for the first call, you show the first 15 messages. Now you want to get the next page of results, say other 15 messages past the ones you already have, so you do LIMIT 15 OFFSET 15 in your database. Cool right?

This is a fast solution and uses the native way of your database to navigate through results, but it does not scale very well. Your database will have to scan every row and ditch the first N values according to your offset.

We can try to query a table with ~40M records. If we run the query on MySQL using DESCRIBE when can see how many records the database has read before returning the result:

EXPLAIN SELECT * FROM message LIMIT 15 OFFSET 1500000;

The output:

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEmessageNULLALLNULLNULLNULLNULL38898985100NULL

We can see that the database estimates that it has to filter more than 38M rows to perform our query.

We can easily improve this approach by relying on keyset pagination or seek pagination.

The two approaches are very similar, as they both rely on querying using an indexed condition.

Using the keyset pagination, we tell the API how we want to filter the results. Consider a scenario where we want all the messages created after a certain date. Remember that this field has to be indexed, otherwise, we won’t achieve any performance gain. We can query by date specifying the field, the condition, and the value. We could do something like this:

The resulting query will be:

SELECT * FROM message WHERE created_at < 1608549815 LIMIT 15;

This is query is very efficient, as we can see from the output of EXPLAIN:

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEmessageNULLrangek_message_created_atk_message_created_at4NULL19449492100Using index condition

The database estimates that it needs to filter a little more than ~19M rows. That’s basically half the result from the previous query!

This is achieved by using the index created on created_at column.

We can now move on to achieving the same result using the PRIMARY KEY of the table, using seek pagination.

You start the same way, you ask for the first fifteen messages. Then, you can perform the next pagination call by using the ID of the last message you currently have stored.

This will result in a query like this:

SELECT * FROM message WHERE id > 4690057 LIMIT 15;

You could extend the query by defining an interval.

This will make use of the primary key, which is (of course) indexed.


Thank you for your time! Do you have any questions? Ask me on Twitter 🦕