Active Record Query Methods and Peformance
Active Record is great, it makes it easy for us to interact with the database without writing much SQL. But sometimes too easy. For example, an innocent statement like this, User.all.each
, can grind our application to a halt, if we had 10M records in our users
table. Because it’ll try to load all of them into memory. But then again, we likely don’t have 10M users. Context matter when comparing performance of different options.
Let’s see this with a few examples - comparing sort_by
and order
, comparing map
and pluck
, and comparing length
/count
/size
and exists?
/present?
/any?
.
Sorting: order
and sort_by
Let’s look two options for sorting: sort_by
and order
. The sort_by
method is from Ruby’s Enumerable
module, which is a mix-in available on things like Arrays and Hashes. It is independent of Rails.
When we use sort_by
, the sorting is done in memory. In the example below, all of the associated books
records being sorted will need to be loaded into memory.
sorted_books = user.books.sort_by { |book| book.title }
The order
method is from Active Record. When we use order
with an Active Record model, it runs a SQL query with an ‘ORDER BY’ clause.
sorted_books = user.books.order(title: :asc)
and the resulting SQL:
SELECT "books".* FROM "books" WHERE "books"."user_id" = 1 ORDER BY "books"."title" ASC
You might conclude that sort_by
seems expensive and we should always use order
. Why would we load all the records in memory to do the sorting when we can make the database do the work? In general, yes making the database do the work is preferred. However, if we have a dataset that is already loaded in memory (for other reasons), then sort_by
is actually faster. Context matters when it comes to performance.
Also, I recently learned that there is a .loaded?
method on Active Record associations, to check if the association is already loaded, i.e. user.books.loaded?
.
There is a parallel to this for filtering and select
and where
methods. The select
method comes from Ruby and does the filtering in memory. While the where
method comes from Active Record and runs a SQL query against the database. So same performance considerations.
Selecting Fields: pluck
and map
Let’s say we need all the of titles from the Book
model. We can get them using map
:
books = current_user.books
books.map(&:title)
This will load all of the records in books
into memory and then iterate over that collection to get only the title
. But we will have loaded all fields of the Book
model into memory.
We can use the pluck
method instead to get the title
without loading all the fields of an Active Record model into memory just to discard all but one field.
books.pluck(:title)
This is resulting SQL from the pluck
method. Note it doesn’t do select *
:
SELECT "books"."title" FROM "books"
So map
requires loading all the fields for the Active Record models into memory and pluck
makes a DB query. Which one is better? Actually, pluck
does not always make a DB query. If the records are already loaded into memory, it’ll use those. And if not, it will query for only the desired field.
Counting Records
Next, let’s compare the methods length
, count
, and size
.
The method length
comes from Ruby and is available on enumerables. The count
method is from Active Record, and size
method is available on Active Record relations.
>> books_array = Book.all.to_a
>> books_array.length
=> 11
>> books_array.loaded?
=> true
>> Book.count
Book Count (13.6ms) SELECT COUNT(*) FROM "books"
=> 11
>> books = current_user.books
>> books.size
=> 5
The length
method will load all records into memory. This is different from count
and size
, which generate SQL queries to count the records directly in the database without loading all records into memory. And size
is the best-of-both-worlds, in the sense that if the records are already loaded into memory it will use those and not make a SQL query.
And as a bonus, let’s add the methods present?
/exists?
/any?
for checking whether records are present, as they parallel the counting methods length
/count
/size
, respectively.
>> books.loaded?
=> true
>> books.present?
=> true
>> books.exists?
Book Exists? (7.6ms) SELECT 1 AS one FROM "books" LIMIT $1 [["LIMIT", 1]]
=> true
>> books.any?
=> true
Hope this illuminates some things about Active Record methods and performance. Because when you look at calls like books.count
and books.length
, it isn’t immediately obvious which one makes a DB query or not? Or which one is a Ruby method and which one comes from Rails and Active Record.
Also when it comes to performance context matters. When deciding between multiple methods that accomplish the same thing, we have to ask: Will it run a database query? Will it load results into memory? Will it use results already loaded in memory? And what is the scale of your data?
Subscribe for more Ruby, Rails, and Hotwire
Short posts, clear explanations