TL;DR; Leverage model caching on commonly used queries, associations and scopes to give your database a break. There are plenty of cheap options in Rails to efficiently use and expire your cache. The more you do it, the smarter the load becomes on your DB.
In most applications, scaling the runtime (= your application servers) is far easier than scaling your database. Using a serverless approach like GCP Cloud Run you can horizontally scale your app to thousands of instances.
The same cannot be said with databases, especially relational ones. Most of the time they represent an expensive bottleneck and are harder to scale. Which is why it's good practice to take the habit to alleviate the load on it when cheap alternatives exist.
Let's take a concrete example: background jobs. Should you use ActiveJob, Sidekiq, Resque or Cloudtasker for GCP, it's very common to have jobs defined like this:
It's alright, it's just a find method. But let's imagine thousands of these jobs running constantly. Your DB will certainly cope with it, but the question is: do you really want your DB to spend expensive CPU milliseconds on this kind of basic queries?
The primary goal of a relational database is to be right, not to be fast. If you're looking to be fast, you should look at other options. Redis caching is one of the most popular one.
Let's use that pesky find method to see what cheap alternative we have...then dig into other caching opportunities.
Model.find: a quick win
Assuming you have spotted a few models which are read-intensive, the following module will provide a find_cached method which leverages Redis first. The module also expires the cache whenever your model gets updated or destroyed.
You can use this module in your ActiveRecord models like this:
Then update your find calls with:
That's all you need. You've just saved your database thousands of useless calls potentially.
"Wait! I usually need to access parent associations through this model, so I would still be making database calls!" Not if you eager load associations in the cached version of your record.
The module above allows you to customize the cached version of your record via find_for_cached. Example:
There is a caveat though: The cached company association will not be expired upon company update. It is alright if you only need to access persistent attributes on the company association but if you need to access regularly updated attributes, then you need to manually expire the project cache upon company update.
Cache expiration of associated models can be achieved through an after_commit callback, such as:
Your project find_cached version will now be properly be expired on parent model updates.
Now let's keep in mind it's a tradeoff. The more you link records together for cache expiration and the more these related records are updated, the less you'll benefit from your cache.
If all you need on your Project cached versions is to access persistent company references that will never change (e.g. an external customer ID), then you might actually be better off not expiring the Project cached keys upon company update. But if you go down that path, ensure other developers are made aware of this caveat because relying on stale record attributes will lead to bugs difficult to troubleshoot.
Caching is an opportunistic habit, not a silver bullet
The previous section is simplistic and looks at the most basic form of caching: the find method. This is not going to save your application from DB overload. But it opens the path to more complicated caching approaches.
As an example, let's look at the Company <-> Project relationship. If some_company.projects is a call you frequently make and assuming the number of projects returned is expected to be reasonable, you can provide a cached version of this association in the following manner.
The same approach can be used for scopes, large queries involving joins etc..
In the end, the hardest part is thinking about which resources are involved in your cache and placing the right expiration calls on your associated models.
Now as the title says, it's an opportunistic habit. There is no point in caching every single database call in Redis as it will clutter your application code more than anything.
Your first habit should be to look at your database monitoring system. NewRelic, DataDog, GCP Query Insights...will give you hints on which queries are expensive and frequently run.
Target these first. Once you've addressed the most expensive queries you can evaluate where to further optimize database calls.
Keypup is on a mission to help developers and tech leads work better with each others on development projects. Our platform automatically centralizes, prioritizes and assigns people and actions on issues and pull requests to optimize your development flow.
Don't get lost because you have to juggle with twenty pull requests across five development projects. We'll clean and organize that for you to ensure a smooth landing.
Code snippets hosted with ❤ by GitHub