Fixing OOM (Out-of-Memory) when Generating Sitemap
ruby on rails
TL;DR
When generating sitemaps, our application encountered an Out-of-Memory (OOM) issue due to the large number of records loaded. This is resolved by loading the records in batches instead.
What is Sitemap?
When users search for terms like “XXX interview questions”, I’m proud that NodeFlair often ranks in the top three results.
This is despite competing against companies:
- That have been around longer
- Higher Domain Authority (making it easier for them to rank if all else is equal),
- Dedicated SEO teams
One of the many things we did to make this possible was creating a sitemap and submitting it to Google. Think of sitemaps as directories that help search engines like Google crawl your site more efficiently.
The Issue
To generate the sitemap, we need to list all the URLs to be included.
Here’s a snippet of code used to generate the sitemap. When we run this code, the application loads all records into memory. (Modified for simplicity and confidentiality purposes)
# BEFORE
CompanyInterviewQuestion
.all do |interview_question|
add interview_path(interview_question)
end
When we first started, generating sitemaps was quick and easy. After all, there weren’t many pages.
However, over the years, with more data and pages on our site, the number of pages has grown to over 2 million. This means we are loading a large number of records into memory all at once, which causes the OOM issue.
Solution - Load the Data in Batches!
Of course, we could have easily used a machine with higher RAM, but that would be equivalent to buying a bigger house when your house has too many things instead of cleaning them up.
Luckily for us Ruby on Rails folks, instead of loading all the data into memory at once, we can use find_each
to load records in batches, reducing memory usage.
This is especially important when dealing with large datasets, as it prevents loading all records into memory at once.
# AFTER
CompanyInterviewQuestion
.find_each(batch_size: 100) do |interview_question|
add interview_path(interview_question)
end
Additionally, batching queries can distribute the load on the database more evenly over time, rather than placing a heavy load all at once.
With this change, we significantly reduce the RAM needed for generating the sitemap - another day of doing more with less!
A little about what I do at NodeFlair…
The world today runs on code written by developers that solve the world’s problems and impact lives.
Now, imagine a world where developers get to code at a place where they find purpose in their work. This meaning could translate into drive that pushes boundaries to solve more of the world’s problems.
That’s why at NodeFlair, we make it our mission to improve the world by empowering developers to code() at where they love.