A Rate-limited Sidekiq Job - Part 1
Problem: A background task needs to hit an external service, but not too frequently
Solution: Use the rate limit gem
What do we do when the limit has been met?
Recently, I’ve been working on the above. I thought I'd write about the initial solution I tried, which is fine, but doesn't quite scale if you are frequently hitting the limit. The next post will address an alternate take.
TLDR; raise an exception if the rate limit has been met, and let sidekiq queue the job for later.
Sidekiq has a great feature, in that failed jobs will be re-queued.
Some details on the rate limits involved:
We want to hit the external service (Strava API) a maximum of 600 times in a fifteen minute period (900 seconds).
To build the basic Sidekiq job, retrying every fifteen minutes if we failed (to let the rate limit replenish)
class StravaSyncUserJob include Sidekiq::Worker sidekiq_retry_in do |count| 15.minutes end def perform(user_id) end end
Now if our StravaSyncUserJob#perform implementation raises, the task will be shelved for a later attempt. Let's configure the rate limiting with the Ratelimit gem
In our Gemfile:
Then install the gem with
Now we'll setup some values:
RL_SUBJECT = "users" # Just a way to separate different ratelimit counts, can be any string in our case # 600 hits per 15 minutes RL_THRESHOLD = 600 RL_INTERVAL = 15.minutes
And implement our sidekiq perform method:
def perform(user_id) ratelimited do fetch_strava_data(user_id) end end
Creating the ratelimit object is easy, we give it a unique key, and an instance of the redis client, if we already have one for other purposes:
def ratelimit @ratelimit ||= Ratelimit.new( "strava_sync", redis: $redis # We are already using Redis elsewhere in the app. If you aren't, leave out this parameter ) end
Next, we'll implement the ratelimited method, which accepts a block and only calls it if the service has not exceeded the limit.
def ratelimited(&:block) raise "Ratelimit met" if ratelimit.exceeded?(RL_SUBJECT, interval: RL_INTERVAL, threshold: RL_THRESHOLD) block.call end
Above, you can see we implicitly raise a RuntimeError with a message. This exception will trigger Sidekiq to re-queue the job and try in 15 minutes.
Earlier, I hinted this was not the solution we ended with (I'll write that up in the next post). We have somewhere around 6000 users for which we want twice daily sync of strava trips taken. Coupled with the short nature of the job (many requests will do little work as not all users will have recorded a ride since our last check), this causes lots of retries as we quickly hit the 600 requests per 15 minutes ceiling. However, the above approach sees perhaps a majority of the jobs re-queued by Sidekiq. This is fine, as they will get serviced eventually, and sidekiq is very efficient at it's work. But thousands of jobs were being requeued was the norm (not the exception), which feels wrong.