Post Save hooks, do it right.

Han
5 min readApr 6, 2022

Post save hooks is one of the common ways to connection actions to a “save” action. This kind of pattern exists in many business domains. Some examples include:

Send Email after a new User is created.

Recalculate a user’s total_post_count when the user have created a new post.

In a simple web application, developers may simply wrap the action right after the save function, or even use some dispatch mechanism to trigger actions. Some times people event right triggers into database to ensure accuracy.

//Ruby on Rails
class User extends ActiveModel
after_save: send_email
def send_email
// call email service send email notify user profile is updated.
end
end
// PostgreSQL
create trigger after_new_post AFTER INSERT on posts for each row
execute procedure recalc_post_count();

While this is a pretty good solution for many start up projects, it is not very sustainable when the server gets busy. Imagine (just an unrealistic example to make my point) a user uses an “import posts from other source” function and imported thousands of posts, do you want to trigger recalc_post_count every time a post is created ? Obviously a sensible thing to do is to wait until they are all done and recalculate.

In this posts I will talk about how we should use “triggers” correctly.

Async trigger actions

It is quite common to add trigger directly to the main thread code. (like the Ruby on rails example). But it has a significant flaw: the save and send_email actions are not in the same business domain, various issues could arise, for example, what if the email service is down ? In a good object oriented design, it is also not ideal to write send_email inside a User class. When there are actions that should be triggered, we should always use async triggers and delegate the job to a background worker. A full function web application should always have access to a background worker, like sidekiq(ruby), celery(python), Bull(js) .

Because of the async nature of background jobs, the job may be delayed (depend on how you manage the queue), but usually that’s ok, the triggered action are usually not exactly time sensitive.

Debounced the triggers to avoid frequent invocation

If we agree that the background jobs are not time sensitive, then we may also agree that a very frequent triggering of the job is also not necessary. If you work on frontend html, you might be very familiar with Debounced/Throttled.

Debounced/Throttled are slightly different, but they both are used very frequently in frontend web pages to solve one simple problem: user click a button to trigger an action too fast.

The same idea could be applied to this “triggered action”, if an update happened too frequently, then instead of triggering one job per update, it could be debounced, or throttled to trigger much less jobs.

Jobs should be designed to be idempotent and transactional

Background workers are different process than the main web application, the way they communicate with each other is by some broker or message service, such as redis, rabbit-mq .

This extra level of network service increases the risk of unexpected failures for background jobs. This is expected, as we already know, an async of background worker to complete some recalculation is not as stable as just inserting the code inside the main thread, but with careful designing of the jobs, the pro outweights the con significantly.

One of the key design consideration is that the job should be idempotent and transactional. Idempotent means the job can be triggered multiple times, the result would still be correct.

A example of un-idempotent job would like as follows:

class Post
after_create: increament_count_job
// background
def increament_count_job(user_id)
user = User.find(user_id)
user.post_count = user.post_count + 1
user.save()
end

The above job is very important to be triggered once per post creation, and cannot be debounced, or rerun. A good example, would be like below:

class Post
after_create: recalc_count_job
// background
def recalc_count_job(user_id)
user = User.find(user_id)
user.post_count = user.posts.size()
user.save()
end

The idempotent characteristic enables a failed job to be retried constantly until successful. It is much easier to have a retriable job, than having a complex try catch to deal with failure. However, even an idempotent job may have side effects that may not work well. For example:

def send_email_to_new_user(user_id)
send_welcome_email(user_id)
user.welcome_email_sent = true
user.save()
end

If the user.save() failed, the email has already been sent, you should not retry the job, because that would send duplicate emails. This is what transactional mean. The job, if failed in the middle then retried, should not produce duplicate side effect. The best is to make the job rollbackable, when it failed half way, a try/catch mechanism automatically rollback the current changes, like the database transactions. But usually it is not necessary, a retry of the job could overwrite the changes, you do a little extra work, (full retry, instead of recover from failed code) but that is little price to pay.

With these two important points in mind, the background jobs should be very easy to manage.

Use multiple Queues to control the jobs

Now you have moved all your trigger action into some background job and your website is getting popular, and application is getting more complex. You find yourself running into problems of jobs delayed too much.

Then you should consider having multiple queues to manage the jobs.

Most background worker framework, the jobs are deployed to a queue, by default it is called something general like “default”. And you can start multiple worker processes, which continuously poll from the queue, and execute the job in parallel. The framework should also let you to specify which queue(s) the worker should pick from.

As an example setup I like to use:

Queue 1:
name: fast
Queue 2:
name: data-sync
Queue 3:
name: slow
Worker 1-10: Pick job from Queue 1 only
Worker 11-15: Pick job from Queue 2 and Queue 1 (prefer Queue 2)
Worker 16-20: Pick job from Queue 3 and Queue 1 (prefer Queue 3)
Worker 21-25: Pick job from Queue 4 and Queue 1 (prefer Queue 4)

Most of the quick jobs, like “send email”, “updating some other field after you save post” etc are in Queue 1. jobs like “sync financial data into local database” are in Queue 2, because they take longer time, and may fail due to third party’s api may not be stable. Queue 3 is for jobs that takes much longer time, mostly they are cron jobs that fetch data from local database and recalculate some statistics.

Summary

In this post, I talk about a recommended way to handle trigger actions that is associated with a data change. I recommend use background worker to handle these trigger actions because the background jobs can be debounced/throttled, and design these jobs to be, in simple words, “retry-able”. Also if application gets more complicated, consider using multiple queues to handle the jobs.

--

--

Han

Google SWE | Newly Dad | Computational Biology PhD | Home Automation Enthusiast