This article is Part 1 in a 6-Part Series.
- Part 1 - This Article
- Part 2 - Don't do it now! Part 2. Background tasks, job queuing and scheduling with Hangfire
- Part 3 - Don't do it now! Part 3. Hangfire details - jobs
- Part 4 - Don't do it now! Part 4. Hangfire details - dashboard, retries and job cancellation
- Part 5 - Don't do it now! Part 5. Hangfire details - job continuation with ContinueWith
- Part 6 - Don't do it now! Part 6. Hangfire details - recurring jobs and cron expressions
This is a post about the importance of not doing stuff right here and right now. About doing stuff later, delaying them and in a perfect world making someone else do them. But I will still stick to IT, don’t expect any life tips or guidance.
A mindset I see constantly in system design is doing the right thing right now, and even more stuff after that. Like sending an email, doing a REST call, performing some CPU intense calculations, calling an external resource. We are doing those things where they should be from the logical point of view. So, if we are given a task like registering an user that sound more or less like that:
"Verify the required fields, check if a user with such email doesn't exist if everything is OK save it to the database, and send an confirmation email."
We will:
- verify required fields
- check for a duplicate
- save it to the database
- send the confirmation email
And for me this is insane and even more - wrong. Wrong as not implementing what the requirement told. For me it should be:
- verify required fields
- check for a duplicate
- save it to the database
- schedule sending the confirmation email
Did you spot the difference? It is in the last step. My point is that we should schedule sending the email and not send it. Why schedule not do it now? There are many reasons, but to name a few:
- in a web application it makes the request faster (scheduling is faster then sending an email)
- processing of jobs can be done on different server.
- it is easy to scale job processing servers
- it is more fail resilient (easy to retry)
- it allows to batch operations
- it is easier to monitor
But for me there is also a bigger point to make here. I believe it is more tha n a technical detail, or a performance enhancer. I think that only the second version is implementing the requirement properly. This is because failure to send the email now should not block the registration. It’s a technical problem, not a business one and it should be dealt with without the user knowing. Imagine getting an error from registering saying “SMTP port not responding”. Most of us will say, or I hope so at least, “This is obvious”. But most of us spend our days implementing more complicated processes, where it is not that obvious, and we don’t dare to ask:
"Does it need to be done now? It is an integral part of the process? Do you need it in real time?"
Why? Maybe we feel it implies that the system isn’t fast enough or we can’t code it to work fast enough. Those may be, but my bet is on that we assume it is an integral part and we don’t ask:
- should the whole process fail if this part does?
- can this action fail for any business reason?
- how well should this scale? Long monolithic processes don’t scale.
If you hear “Yes” to any of this questions then do it later.
Keep in mind that latency between scheduling a job and picking it up for execution shouldn’t be greater than 5 second for most cases, because then you are still in “real time” frame of most use cases.