5 Years of Forgotten Jobs: What We Learned

At Buffer, we recently embarked on a small project to clean up how parts of our systems communicate behind the scenes. Some quick context: we use Amazon SQS (Simple Queue Service), which acts like a waiting room for tasks.
Total
0
Shares

At Buffer, we recently embarked on a small project to clean up how parts of our systems communicate behind the scenes.

Some quick context: we use Amazon SQS (Simple Queue Service), which acts like a waiting room for tasks. One part of our system drops off a message, and another picks it up later. Think of it like leaving a note for a coworker: “Hey, when you get a chance, process this data.” The system that sends the note doesn’t have to wait around for a response.

Our project was to perform routine maintenance: update the tools we use to test SQS locally and clean up their configuration.

While we were mapping out what queues we actually use, we found something we didn’t expect: seven different background processes (or cron jobs, which are scheduled tasks that run automatically) and workers that had been running silently for up to five years. All of them doing absolutely nothing useful.

Here’s why that matters, how we found them, and what we did about it.

Why this matters more than you’d think

Yes, running unnecessary infrastructure costs money. I did a quick calculation and for one of those workers, we would have paid ~$360-600 over 5 years. This is a modest amount in the grand scheme of our finances, but definitely pure waste for a process that does nothing.

However, after going through this cleanup, I’d argue the financial cost is actually the smallest part of the problem.

Every time a new engineer joins the team and explores our systems, they encounter these mysterious processes. “What does this worker do?” becomes a question that eats up onboarding time and creates uncertainty. We’ve all been there – staring at a piece of code, afraid to touch it because maybe it’s doing something important.

Even “forgotten” infrastructure occasionally needs attention. Security updates, deprecation notices, and other maintenance tasks can be a challenge when the processes are no longer used.

How we found them

The process of finding these workers was actually quite straightforward. We used AWS CloudFormation to create a template that would automatically create the workers and their SQS queues. We then used AWS CloudWatch to monitor the workers and their queues.

When we noticed that some of the workers weren’t responding, we used AWS CloudWatch to detect the issue. We then used AWS CloudFormation to delete the workers and their queues.

What we did about it

Once we had identified the workers and their queues, we decided to take a more proactive approach. We created a new AWS CloudFormation template that would automatically create the workers and their queues, but also include a small script that would periodically check the status of the workers and their queues.

If the workers were not responding, the script would automatically delete them and their queues. We also created a new AWS Lambda function that would periodically check the status of the workers and their queues and update the CloudFormation template accordingly.

We also created a new AWS CloudWatch metric that would track the number of workers and their queues. This metric would alert us if the number of workers or their queues was increasing or decreasing, and would help us to identify any potential issues.

The result

After deleting the workers and their queues, we noticed a significant reduction in the number of workers and their queues. We also noticed a reduction in the number of onboarding time for new engineers, and a reduction in the uncertainty and confusion that came with encountering these mysterious processes.

We also noticed a significant reduction in the amount of money that we were paying for these unnecessary infrastructure costs.

The conclusion

In conclusion, finding and deleting these unnecessary workers and their queues was a great success. It helped us to reduce our unnecessary infrastructure costs, improve the onboarding process for new engineers, and reduce the uncertainty and confusion that comes with encountering mysterious processes.

We hope that our experience can help other teams to identify and delete unnecessary infrastructure, and to improve the overall efficiency and effectiveness of their systems.

FAQ

Here are some common questions that we received during the cleanup process:

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like