When you think about cost in the DevOps world, the first thing that will come to mind is your AWS bill, or maybe that Datadog subscription. However, the cost is not always as straightforward as receiving an invoice, and in the DevOps world, the hidden cost is about doing it wrong.
I believe that being aware of hidden costs is a “mental model” you can use when making decisions. Without taking it to account, you are making bets with only half of the information.
Hidden cost takes on many forms in DevOps:
- Idle developers are waiting for builds to finish.
- Over architected Jira workflows that take forever to go through - Jireaucracy (not my term).
- Automating things that don’t need to scale - we are all lazy engineers who would rather spend a full day scripting something than doing 5 minutes of manual work.
- Trying to save cost where there is no real benefit.
- Higher Churn Rate.
- Technical debt.
Let’s try to take a close look at some of these examples.
Not accounting for the human element
In these complicated days, companies are franticly working to reduce their cloud costs. However, there’s a trap that you need to be mindful of when going through such a process.
Let’s look at this oversimplified equation.
$100 (hourly cost of a developer) x 10 (developers) x 2 (idle hours a day) x 21 (work days in a month) = $42,000
If, on average, a developer at a company waits on builds or deployment for two hours per day and there are 10 developers on the team, that’s $2,000/ per day going down the drain. Multiply by 21 workdays per month, and you get $42,000 of wasted developer time.
Now, suppose that you can cut the idle time by half by adding two new build workers, where each one costs $3000/month. Is it worth it? It seems trivial when you present it this way but harder to consider when cutting costs across the board.
The problem is that employees are a “fixed cost”, a different budget, and you don’t think of them in the context of wasted resources.
Now think about that developer idling for two hours a day. An idle engineer is a bored engineer, which leads me to the next point - chasing away competent engineers, or making it more challenging to attract new ones.
You spend significant resources to keep your top engineers, but then they will be the first to jump ship when work is just not fun.
Team leaders spend much of their time in the recruiting process when the company is growing. Time that could be used for onboarding new employees, or mentoring the team, or doing actual development work.
If it takes weeks for engineers to see their work in production or they spend their time chasing the DevOps team to get a simple thing running, they will not stay for long. And worse, you will start getting a bad reputation, resulting in fewer hiring options.
Add burnout to the mix, and now your top developers are producing less, which trickles down to more jr. engineers. You are now stuck with a disgruntled team and no new hires on the horizon.
Trying to automate and scale at all cost
Here is another example that I see all the time - A belief that everything has to be automated and scalable.
“Write this script to automate task XYZ, and don’t forget to document everything and make sure it’s IaC and write a confluence page and…” But do you really need to automate it? How much time does this task take, and how many times are you planning to perform it?
If it’s an annoying task that takes an hour to perform, and you expect to repeat it three times in the future, does it justify a full week of an engineer to automate?
What about a deployment that is pretty much a one-off? Do you really need to automate it fully? Is it worth your time? Isn’t there something else that is more meaningful for you to do?
I mean, yes, we have standards and workflows and beliefs, but we don’t have to follow them blindly when it doesn’t make sense.
Technical Debt
Managers who brush off the importance of technical debt are a bit like people who never back up their system. They never saw how technical debt could cripple a company.
If you don’t account for the effects of technical debt, it’s hard to see the cost, but by looking closely, you will soon see how much it affects your velocity, churn, and pretty much every aspect of the value you deliver:
- It slows down your developers.
- Each feature takes twice as much time to complete.
- The system keeps crashing.
- It burns out your team members.
- It makes some business decisions impossible.
Think about credit card debt. At one point, the interest rates will make you go bankrupt. Yes, fixing some of the issues takes time and resources, but can you afford to continue rolling the snowball?
Opportunity cost
In a previous example, I referred to the cost “as is,” using the developer’s direct hourly cost. But unless you are a service business, a developer hour is worth much more than the cost.
This is why software companies are worth so much - The Developer’s time is a force multiplier. Developers are adding value to the company that is much higher than their yearly cost. The value they add takes the the form of IP and products that you can sell over and over without scaling inventory and distribution.
If we go back to the previous example, if you instead have the developers work on a new product or initiative with those hours, it’s not $42k per month that you are losing, but giving up a much higher future revenue.
And of course, opportunity cost can take on other forms that may not be as intuitive:
Milestone Investing
Suppose that you are an early startup that has raised capital with set milestones (Milestone Investing). By hitting the next milestone, you will increase your valuation and get a hefty cash infusion. Now imagine your runway running out, and you have yet to meet the milestone.
Wouldn’t it be great if you had more time to reach your goals?
Suppose that your developers worked much more efficiently. Instead of idling for two hours, automating and simplifying your developers’ workflow added an hour a day of “working in the zone” for each developer.
How much more could you accomplish by having three more hours a day per developer?
Could you meet that arbitrary board goal with an impossible deadline they set for you?
Is it going to help you meet the milestone the investors set for receiving the next cash infusion?
That’s where we, the DevOps engineers (or whatever you call this function at your company), can help move the needle. We can facilitate growth and improve the productivity of our engineering teams. It’s not just about throwing new technology at the solution and implementing the latest fad.
So how can we produce Value as DevOps engineers?
DevOps has always been tied to business value, but it’s rare to see this idea fully implemented in corporate culture. Even when it does, it’s usually wrapped around a vague user story with a soggy and generic statement that doesn’t make any sense.
“This feature will increase user happiness and deliver more value by making the top navigation bar a bit more blueish.”
One way to create value is by addressing the cost - by reversing it, we can generate value. All you need is to look at it from a different perspective.
I mentioned Churn before as a metric that the business is actively working to reduce, but for engineers, it’s just “downtime.” Engineers are seldomly exposed to the idea that lowering Churn increases customer lifetime value and affects customer acquisition cost.
So put your DevOps cap on for a minute. Can you think of ways to reduce the churn rate?
- Reduce downtime.
- Reduce latency and load speed.
- Optimize report generation time.
- Increase email deliverability.
- Enable toggle features to allow your product team to test their new features on users.
The list goes on, and it’s just for reducing Churn. And yes, these are obvious priorities for operations, but now you have a clear business value tied to it.
Almost every hidden cost I mentioned earlier could turn into a value proposition for a DevOps related project.
Make sure you have all the data
All of this “Cost” and “Value” talk is almost meaningless or non-actionable if you don’t have metrics in place.
- You need to have a cloud provider cost breakdown to know how much you’re spending on a project or a server.
- Build time statistics.
- Build and deployment success rate over time.
- How long it takes to onboard a new developer to the team.
You need the metrics for three reasons:
- Visibility - You need to be aware of the cost in the first place.
- When making a decision, you have to have a benchmark.
- To see the impact of your actions when working to reduce the cost.
You can’t make informed decisions without having all the data; otherwise, it’s just guessing.
How to start thinking about hidden cost?
Ask yourself the following questions:
- Do we really need to scale this?
- How much time will it take to reduce a cost? What can your developers do instead?
- How can you improve the development workflow? How much time can you save?
- Do we have a culture that allows open discussion? Do engineers have a voice when it comes to new projects and features?
- How can we share knowledge across the organization, so engineers are aware of the product, sales, and marketing struggles?