Given his experience, I'm surprised that the author is surprised that companies don't know how much time they spend on hardening. Nobody gets paid to do that unless necessary for compliance; companies prefer to build features, and don't track this stuff. Don't even think about asking them to quantify the benefit of hardening.
I'm huge into measurement, and quantifying this has stumped me. It's one of the few areas I'm willing to surrender and say "Let's just pick a % of time to put on it."
It's bad to say "Let's give it to folks who are underutilized or have capacity" because those are rarely the people who can do it well.
All I can come up with is the hardening % should be in proportion to how catastrophic a failure is, while keeping some faith that well done hardening ultimately pays for itself.
There should be at least some large-company corporate incentive to measure "Bugs vs features"; the former is OpEx and the latter is CapEx, right?
(I've been at places where Finance and IT aligned to put 3 mandatory radio-button questions in JIRA which Finance used to then approximate development expenditure as CapEx vs OpEx. You were also invited as a manager to override the resulting percentages for your team once every period)
This metric is typically tracked internally and probably wouldn't be as public because it could indicate how "buggy" a product is. An easy way to measure this is time spent taking incidents from open -> mitigated -> resolved and treating that as time spent * engineers for amount of impact.
The tricky part would be measuring time spent on hardening and making the business decision on how to quantify product features vs reliability (which I know is a false tradeoff because of time spent fixing bugs but still applies at a hand wavy level)
Also, depending on the system, time spent on hardening is many times happening concurrently with some other tasks.
Maybe you trigger a load test, or run a soaking test or whatever, while that runs you do something else, pause and check results, metrics, logs, whatever.
If something is funky, you may fix something and try again, get back to your other task and so on.
It's messy, and keeping track of that would add significant cognitive load for little gain.
I see problematic hardening at two different levels)
1) Putting NULL pointer checks (that result in early returns of error codes) in every damn function. Adds a sizable amount of complexity for little gain.
2) Wrapping every damn function that can fail with a “try 10 times and hope one works” retry loop. It quickly becomes problematic and unscalable. An instantaneous operation becomes a “wait 5 minutes to get an error” just because the failure isn’t transient (so why retry?).
Also becomes quickly absurd (gee, tcp connect failed so let’s retry the entire http request and connect 10 more times each attempt… gee, the HTTP request failed so let’s redo the larger operation too!)
The time spend on hardening software is always zero or very close to that unless the company makes that hardening a selling point of the product they make.
In the world of VC powered growth race to bigger and bigger chunk of market seems to be the only thing that matters. You don't optimize your software, you throw money at the problem and get more VMs from your cloud provider. You don't work on fault tolerance, you add a retry on FE. You don't carefully plan and implement security, you create a bug bounty.
Then you'll get hacked or have an outage, and unless you're a monopoly it will cost you. But will the people who made poor decisions be held accountable?
You can do a decent hardening job without too much effort, if follow some basic guidelines. You just have to be conscientious enough.
I was once told to stop wasting time submitting PRs adding null checks on data submitted via a public API. You know, the kind of checks that prevented said API from crashing if a part of payload was missing. I was told to stop again with my concerns dismissed when I pointed similar things out during code review. I left that company not long after, but it's still around with over a quarter of a billion in funding.
I would love to say that this was an exception during almost 20 years of my professional career, but it wasn't. It was certainly the worst, but also much closer to average experience than it should have been.
I find valgrind easy on Linux and ktrace(1) on OpenBSD easy to use. I do not spend much time, plus I find testing my items on Linux, OpenBSD and NetBSD tends to find most issues without a lot of work and time.
This is not a "companies don't spend enough time with static and dynamic analysis of their software" problem, it's "less than a third of companies I worked or consulted for in the past 20 years mandated having input validation of any kind" problem.
Given his experience, I'm surprised that the author is surprised that companies don't know how much time they spend on hardening. Nobody gets paid to do that unless necessary for compliance; companies prefer to build features, and don't track this stuff. Don't even think about asking them to quantify the benefit of hardening.
https://www.wiley.com/en-us/How+to+Measure+Anything+in+Cyber...
I'm huge into measurement, and quantifying this has stumped me. It's one of the few areas I'm willing to surrender and say "Let's just pick a % of time to put on it."
It's bad to say "Let's give it to folks who are underutilized or have capacity" because those are rarely the people who can do it well.
All I can come up with is the hardening % should be in proportion to how catastrophic a failure is, while keeping some faith that well done hardening ultimately pays for itself.
Philip Crosby wrote about this in manufacturing as "Quality is Free" https://archive.org/details/qualityisfree00cros
re: "Nobody gets paid to do that"
There should be at least some large-company corporate incentive to measure "Bugs vs features"; the former is OpEx and the latter is CapEx, right?
(I've been at places where Finance and IT aligned to put 3 mandatory radio-button questions in JIRA which Finance used to then approximate development expenditure as CapEx vs OpEx. You were also invited as a manager to override the resulting percentages for your team once every period)
It is pretty unknowable.
How do you figure? You could seat a 2nd programmer next to the first and have him watch and measure with a stopwatch. Expensive, but doable.
This metric is typically tracked internally and probably wouldn't be as public because it could indicate how "buggy" a product is. An easy way to measure this is time spent taking incidents from open -> mitigated -> resolved and treating that as time spent * engineers for amount of impact.
The tricky part would be measuring time spent on hardening and making the business decision on how to quantify product features vs reliability (which I know is a false tradeoff because of time spent fixing bugs but still applies at a hand wavy level)
Also, depending on the system, time spent on hardening is many times happening concurrently with some other tasks.
Maybe you trigger a load test, or run a soaking test or whatever, while that runs you do something else, pause and check results, metrics, logs, whatever.
If something is funky, you may fix something and try again, get back to your other task and so on.
It's messy, and keeping track of that would add significant cognitive load for little gain.
Code that directly affects revenue (e.g. licensed entitlement enforcement) is hardened, quietly and iteratively, based on failure and attacker.
I see problematic hardening at two different levels)
1) Putting NULL pointer checks (that result in early returns of error codes) in every damn function. Adds a sizable amount of complexity for little gain.
2) Wrapping every damn function that can fail with a “try 10 times and hope one works” retry loop. It quickly becomes problematic and unscalable. An instantaneous operation becomes a “wait 5 minutes to get an error” just because the failure isn’t transient (so why retry?).
Also becomes quickly absurd (gee, tcp connect failed so let’s retry the entire http request and connect 10 more times each attempt… gee, the HTTP request failed so let’s redo the larger operation too!)
The time spend on hardening software is always zero or very close to that unless the company makes that hardening a selling point of the product they make.
In the world of VC powered growth race to bigger and bigger chunk of market seems to be the only thing that matters. You don't optimize your software, you throw money at the problem and get more VMs from your cloud provider. You don't work on fault tolerance, you add a retry on FE. You don't carefully plan and implement security, you create a bug bounty.
It sucks and I hate it.
Then you'll get hacked or have an outage, and unless you're a monopoly it will cost you. But will the people who made poor decisions be held accountable?
You can do a decent hardening job without too much effort, if follow some basic guidelines. You just have to be conscientious enough.
I was once told to stop wasting time submitting PRs adding null checks on data submitted via a public API. You know, the kind of checks that prevented said API from crashing if a part of payload was missing. I was told to stop again with my concerns dismissed when I pointed similar things out during code review. I left that company not long after, but it's still around with over a quarter of a billion in funding.
I would love to say that this was an exception during almost 20 years of my professional career, but it wasn't. It was certainly the worst, but also much closer to average experience than it should have been.
c2h5oh: that does sound sucky. Perhaps it mostly describes web development though? Other software fields take this stuff more seriously.
Unless you equate web development and SaaS then no. It's the same in education, finance and SaaS targeting Fortune 500 companies.
Source: most of the companies I worked or consulted for in the past 20 years.
Depends upon the software.
I find valgrind easy on Linux and ktrace(1) on OpenBSD easy to use. I do not spend much time, plus I find testing my items on Linux, OpenBSD and NetBSD tends to find most issues without a lot of work and time.
This is not a "companies don't spend enough time with static and dynamic analysis of their software" problem, it's "less than a third of companies I worked or consulted for in the past 20 years mandated having input validation of any kind" problem.