Sema Blog

Top Three Leading Indicators of Codebase Quality

Posted by Matt Van Itallie on Jun 17, 2019 6:29:56 PM
Find me on:


Sema’s goal is simple. We want to provide senior management, architects and developers with the definitive set of tools they need to find, fix, prevent and manage technical debt.

That goal may be straightforward, but the journey takes time, effort and attention. Finding and fixing technical debt means constant assessment of the health and quality of a codebase.

This allows stakeholders to quickly identify risks and “hotspots” -- and address them before they frustrate customers, generate enormous cleanup costs, or aggravate developers who just want to get back to creating new code.

In essence, the questions about codebase quality boil down to these three:
  • How good is your code?
  • Is your code getting better or worse?
  • What support do your teams require in order to deliver the right level of code quality?

Codebase assessment tools that answer parts of these questions have been around for decades. But our conversations with thousands of members of the software community -- investors, researchers, policy makers, software leaders, investors, industry analysts and, most importantly, developers and architects themselves -- have made it clear there is enormous unmet demand in the software investment space.

One of the challenges is where to begin. There are infinite ways to measure software quality, so a particular need exists for leading indicators – a short list of metrics that can accurately predict what to expect from a comprehensive review.

Perhaps the most famous example of leading indicators are the canaries in the coal mine; actual canaries, used to detect potentially deadly carbon monoxide build-ups so coal miners didn’t have to. One dead canary was a small price to pay for the information required to save hundreds of lives.

But we think iconic rockers Van Halen also have something to teach us all. The band would include a modest request in every live performance contract, tucked somewhere within dense text about important safety procedures and technical issues.

The rider? No brown M&Ms to be allowed backstage.


The band were happy for this perceived diva behavior to pass into rock music lore, another crazy contract clause imposed by pampered rock stars. Meanwhile, the band knew that if they found brown M&Ms in a candy bowl backstage, this was an indicator much larger safety and technical issues could have been overlooked or ignored.

With that goal in mind – production of a small list of actionable, predictive measures of codebase quality – Sema has analyzed thousands of codebases to see what measures can predict higher-quality code.

Our research and real-world implementations have lead us to identify three leading indicators of high quality codebase quality.

These metrics each reflect one element of each of the three pillars of Sema’s Software Quality Framework .

  1. The code itself (“the what”): Look for line-level technical debt less than $1.00 per line of code.
  2. The process used to create the code (“the how”): Look for files-per-commit variance under 50% over the past six months.
  3. Development teams contributions to quality (“the who”): Discover teams or developers with the highest contributions to code flexibility, and clarity.

Here we take a closer look at these crucial factors.


1) Code

Target: Line-level technical debt less than $1.00 per line of code.

Most organizations do not have the luxury to pursue code quality for its own sake. Instead, code quality is one more factor to optimize, along with creating features, delivering on time, and managing cost.

Tech leaders and non-technical executives have not had the ability to compare notes regarding potential trade-offs in scope, timing and quality. As a result, non-technical executives feel “in the dark” while technical leaders struggle to get buy-in for the crucial investments in code maintenance, even for those efforts with a quantifiably higher return on investment.

For those of us passionate about code quality, we believe it is up to us to explain it in a way that our colleagues can understand, rather than having to work in the dark.
With that in mind, Sema’s research and experience has identified a leading indicator of code quality that meets the needs of non-technical audiences: Line-level technical debt less than $1.00 per line of code.

This calculation takes into account the thousands of line-level warnings about code; focuses only on the substantive -- rather than stylistic -- warnings that are tuned to the organization’s specific context; accounts for time and cost of resolution; and contextualizes the cost and time relative to the overall size of the codebase.

Codebases that meet or exceed this standard are almost certainly paying close attention to these metrics and -- more importantly -- are highly likely to provide developers with tools to prevent and address issues in real time as they emerge.

678122-cogs-5122) Process

Target: Files-per-commit variance under 50% seen over the previous six months.

Code can be extremely high quality when it is first written. But without quality processes in place to add to or fix existing code, entropy in the system means quality will decline.

Sema’s research has identified a leading indicator of high quality codebase process for agile teams: Files-per-commit variance under 50% seen over the previous six months.
This calculation requires eliminating one-off administrative changes, understanding commit history within and across teams, and assessing process variance.

Organizations that meet or exceed this standard have the ability to track, coach and support their teams. Even during times of intense coding activity, they are able to keep processes consistent and disciplined. This bodes well for the code that results.

user-group-5123) Team

Target: Identify individuals with the highest contributions to code clarity and flexibility.

We mustn’t forget that software development is a team sport. That is obvious in situations where development teams are engaging in pair programming or mob programming. But it is equally true for individual programming, which is heavily determined by the team culture, tools and support structures.

At Sema, we typically focus on the work of teams rather than individual developers. So our leading indicator on team contributions to code quality is: Teams with the highest contributions to code flexibility and clarity should be known to the organization.
The insight about this leading indicator is obvious. It is extremely difficult to aim for code quality if one does not know whether the teams are heading in that direction.

The underlying science is complex. Measuring team code contributions on architectural measures requires branch-to-master and team reconciliation, and objective repeatable measures of architectural quality that capture the trade-offs created by sensible design decisions in the real world.

This metric has been found to be a leading indicator because it reveals whether teams have access to this information on a day-to-day basis. All of us in the software community are too familiar with development teams given the task of understanding and then untangling spaghetti code written by others with no tools, just wisdom, experience, and brute force reading of hundreds of thousands of lines of code.

It is now possible to do better than that. If you know your top contributors, that almost certainly means the development teams have the right tooling in place to set architectural goals.


We know there are high quality codebases that succeed without some or even all of these metrics. Extraordinary development teams can achieve extraordinary results even when they don’t have access to the best tools and insights. We’ve been privileged to work with developers and development teams who -- using their expertise and judgment -- can deliver high quality code optimized for the situation at hand.

That said, codebases that do meet or exceed all three of these indicators are almost guaranteed to be high quality. Transparency and active management of team contributions, the process by which code is committed, and the code quality itself, substantially increase the likelihood that code will be the best that it can be.

  • We’d love to talk more about how a quality codebase could be as easy as one, two, three. Sema is offering a no-cost evaluation of your code against these three leading indicators.

Learn more