Improving DORA Metrics – Lead Time for Changes
For the past nine years, DevOps Research and Assessment (DORA) has been providing the industry standard for measuring software delivery performance. Their research, published yearly in the Accelerate State of DevOps Report, explores best practices and capabilities to measurably improve an organization’s performance and its employee’s well-being.
The DORA Core model centers around four key metrics of software delivery performance that have been proven to improve organizational performance and employee well-being:
- Lead time for changes: How long does it take to go from having the code committed to successfully running in production?
- Deployment frequency: How often is code deployed to production or released to end users?
- Change fail rate: What percentage of changes released to production or released to users result in degraded service (eg lead to service impairment or service outage) and subsequently require remediation (eg require a hotfix, rollback, fix forward, patch)?
- Time to restore service: How long does it generally take to restore service after a service incident or defect impacts users (eg unplanned outage, service impairment)?
This article focuses on “Lead time for changes” and how to improve this metric. The faster a development team can make changes and release them to production, the sooner they can deliver value to customers, run experiments, and receive feedback. In business terms, lead time for changes measures an organization’s ability to be competitive in response to changing market conditions.
Cycle Time and Lead Time for Changes
Before you can start improving lead time for changes, you need to know how to measure it. To do this effectively, you’ll need to know about a closely related metric to the lead time for changes: cycle time.
While very similar, the main difference between the two is in the level at which these metrics look at a process.
Where the lead time for changes measures the time between when the change starts (the first commit) and when the change is deployed to production, cycle time zooms in on the separate steps a change goes through between start and finish.
The cycle time allows you to measure smaller intervals of the process to see with more precision where you can get the most benefit out of improvements. Breaking down potential improvements this way is useful for knowing what practical changes to make since smaller, more focused work is easier to plan and execute than large, sweeping changes.
Consider a typical example of a software development team responsible for improving and maintaining the company’s website and backend processes. As the lead, you know the team’s lead time for changes is seven days. Your chief information officer (CIO) has tasked you to improve this metric, but without more details, it’s difficult to figure out where to start.
Using cycle time, you break down the process between the first commit and the deployment to production. You determine that a change goes through the following steps:
- Coding: three days
- Waiting for review: one day
- Reviewing: two days
- Merging and deploying: one day
Total: seven days
This more granular view gives you insights into each step in the process and where there’s room for improvement.
How to Improve Lead Time for Change
Let’s consider how you can measure each step in the process and what some of the tools and best practices are to improve them.
Coding
The very first step for any change is coding or development work.
There are several ways to measure coding time, which differ mostly based on when the measurement is started. When working with a feature branch strategy like Gitflow, work is considered to have started when a feature branch is created and ended when a pull request (PR) is created to merge that branch into the main branch. When using trunk-based development, the moment the first commit is made is the starting point.
Many factors influence coding time, so there are many ways to improve the time spent in this step. Here are a few options:
- Improving requirements quality: High-quality requirements take much of the guesswork out of coding. When developers know what to build, they can focus completely on how to do it in the most efficient way possible—instead of having to constantly switch between coding and figuring out what to do.
- Pair programming: Pair programming is a practice where two developers work on the same code simultaneously. While this seems counterintuitive (after all, they could have been working on twice the number of features), pair programming increases productivity by improving code quality, reducing errors, and enhancing learning and knowledge sharing.
- Using AI-assisted coding tools: AI-assisted coding tools are on the rise, with the most notable example being GitHub Copilot. Having AI take care of most repetitive or boilerplate code can create huge boosts in productivity.
Waiting for Review
Pull requests are a standard way to ensure that code is always reviewed by someone other than the original author (the four-eyes principle). It ensures code adheres to standards, doesn’t contain errors, and solves the problem in the most efficient way. However, for all these benefits, PRs can also become a bottleneck in the software development process.
For example, the moment a PR is created for a code change, it sits there waiting for a reviewer to pick it up. Oftentimes, specific code requires specific team members, such as an architect or security specialist, to review it. If they are busy with something else, the PR will remain open until they can get to it.
There are a few things you can do to shorten the time between when a PR is created and picked up by a reviewer:
- Add more reviewers: It’s seemingly the simplest solution, provided that more reviewers with the correct skill set are available. There is a gotcha here, though, which leads to the second possible approach.
- Differentiate between PRs: Not all pull requests need the same level of review. Simple changes might require only a single reviewer, while more complex changes might require multiple reviewers. Keep in mind that differentiating can result in a lot of work, but tools like FlexReview can automate most, if not all, of this solution. (See the reviewer suggestions and smart code approval features, for example.)
- Use service level objectives (SLO): Just like support organizations use service level agreements (SLA) and SLOs to process customer support requests timeline, you can use SLO management for the PR process, too. This feature allows the code author and the reviewer to agree on picking up a PR within a certain amount of time. FlexReview also takes time zone differences, PR sizing, and more into account.
Reviewing Code
Core review time is the time between when the review is started and completed, which is marked by when the code is merged into the main code branch. A code review can consist of several steps: reviewing the proposed implementation, static code analysis, reviewing adherence to coding standards, security scanning, and more.
The more thorough the code review, the more time this step potentially takes. Fortunately, there are some ways you can speed up the process:
- Automated analysis: Many of the processes during a code review can (and should) be automated. Many analysis tools, like SonarQube, exist to automate static code analysis and security scanning, which removes much of the manual work.
- Smaller PRs: If you’ve changed fewer lines of code, a reviewer can complete their review faster. It’s also much easier to spot mistakes when you’re reviewing only a small change, resulting in higher-quality code and less rework—another factor that increases lead time for change.
- Assigning the right reviewer to the PR: As mentioned, PRs differ in complexity and subject matter. For example, some might require the attention of a security engineer, but others could be reviewed by a generalist or even a junior—and vice versa. FlexReview lets you assign reviewers based on the changes made and designate alternate reviewers who are automatically assigned if someone’s unavailable.
Merging and Deploying
Once a PR is completed, the code is merged into the main codebase and deployed to production. This step includes creating a production build, running automated tests, scanning for vulnerabilities, and implementing the actual deployment to production, which might include approvals and changes to infrastructure.
- Batch sizing: Small batch sizes have benefits across the software development process. In merge and deploy, a smaller change leads to less build time, less testing time, and fewer lines of code to scan for vulnerabilities, which speeds up the entire process.
- Build optimization: Most build engines include some sort of caching or incremental builds, which means only the changed code is rebuilt. You can also use parallelism, where you build multiple components simultaneously.
- Feature flags: This software development technique allows specific features or functionality to be turned on and off to hide them from users. Feature flags allow code to be deployed to production without worrying about user impact, which means testing and validation can take place in parallel with further development. Feature flags also let you deploy partial features—for example, only the backend for a new UI—which reduces dependencies between teams and potential bottlenecks.
- Infrastructure as code (IaC): Deploying infrastructure changes requires you to consider downtime, dependencies between components, security, and many more factors. In turn, it requires specialized knowledge, which means an Ops engineer needs to be available. IaC removes many of these issues by automating away the complexities of infrastructure and making it part of the development process. While you still need the expertise of an Ops engineer, they can be included much earlier in the process to guide you in developing the IaC templates. At deployment time, these templates can be automatically applied, which removes the dependency on the Ops.
Conclusion
DORA’s lead time for changes measures a development team’s ability to quickly implement and deploy changes. It’s an indicator of an organization’s ability to respond to market changes, which reflects its competitiveness in the market.
This article explained the difference between lead time for changes and cycle time and why you need to measure both correctly to make effective improvements to lead time for changes. It also suggested options for improving these metrics at each step of the software development process.
FlexReview helps you complete code reviews faster, more efficiently, and with less organizational stress. Want to know more about how it can help improve your lead time for change? Schedule a chat with one of our developers to find out more.