How to Measure Productivity of Coding Assistants

AI-powered software development has tranformed the way software is conceived, developed, tested, and managed. AI-assisted tools enable developers to explore innovative ideas and receive intelligent suggestions for new, improved, or refactored code.

Since the launch of ChatGPT in November 2022, Microsoft introduced its AI Copilot in November 2023. Other tools such as Tabnine, Amazon CodeWhisperer, OpenAI Codex, and Oracle Code Assist are also available, and this number continues to grow.

This article focuses on using GitHub Copilot to understand its impact on engineering productivity, though the same principles can apply to all coding assistants.

The evaluation process

New tools often spark debates about their pros and cons. Copilot, for example, has garnered significant attention. While generating code with AI tools is straightforward, evaluating it across various parameters is complex. No specific criteria or metrics exist to measure productivity, as it varies by person, team, and project

New AI tools frequently promise reduced workloads and improved productivity. Some believe AI tools will make developers obsolete, while others claim developers without AI tools are significantly less productive. To move beyond speculation, Scalefocus conducted an experiment over seven sprints (four months) with three agile teams, tracking key metrics:

Tasks↓			Without Copilot	With Copilot
LOC per story point (100 LOC/ story point)	Bugs		9.72	29.37
LOC per story point (100 LOC/ story point)	Tasks		79.89	36.56
Hours of development (hours)	Bugs	Small	3.21	2.97
	Bugs	Medium	2.60	2.90
	Tasks	Small	5.33	8.0
	Tasks	Medium	5.60	4.44
Time working on Unit tests (hours)	Bugs	Small	0.43	1.20
	Bugs	Medium	1.50	1.00
	Tasks	Small	1.25	1.33
	Tasks	Medium	3.00	2.67

source: Scalefocus

Tasks↓	Without Copilot	With Copilot
Average LOC per hour worked	12-13	9
Average Vulnerabilities	0	Increased by 0.25%
Average of duplicate blocks	1.85	0.38

source: Scalefocus

This data highlights Copilot’s impact on team productivity. Copilot accelerates software development, increasing productivity by generating repetitive code blocks and suggesting best practices. It reduces both development and code review time by handling syntax and naming conventions.

Quantitative Metrics

When measuring developer productivity with Copilot, no dedicated tools exist. However, other methods can help, varying by project and team. Common techniques include:

Lines of Code (LOC) Metrics: While LOC alone isn’t ideal for measuring productivity due to code complexity and quality, it can indicate efficiency and readability. Copilot performs better with task development, but manual efforts excel in bug fixing. Copilot may generate unnecessary lines of code, so this metric should be used cautiously.

source: GitHub
Commit Size and Frequency Metrics: Tracking the volume and frequency of code changes helps understand Copilot’s impact. Surveys show a 10-11% increase in pull requests after Copilot integration, highlighting its influence on development pace and collaboration.

Quantitative assessments are crucial for setting performance benchmarks, tracking progress, and identifying strengths and weaknesses.

Qualitative Assessment

Surveys: Surveys provide insights into developers’ experiences with and without Copilot, covering utility, impact, productivity enhancement, and ease of use.

source: Adevinta Copilot Survey

Code Quality Assessment: Evaluating Copilot-generated code for adherence to standards, readability, maintainability, and reusability ensures alignment with industry best practices.

source: Faros

Evaluating the code generated by Copilot using the basic programming principles like KISS (Keep It Stupid Simple) and DRY (Don’t Repeat Yourself) can help identify whether the code is simple, reusable, and of better quality than the IDE’s default suggestions.

The SPACE Framework

Measuring developer productivity is complex and subjective. The SPACE framework, developed by experts, outlines five components:

Satisfaction: Collecting this data can be very challenging, some proxy metrics include staff turnover and absenteeism rates.
Performance: Focuses on quality outcomes rather than just output quantity.
Activity: Tracks tasks like test cases, pull requests, and documentation.
Communication and Collaboration: Measures response rate and collaboration rates.
Efficiency and Flow: Includes metrics like time to market, defect density, and cost of delay.

This data can be collected with and without having a copilot to measure impacts on productivity.

Source: Devops.com

Conclusion

AI coding assistants can boost productivity, enhance code quality, and maintain a competitive edge. However, careful planning is essential for their integration. Leaders should assess their teams’ needs and choose appropriate AI tools, providing proper training for effective use.

While AI tools automate development tasks, they don’t guarantee perfect code. Data shows a positive impact on productivity, but not for all tasks. By implementing tracking mechanisms, setting clear goals, and regularly reviewing data, teams can maximize the productivity gains from Copilot.

How to measure the productivity impact of using coding assistants

Anonymous

How to measure the productivity impact of using coding assistants

The evaluation process

Quantitative Metrics

Qualitative Assessment

The SPACE Framework

Conclusion

Subscribe to new posts.

Be the first to know once we publish a new blog post

The evaluation process

Quantitative Metrics

Qualitative Assessment

The SPACE Framework

Conclusion

Similar

Technical Debt and the Role of Refactoring

Solving the Nasty Code Migration Problem with Assisted AI Agents

How to Do Developer Satisfaction Surveys Right

Trending now

Subscribe to new posts.

Be the first to know once we publish a new blog post

Learn best practices from modern engineering teams