How to measure the productivity impact of using coding assistants
AI-powered software development has tranformed the way software is conceived, developed, tested, and managed. AI-assisted tools enable developers to explore innovative ideas and receive intelligent suggestions for new, improved, or refactored code.
Since the launch of ChatGPT in November 2022, Microsoft introduced its AI Copilot in November 2023. Other tools such as Tabnine, Amazon CodeWhisperer, OpenAI Codex, and Oracle Code Assist are also available, and this number continues to grow.
This article focuses on using GitHub Copilot to understand its impact on engineering productivity, though the same principles can apply to all coding assistants.
The evaluation process
New tools often spark debates about their pros and cons. Copilot, for example, has garnered significant attention. While generating code with AI tools is straightforward, evaluating it across various parameters is complex. No specific criteria or metrics exist to measure productivity, as it varies by person, team, and project
New AI tools frequently promise reduced workloads and improved productivity. Some believe AI tools will make developers obsolete, while others claim developers without AI tools are significantly less productive. To move beyond speculation, Scalefocus conducted an experiment over seven sprints (four months) with three agile teams, tracking key metrics:
Tasks↓ | Without Copilot | With Copilot | ||
LOC per story point (100 LOC/ story point) | Bugs | 9.72 | 29.37 | |
Tasks | 79.89 | 36.56 | ||
Hours of development (hours) | Bugs | Small | 3.21 | 2.97 |
Medium | 2.60 | 2.90 | ||
Tasks | Small | 5.33 | 8.0 | |
Medium | 5.60 | 4.44 | ||
Time working on Unit tests (hours) | Bugs | Small | 0.43 | 1.20 |
Medium | 1.50 | 1.00 | ||
Tasks | Small | 1.25 | 1.33 | |
Medium | 3.00 | 2.67 |
Tasks↓ | Without Copilot | With Copilot |
Average LOC per hour worked | 12-13 | 9 |
Average Vulnerabilities | 0 | Increased by 0.25% |
Average of duplicate blocks | 1.85 | 0.38 |
This data highlights Copilot’s impact on team productivity. Copilot accelerates software development, increasing productivity by generating repetitive code blocks and suggesting best practices. It reduces both development and code review time by handling syntax and naming conventions.
Quantitative Metrics
When measuring developer productivity with Copilot, no dedicated tools exist. However, other methods can help, varying by project and team. Common techniques include:
- Lines of Code (LOC) Metrics: While LOC alone isn’t ideal for measuring productivity due to code complexity and quality, it can indicate efficiency and readability. Copilot performs better with task development, but manual efforts excel in bug fixing. Copilot may generate unnecessary lines of code, so this metric should be used cautiously.
source: GitHub - Commit Size and Frequency Metrics: Tracking the volume and frequency of code changes helps understand Copilot’s impact. Surveys show a 10-11% increase in pull requests after Copilot integration, highlighting its influence on development pace and collaboration.
Quantitative assessments are crucial for setting performance benchmarks, tracking progress, and identifying strengths and weaknesses.
Qualitative Assessment
Surveys: Surveys provide insights into developers’ experiences with and without Copilot, covering utility, impact, productivity enhancement, and ease of use.
source: Adevinta Copilot Survey
Code Quality Assessment: Evaluating Copilot-generated code for adherence to standards, readability, maintainability, and reusability ensures alignment with industry best practices.
source: Faros
Evaluating the code generated by Copilot using the basic programming principles like KISS (Keep It Stupid Simple) and DRY (Don’t Repeat Yourself) can help identify whether the code is simple, reusable, and of better quality than the IDE’s default suggestions.
The SPACE Framework
Measuring developer productivity is complex and subjective. The SPACE framework, developed by experts, outlines five components:
- Satisfaction: Collecting this data can be very challenging, some proxy metrics include staff turnover and absenteeism rates.
- Performance: Focuses on quality outcomes rather than just output quantity.
- Activity: Tracks tasks like test cases, pull requests, and documentation.
- Communication and Collaboration: Measures response rate and collaboration rates.
- Efficiency and Flow: Includes metrics like time to market, defect density, and cost of delay.
This data can be collected with and without having a copilot to measure impacts on productivity.
Source: Devops.com
Conclusion
AI coding assistants can boost productivity, enhance code quality, and maintain a competitive edge. However, careful planning is essential for their integration. Leaders should assess their teams’ needs and choose appropriate AI tools, providing proper training for effective use.
While AI tools automate development tasks, they don’t guarantee perfect code. Data shows a positive impact on productivity, but not for all tasks. By implementing tracking mechanisms, setting clear goals, and regularly reviewing data, teams can maximize the productivity gains from Copilot.