Home GADGETS Does GitHub Copilot improve code quality? Here’s what the data says

GADGETS

Does GitHub Copilot improve code quality? Here’s what the data says

November 19, 2024

AI has fundamentally changed software development in the two years since GitHub Copilot was released to the public. In that time, GitHub Copilot has helped developers code up to 55% faster. Prior research also showed that 85% of developers felt more confident in their code and 88% felt more in the flow using GitHub Copilot.

But the question remains: is the quality of code written using GitHub Copilot objectively better or worse?

To answer this, we conducted a randomized controlled trial to understand how functional, readable, reliable, maintainable, conciseand likely to be approved code authored using GitHub Copilot is.

In the study, we recruited 202 developers with at least five years of experience. Half were randomly assigned GitHub Copilot access and the other half were instructed not to use any AI tools. The participants were all asked to complete a coding task writing API endpoints for a web server. We then evaluated the code with unit tests and with an expert review conducted by developers.

Our findings overall show that the quality of code authored with GitHub Copilot has increased functionality, improved readability, better quality, and higher approval rates.

Key findings from this study:

Increased functionality: developers with GitHub Copilot access had a 56% greater likelihood of passing all 10 unit tests in the study, indicating that GitHub Copilot helps developers write more functional code by a wide margin.
Improved readability: in blind reviews, code written with GitHub Copilot had significantly fewer code readability errors, allowing developers to write 13.6% more lines of code, on average, without encountering readability problems.
Overall better quality code: readability improved by 3.62%, reliability by 2.94%, maintainability by 2.47%, and conciseness by 4.16%. All numbers were statistically significant. These quality improvements were consistent with those found in the 2024 DORA Report.
Higher approval rates: developers were 5% more likely to approve code written with GitHub Copilot, meaning that such code is ready to be merged sooner, speeding up the time to fix bugs or deploy new features.

Here’s a deep dive into what we found:

Code written using GitHub Copilot was more functional

If code doesn’t work, it’s impossible to say it’s high quality. So, we looked at functionality and measured it by analyzing how many unit tests the code passed. We found that code authored using GitHub Copilot passed significantly more tests (p=0.002). In fact, the developers with GitHub Copilot access had a 56% greater likelihood to pass all 10 unit tests in the study (p=0.04). This means that using GitHub Copilot helps developers write code that is more functional by a wide margin.

Developers found code written using GitHub Copilot easier to read

The 25 developers who authored code that passed all 10 unit tests from the first phase of the study were randomly assigned to do a blind review of the anonymized submissions, both those written with and without GitHub Copilot. What the reviewers found was that code authored using GitHub Copilot had fewer code readability errors.

Our analysis of the developers’ line-by-line code review showed that code written using GitHub Copilot had significantly fewer code errors: developers using GitHub Copilot wrote 18.2 lines of code per code error, but only 16.0 without. That equals 13.6% more lines of code with GitHub Copilot on average without a code error (p=0.002). This can translate into real-time savings, as each of these code errors require action from a developer. For example, without using GitHub Copilot, teams may face up to 13% more comments or suggestions to address, which accumulates over time.

	Mean # of code errors	Mean lines of code	Avg lines of code per code error	% difference
Using GitHub Copilot	4.63	84.3	18.2	13.6%
Not using GitHub Copilot	5.35	85.7	16.0	-11.9%

Furthermore, the differences developers found weren’t limited to the errors per line of code. They also rated the code authored using GitHub Copilot as more readable, reliable, maintainable, and concise by 1-3% (p=0.003, p=0.01, p=0.041, p=0.002, respectively). While these differences were small, they were statistically significant and do contribute to a better codebase.

Dependent Variable	Mean difference	P-value
Readable	3.62%	0.003
Reliable	2.94%	0.01
Maintainable	2.47%	0.041
Concise	4.16%	0.002

Code authored using GitHub Copilot was more likely to be approved

Finally, we found that developers were 5% more likely to approve code authored using GitHub Copilot, too (p=0.014). In real-world settings, this means that developers using GitHub Copilot write code that’s ready to be merged sooner, which will speed up the time to fix bugs or deploy new features.

The bottom line

So, what do these findings say about how GitHub Copilot improves code quality? While the number of commits and lines of code changed was significantly higher for the GitHub Copilot group, the average commit size was slightly smaller. This suggests that GitHub Copilot enabled developers to iterate on the code to improve its quality. Our hypothesis is that because developers spent less time making their code functional, they were able to focus more on refining its quality. This aligns with our previous findings that developers felt more confident using GitHub Copilot. It also demonstrates that with the greater confidence GitHub Copilot gave them, they were likely empowered to iterate without the fear of causing errors in the code.

As the first controlled study to examine GitHub Copilot’s impact on code quality, it shows that GitHub Copilot aids in writing high quality code. We hypothesize that other studies might not have found an improvement in code quality with GitHub Copilot, not because of the tool itself, but because developers may have lacked the opportunity or incentive to focus on quality. This data builds on our previous research indicating that GitHub Copilot is a powerful product that helps developers code quicker and increase job satisfaction, as well as empowers teams to move fast and supercharge their creativity and innovation.

Here, in the GitHub Customer Research team, we’re constantly conducting new research on the efficacy of our products as we work to be the home of 1 billion developers—so, stay tuned for more insights and developments in the near future.

Methodology

In the first phase of the study, we recruited 243 developers with at least five years of Python experience. They were randomly assigned to either use GitHub Copilot or not. Each group completed a coding exercise for a web server of fictional restaurant reviews, with 10 unit tests to assess functionality. We received valid submissions from 202 developers: 104 with GitHub Copilot and 98 without.

In the second phase, developers were randomly assigned submissions to review using a provided rubric. They were blind to whether the code was authored with GitHub Copilot. Each submission was reviewed by at least 10 different participants, resulting in 1,293 reviews. The developers used the rubric to provide a line-by-line review that focused on identifying code errors. They also provided an overall evaluation of the submission for readability, reliability, maintainability, and conciseness, and if the submission should be approved. For additional questions about the methodology, please reach out to press@github.com

How do we define a code error?

In this study, we defined code errors as any code that reduces the ability for the code to be easily understood. This did not include functional errors that would prevent the code from operating as intended, but instead errors that represent poor coding practices. These code errors were derived from academic literature on code readability^{and code complexity^{. The code errors were used in the rubric provided during the code review. They included: inconsistent naming, unclear identifiers, excessive line length, excessive whitespace, missing documentation, repeated code, excessive branching or loop depth, insufficient separation of functionality, and variable complexity.}}

Acknowledgments: we’d like to thank Lizzie Redford, Ph.D. and Sida Peng, Ph.D. for their support on the study design and statistical analysis of this research.

Ready to build high-quality code quickly?
Get started with GitHub Copilot >

Notes

Written by

Source link

Does GitHub Copilot improve code quality? Here’s what the data says

Code written using GitHub Copilot was more functional

Developers found code written using GitHub Copilot easier to read

Code authored using GitHub Copilot was more likely to be approved

The bottom line

Methodology

How do we define a code error?

Notes

Written by

EDITOR PICKS

Telangana Dy CM travels by TGSRTC bus, gets direct feedback from passengers on flagship...

All commercial buildings, hospitals to be brought under the purview of Fire Services Act

'Pawan Kalyan Cinema Hero, Nenu Political Hero'

Production Of Tesla Cars For India Commenced

Get This Sleek Marshall Emberton II Bluetooth Speaker at Its Cheapest Price Yet of...

Big March storm fans wildfires and kills 3 as forecasters fear weekend tornado outbreak

Venkat Prabhu speaks on The GOAT’s plot similarity with Rajadurai | Latest Telugu cinema...

Salman Khan’s ‘Sikandar’ reviews are mixed to negative

Maruti EVX, Tata Curvv – 2 New Electric SUVs Launching In 2024

Car Retail Sales April 2025

Kia Cars at 2025 Bharat Mobility Show (Auto Expo)

India poised to lead global creative economy through AI-powered innovation

Jagan’s speech at Raptapdu is a bundle of lies, says Pattabhi

NASA’s Webb Space Telescope shows off its most detailed image of Uranus yet, revealing...

EVEN MORE NEWS

Churn, baby, churn. AT&T’s CEO and CFO reveal what’s ailing the...

IND vs ENG | ‘The No.3 conundrum is back’: Former cricketer...

Divyanshi Bhowmick, Sathiyan-Akash shine as India bags 11 medals at WTT...

POPULAR CATEGORY