June 2021 presented an opportunity for me to live out my dream of actually being a “good” developer. June was the month GitHub Copilot was introduced. Yes! I thought, no more embarrassing code reviews, no more endless scrolling through stack overflow. I Will Be A Genius!
While that didn’t end up being completely true, I still held out hope that it would one day become true with some improvements to the AI engine behind Copilot. Day 1 of BlackHat 2022 has officially shut the door on that expectation.
My highlight session of the day, (and not just because it was presented by two fellow New Zealanders, Hammond Pearce and Benjamin Tan), was a session on whether GitHub Copilot introduces vulnerable code suggestions.
We are at a security conference so obviously…. You expect the answer to be “yes it does”. But the results of the experiment were much more fascinating than expected.
Pearce and Tan also teamed up with fellow security researchers Brendan Dolan-Gavit and Baleegh Ahmad to write a paper on how Copilot introduces insecure code. I will not paraphrase the entire research but instead, focus on an element I found particularly interesting, or perhaps disappointing. Write crappy code, get a crappy Copilot

Hammond Pearce Left and Benjamin Tan Right on stage at BlackHat 25 - source BlackHat 25
The research project
The presentation put Copilot forward as the potential end of the Stack Overflow age, where we can get the answers we seek, directly from our IDE using an AI code suggestion tool without scrolling through hours of Stackoverflow posts. This introduces a big problem though, namely – automation biases. This is the idea that we trust automated code far more than we should.
When we look at Stackoverflow we can see a stream of comments or suggestions and we have the understanding it was written by humans, flawed fellow humans. But an AI tool is supposedly far more intelligent, right? Actually, it is also based on code written by fellow humans, so probably not. And it isn’t that intelligent. Copilot has a huge database of code already written and using this huge database it gives its best guess as to what it thinks you want. But that isn’t a database of good code, or secure code, it is a database of just huge amounts of code, and it will serve up whatever code it thinks you want. In this case artificial intelligence is closer to artificial guessing (even if that guess is still based on some logic).
“Humans generally have this bias towards accepting without thinking anything that comes from algorithm or automation” - Benjamin Tan
The research was simple in its design. Take a GitHub product, Copilot, give it some seed code and let it run wild by taking multiple code suggestions. Then take another GitHub product CodeQL and test how vulnerable the code is. The research didn’t just take the top suggested result but also took many options and created many different versions of the same application to see how many were vulnerable. In total from 89 scenarios they created a total of 1689 programs written mostly by Copilot. They used different languages to see if there was a difference (there was), and even used different methods to try and trick the Copilot.
GitHub Copilot suggestions have vulnerabilities - study
“Out of 89 scenarios we created 1689 programs – 39.33% of the top suggestions were vulnerable and 40.73% of total suggestions were vulnerable” - Hammond Pearce
You might think that Copilot is taking your seed code and trying to figure out the intent of that code and give you the best next step in the ultimate solution. But as we said, Artificial Intelligence is more like artificial guessing. Copilot isn’t concerned with the intent of your code, it is concerned with the next step based on the data it has. When you feed Copilot seed code that was written in a way that a less experienced developer would write it, it gives you back a result a less experienced developer might come to themselves. But the results actually got pretty weird too. By using the Author fields, they tricked the Copilot into thinking it was being written by a well-known and experienced developer, In the exact case it was, Andrey Petrov, and guess what? It produced less vulnerable code than when using a less known author, Hammond Pearce …. Whatttttt……
Choosing between spaces and tabs (not that I want to unpack this can of worms here), also changed if the output code is likely to be vulnerable.
The essence of this presentation was that, if you are a shitty (novice) developer (like me) then you are going to get a shitty (novice) Copilot. It isn’t going to make you the experienced super dev you dreamed to become when it was first announced, it will mostly help you write shitty code faster. But this doesn’t mean Copilot is inherently or that we shouldn’t use it.
“Large language models like GitHub Copilot will probably transform the way we write software” - Benjamin Tan
The Evolution of Copilot Vulnerabilities: From Research to Real-World Exploits
While the original BlackHat 2022 research highlighted fundamental issues with GitHub Copilot's code quality, the security landscape has evolved dramatically. Recent discoveries have revealed more sophisticated attack vectors that go beyond simple vulnerable code generation. The "Rules File Backdoor" vulnerability, disclosed by security researchers in 2025, demonstrates how attackers can weaponize AI configuration files to inject malicious instructions into seemingly innocent project settings.
This evolution represents a shift from passive vulnerabilities (where Copilot generates insecure code) to active exploitation vectors (where attackers manipulate Copilot's behavior). The CVE-2025-53773 vulnerability showed how prompt injection techniques could enable remote code execution by manipulating Copilot's configuration files. These attacks leverage invisible Unicode characters and sophisticated evasion techniques that bypass traditional code review processes, making them particularly dangerous for enterprise environments where multiple developers collaborate on shared codebases.
What does GitHub say on security vulnerabilities?
GitHub does acknowledge that Copilot can introduce security vulnerabilities and that it should be complemented with other security tools, specifically other GitHub tools of course.

Enterprise Detection and Monitoring Strategies for AI-Generated Code
Organizations adopting GitHub Copilot face unique challenges in detecting and monitoring security issues across AI-generated code. Unlike traditional static analysis that focuses on known vulnerability patterns, AI-generated code requires behavioral analysis to identify anomalous suggestions and potential security risks. Enterprise security teams need comprehensive visibility into how AI coding assistants interact with their development workflows.
Effective monitoring strategies should include real-time analysis of Copilot suggestions before code acceptance, automated scanning of AI-generated commits, and integration with existing security systems. GitGuardian's secrets detection platform provides this capability by scanning both human-written and AI-generated code for hardcoded credentials, API keys, and other sensitive data. The platform's pre-commit hooks and IDE integrations ensure that secrets are caught before they can be incorporated into Copilot's context or training data, preventing the amplification of secrets sprawl through AI assistance.
Compliance and Governance Frameworks for AI Coding Assistants
The rapid adoption of GitHub Copilot in enterprise environments has created new compliance challenges that traditional security frameworks haven't fully addressed. Organizations must now consider how AI-generated code impacts their security posture, audit trails, and regulatory compliance requirements. This is particularly critical for industries subject to strict data protection regulations like GDPR, HIPAA, or SOC 2 compliance.
Establishing governance frameworks for AI coding assistants requires clear policies around data handling, code review processes, and accountability measures. Organizations need to define who is responsible for validating AI-generated code, how to maintain audit trails for compliance purposes, and what controls prevent sensitive data from being exposed through AI interactions. GitGuardian's incident management capabilities help organizations instantly neutralize exposed secrets, and maintain compliance by providing detailed logs of secrets detection events, automated remediation workflows, and comprehensive reporting that supports audit requirements. These governance measures become essential as AI coding tools integrate deeper into development workflows and handle increasingly sensitive codebases.
Should you use Copilot - Conclusion and recommendations
I do want to stress that there were plenty of findings in this research and the full paper can be found here. But this part I found fascinating and also surprisingly logical. Copilot and other tools are not trying to unpack the logic of what you are aiming to achieve in your code, they are looking for similar code to give you similar suggestions based on size. The danger comes in two forms:
1) you automatically trust automation and therefore Copilot more than you should and two, unlike Stackoverflow for example,
2) Copilot gives you isolated suggestions. Stackoverflow has upvotes, comments, and downvotes and when you decide to take code written by someone else then it's a process. Copilot is just a reflex.
Fortunately, I was able to ask Tan a couple of questions after the presentation, one of them was: what would his three recommendations be when using Copilot.
Be smart and be careful when using this tool
1. Make sure you have good processes in place for general security validation - Don’t treat Copilot different to human developers
2. Think a little bit about who in your team should be able to use this, be mindful that novice developers might blindly trust it
3. Really treat it as a Copilot, something that will help you along and don’t let it direct your code
I will leave you with one line from Pearce that sums up if Copilot should be used or not.
“Copilot should remain a CO-Pilot” - Hammond Pearce
FAQ
How do GitHub Copilot security vulnerabilities typically arise in enterprise environments?
GitHub Copilot security vulnerabilities often stem from its reliance on vast, uncurated codebases for training. When Copilot receives poorly structured or insecure seed code, it tends to generate similarly vulnerable code. Additionally, novel attack vectors like prompt injection and manipulation of configuration files have emerged, increasing the risk of introducing exploitable flaws into shared codebases.
What monitoring strategies are effective for detecting AI-generated code risks?
Effective monitoring includes real-time analysis of Copilot suggestions, automated scanning of AI-generated commits, and integration with SIEM systems. Platforms like GitGuardian provide pre-commit hooks and IDE integrations to detect secrets and vulnerabilities before code is merged, ensuring both human and AI-generated code are scrutinized for security risks.
How should organizations govern the use of AI coding assistants like Copilot to maintain compliance?
Organizations should establish governance frameworks that define code review processes, data handling policies, and accountability for AI-generated code. Maintaining detailed audit trails, restricting Copilot usage to experienced developers, and leveraging automated secrets detection are key steps to ensure compliance with regulations such as GDPR, HIPAA, or SOC 2.
Can Copilot-generated code introduce hard-to-detect vulnerabilities beyond insecure code patterns?
Yes, recent research highlights that attackers can exploit Copilot through configuration file manipulation and prompt injection, leading to vulnerabilities like remote code execution. These attacks may use invisible Unicode characters or sophisticated evasion techniques, making them difficult to detect with traditional static analysis tools.
What are best practices for mitigating GitHub Copilot security vulnerabilities in large development teams?
Best practices include enforcing robust security validation processes, treating Copilot suggestions with the same scrutiny as human code, limiting Copilot access for novice developers, and integrating automated secrets detection tools. Continuous education on AI risks and maintaining strong code review protocols are essential for minimizing exposure.
How can secrets detection platforms like GitGuardian help manage Copilot-related risks?
Secrets detection platforms automatically scan both human and AI-generated code for hardcoded credentials, API keys, and sensitive data. By integrating with developer workflows, these tools prevent secrets from being committed or included in Copilot's context, reducing the risk of secrets sprawl and supporting incident response and compliance efforts.