ChatGPT Security Risk: What Every Organization Must Know

👉

TL;DR: ChatGPT security risk extends to your organization even if you don’t officially use it. Employees may leak sensitive code or secrets through personal AI accounts, while AI-generated code can introduce vulnerabilities or expose credentials. Weak access controls, model poisoning, and insecure third-party integrations amplify risk. Proactive governance, developer education, and robust secrets detection are essential to mitigate these emerging threats.

ChatGPT may not be used by all organizations and may even be banned. But that doesn't mean you don't have exposure to the security risks it contains. This post looks at why ChatGPT should be part of your threat landscape.

ChatGPT came into 2023 with ferocious force, but AI and large language models are not all that new; before I was writing articles about security and ChatGPT I was writing them about GitHub Co-Pilot. There is no doubt that large language model tools like ChatGPT are here to stay. But as more and more software engineers use ChatGPT, what can we expect? And more specifically, what are the security risks associated with it? In this article, we are going to discuss why ChatGPT needs to be considered a part of your organization's attack surface even if you don't use it and break down some of the biggest security concerns right now.

Why ChatGPT needs to be part of your threat landscape

Before delving deeper into ChatGPT, let's take a moment to discuss the security concerns surrounding GitHub, as there are similarities between the two platforms. GitHub is used by nearly 100 million developers; it hosts their side projects, they contribute to open-source projects, use it to educate themselves, and of course, showcase their portfolios. However, GitHub is also a place where sensitive information can easily be leaked. The GitGuardian 2023 State of Secrets Sprawl report revealed that over 10 million secrets, such as API keys and credentials, were exposed in public repositories in 2022 alone. Many of these secrets actually belonged to organizations but were leaked through personal or unconnected accounts. For instance, Toyota, although they don’t use GitHub themselves, had a contractor accidentally leak database credentials associated with a Toyota mobile application in a public GitHub repository. This raises concerns about ChatGPT and other LLM tools because, like GitHub, even if your organization isn't using ChatGPT, your employees almost certainly still are. Within the developer community, there's a palpable fear of falling behind if you don't use such tools to boost productivity. However, again like GitHub, we have limited control over what employees share with ChatGPT, and there's a high chance that sensitive information may be stored on the platform, potentially resulting in a leak.

In a recent report, data security service Cyberhaven detected and blocked requests to input data into ChatGPT from 4.2% of the 1.6 million workers at its client companies because of the risk of leaking confidential information, client data, source code, or regulated information to the LLM.

Source DarkReading

One of the best metrics we have on what tools developers are using is, ironically, measuring the number of secrets leaked on Public GitHub. According to the GitGuardian 2023 State of Secrets Sprawl Report, OpenAI API keys saw a massive increase towards the end of 2022 along with mentions of ChatGPT on GitHub showing a clear trend in developers actually using these tools.

Graph from State of Secrets Sprawl report for 2023

The risk of Leaking data

Wherever there is source code, there are secrets, and ChatGPT is often utilized as a sort of code assistant or co-writer. While there have already been instances where ChatGPT experienced data breaches, such as accidentally exposing query history to unrelated users, the more concerning issue is the storage of sensitive information in ways that are completely inappropriate and insecure for their level of sensitivity.

Storing and sharing sensitive data like secrets should always be done with a heightened level of security, including high-quality encryption, tight access control, and granular logs showing where, when and who accessed the data. However, ChatGPT is not designed to handle sensitive information, as it lacks encryption, strict access control, and access logs. This is similar to the use of git repositories, where sensitive files can often end up despite the lack of sufficient security controls. This means that sensitive information is left in an unencrypted database that is likely to be a hot target for attackers.

In particular, personal ChatGPT accounts that employees may use to avoid detection from work have weaker security and a complete history log of all the queries and code entered into the tool. This could be a treasure trove of sensitive information for attackers, posing a significant risk to organizations regardless of if they allow or use ChatGPT for their day-to-day operations.

Enterprise ChatGPT Security Policies and Governance

Organizations face significant challenges in managing ChatGPT cyber security risks across their workforce, particularly when employees use personal accounts or unauthorized AI tools. The lack of centralized governance creates blind spots in security monitoring and incident response capabilities. Enterprise security teams must establish comprehensive AI usage policies that define acceptable use cases, data classification guidelines, and mandatory security controls for any AI interaction.

Effective ChatGPT governance requires implementing data loss prevention (DLP) solutions that can detect and block sensitive information before it reaches AI platforms. Organizations should deploy network monitoring tools to identify unauthorized AI tool usage and establish clear consequences for policy violations. Security teams must also create incident response procedures specifically for AI-related data exposure events, including immediate containment protocols and forensic investigation capabilities.

The challenge extends beyond technical controls to organizational culture and training. Security awareness programs must educate employees about the specific risks of sharing proprietary code, customer data, or confidential business information with AI systems. Regular security assessments should evaluate the effectiveness of AI governance policies and identify gaps in coverage across different business units and development teams.

Tricking ChatGPT

So we have talked about leaking sensitive data to chatGPT and about data leaking from ChatGPT. This is also a big concern. ChatGPT will prevent you from obtaining sensitive information if you directly ask for it by responding with a generic cookie-cutter answer if you try. But ChatGPT is remarkably easy to trick. In the below example, we asked them to give us some AWS credentials and it refused, but then we fluffed out the query to make it seem less malicious we got the answer we wanted.

About half of the tokens it provided as examples contained the word EXAMPLEKEY, but about half didn’t. It does lead to the question though of where are these keys coming from. All these keys fit the AWS format including, character set and length, and entropy base (excluding the example text). Does ChatGPT understand how these keys are made or is it altering keys it found in its dataset? More likely the latter. ChatGPT uses the Common Crawl data set, which is a publicly available web corpus that contains over a trillion words of text from a diverse range of sources on the internet. Included in this data set is source code from public repositories on GitHub, as we know GitHub has a huge amount of sensitive information included in it. We know that when GitHub first released Copilot it was possible to get it to give you some API keys and credentials as suggestions. We know chat GPT changes its answers significantly depending on how you ask something so it could be that asking it the right questions gets it to spit out sensitive information from the Common Crawl data set.

Model Poisoning and Supply Chain Attacks

ChatGPT data security risks extend beyond direct user interactions to include sophisticated supply chain attacks targeting the AI model itself. Threat actors can potentially influence ChatGPT's training data through coordinated campaigns that inject malicious content into publicly available datasets. This model poisoning technique can cause the AI to generate biased, incorrect, or deliberately harmful responses that appear legitimate to unsuspecting users.

The software development community faces particular vulnerability through code repositories and documentation that may be compromised with malicious examples. When developers rely on ChatGPT-generated code suggestions, they risk incorporating security vulnerabilities or backdoors that were intentionally embedded in the model's training data. These supply chain attacks are especially dangerous because they can affect thousands of applications simultaneously through widely-adopted code patterns.

Organizations must implement code review processes that specifically scrutinize AI-generated content for potential security flaws. Security teams should establish baseline security scanning for all code, regardless of its source, and maintain updated threat intelligence about known model poisoning campaigns. Regular security assessments of AI-assisted development workflows help identify potential exposure points where malicious code suggestions could infiltrate production systems.

ChatGPT isn’t a very good software engineer

The other element of security concern with ChatGPT is one I discussed when Co-Pilot was stealing all the jobs before ChatGPT. Diving into some research I discovered the concept of AI biases—that is, we trust AI much more than we should. Like a friend that is very confident in their answers so you assume they are right until you finally discover they are an idiot that likes to talk a lot (in walks ChatGPT). ChatGPT often gives you example code that is completely unsecure and, unlike forums like StackOverflow, there is no community to warn you against it. For example, when you ask it to write code to connect to AWS, it hard codes credentials instead of handling them in a secure way, like using environment variables.

The problem is that a lot of developers will trust the solution their confident AI friend gives them not understanding that it is unsecure or why it is unsecure.

Third-Party Integration Security Vulnerabilities

The proliferation of ChatGPT integrations and plugins creates additional attack vectors that organizations must address as part of their security risk assessment. Third-party applications that connect to ChatGPT often lack the same security controls as the core platform, creating potential data leakage paths through API connections, browser extensions, and mobile applications. These integrations frequently request broad permissions that can expose sensitive organizational data beyond what users intend to share.

Security teams must evaluate each ChatGPT integration for its data handling practices, encryption standards, and access control mechanisms. Many third-party tools store conversation histories in less secure environments or transmit data through unencrypted channels, amplifying the security risks with ChatGPT usage. Organizations should maintain an inventory of approved AI integrations and implement technical controls to prevent unauthorized tool installations.

The challenge becomes more complex when considering shadow IT scenarios where employees install ChatGPT browser extensions or mobile apps without IT approval. These unauthorized integrations often bypass corporate security controls and can exfiltrate sensitive data through personal accounts or unsecured cloud services. Regular security audits should include discovery of unauthorized AI tools and assessment of their potential impact on organizational data security.

What can we do

AI will get better but it is only as good as the data it is trained on. They are trained on large datasets that may not always be of good quality. This means that they may not be able to distinguish between good and bad source code, and it is why they provide poor coding practices as examples.

Educate developers - it's important to educate developers about the limitations of AI tools like ChatGPT. Instead of banning them, we should show our developers why they are unsecure and confront them with their AI biases. By doing so, we can help them understand the limitations of the technology.

Identify and secure secrets - To prevent sensitive information from being leaked via ChatGPT, it's also important to identify secrets and reduce secrets sprawl. This involves scanning repositories and networks for secrets, centralizing them in a secrets manager, and enforcing strict access control and rotation policies. By doing so, we can reduce the likelihood that a secret will end up in ChatGPT's history.

Embrace it because it is here to stay - Finally, we should not resist the AI revolution, but instead, embrace it with caution. While leaked credentials are a legitimate concern, AI can also be a valuable tool if we understand its purpose and limitations. By taking steps to educate ourselves and secure our data, we can leverage the benefits of AI without compromising security.

FAQ

Why is ChatGPT a security risk for organizations even if it is not officially used?

ChatGPT poses a security risk because employees may use personal accounts or unauthorized access, potentially sharing sensitive code, credentials, or proprietary data. These interactions can lead to data leakage, as ChatGPT lacks enterprise-grade security controls and auditability, expanding your organization's attack surface regardless of official adoption.

How can secrets and credentials be exposed through ChatGPT usage?

Developers often use ChatGPT as a code assistant, sometimes pasting sensitive information like API keys or credentials into prompts. ChatGPT stores these interactions, and if compromised, this data could be accessed by unauthorized parties. Additionally, AI-generated code may inadvertently include hardcoded secrets, further increasing exposure.

What governance strategies should enterprises implement to mitigate ChatGPT security risks?

Organizations should establish clear AI usage policies, enforce data classification standards, and deploy DLP solutions that block sensitive data from reaching AI platforms. Regular security awareness training, incident response planning for AI-related events, and monitoring for unauthorized AI tool usage are essential for effective governance.

How do third-party ChatGPT integrations increase security risks?

Third-party integrations and plugins often lack robust security controls, potentially exposing sensitive data via insecure APIs, browser extensions, or mobile apps. These integrations may store conversation histories insecurely or transmit data unencrypted, creating additional vectors for data leakage and compliance violations.

What is model poisoning and how does it relate to ChatGPT security risk?

Model poisoning is a supply chain attack where adversaries inject malicious content into the AI's training data. For ChatGPT, this can result in the model generating insecure code or biased outputs. Developers relying on AI-generated suggestions risk introducing vulnerabilities or backdoors into production systems.

How can organizations detect and prevent secrets sprawl related to ChatGPT?

Organizations should implement automated secrets detection across code repositories and communication channels, centralize secrets management, and enforce strict access controls and rotation policies. Regular scanning and remediation reduce the likelihood of secrets being leaked via ChatGPT or other AI tools.

What are the best practices for secure use of AI tools like ChatGPT in software development?

Best practices include educating developers on AI limitations and risks, enforcing secure coding standards, reviewing all AI-generated code for vulnerabilities, and restricting AI tool usage to approved, monitored environments. Continuous assessment of AI-assisted workflows helps maintain a strong security posture.

Why ChatGPT is a security concern for your organization (even if you don't use it)

Mackenzie Jackson

Carole Winqwist

Mackenzie Jackson, Carole Winqwist

Why ChatGPT needs to be part of your threat landscape

The risk of Leaking data

Enterprise ChatGPT Security Policies and Governance

Tricking ChatGPT

Model Poisoning and Supply Chain Attacks

ChatGPT isn’t a very good software engineer

Third-Party Integration Security Vulnerabilities

What can we do

FAQ

Thomas Segura

Dwayne McDaniel

Tiexin Guo

Tiexin Guo

Start your journey to secrets-free source code

Why ChatGPT needs to be part of your threat landscape

The risk of Leaking data

Enterprise ChatGPT Security Policies and Governance

Tricking ChatGPT

Model Poisoning and Supply Chain Attacks

ChatGPT isn’t a very good software engineer

Third-Party Integration Security Vulnerabilities

What can we do

FAQ

Related Articles

OIDC for Developers: Reasons Your Auth Integration Could Be Broken

Thomas Segura

From Detection to Defense: How Push-to-Vault Supercharges Secrets Management for DevSecOps

Dwayne McDaniel

Mutual TLS (mTLS) Authentication - A Complete Guide

Tiexin Guo

A Complete Guide to Transport Layer Security (TLS) Authentication

Tiexin Guo

Start your journey to secrets-free source code