Pierre Lalanne

Engineering Manager at GitGuardian
👉
TL;DR: Detecting generic credentials is essential for comprehensive secrets security. These credentials account for nearly half of all exposed secrets and often lack clear context or automated validation. GitGuardian’s advanced detection engine combines broad pattern recognition with precision filtering to minimize false positives and alert fatigue, ensuring high recall and robust protection against both known and unknown credential risks in complex enterprise environments.

At our core, you probably know that we are a company specializing in detecting secrets (if you don’t know what a secret is, please take a moment here and come back).
Very early on, we had to address the question: what would be a good way to categorize secrets?

Take a look at this:

AWS_ACCESS_KEY_ID = AKIAX52MPYOTPRUCRC22
AWS_SECRET_ACCESS_KEY = hjshnk5ex5u34565d4654HJKGjhz545d89sjkjak

and this:

connect_to_db(host=”136.12.43.86”, port=8130, username=”root”, password=”m42ploz2wd”)

You can spot the difference: the first one is tied to a well identified service, AWS, while for the second, things are a bit blurrier: we immediately understand that it has to do with a database connection, but, without further context it doesn’t tell us what doors it opens.

That's the most basic distinction we can make when scanning source code: some secrets are specific, since they are somehow self-revealing, while others are said to be generic, because we cannot be so sure what they give access to..

In this article, we intend to explore why detecting generic credentials is an absolute must have for a secrets detection engine. We will also explain how we addressed this topic at GitGuardian, and give some insights on our findings.

Specific detection has advantages but is not sufficient

Detecting specific credentials has at least two big advantages over detecting generic ones.

First of all, we are often able to test the validity of specific credentials, which can give us 100% confidence in the secret’s validity. This ensures a very good precision overall.

Second, they are often associated with very well known patterns, sometimes even prefixed ones, or at least a very specific context. This means a very good recall is easy to obtain.

But don’t be fooled: being “the easy part of the game” doesn’t mean that they are any less valuable: eventually, the user can be provided with very detailed information about the exposed secret, the risks incurred or the correct way to revoke credentials. That’s what we thrive to do in our public documentation for the 300+ detectors we are covering.

Although we have drastically improved our average time to develop a new specific detector, shrinking it from 2 days to 2 hours, scaling this list is not easy. The reason is straightforward: the number of API providers is growing at a very fast rate.


But what about the other category? As said earlier, some of the credentials we found are simply not linkable to any particular service: think about contextless passwords, combinations of usernames and passwords for an internal service, or just an API key with a very generic name. We estimate that almost half of the secrets we find belong to this category.

đź’ˇ
You may be wondering what happens if some credentials are detected by both generic detectors and specific detectors. In that particular case, GitGuardian always gives priority to the specific detector for the reasons we listed above. But note that this is not an issue and even rather a clue that our generic detection performs well and can act as a failsafe in case something went wrong with the concerned specific detector.

You get where this is going: if a secrets detection engine wants to achieve the best possible precision AND recall, it needs a tailored and powerful detection for generic credentials.

How it’s done at GitGuardian

Why generic detection is not so easy to do…

As the name suggests, when looking for generic credentials, the contextual information we are looking for is… generic. Narrowing down candidates is therefore a bit more complicated. For instance, targeting all the password keywords is obviously not as effective a filter as targeting files containing aws and client_secret.

Generic credentials detection difficulty results from 3 factors. First, they are widely different, being made from very broad patterns: charset and length can be almost anything. Second, how they are supposed to be used is also unknown. Third, even when the credential is clearly identified, we have no way to check its validity.

By the way, a quick reminder on the importance of having both a good precision and a good recall. Take for example this valid, generic, secret:

# Define variables
apikey = as.NbtuEaorueoFu435n&stau

We could certainly catch this one by filtering for all the random looking strings in our engine (namely, high entropy strings). But we would also certainly catch a lot of random strings that are not secrets (think UUIDs, hashes...), ruining our precision rate. So entropy alone is not a sufficient criteria if we want to limit noise and save the engineers from alert fatigue.

On the other hand, if the engine only targets very specific assignments like apikey = abc , we would miss a lot of generic credentials that are valid secrets resulting in low recall. Worse, we never know for sure what the proportion of missed secrets is (e.g. the rate of false negatives). For the user, it means a low level of confidence in the tool.

Generic detection is a real challenge that requires techniques of its own. At GitGuardian, our approach is twofold: first, the idea is to maximize recall and avoid blind spots by looking for very broad assignments in source code. Second, we want to have powerful tools to sort the results and discard false positives in an efficient way, so to guarantee a high precision and avoid alert fatigue.

GitGuardian’s arsenal and tools

As mentioned earlier, an important and first part of our approach is to detect a wide variety of assignments in source code. To do so, we came up with a wide variety of possible assignments inspired from many languages. Here are some tricky examples that we can detect:

'password': [mAEapzCoNVpwrCz6ErRvOZm0B7g]
pass -> “mAEapzCoNVpwrCz6ErRvOZm0B7g”
{“passwd”: “mAEapzCoNVpwrCz6ErRvOZm0B7g”}
<config name="password"><value>mAEapzCoNVpwrCz6ErRvOZm0B7g</value>

Having this capability significantly improves our recall. Then an important part of our work is to discard false positives as early as possible in the process. At GitGuardian, we designed a wide arsenal of post validation steps to decide whether a secret should be processed any further or not. Here are details about some of these so-called post-validators.

ContextWindowPostValidator:
This post validator bans irrelevant matches based on contextual information. For instance, we consider that a match that contains pubkey in its close context can safely be discarded.

set_pubkey(key=”mAEapzCoNVpwrCz6ErRvOZm0B7g”)

CommonValuesBanlist:
This PostValidator leverages dynamic banlists that are computed and adjusted according to the live monitoring of GitHub. More specifically, we are looking for example keys, or patterns that are so common that we consider them as invalid secrets. Here are some simple examples:

placeholder
example
passphrase
changeme

And many other common values for passwords, usernames or high entropy values.

AssignmentBanlistPostValidator :
That’s a very powerful and unique feature of GitGuardian’s secrets detection engine. For each language, we are able to identify the variable to which a secret was assigned, if it exists. We can then ban some patterns in the assignment variable. For instance all assigned variables containing “uuid” suggest that the value matched is not a secret but an identifier.

ORDER_TOKEN_UUID = 'afe005ae-e4fa-4ec5-919a-93c32fd8268f'

Key Figures and Insights

At the end of the day, GitGuardian has developed more than ten generic detectors, scoring between 85% and 95% for precision according to our benchmarks. These are the most common:

Type % of Generic Secrets
Generic Password ~20%
Generic Database Assignment ~20%
Generic High Entropy Secret >50%
Generic Username/Password 3%
Generic Company Email/Password 1%

Overall, generic detectors account for 45.4% of all the secrets we detect. This means that any secrets detection solution that does not implement generic detection algorithms, misses at least half of the secrets present out there.

Another interesting metric : close to 25% of secrets found by specific detectors would have also been found by generic detectors. This indicates that generic detectors appear to be a very good fallback in case a specific detector behaves badly or simply does not exist yet.

Finally, our efforts to improve generic detection brought up interesting side-effects that we were able to exploit:

  • Detecting pattern drift:
    When we detect that a specific detector yields less credentials over time, and if in the meantime, we witness the appearance of generic credentials with the name of the concerned provider in their context, we can conclude that a change occurred in the pattern for this provider. This has proven very useful to constantly be up to date with vendor’s changes.
  • Detecting new candidates for specific detectors:
    If all of a sudden a lot of generic credentials mention a given word in their context, we can conclude that this corresponds to a new API provider gaining notoriety. We even have an internal tool to infer the pattern for the concerned credentials and be up to date with developer’s practices as fast as possible.

Managing Generic Credentials in Enterprise Environments

Enterprise environments face unique challenges when dealing with generic credentials across their infrastructure. Unlike specific service credentials that can be automatically rotated through APIs, generic credentials often require manual intervention and careful tracking to maintain security posture.

Organizations typically encounter generic credentials in configuration files, internal service connections, and legacy system integrations where modern authentication protocols aren't supported. These credentials for internal services, database connections, and custom applications represent a significant portion of an organization's credential landscape—often accounting for nearly half of all discovered secrets in enterprise codebases.

Best practices for managing generic credentials include implementing regular rotation schedules, using credential vaults for centralized storage, and establishing clear naming conventions to improve identification. Since generic credentials cannot be automatically validated like their service-specific counterparts, organizations must rely on contextual analysis and monitoring to detect potential compromises. This includes tracking usage patterns, implementing access logging, and maintaining detailed inventories of where generic credentials are deployed across the infrastructure.

The Security Implications of Generic Credential Detection

The inability to validate generic credentials presents unique security challenges that organizations must address through comprehensive detection strategies. Unlike API keys that can be tested against their respective services, generic credentials require sophisticated analysis to determine their legitimacy and potential impact if compromised.

Generic credentials often appear in contexts that make them difficult to distinguish from legitimate data structures, such as configuration hashes, UUIDs, or encoded identifiers. This ambiguity creates a higher risk of false positives in automated detection systems, potentially leading to alert fatigue among security teams. However, the alternative—missing valid generic credentials—poses an even greater risk to organizational security.

The security impact of exposed generic credentials can be substantial, as they often provide access to internal systems, databases, or services that may not have additional layers of protection. Without the ability to immediately revoke or rotate these credentials through automated APIs, organizations face extended exposure windows when generic credentials are discovered in public repositories or compromised systems. This makes proactive detection and rapid response capabilities essential for maintaining security posture when dealing with generic credentials in enterprise environments.

Conclusion

Implementing solid generic detection capabilities is a significant improvement for recall while keeping a very good precision. It is therefore a huge competitive advantage compared to other tools. What’s more, generic detection offers some serenity for our customers: we may not have a specific detector targeting this very special kind of secret, but, in most cases, our generic detectors have our customer’s back, and we keep getting better at detecting generic credentials.

FAQ

What are generic credentials and how do they differ from service-specific credentials?

Generic credentials are authentication secrets that lack clear association with a specific service or provider. Unlike service-specific credentials (e.g., AWS keys), generic credentials often appear as passwords, API keys, or tokens in code without identifiable context, making them harder to validate and manage. This distinction impacts detection, remediation, and overall security posture.

Why is detecting generic credentials critical for enterprise security teams?

Detecting generic credentials is essential because they constitute nearly half of all secrets found in enterprise codebases. Their ambiguous nature makes them difficult to identify and remediate, increasing the risk of unauthorized access to internal systems, databases, or custom services if exposed.

How does GitGuardian improve precision and recall when detecting generic credentials?

GitGuardian employs broad assignment detection across multiple languages to maximize recall, then applies advanced post-validation techniques—such as contextual analysis, dynamic banlists, and assignment variable filtering—to minimize false positives and maintain high precision, reducing alert fatigue for security teams.

What are the main challenges in managing generic credentials within large organizations?

Managing generic credentials is challenging due to their lack of automated validation, inconsistent naming, and frequent manual handling. Organizations must implement regular rotation, centralized vault storage, and robust monitoring to track usage and detect potential compromise, as automated revocation is rarely possible.

How do generic credentials differ from Windows credentials in enterprise environments?

Windows credentials are validated via domain protocols and integrated authentication, while generic credentials serve as a catch-all for secrets not tied to Windows authentication. Generic credentials are typically used by applications directly and lack built-in verification, making them unsuitable for scenarios requiring Windows security context.

What security risks are associated with undetected generic credentials?

Undetected generic credentials can provide persistent access to critical internal systems or databases, often without layered protections or automated revocation. Their ambiguous format increases the risk of false negatives, potentially leaving organizations exposed to breaches or compliance violations if not proactively detected and managed.