In a recent webinar, software engineer Allan Reyes tackled one of security's most persistent challenges: preventing sensitive data from appearing in application logs. This problem, as Allan pointed out, is both annoying and difficult—a rare combination that makes it particularly frustrating for security teams.

The Problem

There are some security problems that are just annoying, and some that are difficult. Keeping secrets out of logs is both. It's not the most critical security issue you'll face, but it might be among the most frustrating.

What makes it worse is the full spectrum of potential impact—from API keys you can quickly rotate to personally identifiable information (PII) that might require customer notification. As Allan noted, even tech giants aren't immune; Twitter, Google, and Facebook have all had public incidents involving passwords or other sensitive data in logs.

The most frustrating aspect? Logging can bypass all your other security controls. You could implement military-grade encryption, zero-trust architecture, and robust access controls—but if sensitive data ends up in your logs, much of that work is undermined.

Common Causes

During his research, Allan identified six common causes of secrets appearing in logs:

🤦 Direct logging. The most obvious cause—developers explicitly logging sensitive data. This might be accidental (debug code that made it to production) or from developers who simply don't know better.

🚰 Kitchen sinks. Large objects containing sensitive data nested deep within them. When these objects are logged in their entirety, secrets come along for the ride. Error objects with request configurations are common culprits.

⚙️ Configuration changes. Modifications to log levels or middleware that expose data never intended for logging. A simple change from WARN to DEBUG can suddenly reveal sensitive information throughout your application.

🥧 Embedded secrets. Secrets baked into formats like URLs or RPCs that get logged automatically by infrastructure. Magic login links are a perfect example—they're not logged explicitly, but appear in HTTP logs.

📡 Telemetry. Error monitoring and analytics platforms capture runtime context, often including local variables containing secrets. These platforms essentially become secondary logging pipelines.

🕺 User input. Users putting sensitive data in unexpected places. Even if you protect password fields, users may paste passwords into usernames or other fields that aren't as protected.

The Lead Bullets Approach

Since there's no silver bullet for this problem, Allan recommends a "lead bullets" approach—multiple solutions working together:

📐 Data architecture. Treat this as a data flow problem. Centralize your logging pipeline and control what data enters it. This simplifies the challenge by reducing the number of places you need to implement protections.

🍞 Data transformations. Apply techniques like minimization, redaction, tokenization, or hashing to transform sensitive data before logging. These range from completely removing the data to replacing it with safer representations.

🪨 Domain primitives. Create specialized types that wrap sensitive strings. These can override serialization methods to prevent accidental logging. This is one of the most powerful approaches as it makes your code secure by design.

🎁 Read-once objects. A powerful extension of domain primitives where secrets can only be read once, then lock themselves. This ensures sensitive data is used only for its intended purpose and cannot be accidentally logged later.

🔎 Taint checking. Static analysis that traces how sensitive data flows through your code. This automation can identify potential leaks during development, well before they reach production.

✏️ Log formatters. Middleware in your logging pipeline that detects and redacts patterns matching sensitive data. These act as your last line of defense at the application tier.

🧪 Unit tests. Hook into your existing testing infrastructure to identify unsafe logging patterns. Configure tests to fail loudly when sensitive data patterns appear in captured output.

🕵️ Sensitive data scanners. Tools that detect secrets in logs after they've been written. These can help catch the more elusive causes we discussed earlier.

🤖 Log pre-processors. Stream processing that filters logs before they reach persistent storage. This leverages existing infrastructure to prevent secrets from being stored.

🦸 People. Ultimately, your team members are the most adaptable part of this system. Educate them, empower them to report issues, and equip them with tools that make doing the right thing easy.

A Strategic Approach

To implement these solutions effectively, Allan suggests a four-part strategy:

  1. Lay the foundation. Establish expectations, culture, and support. Define what constitutes sensitive data and ensure your logs are structured and centralized.
  2. Understand the data flow. Map exactly how data moves through your system, identifying unexpected side channels like front-end analytics, HTTP access logs, or error monitoring tools.
  3. Protect at chokepoints. Focus your efforts on the critical paths where most logs flow. Implement domain primitives, taint checking, and unit tests at the source, then add log formatters at the logging library.
  4. Apply defense-in-depth. Layer preventative and detective controls throughout your system. Every control should have a backup, ensuring something always has your back when one layer fails.

Expert Answers to Your Questions

Q: How do you handle API requests/responses containing PII that you don't control?

A: This is a classic "kitchen sink" problem. When dealing with opaque data from third parties:

  • Transform it into a well-defined schema so you can selectively filter fields
  • Use log formatters to drop or redact specific patterns
  • Drop entire response objects if they're too risky
  • Apply domain primitives to ensure secure handling from receipt

Remember, if you're logging objects you don't fully understand, it's just a matter of time before something sensitive appears.

Q: What are your thoughts on sensitive data redaction at the destination layer (e.g., Datadog)?

A: Redaction at the destination is just one part of a defense-in-depth strategy. Its effectiveness depends on your specific infrastructure:

  • If logs pass through multiple systems before reaching the destination, those intermediate systems might still contain sensitive data
  • You need to understand all the places logs might reside, not just their final destination
  • Redaction at the source is always preferable when possible
  • Destination redaction should be your last line of defense, not your only one

A proper data flow diagram is essential here—map everywhere your logs go and protect each point appropriately.

Q: For someone building their first full-stack app, what are the absolute essentials for proper logging?

A: Start with these fundamentals:

  • Use a structured logging library that outputs JSON instead of free-form text
  • Chart the lifecycle of your logs from emission to storage
  • Use a standard library rather than trying to innovate in logging
  • Be explicit about what you log rather than dumping entire objects
  • Implement basic redaction for common patterns like passwords and tokens

Focus on making your logs useful while being conscious of what shouldn't be there. The paved path is usually the safest one for beginners.

Q: What about tools like Sentry for error monitoring? Should everything go through one pipeline?

A: Error monitoring tools are essentially another logging pipeline and need the same protections. While centralization helps reduce risk, these specialized tools provide valuable insights that standard logging often doesn't.

If you use them:

  • Understand they're another potential leakage point
  • Configure them to redact sensitive data
  • Be careful about the local context they capture
  • Include them in your data flow analysis
  • Consider them part of your overall logging strategy

The goal isn't necessarily to eliminate these tools but to ensure they're properly secured within your overall approach.

Final Thoughts

Keeping secrets out of logs requires a multi-layered approach tailored to your specific environment. By understanding the common causes, implementing several complementary solutions, and taking a strategic approach to deployment, you can significantly reduce the risk of sensitive data appearing in your logs.

Remember that this isn't a problem you solve once and forget about. As Allan noted, "Like most things in security, the job often isn't ever done"—but with the right tools and strategy, you can make significant progress in keeping your secrets where they belong.