Working in data science at GitGuardian
Pierre is a developer who has been part of the Secrets Detection Team since July 2020. The Secrets Team is responsible for both maintaining the GitGuardian detection engine (by improving its precision and recall) and widening its detection capabilities by looking for new types of credentials, or new types of private assets. (Learn more about how GitGuardian detects secrets)
What interested you about working in cybersecurity?
I graduated from ISAE-Supaéro in France with an aeronautics engineering diploma, and a specialization in data science. Although this is not a common combination of study, it provided me with a large scope of knowledge. I was highly interested in the airline industry, research, and data science. This line of specialization allowed me to build a cross-section of these different worlds by joining the Operations Research team at Air France.
After 3 years of leveraging data to solve some of the airline’s operational challenges, I wanted to dive into a more “tech-oriented” company having a very concrete impact. Specifically, I was drawn to cybersecurity through an interest in hacker activity that has been a common feature in headlines recently. I was always interested in how the underground world of hackers worked. How did they perform these hacks? And how can companies defend themselves against such actors?
This is why when I was prospecting for opportunities and I came across GitGuardian, it looked like a perfect match as the company had a deep tech DNA in a field I was passionate about.
What excited you about working at GitGuardian?
When we learn about data science at school, we often hear about complex, cutting-edge data models. The reality when working in a company is a bit different. Very few data scientists get to imagine complex models that will reach production. Actually, the amount of data and their structuration level is often not high enough to be able to design the expected learning models.
One huge advantage with working for GitGuardian is the astonishing amount of data that we have available to us. It is very rare for a company, particularly one still growing, to have access to so much data. Having scanned every public commit since 2017, GitGuardian has billions of open source code samples at its disposal, data that is also well-structured thanks to the underlying git protocol. It presents itself as a rare opportunity to analyze a massive amount of structured data at scale.
What is also really interesting about working for GitGuardian is coming to terms with the scale of sensitive information being leaked into GitHub. Not just quantity but the diversity as well: we can see leaks of senior developers for critical infrastructure… I quickly learned that even companies with advanced cybersecurity practices suffer from exposed secrets simply because of humans being in the chain. Once you understand the problem, it really pulls you into being part of the solution to fix it.
From a personal perspective too, GitGuardian has a strong technical DNA within a demanding environment with a big focus on engineering. It is the perfect company for me to build up my skills and share with highly qualified colleagues. I was also amazed by the thorough but fast paced onboarding, after a few weeks in the team, my first detector was rolled out to production.
What most excites you about the future at GitGuardian?
It takes a while to grasp the amount of data GitGuardian has, but once you do, and you talk to the engineers and our founders about the future, you begin to get very excited about the future possibilities on the horizon.
I can see how with this data we can move beyond detecting secrets to get into detecting personal information or copyrighted technical assets. The possibilities are vast and being a part of that journey is tremendously exciting!
Pierre, when he does not wear the Guardians' cape:Interested in joining the GitGuardian team?
I love music, I've been playing guitar for ten years, but I decided to take advantage of the first lockdown to (try to) learn the clarinet!