Hi Arnault! Can you provide an overview of what you do at GitGuardian?

Being the lead ML engineer at GitGuardian, my mission is to grow the team. We have five members at the moment and are planning to bring on board two interns in the coming months. The core focus of our work revolves around data science, using ML to deal with complex problems that regular rule systems may not be able to solve. We're currently emphasizing on minimizing false positives without losing any true positives. The goal is to make the experience smoother for our customers, who monitor secrets every day. We're also looking to enhance our detection quality with relevant information and to automate the code fixing and remediation process.

Can you share your educational background and how it led you here?

My academic journey started at a preparatory school which led me to Centrale Paris. I took a gap year to learn coding at startups, something I wasn't originally interested in but eventually gravitated toward. Machine learning intrigued me, particularly its capability to leverage all the unstructured information available on the web, such as text and images.

I interned at Feedly, working on news aggregation with the aim that each user would only see a news story once. Following that, I was at Sicara, a consulting firm developing ML and data science projects for various companies. Here I gained experience in initiating and implementing various machine learning projects, which I believe has prepared me well for leading our new team.

How does your current job compare with the previous jobs you’ve had?

What I love about GitGuardian is its tech-centric environment and the vast amounts of data and metadata we get to handle. The complexity and volume of the secrets we manage are challenges I relish. We have a ton of historical data to leverage and aim to become increasingly efficient in our secret analysis. It is also a lot of fun to code for code.

What would you say to someone asking, “Why simply not use OpenAI”?

I would say it is unrealistic for many reasons. For instance, we process so many secrets per minute that using OpenAI would be both too slow and expensive. From the data security angle, this wouldn’t even be an option anyway. Additionally, our need for high-precision systems, combined with the variety of edge cases we encounter in secret detection, goes far beyond the scope of typical prompt engineering. The ML must be in the product, and combined with our data and years of work, we have a very strong edge.

Where do you see yourself and machine learning at GitGuardian in the next 5 years?

Projecting five years into the future is difficult, given the rapid pace of evolution in our sector. We are working on multiple challenging problems that necessitate constant system updates and improvements: languages, APIs, and modules change frequently. Our aim is to automate as many processes as possible, such as secret detection updates and issue remediation, even as the nature of the problems becomes more complex. Ideally, we would have a self-evolving system capable of detecting and responding to new challenges as they arise, such as dependency and code analyses.

What advice would you give to someone looking to work in a role like yours?

My advice for aspiring ML engineers: try to build your own project from scratch. Pick an idea that looks both fun and relatively easy and make it work. Most school learning is based on pre-existing datasets on a predefined task and with existing baselines to compare to. In the real world, you will have to formulate what succeeding means: define your metrics, build your dataset, split the project into subtasks you will have to orchestrate, build an API, deploy it... Solving practical engineering problems requires making choices and formulating solutions. It's not just about improving a single model; it's about how you can transform an idea into something that makes a difference in people’s lives. That’s the journey from a student to a Machine Learning engineer, and that’s what we seek at GitGuardian.

Thanks for your time, Arnault! 

Thanks!