Lessons in Testing, Performance, and Legacy Systems from /dev/mtl 2025

Montreal, Canada, is the birthplace of the search engine. Long before the world was talking about semantic search and vector embeddings, a student at McGill built Archie to find files across FTP servers. The creator's goal wasn't to change how everyone would use the internet; it was to solve a specific issue around finding available files by exact title. Almost all leaps in technology are achieved the same way: a small group of practitioners focuses on solving a specific issue, and those innovations reverberate across the whole internet. It made Montreal an ideal setting for a group of current innovators to get together and discuss common challenges at /dev/mtl 2025.

Around 150 developers got together at École de technologie supérieure (ÉTS) for a truly cross-community event, put on by a coalition of 14 local tech communities, including Java and Python user groups, the local CNCF and AWS meetups, and Women in AI. In true Québécois fashion, the 21 speakers shared their knowledge in both French and English across three tracks.

Here are just a few highlights from this year's /dev/mtl.

Unchecked Complexity Makes Testing Unpredictable

In his session "Feature Flags And End-to-End Testing," Gleb Bahmutov, Senior Director of Engineering at Mercari, walked through a problem a lot of teams quietly live with, especially as they update legacy code. Feature flags are great for incremental releases, experiments, and kill switches, but they can turn end-to-end testing into a maze.

The math is exponential, as each new flag doubles the number of possible states. For example, if you have three flags with two states each, that is (2 × 2 × 2) = 8 states to test. Adding one more flag with two more states makes that 16 possible states. It introduces testing questions like “Did we ever test these flags in these states?" Keeping up manually is a logistical nightmare.

Gleb explained that every test is supposed to be deterministic, yet percentage rollouts and misaligned environments mean the same test can fail every few days for no obvious reason.

He compared three testing strategies. Total control gives tests the full feature flag payload through an API and fixtures, but now you are debugging caching and invalidation on top of the app. Selective control stubs only the flag under test, but page reloads, navigation, and backend behavior still make things unpredictable. The most reliable option was per-user control. Keep flags as production-like as possible, then target individual user IDs in tools like LaunchDarkly

The larger lesson was lifecycle discipline. Treat flags as temporary. Make new features explicit opt-in, migrate tests as defaults change, archive flags when done, and aggressively retire anything old. Also, do not build your own feature flag system.

Invisible Complexity Impacts Performance

Reza Madabadi, Software Developer at 360.Agency, started his talk, "Why Your Database Hates You: The N+1 Query Problem," by asking, "Why is everything so slow?" It is a common question every developer faces when starting with a new company with years of history to decode. Tools like MySQL EXPLAIN can help diagnose some of the issues, but Reza said that it does not tell you the full story; it can show the query is taking a long time, but it does not show why.

Reza said what changed things for him was tracing. This revealed how a single, seemingly harmless request was turning into thousands of database calls. Each individual lookup was cheap, but together they added up. That is the N+1 problem. Not a language or framework bug, but a middleware issue rooted in object relational mappers (ORMs).

He explained that in a typical Java and Hibernate monolith, data access objects feed big data transfer objects (DTOs), and lazy loading tries to protect you from loading the whole database at once. Instead of one query with joins, the ORM runs one query to get a list, then N queries to hydrate each association. Reza walked through join fetches and Hibernate batch size tweaks as partial fixes. They help, but they are still hacks that can create long prepared statements and memory pressure.

The more durable answer was to design for DTOs directly. Reza said they now use entity-based CRUD where it makes sense, but write targeted select DTO queries where they know exactly what is needed. Pair that with local tracing tools like Digma to catch N+1 patterns early, before they turn into mysterious slow nights in production.

Improvements Take A Dedicated Approach

In his talk "My Journey With Software Testing," Lucian Condrea, a freelance full-stack developer and contributor at Tribe Social, told a story many self-taught testers will recognize. He started with no real testing skills, no strategy, and a growing pile of manual checks that were slow, tedious, and mentally draining. Leadership only saw that “QA is too slow,” without understanding the challenges the team was facing, including the invisible cognitive load. Progress felt random, and there was no clear path to those dependable systems and breezy workdays he wanted.

Things shifted when he became intentional about learning. Lucian built a “testing wishlist” and carved out daily 30-minute practice sessions. He leaned into small, atomic wins instead of vague “I should be testing more” guilt. He credited blogs from folks like Kent C. Dodds for finally clarifying why tests matter, and Nicolas Carlo’s “Legacy Code: First Aid Kit” for showing how to create boundaries in the code so it was even possible to test.

From there, he adopted a pragmatic view: "Tests should serve you, not the other way around." Focusing on readable, co-located tests, integration tests for the best return. This means minimal mocking and only as much end-to-end coverage as you truly need.

He left us with the advice to reflect on strategy, not dogma. He said to build deliberate habits so your tests give you confidence instead of resentment.

Lessons From Developers For Everyone

Legacy Is Our Shared Starting Point

One quiet constant behind every talk was the reality that legacy is not a corner case, and no one is starting from greenfield. Legacy systems are where we do our most meaningful work. We are not working in an ideal vacuum, but are layering decisions on top of years of code, data, and human habits.

Instead of fantasizing about starting over again, rewrite “once things calm down,” the real work is learning how to move forward inside constraints you did not choose. When you accept that, dealing with legacy stops being a shameful side quest and becomes the main design problem of figuring out how to change things without breaking the promises your system made years ago.

Feedback Loops Beat Raw Velocity

Another theme that permeated the whole event was that while speed matters, measuring and investigating what is happening might be just as important. Whether the topic was performance, testing, releases, or AI, the teams that seemed calmer were the ones with feedback loops they trusted. That might mean observability that shows how a request actually flows, or tests that fail in ways that teach you something instead of interrupting you at random. It might be metrics on how often your users get “no results” in search, or how many flags are still active past their intended lifetime.

If you cannot observe it, you cannot reason about it, and if you cannot reason about it, you are just feeling like you are moving faster, in the dark.

Guardrails Over Heroics

The speakers' stories kept pointing to the fact that the strongest teams design guardrails so that normal behavior is safe by default. That looks like defensible defaults, explicit lifecycles, and constraints that keep complexity from running away in the first place.

It means treating experiments, flags, tests, and configurations as living systems with an end-of-life plan, not as one-off hacks. When you do that, you do not need a rockstar to remember every edge case. You need a group of normal people who respect the guardrails and adjust them as reality changes. This is as true for security and secrets governance as it is for any other area of production systems.

Tools Change, Habits Compound

Underneath all the specific technologies, the real leverage showed up in habits, not tools. Tools will keep changing. We will likely never stop learning about new frameworks, new agents, and new tracing stacks.

What carried across topics was the value of small, repeatable practices. Speakers commonly talked about carving out time to improve tests, routinely inspecting how your system actually behaves, and retiring complexity instead of hoarding it. We should strive for the simplest solution that fits the current scale.

These habits compound in a way that individual tools never do. The future of our systems depends less on the next big thing and more on how disciplined we are with the things we already have.

Innovations Come From Persistence While Addressing Real Issues

Your author was able to share a session on secrets security at this developer-focused event. Rather than being put off by the scale of the secrets sprawl problem, developers who attended leaned in and asked about possible solutions. It was highly encouraging to see folks who do not regularly interact with the security team immediately recognize the dangers of plaintext credentials and seem eager to embrace available solutions to work more safely and efficiently.

No matter what area of enterprise technology you are dealing with, the same themes that ran through this developer-focused conference apply. Accept the legacy you have, add observability, put guardrails in place, and build habits that make the safe path the easy one. If we keep doing that across testing, performance, search, and security, the next “Archie moment” will not come from a single breakthrough, but from thousands of small, deliberate improvements shipped by teams like the ones who showed up at /dev/mtl.

Lessons in Testing, Performance, and Legacy Systems from /dev/mtl 2025

Dwayne McDaniel

Dwayne McDaniel