I really want us to figure out a way to make collecting information for, interpreting, and resolving type-checker bugs much easier.

It has turned out that a majority of miscompilation and compiler bugs, that are not trivially diagnosable, in the past 1/2-1 year are due to the type checker. Because our compiler is type-preserving, a bug in the checker may not manifest itself until much later (e.g. alias analysis, commonly).

Checking intermediate representations (e.g. mono-check) helps, but in the end, our checker is very complicated - likely the most algorithmically complicated part of the software, if not for mono. And complicated relative to many other PL checkers (of course, sans OCaml + Haskell).

History

In the days of old, there were no debugging tools at all. It was mayhem. There were a huge number of bugs, and most were trivial to diagnose and resolve.

Fixing trivial bugs meant people could write larger programs. Writing larger programs led to more involved bugs. ROC_PRINT_UNIFICATIONS=1 and ROC_TRACE_COMPACTION=1 came to help us here. They provide a log-line based debugging utility that tells you what type checking operations are happening, in what order, which succeed, and which fail. We fixed many more bugs with these, as they provided visualization into the “obvious” issues which led to a quick diagnosis.

Today

We are now at a point where these debug flags do not materially aid in the diagnosis of new, much more involved bugs. Like before, fixing more involved bugs meant people could write even larger programs, and now we have bugs that are (seemingly) reproducible only with huge (20+ node) types.

I can name many examples, including some from June 2022 when I was debugging the abilities implementation that took days to resolve. But as a motivating example, we can consider https://github.com/roc-lang/roc/issues/5464#issuecomment-1631583439 - a “High Priority” bug with a small source code, whose diagnosis I believe (but have not yet confirmed is solely diagnosed by) took more than 4 hours to come to.

I claim that I am the current SME of the type implementation - I think this is fair, but please challenge if you disagree. If it takes us half a working day to diagnose a bug relatively small in its surface syntax side, we are not positioned well for developing and stabilizing Roc in the future - especially in the face of improvements to the language design.

What can we do about this?

Have the SMEs get better at diagnosing bugs as-is. I have no doubt I am not as efficient as I could be, but I do not think this is actionable.
Gain more SMEs. It is true that all bugs are shallow given enough eyeballs. On the other hand, I struggle to see a path forward for gaining more SMEs (at least enough to make bugs shallow) in the near term. It is already the case that we have a relatively small pool of core contributors, none of whom work solely on Roc full-time. It is not clear to me that we could actively engage multiple members of the community to become SMEs of this domain in the near term, without additional work.
Build better diagnosis tools. It’s incredible how valuable valgrind, lldb/gdb, and debug mallocs are for debugging our code generators. It’s unfortunate that we do not have similar interactive debugging tools for the type checker. If we had such tools (developed to facilitate exactly our needs), I estimate that our diagnosis time would drop significantly, and it would be easier to transfer knowledge about the checker - which might increase the number of SMEs.

Is this worth doing right now?

It’s important to consider the relative cost of building these tools relative to actually spending time diagnosing more bugs. If there are only a small number of material (by which I mean blocking and frequently-appearing) bugs, developing tools to make it easier to diagnose them may not make sense. But, if the tail is long and appears frequently, it does. But, if we getting the debugging tools wrong, or specialize them too much, then they may not be useful for debugging issues in the future, and we have wasted our time. And so on.

Have felt that we would benefit from better tools of this form since May 2022, I am inclined to say that it is unlikely we get the value of the tool “wrong”, and that the tail will continue to be long for a while before it shortens up. Every couple weeks I convince myself that most issues are resolved, only to diagnose a new issue that proves otherwise!

The other thing is relative priorities. Our biggest priorities right now are stabilizing glue, supporting the usage of Roc at Vendr, and unblocking the path forward for composable effect-handler Task/Map2/etc. Glue is close to stabilization, and the latter two efforts only benefit from our being able to diagnose checker bugs faster. And, none of these are in an immediate rush - which means we have some room to go build good tools, to the extent that it’s not a waste of time.

Finally, I’ll mention that these tools may pay off as inspiration/plugins for the eventual editor, or even as other tools available for developers in the Roc system.