Monorepos will ruin your life -- but they're worth it!

Monorepos will ruin your life -- but they're worth it!

08/26/2021

I stumbled across the concept of monorepos about four years ago when I was knee deep in git sub-modules and thousands of interconnected but disparate GitHub repositories. It seemed like the answer to all of my problems. You probably already know where this is going.

A monorepo is a way of managing code where multiple self-contained packages are developed, versioned and deployed in a single repository instead of managing each package in its own repository. I have written on the topic of monorepos in the past. I still hold that monorepos are amazing at solving a certain set of problems, but as with most things in life, computer science is a series of tradeoffs.

Why Monorepos Are Problematic

Sometimes addressing big issues in programming can feel a little like adjusting those retro blinds from the 90s. Every tweak you make can have vast repercussions. Having built out four or five monorepos at this point with Lerna, I feel that I've hit most of the rough edges.

Hoisting horror show

One of the big benefits of using a monorepo is that you can hoist many or all of your dependencies so that instead of living in the package node_modules folder, they are installed in the node_modules directory in the root of the repository. This can speed up install time, decrease the size of repeated packages taking up space and guarantee that packages are sharing the same version of dependencies.

However, the downside of this is that dependencies that are hoisted with open version numbers such as ~1.0.0 or ^1.0.0 may silently be bumped up or down to satisfy the more specific version numbers like 1.0.1. In a perfect world, everyone would be correctly using semantic versioning and breaking changes would always result in a major version bump, but that is sadly not the case.

On this note, if you are using Lerna with Yarn Workspaces, be advised that the Lerna --no-hoist flag and config options are ignored in favor of the nohoist in the workspaces object of the package.json. For example:

{
  "workspaces": {
    "nohoist": ["react-native", "react-native/**"]
  }
}

This is a note to myself as much as to anyone else, because I have personally lost a decent amount of time trying to figure out why modules were hoisting when I thought I was explicitly telling them not to.

Congratulations, your new job is devops!

As soon as you adopt a monorepo architecture, you will introduce a layer of complexity to your deployment pipeline. With an architecture of one package per repo, the build process is fairly straightforward -- you install your dependencies, build and test the code and if there's not an issue, you release it.

With a monorepo, order becomes very important. Dependencies must be built before the packages that consume them. If you need to extract variables from one released package to pass into another package, you have to release in order as well. What if a package fails to release? Do you roll back everything?

If the most difficult element of programming is added complexity, your new role managing the CI pipeline may not be entirely fulfilling.

Build a house to get a doorknob

If not managed correctly, monorepos can cause the build process to take just short of an eternity. This is acceptable if you are changing code in every package you're building and releasing, but if you just want to make a tiny bugfix in one package go live, it's hard to explain to your CEO why they won't see it for an hour when it used to take a matter of minutes.

To be fair, tools like Lerna do have functionality to check which packages have actually changed since the latest release, but when it comes to using that information to filter what is built, how artifacts get managed and whether everything gets released every time -- that is all on you.

Why It's Worth It

At this point, you're probably wondering if its even worth it. In fact, you may be halfway through moving all of your packages back to their own repositories. But please, hear me out on this. Without sorrow, we will never know joy. Here are just a handful or reasons why I still recommend monorepos for applications with more than a couple of associated packages.

A warm environment

Managing environment variables is much easier in a monorepo. I'll typically have a /env folder in the root of the monorepo where I put all of my .env files for each deployment stage. Those files can then be used by all of the packages with dotenv or whatever the equivalent is in your language or framework of choice. There is simply a lot to be said for having a single source of truth not just for code but for configuration.

One feature, one pull request

If you work across the entire stack, (yes, I'm one of those people who believe that fullstack developers are not a myth), your features will rarely be in just one package. Any meaningful change to the frontend will almost certainly be paired with a change to the API at the very least. I remember in the not so distant past writing pull request comments that would say something like "do not merge until you merge the API feature related to it". The more broad the feature, the more prevalent this problem can become. I once made a breaking change in an API layer that ten other repositories depended on!

With a monorepo, you can truly make one PR for one feature. Besides the obvious perk of having less headache managing merges, it is also much easier to review a pull request where all of the frontend and backend changes are in one place.

Version management

Ensuring that versions get bumped for updated packages could almost be a fulltime job if you are breaking down your packages into small reusable pieces. This is a solved problem in a single codebase, especially if you are using Lerna because when you do lerna version, it finds all of the changed packages and bumps them as well as bumping all of the up to date references to that package in your other packages.

How To Manage the Pain

In life, some pain simply requires a little ibuprofen. In code, hacks, workarounds and best practices tend to be just what the doctor ordered.

Build, test and release independently

If you split up every package into simple lifecycle, it will be easier to move it around and manage build dependencies as the monorepo inevitably increases is size and complexity. For every package, you want to essentially have these four jobs:

The bootstrap job is responsible for installing an dependencies and caching them to docker image for the build and test jobs to take over. I run build and test in parallel since they are non-destructive and if either goes wrong, I want to kill the entire pipeline. Once the tests run and the build completes and uploads its artifact, the release job can use the artifact to deploy the code to wherever it is hosted. I use the term "release" instead of "deploy" here because it can cover releasing native applications to your app store of choice as well as throwing code up on the cloud.

Right now, I'm primarily using CircleCI and I fan the builds out and then have a pre-build workflow that triggers when they are all done that kicks off all of the releases. Since the jobs are broken up like this, I can stash a build artifact for each package after the build step and the pull it down and release it during the release phase.

Avoid the web of dependencies

The more nested your dependency structure is, the longer you're going to have to wait to build your application. Wherever possible, I would advise flattening dependencies to one or two layers at most. You want your builds and deploys to be able to fan out and run in parallel so that you can release faster.

Only build things that changed

If you want your build time to really decrease, I would suggest carefully looking at this example which runs a script to check if each package has changed since the default branch of the repository and only runs those builds.

It gets a little more complicated if you are spinning up new testing environments on demand because if you aren't building an artifact since the package is unchanged, you'll need to pull the artifact that was built against the default branch as a fallback so that you have something to release. It will be much less complicated if you only release to a couple of pre-defined stages because then you can guarantee that an artifact already exists and doesn't need to be released on top of unless it's new.

Closing Thoughts

Monorepos are really powerful. But then again, so was the galactic Empire in Star Wars. It is up to you to reign in the power and use it amicably. Good luck.

Never miss an update!

Join our newsletter

Tyson Cadenhead
Tyson Cadenhead leads the dev team at Vestaboard

He has a passion for Functional Programming, GraphQL, Serverless architecture and React. When he's not writing code or working with his team, Tyson enjoys drawing, playing guitar, growing vegetables and spending time with his wife, two boys and his dog and cat.