What is a monorepo and why use one?
Managing a sprawling codebase across multiple repositories can be a logistical nightmare. Developers often find themselves juggling various versions, wrestling with incompatible dependencies, and navigating a maze of pull requests and merges.
This chaos not only hampers productivity but also increases the risk of errors and inconsistencies. Are you tired of this disarray and looking for a streamlined way to manage your projects?
The answer lies in adopting a monorepo (aka a monolithic repository). One of the most compelling benefits of a monorepo is its ability to simplify version control.
In this comprehensive guide, you’ll gain insights into what a monorepo is and how it differs from traditional multirepo strategies. You’ll also learn about the advantages of using a monorepo, particularly for larger teams dealing with complex projects.
What is a monorepo?
A monorepo is a software development strategy where the code for multiple projects is stored in a single version control system (VCS) repository:
This differs from the more traditional approach where each project or module has its own separate repository (aka a multirepo):
The projects within a monorepo can be interconnected libraries, services, applications, or even documentation.
The central idea of a monorepos is to consolidate the codebase, ensuring more streamlined version control, code reuse, and improved collaboration. For larger teams, this means better code visibility, simplified dependency management, and the possibility of atomic changes across multiple projects.
Monorepo vs multirepo
In a traditional multirepo setup, each project or component has its own repository, often leading to versioning conflicts and making it difficult to keep track of changes across projects. With a monorepo, all your code lives in one place, making it easier to manage versions and maintain a coherent history.
This centralized approach ensures that everyone on the team is working with the same codebase, reducing the likelihood of versioning issues and making rollbacks more straightforward.
Why you should use monorepos
One of the main advantages of using a monorepo is unified versioning. In a traditional multirepo setup, each project has its own version history, making it challenging to understand how changes in one project affect others. With a monorepo, all projects share a single version history, making it easier to understand their interdependencies.
For example, if Project A depends on a feature in Project B, both can be updated simultaneously in a single commit, making it easier to track changes and dependencies.
Following are a few more advantages of using a monorepo:
Reusable code across projects
While it’s true that package managers can help sync dependencies across multiple repositories, having all code in a single repository makes it even easier to share and reuse code. There’s no need to publish internal packages just to share common utilities or components.
This is particularly beneficial for large teams where multiple projects often have overlapping requirements. Code reusability in a monorepo ensures that developers can easily leverage existing code, reducing duplication and accelerating development cycles.
Easier refactoring ensures consistency
In a monorepo, refactoring becomes a less daunting task. Changes can be made once and propagated across all dependent projects in a single commit.
This ensures that improvements or fixes are consistently applied, reducing the risk of one project lagging behind in terms of code quality or features.
Enhanced collaboration through visibility
Monorepos offer improved visibility, allowing teams to better communicate and collaborate. In a large team, this is especially beneficial. Developers can see the entire codebase, understand the context of their work better, and make cross-project changes effortlessly.
This holistic view eliminates the need for special permissions to access different repositories, making it easier for team members to assist each other and encourage code reuse.
Streamlined dependency management
Managing dependencies in a large team can be cumbersome with multiple repositories. A monorepo ensures that there’s a single version of each dependency, reducing conflicts and making updates more predictable.
This centralized approach to dependency management eliminates the “it works on my machine” type of problem, as every team member works with the same set of standardized tools and configurations.
Atomic changes for better version control
In large teams, coordinating releases and updates can be a complex task. Monorepos enable atomic changes, allowing related modifications across multiple projects to be committed at once. This ensures that features or fixes affecting multiple projects are released cohesively, making version control more straightforward and reliable.
Optimized CI/CD pipelines
One of the benefits of monorepos is that continuous integration, continuous deployment (CI/CD) are more streamlined. There’s no need to sync multiple repositories or ensure cross-repo compatibility.
The unified nature of a monorepo allows build and test tools to be standardized, ensuring that everyone is testing and deploying based on the same criteria.
This is particularly advantageous for large teams, where maintaining consistency in CI/CD practices is crucial for efficient and reliable software delivery.
By understanding these benefits in the context of large teams, it becomes clear why monorepos are becoming increasingly popular. They offer a unified, streamlined, and efficient approach to software development that is especially advantageous in complex, multiproject environments.
Monorepo challenges and tools that can help
Monorepos have surged in popularity, especially among large tech giants, due to their myriad advantages. But they’re not a one-size-fits-all solution and come with their own set of challenges. Explore some of these challenges and learn about tools that can help.
Scaling issues
As the codebase within a monorepo grows, so does the build time. Every time a change is made, the CI system might try to rebuild and retest the entire codebase, making the process slow and cumbersome.
To help with these scaling issues, build tools like Bazel, Pants, and Buck2 are specifically designed to optimize the build process through a technique known as incremental builds. Incremental builds minimize the strain on system resources, allowing for more efficient use of hardware, whether you’re working on a local machine or in a cloud-based development environment.
Unlike traditional build systems that recompile the entire codebase every time a change is made, these tools are smart enough to identify which parts of the codebase are affected by recent changes.
These tools are built to seamlessly integrate into your existing development workflow. Once configured, they can automatically detect changes in the codebase and trigger the appropriate incremental builds. This automation is particularly beneficial in a CI/CD environment, where rapid and frequent builds are the norm.
While these tools offer powerful capabilities, they do come with an initial learning curve. Each tool has its own set of configurations, syntax, and best practices that you need to familiarize yourself with. However, the investment in learning is often justified by the significant gains in build speed and efficiency.
Another advantage of using these specialized build tools is their flexibility. They allow for a high degree of customization, enabling you to tailor the build process to meet the specific needs of your project or team. This is especially useful in large teams or complex projects where generic build configurations may not be sufficient.
High complexity
For newcomers or even seasoned team members, navigating a huge codebase can be daunting. Understanding the interdependencies, finding the right modules, or even simply knowing where to start can be overwhelming.
Code navigation tools such as Sourcegraph and integrated features within platforms like GitHub serve as invaluable aids for developers navigating extensive codebases. These tools go beyond basic text search to offer a range of advanced functionalities designed to make code exploration more efficient and insightful.
One of the primary features of these tools is advanced code search, which allows developers to perform complex queries to find specific code snippets, functions, or even documentation within a large codebase. This is particularly useful when you’re trying to understand how a particular piece of code interacts with other components or when you’re debugging.
Another powerful feature is cross-referencing, which enables developers to easily find where a particular function or variable is used across different files or projects. This is incredibly helpful for understanding the impact of potential changes or for tracking down the root cause of a bug. It eliminates the need to manually search through multiple files, saving both time and effort.
These tools also offer intelligent code mapping, which provides a visual representation of how different parts of the code are interconnected. This can be especially useful for new team members who are trying to get a grasp of a complex project or for any developer who wants to understand the architecture and dependencies within the codebase.
Potential for conflicts
With many developers working simultaneously on the same repository, the chances of conflicting changes or merge conflicts increase. This can hamper the development speed and lead to errors if not resolved correctly.
VCS like Git offer robust mechanisms to handle merge conflicts. Features like pull requests in platforms like GitHub or Bitbucket allow for code review, helping spot and resolve conflicts before they’re merged into the main branch.
Additionally, automated testing tools like Jenkins, Travis CI, or CircleCI can automatically run tests on branches before they’re merged. This ensures that any breaking changes or conflicts get flagged early.
As you can see, while monorepos have their disadvantages, there’s a range of tools designed to mitigate these challenges.
Building a monorepo culture
The decision to use a monorepo goes beyond just tools and technical considerations; it requires a cultural shift in how developers work and collaborate. This culture is foundational to effectively managing and scaling a monorepo environment, ensuring that the benefits outweigh the challenges.
Take a look at a few different aspects of building a monorepo culture.
Shared responsibility
In a monorepo setting, boundaries between projects or components become blurred. Instead of viewing projects as isolated entities, team members should see the entire repository as their domain. That’s why it’s important to encourage collaboration across teams. Cross-team code reviews, pair programming, and team rotations can break silos and foster a holistic view of the codebase.
Additionally, you should regularly organize internal workshops, tech talks, or code walkthroughs. This can help team members familiarize themselves with different parts of the codebase and understand its intricacies.
For instance, Google fosters an environment in which developers have the freedom to access and contribute to any section of the codebase. This approach to code ownership has led to standardized coding practices, enhanced collaboration among team members, and a simplified process of reusing code.
Early merging to catch integration Issues
Consistently merging code changes is a proactive approach to software development that helps catch integration issues at an early stage. By integrating changes frequently, you can identify conflicts or bugs sooner rather than later, making them easier to resolve.
This practice minimizes the risk of encountering larger, more complicated issues in the future, which could require significant time and effort to fix. For example, if two developers are working on features that affect the same piece of code, early merging will reveal any incompatibilities between their changes, allowing for quicker adjustments.
To manage these merges in a more organized fashion, implementing branching strategies like feature branching or trunk-based development is highly recommended.
In feature branching, each new feature or bug fix is developed in its own branch. This allows developers to work on different features simultaneously without affecting the main codebase. Once the feature is complete and tested, it can be merged back into the main branch.
Feature branching is particularly useful for teams that have multiple developers working on different aspects of a project, as it allows for parallel development without the risk of one feature negatively impacting another.
In comparison, trunk-based development encourages developers to merge their changes directly into the trunk or main codebase as quickly as possible, often multiple times a day. This approach is beneficial for catching integration issues early and ensures that the codebase remains in a consistently deployable state. It’s especially effective for large teams where rapid integration is crucial for maintaining a smooth development workflow.
Take Facebook’s example, where the codebase is designed to empower engineers to “move fast and break things,” signifying a culture that values swift innovation along with ongoing refinement and iteration.
Thorough documentation
A monorepo’s vastness makes it challenging to navigate and understand. Comprehensive documentation acts as a map, guiding developers through the code.
Make sure you establish clear standards for documenting code. This might include things like comments, READMEs, and architecture diagrams.
Additionally, use tools like Doxygen, Javadoc, or Sphinx to automatically generate documentation from source code comments.
Continual refinement for a healthy codebase
As your codebase grows and evolves, it’s essential to periodically revisit and fine-tune existing code. This practice ensures that your code stays clean, efficient, and in line with current best practices. For instance, an algorithm that was efficient a year ago may now have a more optimized version, or a library you’re using might have received updates that you can take advantage of.
To systematically address this, consider dedicating specific sprints or time periods exclusively to code refactoring and reducing technical debt. For example, you could allocate the last week of every development cycle to revisit sections of the code that have been flagged for optimization or refactoring. This focused effort ensures that your codebase doesn’t accumulate quick fixes or workarounds that can make it harder to maintain and scale over time.
In addition, encourage a culture of detailed code reviews that go beyond just assessing functionality. These reviews should also scrutinize the quality of the code, examining factors like readability, efficiency, and adherence to coding standards. Peer feedback during these reviews can be invaluable for identifying areas that may require refactoring. For example, a team member might notice that a particular function is overly complex and suggest breaking it down into smaller, more manageable functions, thereby improving both readability and maintainability.
By continually refining your code, dedicating time to tackle technical debt, and fostering a culture of thorough code reviews, you can maintain a high-quality, efficient codebase that is easier to work with and less prone to issues in the long run.
Monorepo culture at tech giants
Monorepo culture has been adopted by many tech giants and renowned companies due to the myriad advantages it offers. Take a quick look at how Google, Facebook, and Microsoft have adopted a monorepo culture:
Google is often credited for popularizing the monorepo approach through its massive monolithic codebase known as Piper, which contains billions of lines of code and thousands of projects.
At Google, a culture of shared ownership encourages developers to access and contribute to any part of the codebase. This collaborative approach has led to consistent coding standards, enhanced collaboration, and easier code reuse.
In conjunction with this, Google created Bazel, a build tool designed to work with large codebases like theirs. Bazel supports incremental builds, ensuring only affected components are rebuilt, significantly speeding up the build process.
Facebook also employs a monorepo for its vast collection of projects, including the main Facebook app, Instagram, and WhatsApp.
Facebook’s codebase encourages engineers to “move fast and break things,” meaning they actively engage in rapid innovation while also continuously refining and iterating.
In conjunction, Facebook uses Buck, a build system tailored for their monorepo. It ensures efficient and reproducible builds, which is vital given the scale and pace of their development.
Microsoft
Microsoft famously transitioned the Windows codebase to a monorepo using Git, the largest Git repo on the planet. With the move, Microsoft aimed to increase developer productivity, improve code sharing, and streamline the engineering system.
To manage the massive repository, Microsoft developed the Virtual File System for Git (VFS for Git). It allows the Git client to operate at a scale previously thought impossible by virtualizing the filesystem beneath the repo and making it appear as though all the files are present when, in reality, they are not.
These companies not only showcase the technical adaptability of monorepos but also emphasize the cultural shift essential for such a model’s success.
The benefit of monorepos
Deciding between monorepos and multirepos isn’t solely a technical decision—it encapsulates a team’s collaboration dynamics, accountability distribution, and holistic view toward software creation. When complemented with the right tools and a strong culture emphasizing shared ownership and ongoing refinement, monorepos can create a vibrant, streamlined, and unified framework for software initiatives, particularly for larger teams.
Beyond technical merits, monorepos foster an enhanced collaborative environment. They dissolve barriers between developers, promoting shared responsibility, comprehensive code reviews, and a unified development environment.
Together, these features make monorepos a compelling choice for teams seeking both technical efficiency and collaborative synergy.
Aviator: Automate your cumbersome processes
Aviator automates tedious developer workflows by managing git Pull Requests (PRs) and continuous integration test (CI) runs to help your team avoid broken builds, streamline cumbersome merge processes, manage cross-PR dependencies, and handle flaky tests while maintaining their security compliance.
There are 4 key components to Aviator:
- MergeQueue – an automated queue that manages the merging workflow for your GitHub repository to help protect important branches from broken builds. The Aviator bot uses GitHub Labels to identify Pull Requests (PRs) that are ready to be merged, validates CI checks, processes semantic conflicts, and merges the PRs automatically.
- ChangeSets – workflows to synchronize validating and merging multiple PRs within the same repository or multiple repositories. Useful when your team often sees groups of related PRs that need to be merged together, or otherwise treated as a single broader unit of change.
- TestDeck – a tool to automatically detect, take action on, and process results from flaky tests in your CI infrastructure.
- Stacked PRs CLI – a command line tool that helps developers manage cross-PR dependencies. This tool also automates syncing and merging of stacked PRs. Useful when your team wants to promote a culture of smaller, incremental PRs instead of large changes, or when your workflows involve keeping multiple, dependent PRs in sync.