How to work with git submodules
Git submodules allow you to include one repository inside another. This is useful when you want fine-grained control over your dependencies, or in situations where a dependency manager is not suitable. Submodules are powerful tools, and it’s worth understanding them properly before using them.
In this article, we’ll cover:
- What Git submodules are
- Common workflows with submodules
- What they are useful for
- When you shouldn’t use them
At the end of the article, you’ll also find links to further resources.
Git submodules
Imagine you are working on a text editor. You’ve implemented the basic features of viewing and editing files, and now you want to add syntax highlighting. There’s a cool library on GitHub that does exactly what you want, but it hasn’t been published to the dependency manager you use. How can you use it?
This is a situation where Git submodules might come in handy. Submodules are a feature of Git that lets you include one repository inside another. This means that you can include the syntax highlighting library in your text editor’s repo while keeping a link to the original repository so that you can receive upstream changes.
The above diagram shows what your repository structure might look like when you use submodules. The src
and test
directories contain your own files, but lib
contains the syntax highlighter library as a submodule.
Submodules are entire Git repositories that are pinned to a specific commit. Your local copy of a repository containing a submodule will contain all of the files from the submodule, which means that you can treat it as if it were your own code. Submodules let you view, edit, and reference all of the files in the contained repository.
text-editor
├── lib
│ └── syntax-highlighter
│ ├── README.md
│ ├── docs
│ │ └── very-good-docs.md
│ └── ...
├── src
│ ├── editor.py
│ └── ...
└── test
└── ...
Above is the file structure of our text editor repository after adding the syntax highlighting library as a submodule. All of the files from the submodule are on our filesystem and ready for us to edit them.
Workflows
Now that you’ve seen what submodules can do, the following section will take you through how to use them.
Adding a submodule to your repository
Following on from the example earlier of a syntax highlighting library, imagine that the library you want to add is at the following URL:
https://www.github.com/username/syntax-highlighter
You can add this library as a submodule to your repository by using the command:
git submodule add https://www.github.com/username/syntax-highlighter lib/syntax-highlighter
This will add two new files to your repository, .gitmodules
and lib/syntax-highlighter
. You can see these files using git status
:
~/src/text-editor$ git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: .gitmodules
new file: lib/syntax-highlighter
.gitmodules
is a simple text file that lists the submodules in your repository. You should commit this file so that other people working on your repository can also use the submodule.
lib/syntax-highlighter
is a bit more complicated. Git sees this path as a file, but your filesystem sees the path as a directory. You can output what Git sees by running git diff --cached lib/syntax-highlighter
:
~/src/text-editor$ git diff --cached lib/syntax-highlighter
diff --git a/lib/syntax-highlighter b/lib/syntax-highlighter
new file mode 160000
index 0000000..ac8e080
--- /dev/null
+++ b/lib/syntax-highlighter
@@ -0,0 +1 @@
+Subproject commit ac8e080ae2ba4c582eb5842139ab7e5082b4cff0
As shown in the diff above, Git sees the submodule as a file containing the commit ID currently tracked by the submodule. By default, this will be the latest commit to the default branch, which is usually main
on newer Git repositories and master
on older ones.
However, if you look at the submodule on the filesystem using something like ls
, you’ll see that it’s a directory:
~/src/example$ ls lib/syntax-highlighter
README.md docs src test
What’s more, this directory is actually a Git repository in its own right! You can run things like git status
and even edit the code in it.
Cloning a repository that contains a submodule
Git stores each submodule as an entry in .gitmodules
and a file in the repo that describes what commit the submodule points to. As a result, when you clone a repo, you need to do a little extra work to download the code for the submodule into your local copy.
Let’s say you’ve cloned the text-editor repo from earlier:
git clone https://github.com/username/text-editor
If you were then to examine lib/syntax-highlighter
, you’d find that it’s just an empty directory.
~/src/text-editor$ less lib/syntax-highlighter/
lib/syntax-highlighter/ is a directory
To populate lib/syntax-highlighter
with the submodule’s code, you need to run git submodule update --init --recursive
.
~/src/text-editor$ git submodule update --init --recursive
Submodule 'lib/syntax-highlighter' (https://github.com/username/syntax-highlighter.git) registered for path 'lib/syntax-highlighter'
Cloning into '/home/username/src/text-editor/lib/syntax-highlighter'...
Submodule path 'lib/syntax-highlighter': checked out '55086f1cb2ee8294d3354805be941171c287557d'
This is a convenient shorthand for git submodule init
followed by git submodule update
. If your submodules have submodules then this command will also initialize those recursively. init
figures out where the submodule comes from and update
downloads its contents.
An alternative workflow is to use git clone --recurse-submodules
. This is an even shorter shorthand that is equivalent to a git clone
, git submodule init
, and git submodule update
.
Editing a submodule’s code
Submodules are complete Git repositories in their own right. This means that you can use them exactly as you would any other Git repository. To illustrate this point, let’s walk through making a change to a submodule in our repository.
Imagine that you want to add a line to the syntax-highlighter library to let it support Python. We can make that change in our favorite text editor (possibly the one we’re building!) and then see the change with git diff
:
~/src/text-editor/lib/syntax-highlighter$ git diff
diff --git a/src/supported-languages.txt b/src/supported-languages.txt
index ad2c90d..f4311cb 100644
--- a/src/supported-languages.txt
+++ b/src/supported-languages.txt
@@ -2,3 +2,4 @@ javascript
markdown
java
c++
+python
Note the path in the terminal prompt: ~src/text-editor/lib/syntax-highlighter
. We are making this change inside the submodule, not inside the original syntax-highlighter repository.
After making the change, we can do our usual git add
, git commit
, and voila! We have edited our submodule. You can see this change in the text-editor repository by running git diff lib/syntax-highlighter
:
~/src/text-editor$ git diff lib/syntax-highlighter/
diff --git a/lib/syntax-highlighter b/lib/syntax-highlighter
index 8b6e157..55086f1 160000
--- a/lib/syntax-highlighter
+++ b/lib/syntax-highlighter
@@ -1 +1 @@
-Subproject commit 8b6e157f0fb785c619b99373bb474e03b1b72f54
+Subproject commit 55086f1cb2ee8294d3354805be941171c287557d
Note that this diff just updates the commit ID that the submodule refers to. The actual changes to the submodule are not recorded in the parent repository. This leads to a really important point: to make changes to a submodule, you need push access to the original repository. Otherwise, the changes would be reflected in your local copy of the submodule, but nowhere else.
If you didn’t create the submodule, and therefore don’t have push access, that’s ok! You just need to fork the original repository and then use your fork as the submodule’s URL.
Pulling upstream changes into a submodule
Submodules maintain a link to the upstream repository that they originate from. You can use this link to pull upstream changes.
Imagine that after you added Python support to the syntax highlighting library, you hear that the maintainers have added TypeScript support. This sounds like a useful feature to include in your text editor and so you want to pull their changes. The first step is to cd
into the submodule and fetch
the changes:
~/src/text-editor$ cd lib/syntax-highlighter/
~/src/text-editor/lib/syntax-highlighter$ git fetch
remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 4 (delta 0), reused 4 (delta 0), pack-reused 0
Unpacking objects: 100% (4/4), 419 bytes | 419.00 KiB/s, done.
From github.com/username/syntax-highlighter
+ 49301eb...54f7bbb main -> origin/main
It’s important to cd
to the submodule directory first because, otherwise, you will fetch changes for your parent repository. The git fetch
shows that main
has been updated on the remote repository.
The changes that we want to pull in are on the main
branch, so we’ll need to merge
them into our own branch. We can use git merge
for this:
~/src/text-editor/lib/syntax-highlighter$ git merge origin/main
Auto-merging src/supported-languages.txt
CONFLICT (content): Merge conflict in src/supported-languages.txt
Automatic merge failed; fix conflicts and then commit the result.
Oh no! There’s a merge conflict with our branch. Thankfully in this case it’s quite small:
javascript
markdown
java
c++
<<<<<<< HEAD
python
=======
typescript
>>>>>>> origin/main
In this case, we want to keep both changes and so we can just delete the merge conflict markers. We can now add, commit and push this change to our remote.
~/src/text-editor/lib/syntax-highlighter$ git add src/
~/src/text-editor/lib/syntax-highlighter$ git commit
[add-python 98d5210] Merge remote-tracking branch 'origin/main' into add-python
~/src/text-editor/lib/syntax-highlighter$ git push
Enumerating objects: 10, done.
Counting objects: 100% (10/10), done.
Delta compression using up to 12 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 500 bytes | 500.00 KiB/s, done.
Total 4 (delta 0), reused 0 (delta 0)
To github.com/username/syntax-highlighter.git
55086f1..98d5210 add-python -> add-python
The ability to edit the code of submodules is both their most powerful feature and their most dangerous. Maintaining your own branch tangential to the main branch of a library is incredibly useful, but be prepared to face merge conflicts.
What are submodules useful for?
Below are a couple of examples of when you might want to use submodules over a dependency manager or other solution.
Libraries not in your dependency manager
Not every library is available through a dependency manager, and no dependency manager has every library. If your dependency manager doesn’t support a certain library, then submodules can help you include it in your project.
In this case, you should weigh up the work of maintaining a submodule against the work of adding that library to your dependency management system. Remember that submodules need to be manually updated.
Editable libraries that track upstream
Dependency managers, for the most part, aren’t designed for you to modify the dependencies that they manage. If you want to make changes to a library that you depend on, then submodules might be a good solution.
Submodules keep a link to the upstream code. This means that you can still pull in the latest security and bug-fix updates from the library you depend on. If you were to just copy and paste the code into your repository, getting updates from upstream would become a lot harder.
An alternative in this situation is to try and merge your changes to the upstream repository. However, this isn’t always practical. There might be license issues, your changes might not be accepted, and even if they are you will likely have to wait a while before they get merged.
Internal libraries
It’s not always OK to publish libraries that are developed within your organization externally due to intellectual property and copyright concerns. Internal package mirrors are one solution to this problem, as they allow you to publish packages within your organization. Submodules can be a lot simpler to manage, however, and you should weigh up the cost of keeping a submodule up-to-date against the cost of maintaining a package mirror.
When not to use submodules
Submodules are powerful, but they come with some caveats. For starters, submodules don’t have automatic update mechanisms like dependency managers do.
If you add a submodule to your project, then you become responsible for keeping it up to date, whereas if you install a dependency with a dependency manager, the dependency manager can automatically keep the package on the latest version.
Git doesn’t download the contents of a submodule by default. This is not obvious to developers who haven’t worked with submodules before and can become a trip hazard for your project. If you use submodules in your project, then it’s worth thoroughly documenting development workflows in contributing.md
or similar.
Submodules bring increased complexity to your development workflow, so it’s only worth using them if you need to. If a dependency manager will satisfy your use case, then consider using it over submodules.
Resources
- https://git-scm.com/book/en/v2/Git-Tools-Submodules
- https://git-scm.com/docs/git-submodule
- https://github.blog/2016-02-01-working-with-submodules/
Aviator: Automate your cumbersome processes
Aviator automates tedious developer workflows by managing git Pull Requests (PRs) and continuous integration test (CI) runs to help your team avoid broken builds, streamline cumbersome merge processes, manage cross-PR dependencies, and handle flaky tests while maintaining their security compliance.
There are 4 key components to Aviator:
- MergeQueue – an automated queue that manages the merging workflow for your GitHub repository to help protect important branches from broken builds. The Aviator bot uses GitHub Labels to identify Pull Requests (PRs) that are ready to be merged, validates CI checks, processes semantic conflicts, and merges the PRs automatically.
- ChangeSets – workflows to synchronize validating and merging multiple PRs within the same repository or multiple repositories. Useful when your team often sees groups of related PRs that need to be merged together, or otherwise treated as a single broader unit of change.
- TestDeck – a tool to automatically detect, take action on, and process results from flaky tests in your CI infrastructure.
- Stacked PRs CLI – a command line tool that helps developers manage cross-PR dependencies. This tool also automates syncing and merging of stacked PRs. Useful when your team wants to promote a culture of smaller, incremental PRs instead of large changes, or when your workflows involve keeping multiple, dependent PRs in sync.