As a software engineer, it is not uncommon that we work on a project which has a dependency on the other project that we also work on at the same time. The scenario may be like the following:
We have two projects, and each of them has its Git repository:
- A common library, say
commonlib
, used by many projects. The library is self-contained and has its test suite and document. - A project called
myproj
which has a dependency on commonlib
.
While we are working on myproj
, we may also need to update commonlib
at the same time. If both commonlib
and myproj
happen to be Python projects, we can use setuptools’ development mode (development mode) and Git submodule (submodule) to make the work easier. This article demonstrates how to use development mode and submodule to deal with this situation. Hopefully, people who need to manage this type of case can find this article helpful.
commonlib
and myproj
are used as an example in the rest of article, and the example assumes the code runs in a virtual environment with the following condition:
- Ubuntu 18.04
- Python 3.7
- Git 2.17
The Challenges
First of all, for Python project development, we usually set up a virtual environment first, and install all the dependencies into a virtual environment. Then, we start working on our project, i.e., myproj
in this case. However, myproj
needs commonlib
, which we also work on at the same time. If we install commonlib
on the normal way, e.g., pip install
, we are not able to use Git to keep tracking our changes of commonlib
. This is the issue that development mode comes to solve.
Second, commonlib
is used by many projects, including myproj
. On the one hand, during the development, myproj
may need to stick with a specific version or branch of commonlib
. On the other hand, other projects may need a different version of commonlib
. In other to make sure that we use the correct branch or version of commonlib
when we work on myproj
, we can set the dependency as a Git submodule.
What is Development Mode?
Development mode allows a project to be both installed and editable.
Normally, we install a Python package from PyPi.
$ pip install <package_name>
Or, we install it from a local package.
$ pip install <path_to_local_archive>
Either way, the package will be installed onto our (virtual) environment. When we install a Python package into our virtual environment, the package will be copied to /virtual_environment/lib/python3.7/site-packages/, for example. If we want to install commonlib
onto our virtual environment, we can do:
$ git clone https://github.com/shunsvineyard/commonlib.git
$ pip install commonlib/
After the installation, commonlib
will be shown as an installed package in the site-packages folder. We can use ls
command to check it. For example, the result may look like the following:
(demo_env) shunsvineyard@remote-ubuntu:~$ ls -l demo_env/lib/python3.7/site-packages/
total 40
drwxrwxr-x 2 shunsvineyard shunsvineyard 4096 Dec 23 05:00 __pycache__
drwxrwxr-x 3 shunsvineyard shunsvineyard 4096 Dec 23 05:01 commonlib
drwxrwxr-x 2 shunsvineyard shunsvineyard 4096 Dec 23 05:01 commonlib-0.0.1.egg-info
-rw-rw-r-- 1 shunsvineyard shunsvineyard 126 Dec 23 05:00 easy_install.py
drwxrwxr-x 11 shunsvineyard shunsvineyard 4096 Dec 23 05:00 pip
drwxrwxr-x 2 shunsvineyard shunsvineyard 4096 Dec 23 05:00 pip-9.0.1.dist-info
drwxrwxr-x 5 shunsvineyard shunsvineyard 4096 Dec 23 05:00 pkg_resources
drwxrwxr-x 2 shunsvineyard shunsvineyard 4096 Dec 23 05:00 pkg_resources-0.0.0.dist-info
drwxrwxr-x 6 shunsvineyard shunsvineyard 4096 Dec 23 05:00 setuptools
drwxrwxr-x 2 shunsvineyard shunsvineyard 4096 Dec 23 05:00 setuptools-39.0.1.dist-info
Development mode creates a link from the package to the virtual environment. With the development mode, a Python package can be installed in a way that allows us to edit the code after the installation. Therefore, when we make any change in the code, the change takes effect immediately in the virtual environment.
To install a Python package as development mode, use the command:
$ pip install -e <path to the package>
Take the commonlib
as an example, and the result may look like the following:
(demo_env) shunsvineyard@remote-ubuntu:~$ pip install -e commonlib/
Obtaining file:///home/shunsvineyard/commonlib
Installing collected packages: commonlib
Running setup.py develop for commonlib
Successfully installed commonlib
(demo_env) shunsvineyard@remote-ubuntu:~$ ls -l demo_env/lib/python3.7/site-packages/
total 40
drwxrwxr-x 2 shunsvineyard shunsvineyard 4096 Dec 23 05:08 __pycache__
-rw-rw-r-- 1 shunsvineyard shunsvineyard 31 Dec 23 05:09 commonlib.egg-link
-rw-rw-r-- 1 shunsvineyard shunsvineyard 30 Dec 23 05:09 easy-install.pth
-rw-rw-r-- 1 shunsvineyard shunsvineyard 126 Dec 23 05:08 easy_install.py
drwxrwxr-x 11 shunsvineyard shunsvineyard 4096 Dec 23 05:08 pip
drwxrwxr-x 2 shunsvineyard shunsvineyard 4096 Dec 23 05:08 pip-9.0.1.dist-info
drwxrwxr-x 5 shunsvineyard shunsvineyard 4096 Dec 23 05:08 pkg_resources
drwxrwxr-x 2 shunsvineyard shunsvineyard 4096 Dec 23 05:08 pkg_resources-0.0.0.dist-info
drwxrwxr-x 6 shunsvineyard shunsvineyard 4096 Dec 23 05:08 setuptools
drwxrwxr-x 2 shunsvineyard shunsvineyard 4096 Dec 23 05:08 setuptools-39.0.1.dist-info
If we open the file, commonlib.egg-link, we will see where it links to. For example:
(demo_env) shunsvineyard@remote-ubuntu:~$
cat demo_env/lib/python3.7/site-packages/commonlib.egg-link
/home/shunsvineyard/commonlib
Note that development mode only available for a local project or a VCS URL. If we try to install a package from PyPi as development mode, the following error message will show. Use numpy
as an example:
$ pip install -e numpy
numpy should either be a path to a local project or
a VCS url beginning with svn+, git+, hg+, or bzr+
What is Git Submodule?
A Git submodule is a Git repository inside another Git repository. It is like that one Git repository has reference to the other Git repository. For example, myproj
has a dependency on commonlib
. If commonlib
is a Git submodule of myproj
, the picture below illustrates their relationship.
Git submodule allows us to keep a Git repository as a subdirectory of another Git repository. When we do git clone myproj
, a specific version of commonlib
defined in myproj
submodule reference will be downloaded from commonlib
repository. This way, we can clone another repository (i.e., commonlib
) into our project (i.e., myproj
) and keep the commits separate.
The following sections use commonlib
and myproj
as an example to demonstrate the setup and workflow of development mode and submodule. The following sections also assume we do everything from scratch, including setup the Git repositories.
Setup the Projects
Assume commonlib
provides a very simple and only feature: greeting. The project layout and code look like the following:
commonlib/
├── LICENSE
├── README.rst
├── commonlib
│ ├── __init__.py
│ └── greeting.py
└── setup.py
greeting.py
def greeting(name: str)
print(f"Howdy, {name}")
setup.py
import pathlib
import setuptools
HERE = pathlib.Path(__file__).parent
README = (HERE / "README.rst").read_text()
setuptools.setup(
name="commonlib",
version="0.0.1",
description="A simple Python package",
long_description=README,
long_description_content_type="text/x-rst",
author="Author Name",
author_email="author@email.com",
license="MIT",
classifiers=[
"License :: OSI Approved :: MIT License",
"Programming Language :: Python"
],
packages=setuptools.find_packages(),
python_requires=">=3.7"
)
(A complete example of commonlib can be found at https://github.com/shunsvineyard/commonlib)
Now, we are ready to set up the Git repositories for both commonlib
and myproj
. Before we do that, we need to set up a Git server. This example uses localhost (i.e., 127.0.0.1) as the Git server.
$ sudo useradd git
$ sudo passwd git
$ su git
$ cd ~
$ git init --bare commonlib
$ git init --bare myproj
Setup Git Repository for commonlib
After we have a Git server, we can add the existing commonlib
to the Git server. Go back to the local user.
user:~$ cd commonlib/
user:~/commonlib$ git init
user:~/commonlib$ git add –all
user:~/commonlib$ git commit -a -m "Initialize commonlib repository"
user:~/commonlib$ git remote add origin git@127.0.0.1:commonlib
user:~/commonlib $ git push -u origin master
Setup Git Repository for myproj
For myproj
, we can do a similar thing as commonlib
. The project layout and code are like the following:
myproj/
├── LICENSE
├── README.rst
├── app.py
└── setup.py
app.py
from commonlib import greeting
def run():
greeting.greeting("Git Submodule")
if __name__ == "__main__":
run()
setup.py
import pathlib
import setuptools
HERE = pathlib.Path(__file__).parent
README = (HERE / "README.rst").read_text()
setuptools.setup(
name="myproj",
version="0.0.1",
description="A simple Python project",
long_description=README,
long_description_content_type="text/x-rst",
url="https://github.com/shunsvineyard/myproj",
author="Author Name",
author_email="author@email.com",
license="MIT",
classifiers=[
"License :: OSI Approved :: MIT License",
"Programming Language :: Python"
],
packages=setuptools.find_packages(),
python_requires=">=3.7"
)
Then, add the existing code to the Git server.
user:~$ cd myproj/
user:~/myproj$ git init
user:~/myproj$ git add –all
user:~/myproj$ git commit -a -m "Initialize myprojrepository"
user:~/myproj$ git remote add origin git@127.0.0.1: myproj
user:~/myproj$ git push -u origin master
Setup Git Submodule
Although Git submodule provides many features for all kinds of situations, the two use cases used the most are:
- adding a repository as a submodule, and
- update a submodule.
Add a Repository as a Submodule
Adding an existing repository as a submodule of another repository can be simply done by the following commands:
user:~$ cd myproj/
user:~/myproj$ git submodule add git@127.0.0.1:commonlib
user:~/myproj$ git submodule init
user:~/myproj$ git commit -a -m "Add commonlib as submodule"
user:~/myproj$ git push
After we add a submodule, a submodule reference, i.e., a .gitmodules file, will be created. It may look like the following:
shunsvineyard@remote-ubuntu:~/workspace/myproj$ ls -al
total 40
drwxrwxr-x 4 shunsvineyard shunsvineyard 4096 Dec 20 07:20 .
drwxrwxr-x 10 shunsvineyard shunsvineyard 4096 Dec 20 06:47 ..
drwxrwxr-x 9 shunsvineyard shunsvineyard 4096 Dec 20 07:22 .git
-rw-rw-r-- 1 shunsvineyard shunsvineyard 1233 Dec 20 06:44 .gitignore
-rw-rw-r-- 1 shunsvineyard shunsvineyard 73 Dec 20 07:20 .gitmodules
-rw-rw-r-- 1 shunsvineyard shunsvineyard 1067 Dec 20 06:44 LICENSE
-rw-rw-r-- 1 shunsvineyard shunsvineyard 278 Dec 20 06:58 README.rst
-rw-rw-r-- 1 shunsvineyard shunsvineyard 123 Dec 20 06:57 app.py
drwxrwxr-x 3 shunsvineyard shunsvineyard 4096 Dec 20 07:20 commonlib
-rw-rw-r-- 1 shunsvineyard shunsvineyard 724 Dec 20 06:57 setup.py
If we open the file, .gitmodules, we can see that it records the information of submodules.
$ cat .gitmodules
[submodule "commonlib"]
path = commonlib
url = git@127.0.0.1:commonlib
Note: the url
of the submodule in .gitmodules can be a relative path. For example, both commonlib
and myproj
are located at the same folder of the Git server. The url
can be simplified to ../commonlib
.
If we use Github to host our repositories, the submodule may look like below:
(The example, myproj
, can be found at https://github.com/shunsvineyard/myproj)
Update a Submodule
Usually, there are two cases that we may want to update a submodule:
- Update a submodule because of some code changes.
- Update a submodule to a newer or specific version.
Case 1: Update a submodule because of code changes
A submodule is just a Git repository inside another Git repository. When we make some code changes on a submodule, we do the same thing as we usually do on a regular Git repository.
For example, we add a new function called greeting2
into commonlib
.
greeting.py
def greeting2(name: str)
print(f"How are you, {name}?")
We do the same thing for the submodule as we do for a regular repository: commit the change and push the change.
user:~$ cd myproj/commonlib
user:~/myproj/commonlib$ git status
On branch master
Your branch is up to date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: greeting.py
no changes added to commit (use "git add" and/or "git commit -a")
user:~/myproj/commonlib$ git commit -a -m "Added a new greeting function."
user:~/myproj/commonlib$ git push
After we commit and push the change of the submodule, we can see the submodule reference of the main project, i.e., myproj
, also changed, and then we can do the same thing to update the reference. Then, myproj
will attach the newer commonlib
.
user:~/myproj/commonlib$ cd ../
user:~/myproj$ git status
On branch master
Your branch is up to date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: commonlib (new commits)
no changes added to commit (use "git add" and/or "git commit -a")
user:~/myproj$ git commit -a -m "Update submodule, commonlib"
user:~/myproj$ git push
Case 2: Update a submodule to a newer or specific version
When someone else modified commonlib
or add new features, we may want to update commonlib
submodule to the newer version.
For example, someone adds a new function called greeting3
into commonlib
.
greeting.py
def greeting3()
print("How's going?")
And the commit hash is 7735cf8460acd03f92e7c0529486c86ec83b2c0e
as shown below.
user2:~$ git clone git@127.0.0.1:commonlib
user2:~$ cd commonlib
user2:~/commonlib$ vim commonlib/greeting.py
user2:~/commonlib$ git commit -a -m "Added greeting3 function."
user2:~/commonlib$ git push
user2:~/commonlib$ git log
commit 7735cf8460acd03f92e7c0529486c86ec83b2c0e
(HEAD -> master, origin/master, origin/HEAD)
Author: user2 <user2@email.com>
Date: Sun Dec 22 00:27:09 2019 +0000
Added greeting3 function.
The way we update a submodule to a newer or specific version is to update the commit hash that the submodule points.
The Git submodule official document says, “Submodule repositories stay in a detached HEAD state pointing to a specific commit. Changing that commit simply involves checking out a different tag or commit then adding the change to the parent repository.”
The following is an example to update the submodule to commit hash 7735cf8460acd03f92e7c0529486c86ec83b2c0e
.
user:~/myproj$ cd commonlib
user:~/myproj/commonlib$ git pull
user:~/myproj/commonlib$ git checkout 7735cf8460acd03f92e7c0529486c86ec83b2c0e
Note: checking out '7735cf8460acd03f92e7c0529486c86ec83b2c0e'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
HEAD is now at 7735cf8 Added greeting3 function.
user:~/myproj/commonlib$ cd ..
user:~/myproj$ git status
On branch master
Your branch is up to date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: commonlib (new commits)
no changes added to commit (use "git add" and/or "git commit -a")
user:~/myproj$ git commit -a -m "Update submodule, commonlib, to the newer one."
user:~/myproj$ git push
Setup Development Mode with Git Submodule
Development mode is the ability provided by setuptools, so it is no different than writing a setup.py for packaging a Python project. However, when one Python project has another Python project as a submodule in it and we want to install the submodule as development mode, we need to add the submodule to the main project’s requirements.txt file. For example, the requirements.txt of myproj
can be the following.
-e ./commonlib
Therefore, when we install the dependencies of myproj
, commonlib
will be installed as develop mode automatically.
Workflow
The situation that we need to work on both the main project and its dependent project at the same time happens when we work on a big project which contains several smaller projects. In this case, we usually work with others as a team. The recommended workflow for this situation breaks down into two stages: setup stage and working stage.
Setup Stage
This stage prepares the code and working environment.
- Create a virtual environment
- Use
--recurse-submodules
to download the source code. --recurse-submodules
will download all the submodules.
$ git clone --recurse-submodules <URL_to_the_repository>
- Checkout the branch. Usually, when we work on a feature or fix a bug, we will create a branch for the work. We should avoid working to the master (or develop) branch directly. More info about this can be found at https://guides.github.com/introduction/flow/
$ git checkout <branch_name>
- Install the dependencies onto the virtual environment.
$ pip install -r requirements.txt
Working Stage
This stage indicates the time that we are working on our issue. Besides the code change, there are two cases we need to modify submodules.
Case 1: If we need to make some code change of a submodule:
- Create a branch of this change and create a Pull-Request (PR) for the submodule code change.
- After the PR gets approved and the branch is merged, update the submodule to the commit that the PR just merged.
Case 2: Someone updates a repository which is our submodule, and we want to update the submodule to the newer commit:
- Use
git pull
on the submodule folder to get the change. - Update the commit hash of the submodule to the one we want.
cd
to the main project and commit the change of the submodule
Conclusion
It is easy to make mistakes when we are working on multiple related projects at the same time. When we have to work under this situation, development mode and submodule provide an easy way to manage our projects. Using development mode and submodule maybe not straightforward in the beginning. But once we get familiar with using it, the combination of development mode and submodule not only prevents us from making mistakes, but also improves our productivity.