Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / DevOps / Git

Using Git Submodule and Develop Mode to Manage Python Projects

0.00/5 (No votes)
24 Dec 2019CPOL9 min read 7.6K  
How to use Git submodule and develop mode for managing Python projects

As a software engineer, it is not uncommon that we work on a project which has a dependency on the other project that we also work on at the same time. The scenario may be like the following:

We have two projects, and each of them has its Git repository:

  • A common library, say commonlib, used by many projects. The library is self-contained and has its test suite and document.
  • A project called myproj which has a dependency on commonlib.

While we are working on myproj, we may also need to update commonlib at the same time. If both commonlib and myproj happen to be Python projects, we can use setuptools’ development mode (development mode) and Git submodule (submodule) to make the work easier. This article demonstrates how to use development mode and submodule to deal with this situation. Hopefully, people who need to manage this type of case can find this article helpful.

commonlib and myproj are used as an example in the rest of article, and the example assumes the code runs in a virtual environment with the following condition:

  • Ubuntu 18.04
  • Python 3.7
  • Git 2.17

The Challenges

First of all, for Python project development, we usually set up a virtual environment first, and install all the dependencies into a virtual environment. Then, we start working on our project, i.e., myproj in this case. However, myproj needs commonlib, which we also work on at the same time. If we install commonlib on the normal way, e.g., pip install, we are not able to use Git to keep tracking our changes of commonlib. This is the issue that development mode comes to solve.

Second, commonlib is used by many projects, including myproj. On the one hand, during the development, myproj may need to stick with a specific version or branch of commonlib. On the other hand, other projects may need a different version of commonlib. In other to make sure that we use the correct branch or version of commonlib when we work on myproj, we can set the dependency as a Git submodule.

What is Development Mode?

Development mode allows a project to be both installed and editable.

Normally, we install a Python package from PyPi.

Bash
$ pip install <package_name>   

Or, we install it from a local package.

Bash
$ pip install <path_to_local_archive>   

Either way, the package will be installed onto our (virtual) environment. When we install a Python package into our virtual environment, the package will be copied to /virtual_environment/lib/python3.7/site-packages/, for example. If we want to install commonlib onto our virtual environment, we can do:

Bash
$ git clone https://github.com/shunsvineyard/commonlib.git   
$ pip install commonlib/   

After the installation, commonlib will be shown as an installed package in the site-packages folder. We can use ls command to check it. For example, the result may look like the following:

Bash
(demo_env) shunsvineyard@remote-ubuntu:~$ ls -l   demo_env/lib/python3.7/site-packages/   
total 40   
drwxrwxr-x  2 shunsvineyard   shunsvineyard 4096 Dec 23 05:00 __pycache__   
drwxrwxr-x  3 shunsvineyard   shunsvineyard 4096 Dec 23 05:01 commonlib   
drwxrwxr-x  2 shunsvineyard   shunsvineyard 4096 Dec 23 05:01 commonlib-0.0.1.egg-info   
-rw-rw-r--  1 shunsvineyard   shunsvineyard  126 Dec 23 05:00 easy_install.py   
drwxrwxr-x 11 shunsvineyard   shunsvineyard 4096 Dec 23 05:00 pip   
drwxrwxr-x  2 shunsvineyard   shunsvineyard 4096 Dec 23 05:00 pip-9.0.1.dist-info   
drwxrwxr-x  5 shunsvineyard   shunsvineyard 4096 Dec 23 05:00 pkg_resources   
drwxrwxr-x  2 shunsvineyard   shunsvineyard 4096 Dec 23 05:00 pkg_resources-0.0.0.dist-info   
drwxrwxr-x  6 shunsvineyard   shunsvineyard 4096 Dec 23 05:00 setuptools   
drwxrwxr-x  2 shunsvineyard   shunsvineyard 4096 Dec 23 05:00 setuptools-39.0.1.dist-info   

Development mode creates a link from the package to the virtual environment. With the development mode, a Python package can be installed in a way that allows us to edit the code after the installation. Therefore, when we make any change in the code, the change takes effect immediately in the virtual environment.

To install a Python package as development mode, use the command:

Bash
$ pip install -e <path to the package>   

Take the commonlib as an example, and the result may look like the following:

Bash
(demo_env) shunsvineyard@remote-ubuntu:~$ pip install -e commonlib/
Obtaining file:///home/shunsvineyard/commonlib
Installing collected packages: commonlib
  Running setup.py develop for commonlib
Successfully installed commonlib
(demo_env) shunsvineyard@remote-ubuntu:~$ ls -l demo_env/lib/python3.7/site-packages/
total 40
drwxrwxr-x  2 shunsvineyard shunsvineyard 4096 Dec 23 05:08 __pycache__
-rw-rw-r--  1 shunsvineyard shunsvineyard   31 Dec 23 05:09 commonlib.egg-link
-rw-rw-r--  1 shunsvineyard shunsvineyard   30 Dec 23 05:09 easy-install.pth
-rw-rw-r--  1 shunsvineyard shunsvineyard  126 Dec 23 05:08 easy_install.py
drwxrwxr-x 11 shunsvineyard shunsvineyard 4096 Dec 23 05:08 pip
drwxrwxr-x  2 shunsvineyard shunsvineyard 4096 Dec 23 05:08 pip-9.0.1.dist-info
drwxrwxr-x  5 shunsvineyard shunsvineyard 4096 Dec 23 05:08 pkg_resources
drwxrwxr-x  2 shunsvineyard shunsvineyard 4096 Dec 23 05:08 pkg_resources-0.0.0.dist-info
drwxrwxr-x  6 shunsvineyard shunsvineyard 4096 Dec 23 05:08 setuptools
drwxrwxr-x  2 shunsvineyard shunsvineyard 4096 Dec 23 05:08 setuptools-39.0.1.dist-info

If we open the file, commonlib.egg-link, we will see where it links to. For example:

Bash
(demo_env) shunsvineyard@remote-ubuntu:~$ 
cat demo_env/lib/python3.7/site-packages/commonlib.egg-link
/home/shunsvineyard/commonlib

Note that development mode only available for a local project or a VCS URL. If we try to install a package from PyPi as development mode, the following error message will show. Use numpy as an example:

Bash
$ pip install -e numpy
numpy should either be a path to a local project or 
a VCS url beginning with svn+, git+, hg+, or bzr+

What is Git Submodule?

A Git submodule is a Git repository inside another Git repository. It is like that one Git repository has reference to the other Git repository. For example, myproj has a dependency on commonlib. If commonlibis a Git submodule of myproj, the picture below illustrates their relationship.

A close up of a device

Description automatically generated

Git submodule allows us to keep a Git repository as a subdirectory of another Git repository. When we do git clone myproj, a specific version of commonlib defined in myprojsubmodule reference will be downloaded from commonlib repository. This way, we can clone another repository (i.e., commonlib) into our project (i.e., myproj) and keep the commits separate.

The following sections use commonlib and myproj as an example to demonstrate the setup and workflow of development mode and submodule. The following sections also assume we do everything from scratch, including setup the Git repositories.

Setup the Projects

Assume commonlib provides a very simple and only feature: greeting. The project layout and code look like the following:

Bash
commonlib/
├── LICENSE
├── README.rst
├── commonlib
│   ├── __init__.py
│   └── greeting.py
└── setup.py 

greeting.py

Python
def greeting(name: str):
    """Print a simple greeting with the name."""
    print(f"Howdy, {name}") 

setup.py

Python
import pathlib
import setuptools

# The directory containing this file
HERE = pathlib.Path(__file__).parent

# The text of the README file
README = (HERE / "README.rst").read_text()

# This call to setup() does all the work
setuptools.setup(
    name="commonlib",
    version="0.0.1",
    description="A simple Python package",
    long_description=README,
    long_description_content_type="text/x-rst",
    author="Author Name",
    author_email="author@email.com",
    license="MIT",
    classifiers=[
        "License :: OSI Approved :: MIT License",
        "Programming Language :: Python"
    ],
    packages=setuptools.find_packages(),
    python_requires=">=3.7"
)

(A complete example of commonlib can be found at https://github.com/shunsvineyard/commonlib)

Now, we are ready to set up the Git repositories for both commonlib and myproj. Before we do that, we need to set up a Git server. This example uses localhost (i.e., 127.0.0.1) as the Git server.

Bash
$ sudo useradd git
$ sudo passwd git
$ su git
$ cd ~
$ git init --bare commonlib
$ git init --bare myproj 

Setup Git Repository for commonlib

After we have a Git server, we can add the existing commonlib to the Git server. Go back to the local user.

Bash
user:~$ cd commonlib/
user:~/commonlib$ git init
user:~/commonlib$ git add –all
user:~/commonlib$ git commit -a -m "Initialize commonlib repository"
user:~/commonlib$ git remote add origin git@127.0.0.1:commonlib
user:~/commonlib $ git push -u origin master 

Setup Git Repository for myproj

For myproj, we can do a similar thing as commonlib. The project layout and code are like the following:

Bash
myproj/
├── LICENSE
├── README.rst
├── app.py
└── setup.py 

app.py

Python
from commonlib import greeting

def run():
    greeting.greeting("Git Submodule")

if __name__ == "__main__":
    run() 

setup.py

Python
import pathlib
import setuptools

# The directory containing this file
HERE = pathlib.Path(__file__).parent

# The text of the README file
README = (HERE / "README.rst").read_text()

# This call to setup() does all the work
setuptools.setup(
    name="myproj",
    version="0.0.1",
    description="A simple Python project",
    long_description=README,
    long_description_content_type="text/x-rst",
    url="https://github.com/shunsvineyard/myproj",
    author="Author Name",
    author_email="author@email.com",
    license="MIT",
    classifiers=[
        "License :: OSI Approved :: MIT License",
        "Programming Language :: Python"
    ],
    packages=setuptools.find_packages(),
    python_requires=">=3.7"
)

Then, add the existing code to the Git server.

Bash
user:~$ cd myproj/
user:~/myproj$ git init
user:~/myproj$ git add –all
user:~/myproj$ git commit -a -m "Initialize myprojrepository"
user:~/myproj$ git remote add origin git@127.0.0.1: myproj
user:~/myproj$ git push -u origin master 

Setup Git Submodule

Although Git submodule provides many features for all kinds of situations, the two use cases used the most are:

  1. adding a repository as a submodule, and
  2. update a submodule.

Add a Repository as a Submodule

Adding an existing repository as a submodule of another repository can be simply done by the following commands:

Bash
user:~$ cd myproj/
user:~/myproj$ git submodule add git@127.0.0.1:commonlib
user:~/myproj$ git submodule init
user:~/myproj$ git commit -a -m "Add commonlib as submodule"
user:~/myproj$ git push 

After we add a submodule, a submodule reference, i.e., a .gitmodules file, will be created. It may look like the following:

Bash
shunsvineyard@remote-ubuntu:~/workspace/myproj$ ls -al
total 40
drwxrwxr-x  4 shunsvineyard shunsvineyard 4096 Dec 20 07:20 .
drwxrwxr-x 10 shunsvineyard shunsvineyard 4096 Dec 20 06:47 ..
drwxrwxr-x  9 shunsvineyard shunsvineyard 4096 Dec 20 07:22 .git
-rw-rw-r--  1 shunsvineyard shunsvineyard 1233 Dec 20 06:44 .gitignore
-rw-rw-r--  1 shunsvineyard shunsvineyard   73 Dec 20 07:20 .gitmodules
-rw-rw-r--  1 shunsvineyard shunsvineyard 1067 Dec 20 06:44 LICENSE
-rw-rw-r--  1 shunsvineyard shunsvineyard  278 Dec 20 06:58 README.rst
-rw-rw-r--  1 shunsvineyard shunsvineyard  123 Dec 20 06:57 app.py
drwxrwxr-x  3 shunsvineyard shunsvineyard 4096 Dec 20 07:20 commonlib
-rw-rw-r--  1 shunsvineyard shunsvineyard  724 Dec 20 06:57 setup.py 

If we open the file, .gitmodules, we can see that it records the information of submodules.

Bash
$ cat .gitmodules
[submodule "commonlib"]
        path = commonlib
        url = git@127.0.0.1:commonlib 

Note: the url of the submodule in .gitmodules can be a relative path. For example, both commonlib and myproj are located at the same folder of the Git server. The url can be simplified to ../commonlib.

If we use Github to host our repositories, the submodule may look like below:

A screenshot of a social media postDescription automatically generated

(The example, myproj, can be found at https://github.com/shunsvineyard/myproj)

Update a Submodule

Usually, there are two cases that we may want to update a submodule:

  1. Update a submodule because of some code changes.
  2. Update a submodule to a newer or specific version.

Case 1: Update a submodule because of code changes

A submodule is just a Git repository inside another Git repository. When we make some code changes on a submodule, we do the same thing as we usually do on a regular Git repository.

For example, we add a new function called greeting2 into commonlib.

greeting.py
Python
def greeting2(name: str):
    """Print a simple greeting with the name."""
    print(f"How are you, {name}?") 

We do the same thing for the submodule as we do for a regular repository: commit the change and push the change.

Bash
user:~$ cd myproj/commonlib
user:~/myproj/commonlib$ git status
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   greeting.py

no changes added to commit (use "git add" and/or "git commit -a")

user:~/myproj/commonlib$ git commit -a -m "Added a new greeting function."
user:~/myproj/commonlib$ git push 

After we commit and push the change of the submodule, we can see the submodule reference of the main project, i.e., myproj, also changed, and then we can do the same thing to update the reference. Then, myproj will attach the newer commonlib.

Bash
user:~/myproj/commonlib$ cd ../
user:~/myproj$ git status
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   commonlib (new commits)

no changes added to commit (use "git add" and/or "git commit -a")

user:~/myproj$ git commit -a -m "Update submodule, commonlib"
user:~/myproj$ git push 

Case 2: Update a submodule to a newer or specific version

When someone else modified commonlib or add new features, we may want to update commonlib submodule to the newer version.

For example, someone adds a new function called greeting3 into commonlib.

greeting.py
Python
def greeting3():
    """Print a simple greeting with the name."""
    print("How's going?") 

And the commit hash is 7735cf8460acd03f92e7c0529486c86ec83b2c0e as shown below.

Bash
user2:~$ git clone git@127.0.0.1:commonlib
user2:~$ cd commonlib
user2:~/commonlib$ vim commonlib/greeting.py # add greeting3 function as the following
user2:~/commonlib$ git commit -a -m "Added greeting3 function."
user2:~/commonlib$ git push
user2:~/commonlib$ git log
commit 7735cf8460acd03f92e7c0529486c86ec83b2c0e 
       (HEAD -> master, origin/master, origin/HEAD)
Author: user2 <user2@email.com>
Date:   Sun Dec 22 00:27:09 2019 +0000

    Added greeting3 function. 

The way we update a submodule to a newer or specific version is to update the commit hash that the submodule points.

The Git submodule official document says, “Submodule repositories stay in a detached HEAD state pointing to a specific commit. Changing that commit simply involves checking out a different tag or commit then adding the change to the parent repository.

The following is an example to update the submodule to commit hash 7735cf8460acd03f92e7c0529486c86ec83b2c0e.

Bash
user:~/myproj$ cd commonlib
user:~/myproj/commonlib$ git pull
user:~/myproj/commonlib$ git checkout 7735cf8460acd03f92e7c0529486c86ec83b2c0e
Note: checking out '7735cf8460acd03f92e7c0529486c86ec83b2c0e'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 7735cf8 Added greeting3 function.
user:~/myproj/commonlib$ cd ..
user:~/myproj$ git status
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   commonlib (new commits)

no changes added to commit (use "git add" and/or "git commit -a")
user:~/myproj$ git commit -a -m "Update submodule, commonlib, to the newer one."
user:~/myproj$ git push 

Setup Development Mode with Git Submodule

Development mode is the ability provided by setuptools, so it is no different than writing a setup.py for packaging a Python project. However, when one Python project has another Python project as a submodule in it and we want to install the submodule as development mode, we need to add the submodule to the main project’s requirements.txt file. For example, the requirements.txt of myproj can be the following.

Bash
# Install commonlib as development mode
-e ./commonlib # Path to the submodule 

Therefore, when we install the dependencies of myproj, commonlib will be installed as develop mode automatically.

Workflow

The situation that we need to work on both the main project and its dependent project at the same time happens when we work on a big project which contains several smaller projects. In this case, we usually work with others as a team. The recommended workflow for this situation breaks down into two stages: setup stage and working stage.

Setup Stage

This stage prepares the code and working environment.

  1. Create a virtual environment
  2. Use --recurse-submodules to download the source code. --recurse-submodules will download all the submodules.
    Bash
    $ git clone --recurse-submodules <URL_to_the_repository>
  3. Checkout the branch. Usually, when we work on a feature or fix a bug, we will create a branch for the work. We should avoid working to the master (or develop) branch directly. More info about this can be found at https://guides.github.com/introduction/flow/
    Bash
    $ git checkout <branch_name>
  4. Install the dependencies onto the virtual environment.
    Bash
    $ pip install -r requirements.txt 

Working Stage

This stage indicates the time that we are working on our issue. Besides the code change, there are two cases we need to modify submodules.

Case 1: If we need to make some code change of a submodule:
  1. Create a branch of this change and create a Pull-Request (PR) for the submodule code change.
  2. After the PR gets approved and the branch is merged, update the submodule to the commit that the PR just merged.
Case 2: Someone updates a repository which is our submodule, and we want to update the submodule to the newer commit:
  1. Use git pull on the submodule folder to get the change.
  2. Update the commit hash of the submodule to the one we want.
  3. cd to the main project and commit the change of the submodule

Conclusion

It is easy to make mistakes when we are working on multiple related projects at the same time. When we have to work under this situation, development mode and submodule provide an easy way to manage our projects. Using development mode and submodule maybe not straightforward in the beginning. But once we get familiar with using it, the combination of development mode and submodule not only prevents us from making mistakes, but also improves our productivity.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)