How to manage git repositories with Python

Neither Python nor Git need presentations: the former is one of the most used general-purpose programming language; the latter is probably the most used version control system in the world, created by Linus Torvalds himself. Normally, we interact with git repositories using the git binary; when we need to work with them using Python, instead, we can use the GitPython library.

In this tutorial we see how to manage repositories and implement a basic git workflow using the GitPython library.

In this tutorial you will learn:

  • How to install the GitPython library
  • How to manage git repositories with the GitPython library
  • How to add a remote to a repository
  • How to clone a git repository
  • How to create and push commits
  • How to work with branches
  • How to manage submodules
How to manage git repositories with Python
How to manage git repositories with Python

 Software requirements and conventions used

Category Requirements, Conventions or Software Version Used
System Distribution-independent
Software Python and the GitPython library
Other None
Conventions # – requires given linux-commands to be executed with root privileges either directly as a root user or by use of sudo command
$ – requires given linux-commands to be executed as a regular non-privileged user

Installing the GitPyhon library

The GitPython library can be installed either by using our favorite distribution package manager or by using pip, the Python package manager. The first method is distribution-specific, the latter can be used on every distribution where pip is installed.

To install the software natively on recent versions of Fedora, we can run the following command:

$ sudo dnf install python3-GitPython



On Debian, and Debian-based distribution, the package is called “python3-git” and can be installed via apt:

$ sudo apt install python3-git

GitPython is available also in the Archlinux “Community” repository. We can install the package via pacman:

$ sudo pacman -Sy python-gitpython

The universal method to install GitPython is by using pip. We do it by launching the following command:

$ pip install GitPython --user

Notice that since we used the --user option in the command above, the package will be installed only for the user we launched the command as. For this reason, we don’t need to use privilege escalation.

Now that we installed the GitPython library, let’s see how to use it.

Creating a local git repository

Let’s see how can we perform our first steps with GitPython. The first thing we may want to learn is how to create a local repository. When working with the git binary, the command we use to initialize a local repository is git init. When using the GitPython library, we need to use the following code, instead:

from git.repo import Repo
repository = Repo.init('/path/of/repository')



In the code snippet above, the first thing we did is to import the Repo class from the git module. This class is used to represent a git repository. We than called the init method associated with. This method is a “class method”, this means that we can call it without creating an instance of the class beforehand; it takes the path where the repository should be initialized as first argument and returns an instance of the Repo class.

What if we want to create a bare repository? All we have to do is to set the “bare” argument of the init method to True. Our code becomes:

repository = Repo.init('/path/of/repository', bare=True)

Adding a remote to our repository

Once we created our repository we want to add a remote counterpart to it. Suppose for example we create a repository on Github to host our project; to add it as a remote called “origin”, we need to use the create_remote method on the repository object:

# Add https://github.com/username/projectname as a remote to our repository
repository.create_remote('origin', 'https://github.com/foo/test.git')

We passed the name that should be used for the remote as the first argument of the method), and the URL of the remote repository as the second one. The create_remote method returns an instance of the Remote class, which is used to represent a remote.

Adding files to the repository index and creating our first commit

Now, suppose we created an “index.html” file inside our repository containing the following code:

<h2>This is an index file</h2>

The file although exists in the repository, is not tracked yet. To get a list of the file which are not tracked in our repository we can reference the untracked_files property (this is indeed a method which uses the @property decorator)”:

repository.untracked_files

In this case the list returned is:

['index.html']



How to check if our repository contains changes? We can use the is_dirty method. This method returns True if the repository is considered dirty, False otherwise. By default a repository is considered dirty if changes exists to its index: the existence of untracked files does not influence this by default. If untracked files exist, the repository is not considered “dirty”, unless we set the untracked_files argument to True:

repository.is_dirty(untracked_files=True) # This returns true in this case

To add the index.html file to the index of our repository we need to use the following code:

repository.index.add(['index.html'])

In the code above, index (this again is @property method) returns an instance of the IndexFile class, which is used to represent the repository index. We call the the add method of this object to add the file to the index. The method accepts a list as first argument, therefore we can add multiple files at once.

Once we added the needed files to our index, we want to create a commit. To perform such action we call the commit method of the index object, and pass the commit message as argument:

commit = repository.index.commit("This is our first commit")

The commit method returns an instance of the Commit class, which is used to represent a commit in the library. Above we used the commit variable to reference this object.

Pushing and pulling changes to and from the remote

We created our first commit with GitPython, now we want to push the commit to the remote we added in the first step of this tutorial. Performing such actions is really easy. First of all we must say that all the remotes associate to our repository can be accessed via the remotes method of the Repo class:

repository.remotes

As we know, each remote is represented by a Remote object. In our example we want to push our commit to the remote we called “origin”, so all we have to do is to call the push method on it:

repository.remotes.origin.push('master:master')

What we did above is calling the push method and pass a mapping between the local branch and the remote one as first argument: we basically sad to push the content of our master branch to the remote master branch. Since we specified an http url when we created the “origin” remote, once the code is executed we are prompted to provide our credentials:

Username for 'https://github.com': foo
Password for 'https://foo@github.com': 



Notice that if we use an https URL for the remote repository and we have the two-factor authentication set on Github, we will not be able to push to it. To avoid having to provide credentials, we can setup ssh keys and use an ssh URL. To change the URL of the “origin” remote, we need to use the set_url method:

repository.remotes.origin.set_url('git@github.com:/foo/test.git')

If we have ssh keys set on the remote (github in this case), we will not be prompted to provide password or username (unless our private key is password-protected), so the process will become completely automatic.

The push method does return an instance of the PushInfo object, which is used to represent a push.

To avoid having to specify the map between the local and upstream branch when we push a commit, we can perform the push directly via the git binary using the Git class. The class can be referenced via the git property of the repository object. What we have to do is to pass the --set-upstream, so we write:

repository.git.push('--set-upstream', 'origin', 'master)

The next time we perform a pthe basics ofush, we could simply use:

repository.remote.origin.push()

To pull commits from a repository, in similar fashion, we use the pull method instead (again, in this case, the refspec is not needed since before we used --set-upstream):

repository.remote.origin.pull()

Working with branches

In a git repository, branches can be used to develop new features or fix bugs without touching the master, which is itself the main branch where code should always remain stable.

Creating a branch

When using GitPython, to create a new branch in our repository (suppose we want to call it “newfeature”) we would run the following code

new_branch = repository.create_head('newfeature')



With the code above the new branch will be generated from the current HEAD of the repository. In case we want a branch to be created from a specific commit, instead, we need to pass its hashsum as the second argument to the method. For example:

repository.create_head('newfeature', "f714abe02ebf4dab3030bdf788dcc0f5edacccbc")

Switching to a branch

Switching to a new branch involves changing the HEAD of our repository so that it points to it, and synchronize the index and working tree. To switch to the ‘new_branch’ we just created, we use the following code:

# Get a reference to the current active branch to easily switch back to it later
original_branch = repository.active_branch
repository.head.reference = new_branch
repository.head.reset(index=True, working_tree=True)

Deleting a branch

To delete a branch we use the delete_head method on an instance of the Repo class. In our case, to delete the ‘newfeature’ branch, we would run:

repository.delete_head('newfeature')

Working with submodules

Submodules are used to incorporate code from other git repositories.

Adding a submodule

Suppose we want to add a submodule to incorporate code that is found in the ‘https://github.com/foo/useful-code.git’ repository, in the usefulcode_dir directory in the root of our own project (a directory is automatically created if it doesn’t exist). Here is the code we would write:

repository.create_submodule('usefulcode', 'usefulcode_dir', 'https://github.com/foo/usefulcode')

Where, in the example above, the first argument passed to the create_submodule method is the name to be used for the submodule, the second is the submodule path relative to the root of our project, and the last one, is the URL of the external repository we want to use as a submodule.

Listing submodules

To the complete list of all submodules associated to our repository we can use repository.submodules; alternatively we can iterate over the instances yielded by the use iter_submodules method:

for submodule in repository.iter_submodules():
    print(submodule.url)



One important thing to notice is that repository.submodules returns the list of the submodules associated to our repository directly, while iter_submodules will let us iterate over submodules recursively (the repository we added as a submodule could have submodules associated to it, too).

Removing a submodule

To remove a submodule from our repository we have to call the remove method from the Submodule object used to represent it. We can retrieve the submodule we want to delete, by its name, passing it as argument to the submodule method (“usefulcode” in this case):

submodule = repository.submodule("usefulcode")
submodule.remove(module=True, force=True)

The code above:

  • Removes the submodule entry from the .gitmodules file
  • Removes the submodule entry from the .git/config file
  • Forces the removal of the module even if it contains modifications (due t force=True; this may or may not be something you want)

Cloning a repository

Until now we saw how to manage a local repository with the GitPython library; now, let’s see how to clone a repository. To clone a repository we have to use the clone_from method of the Repo class. The method takes the URL of the repository to be cloned as first argument, and the local filesystem path where it should be cloned, as second:

repository = Repo.clone_from('https://github.com/user/test.git', 'test')

Conclusions

In this tutorial we learned how to start working with git repositories using Python and the GitPython library. We saw how to clone or initialize a repository, how to add remotes, how to create commits and how to push and pull to and from the remote. We also saw how to check if a repository has changes and how to manage its submodules. Here we just scratched the surface of the GitPython API: to know more about it, please take a look at the official documentation.



Comments and Discussions
Linux Forum