Neither Python nor Git need presentations: the former is one of the most used general-purpose programming language; the latter is probably the most used version control system in the world, created by Linus Torvalds himself. Normally, we interact with git repositories using the git binary; when we need to work with them using Python, instead, we can use the GitPython library.
In this tutorial we see how to manage repositories and implement a basic git workflow using the GitPython library.
In this tutorial you will learn:
- How to install the GitPython library
- How to manage git repositories with the GitPython library
- How to add a remote to a repository
- How to clone a git repository
- How to create and push commits
- How to work with branches
- How to manage submodules
Software requirements and conventions used
|Category||Requirements, Conventions or Software Version Used|
|Software||Python and the GitPython library|
|Conventions||# – requires given linux-commands to be executed with root privileges either directly as a root user or by use of
$ – requires given linux-commands to be executed as a regular non-privileged user
Installing the GitPyhon library
The GitPython library can be installed either by using our favorite distribution package manager or by using
pip, the Python package manager. The first method is distribution-specific, the latter can be used on every distribution where pip is installed.
To install the software natively on recent versions of Fedora, we can run the following command:
$ sudo dnf install python3-GitPython
On Debian, and Debian-based distribution, the package is called “python3-git” and can be installed via apt:
$ sudo apt install python3-git
GitPython is available also in the Archlinux “Community” repository. We can install the package via
$ sudo pacman -Sy python-gitpython
The universal method to install GitPython is by using pip. We do it by launching the following command:
$ pip install GitPython --user
Notice that since we used the
--user option in the command above, the package will be installed only for the user we launched the command as. For this reason, we don’t need to use privilege escalation.
Now that we installed the GitPython library, let’s see how to use it.
Creating a local git repository
Let’s see how can we perform our first steps with GitPython. The first thing we may want to learn is how to create a local repository. When working with the git binary, the command we use to initialize a local repository is
git init. When using the GitPython library, we need to use the following code, instead:
from git.repo import Repo repository = Repo.init('/path/of/repository')
In the code snippet above, the first thing we did is to import the
Repoclass from the git module. This class is used to represent a git repository. We than called the init method associated with. This method is a “class method”, this means that we can call it without creating an instance of the class beforehand; it takes the path where the repository should be initialized as first argument and returns an instance of the Repo class.
What if we want to create a bare repository? All we have to do is to set the “bare” argument of the
init method to True. Our code becomes:
repository = Repo.init('/path/of/repository', bare=True)
Adding a remote to our repository
Once we created our repository we want to add a remote counterpart to it. Suppose for example we create a repository on Github to host our project; to add it as a remote called “origin”, we need to use the
create_remote method on the repository object:
# Add https://github.com/username/projectname as a remote to our repository repository.create_remote('origin', 'https://github.com/foo/test.git')
We passed the name that should be used for the remote as the first argument of the method), and the URL of the remote repository as the second one. The
create_remote method returns an instance of the
Remote class, which is used to represent a remote.
Adding files to the repository index and creating our first commit
Now, suppose we created an “index.html” file inside our repository containing the following code:
<h2>This is an index file</h2>
The file although exists in the repository, is not tracked yet. To get a list of the file which are not tracked in our repository we can reference the
untracked_files property (this is indeed a method which uses the
In this case the list returned is:
How to check if our repository contains changes? We can use the
is_dirtymethod. This method returns
Trueif the repository is considered dirty,
Falseotherwise. By default a repository is considered dirty if changes exists to its index: the existence of untracked files does not influence this by default. If untracked files exist, the repository is not considered “dirty”, unless we set the
repository.is_dirty(untracked_files=True) # This returns true in this case
To add the i
ndex.html file to the index of our repository we need to use the following code:
In the code above, index (this again is
@property method) returns an instance of the
IndexFile class, which is used to represent the repository index. We call the the add method of this object to add the file to the index. The method accepts a list as first argument, therefore we can add multiple files at once.
Once we added the needed files to our index, we want to create a commit. To perform such action we call the
commit method of the index object, and pass the commit message as argument:
commit = repository.index.commit("This is our first commit")
The commit method returns an instance of the Commit class, which is used to represent a commit in the library. Above we used the commit variable to reference this object.
Pushing and pulling changes to and from the remote
We created our first commit with GitPython, now we want to push the commit to the remote we added in the first step of this tutorial. Performing such actions is really easy. First of all we must say that all the remotes associate to our repository can be accessed via the remotes method of the Repo class:
As we know, each remote is represented by a Remote object. In our example we want to push our commit to the remote we called “origin”, so all we have to do is to call the push method on it:
What we did above is calling the push method and pass a mapping between the local branch and the remote one as first argument: we basically sad to push the content of our master branch to the remote master branch. Since we specified an http url when we created the “origin” remote, once the code is executed we are prompted to provide our credentials:
Username for 'https://github.com': foo Password for 'https://firstname.lastname@example.org':
Notice that if we use an https URL for the remote repository and we have the two-factor authentication set on Github, we will not be able to push to it. To avoid having to provide credentials, we can setup ssh keys and use an ssh URL. To change the URL of the “origin” remote, we need to use the
If we have ssh keys set on the remote (github in this case), we will not be prompted to provide password or username (unless our private key is password-protected), so the process will become completely automatic.
The push method does return an instance of the
PushInfo object, which is used to represent a push.
To avoid having to specify the map between the local and upstream branch when we push a commit, we can perform the push directly via the git binary using the
Git class. The class can be referenced via the git property of the repository object. What we have to do is to pass the
--set-upstream, so we write:
repository.git.push('--set-upstream', 'origin', 'master)
The next time we perform a pthe basics ofush, we could simply use:
To pull commits from a repository, in similar fashion, we use the
pull method instead (again, in this case, the refspec is not needed since before we used
Working with branches
In a git repository, branches can be used to develop new features or fix bugs without touching the master, which is itself the main branch where code should always remain stable.
Creating a branch
When using GitPython, to create a new branch in our repository (suppose we want to call it “newfeature”) we would run the following code
new_branch = repository.create_head('newfeature')
With the code above the new branch will be generated from the current HEAD of the repository. In case we want a branch to be created from a specific commit, instead, we need to pass its hashsum as the second argument to the method. For example:
Switching to a branch
Switching to a new branch involves changing the HEAD of our repository so that it points to it, and synchronize the index and working tree. To switch to the ‘new_branch’ we just created, we use the following code:
# Get a reference to the current active branch to easily switch back to it later original_branch = repository.active_branch repository.head.reference = new_branch repository.head.reset(index=True, working_tree=True)
Deleting a branch
To delete a branch we use the
delete_head method on an instance of the
Repo class. In our case, to delete the ‘newfeature’ branch, we would run:
Working with submodules
Submodules are used to incorporate code from other git repositories.
Adding a submodule
Suppose we want to add a submodule to incorporate code that is found in the ‘https://github.com/foo/useful-code.git’ repository, in the
usefulcode_dir directory in the root of our own project (a directory is automatically created if it doesn’t exist). Here is the code we would write:
repository.create_submodule('usefulcode', 'usefulcode_dir', 'https://github.com/foo/usefulcode')
Where, in the example above, the first argument passed to the
create_submodule method is the name to be used for the submodule, the second is the submodule path relative to the root of our project, and the last one, is the URL of the external repository we want to use as a submodule.
To the complete list of all submodules associated to our repository we can use
repository.submodules; alternatively we can iterate over the instances yielded by the use
for submodule in repository.iter_submodules(): print(submodule.url)
One important thing to notice is that
repository.submodulesreturns the list of the submodules associated to our repository directly, while
iter_submoduleswill let us iterate over submodules recursively (the repository we added as a submodule could have submodules associated to it, too).
Removing a submodule
To remove a submodule from our repository we have to call the
remove method from the Submodule object used to represent it. We can retrieve the submodule we want to delete, by its name, passing it as argument to the
submodule method (“usefulcode” in this case):
submodule = repository.submodule("usefulcode") submodule.remove(module=True, force=True)
The code above:
- Removes the submodule entry from the .gitmodules file
- Removes the submodule entry from the .git/config file
- Forces the removal of the module even if it contains modifications (due t
force=True; this may or may not be something you want)
Cloning a repository
Until now we saw how to manage a local repository with the GitPython library; now, let’s see how to clone a repository. To clone a repository we have to use the
clone_from method of the
Repo class. The method takes the URL of the repository to be cloned as first argument, and the local filesystem path where it should be cloned, as second:
repository = Repo.clone_from('https://github.com/user/test.git', 'test')
In this tutorial we learned how to start working with git repositories using Python and the GitPython library. We saw how to clone or initialize a repository, how to add remotes, how to create commits and how to push and pull to and from the remote. We also saw how to check if a repository has changes and how to manage its submodules. Here we just scratched the surface of the GitPython API: to know more about it, please take a look at the official documentation.