Python is a general purpose programming language that needs no presentations. It was originally written by Guido Van Rossum, and saw its first release in the year 1991. At the moment of writing, the latest stable version of the language is 3.10
. In this tutorial we see how to use it together with the openpyxl library to manipulate Excel spreadsheets.
In this tutorial you will learn:
- How to create a workbook in memory
- How to retrieve,create,copy,move and remove sheets from a workbook
- How to create a workbook from a file
- How to access a range of cells
- How to iterate over rows and columns

Software requirements and conventions used
Category | Requirements, Conventions or Software Version Used |
---|---|
System | Distribution independent |
Software | Python and the openpyxl library |
Other | None |
Conventions | # – requires given linux-commands to be executed with root privileges either directly as a root user or by use of sudo command$ – requires given linux-commands to be executed as a regular non-privileged user |
Installing the openpyxl library
To obtain the openpyxl library on our operating system, we can use two methods: the fist consists into installing the package available in the repository of our favorite distribution using its native package manager, the second, universal method, is to make use of pip, the python package manager. Let’s explore both of them.
The openpyxl package is available in the default repositories of some of the most used Linux distributions, such as Debian (and its derivatives), Fedora, and Archlinux. To install the package in the mentioned distributions, we can run, the following commands, respectively:
# Install openpyxl on Debian and derivatives $ sudo apt install python3-openpyxl # Install openpyxl on Fedora $ sudo dnf install python3-openpyxl # Install openpyxl on Archlinux $ sudo pacman -S python-openpyxl
The commands above are distribution-specific. If we want to use a cross-distribution method to install openpyxl (or any other python library), we can use pip
, the python package manager (pip itself should be installed on our system, of course):
$ pip install openpyxl --user
You can notice that we launched pip without administrative privileges, and with the --user
option to install the package only for our user. This is the recommended way to use the package manager. Once the library is installed on our system, we can start working.
Creating a simple spreadsheet in memory
Let’s start easy. To create a workbook all we have to do is to import and work with the Workbook
class, which represents a container for all the other parts of a document. When we create an instance of the Workbook
class, a new spreadsheet is also created by default. We can access it via the active
property:
from openpyxl import Workbook
workbook = Workbook()
spreadsheet = workbook.active
When a new spreadsheet is created it contains no cells. They are created on the fly, so its better to access them directly in order to avoid wasting precious memory. We can reference a cell of the spreadsheet just like a dictionary key. For example, to get the value of the ‘A1’ cell, we would write:
a1_value = spreadsheet['A1']
Similarly, to assign a value to the same cell we would write:
spreadsheet['A1'] = 'Hello World'
An alternative way to access the spreadsheet cells is to use the cell()
method of the Worksheet
object, and pass the row/column coordinates as arguments:
# Get the value of the cell
a1_value = spreadsheet.cell(row=1, column=1)
# Populate the cell
spreadsheet.cell(row=1, column=1, value='Hello World')
To save the worksheet we created and manipulated, all we have to do is to use the save
method of the Workbook
object, and pass the name of the destination file as argument. For example, to save the worksheet as worksheet.xlsx
, we would run:
workbook.save('worksheet.xlsx')

As soon as we invoke this method, a file with the specified name will be created on our filesystem. Here is its content (in this case I opened with Libreoffice calc):
Adding a sheet to a workbook
In the previous example we saw how to access the active spreadsheet of a workbook. As we know, however, a workbook can contain multiple worksheets, so what if we want to create a new one? We can do it via the create_sheet
method of the Workbook
object:
new_sheet = workbook.create_sheet('new')
The create_sheet
method accepts two optional arguments: title
and index
. We can use the former (it should be a string) to assign a name to the new sheet, and the latter (int) to specify in what position the sheet should be inserted. The method creates and returns the new sheet. In the example above we created a new sheet using “new” as title. The title can be used to later retrieve the spreadsheet:
sheet = workbook['new']
Copying and moving sheets
To copy an existing sheet we can use the copy_worksheet
method and pass the worksheet that should be copied as argument. To copy the active worksheet, for example, we would write:
sheet_copy = workbook.copy_worksheet(workbook.active)
The method returns the created copy of the sheet, which in this case we referenced via the sheet_copy
variable.
To move an existing sheet into a determined position in the workbook, instead, we can use the move_sheet
method which accepts two arguments. The first one, sheet
, is mandatory, and is the worksheet that we want to move around, the second is optional (it defaults to 0
), and is the offset to use to specify the sheet position. Let’s see an example. In this case the default worksheet “Sheet” is the first one in the workbook. To move it in the second position, we would write:
workbook.move_sheet(workbook["Sheet"], 1)
We can obtain a list of all the sheets belonging to a workbook via the worksheets
property.
Removing a sheet
To remove a sheet from a workbook we use the remove
method of the Workbook
class. The method accepts one mandatory argument, which is the object representing the sheet we want to remove. Supposing we want to remove the “new” sheet from our workbook, we would write:
workbook.remove(workbook['new'])
Creating a workbook from file
Reading an existing excel spreadsheet file is quite simple with openpyxl. All we have to do is to load the load_workbook
function from the library. This function only mandatory parameter is filename
, which must be the path of the file we want to open. Supposing this file is called spreadsheet.xlsx
, we would write:
from openpyxl import load_workbook
workbook = load_workbook('spreadsheet.xlsx')
The method accepts also some optional parameters which are useful to modify how the file is handled:
Parameter | Explanation | Default |
---|---|---|
read_only | Open the file in reading-optimized mode. It can’t be edited | False |
keep_vba | Whether to preserve vba content | False |
data_only | Whether to preserve formula in cells or report just the value contained in it | False |
keep_links | Whether links to external workbooks should be preserved | True |
Once the we load the spreadsheet file we can access the spreadsheet(s) via the instance of the Workbook
class returned by load_workbook
.
Accessing multiple cells
What if we want to get the value of a range of cells, instead of the value of a single one? All we have to do is to specify the range with the following syntax:
cell_values = spreadsheet['A1':'D1']
The result of specifying a range will be a tuple containing a tuple for each row that is specified. In the example above, there is only one row, since we specified cells from A1
to D1
(they are indeed on the same row), so that would be the result:
((<Cell 'Sheet'.A1>, <Cell 'Sheet'.B1>, <Cell 'Sheet'.C1>, <Cell 'Sheet'.D1>),)
If we wanted to get the object representing the cells of the columns ‘A’ to ‘D’ of the first 3 rows, instead, we would write:
cell_values = spreadsheet['A1':'D3']
We would obtain the following result:
(
(<Cell 'Sheet'.A1>, <Cell 'Sheet'.B1>, <Cell 'Sheet'.C1>, <Cell 'Sheet'.D1>),
(<Cell 'Sheet'.A2>, <Cell 'Sheet'.B2>, <Cell 'Sheet'.C2>, <Cell 'Sheet'.D2>),
(<Cell 'Sheet'.A3>, <Cell 'Sheet'.B3>, <Cell 'Sheet'.C3>, <Cell 'Sheet'.D3>)
)
The tuple this time contains three tuples, one for each row, as we said before. To access all the cells of a column, instead we would just specify the column name, without any row number. For example, to get all the cells of the ‘A’ column we would write:
cells = spreadsheet['A']
To obtain all the cells of the columns A
to D
, instead, we would write:
cells = spreadsheet['A':'D']
Similarly, we can access whole rows by specifying a range of their numbers. To obtain all the cells of the first two rows, we would write:
cells = spreadsheet[1:3]
Iterate over rows and columns Using the iter_rows() and iter_cols() methods
Instead of specifying a range to access the value of a series of cells we can use the iter_rows()
and iter_cols()
methods of the spreadsheet. Both methods accepts the same optional arguments:
Parameter | Explanation |
---|---|
min_row | The smallest row index |
max_row | The largest row index |
min_col | The smallest column index |
max_col | The largest column index |
values_only | Whether only cell values should be returned |
In both methods with the min_row
/max_row
and min_col
/max_col
parameters we specify the range of rows and columns on which the iteration should take place. The difference between the two is that iter_rows()
returns the cells organized by rows, where iter_cols()
, instead returns them organized by columns. Let’s see some practical examples. Suppose we want to iterate over the first three rows from the first to the fifth column and want to obtain the cells organized by row. Here is what we would run:
for i in spreadsheet.iter_rows(min_row=1, max_row=3, min_col=1, max_col=5):
for cell in i:
print(cell)
Executing the code above returns the following result:
<Cell 'Sheet'.A1>
<Cell 'Sheet'.B1>
<Cell 'Sheet'.C1>
<Cell 'Sheet'.D1>
<Cell 'Sheet'.E1>
<Cell 'Sheet'.A2>
<Cell 'Sheet'.B2>
<Cell 'Sheet'.C2>
<Cell 'Sheet'.D2>
<Cell 'Sheet'.E2>
<Cell 'Sheet'.A3>
<Cell 'Sheet'.B3>
<Cell 'Sheet'.C3>
<Cell 'Sheet'.D3>
<Cell 'Sheet'.E3>
As you can see, the cells are organized by row. To obtain the same cells, this time organized by columns, we would use pass the same arguments to the iter_cols()
method:
for i in spreadsheet.iter_rows(min_row=1, max_row=3, min_col=1, max_col=5):
for cell in i:
print(i)
The same cells are returned; this time, organized in columns:
<Cell 'Sheet'.A1>
<Cell 'Sheet'.A2>
<Cell 'Sheet'.A3>
<Cell 'Sheet'.B1>
<Cell 'Sheet'.B2>
<Cell 'Sheet'.B3>
<Cell 'Sheet'.C1>
<Cell 'Sheet'.C2>
<Cell 'Sheet'.C3>
<Cell 'Sheet'.D1>
<Cell 'Sheet'.D2>
<Cell 'Sheet'.D3>
<Cell 'Sheet'.E1>
<Cell 'Sheet'.E2>
<Cell 'Sheet'.E3>
Conclusions
In this tutorial we learned how to work with Excel spreadsheet files using the Python openpyxl library. We saw how to create a workbook in memory or from a file, how to retrieve, create, copy, move and remove sheets, how to access a cell and a range of cells, and, finally how to iterate over rows and columns. Are you interested in more tutorials like this? Take a look at our Python articles!