CSV is the acronym of “Comma Separated Values”. A csv file is a just plain text document used to represent and exchange tabular data. Each row in a csv file represents an “entity”, and each column represents an attribute of it. Columns are usually separated by a comma but other characters can be used as field separator instead of it. In this tutorial we will see how to read and create csv files using Python and specifically the csv module, which is part of the
language standard library.
In this tutorial you will learn:
- How to read csv rows as a list of strings
- How to read a csv as a list of dictionaries
- How to create a csv using Python
- How to create a csv starting from a list of dictionaries
Software requirements and conventions used
Category | Requirements, Conventions or Software Version Used |
---|---|
System | Distribution independent |
Software | Python3 |
Other | Basic knowledge of Python and Object Oriented Programming |
Conventions | # – requires given linux-commands to be executed with root privileges either directly as a root user or by use of sudo command$ – requires given linux-commands to be executed as a regular non-privileged user |
CSV – Comma Separated Value
As we already mentioned in the introduction of this tutorial, a csv is just a simple plain text file, formatted in a way which let us represent and exchange tabular data. Each row in a csv file represents an entity of some kind, except the
first row which usually contains the field titles. Let’s see an example. Suppose we want to represent characters from the Lord Of The Rings book in csv format:
Name,Race Frodo,hobbit Aragorn,man Legolas,elf Gimli,dwarf
The one above is a trivial example of the content of a csv file. As you can see we used the ,
(comma) as field separator. We save that data in a file called lotr.csv
. Let’s see how we can read it using the Python programming
language, and the csv
module.
Reading a csv file
To interact with a csv file with Python, the first thing we have to do is to import the csv
module. Let’s write a simple script, just few lines of code:
#!/usr/bin/env python3 import csv if __name__ == '__main__': with open('lotr.csv', newline='') as csvfile: reader = csv.reader(csvfile) for row in reader: print(row)
In this example we suppose that the script we created above (let’s call it script.py
) is in the same directory of the csv file, and said directory is our current working one.
The first thing we did was to import the csv
module; then we opened the file in read mode (the default) with a context manager, so that we are sure that the file object is always closed whenever the interpreters exists the with
block, even if some kind of error occurs. You can also notice that we used the newline
argument of the open
function to specify an empty string as the newline character. This is a security measure, since, as stated in the csv
module
documentation:
If newline=’‘ is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n line endings on write an extra \r will be added. It should always be safe to specify newline=’‘, since the csv module does its own (universal) newline handling.
The csvfile
object represents our opened file: we pass it as argument to the csv.reader
function which returns a reader object we reference via the csv_reader
variable. We use this object to iterate through each line of the file, which is returned as a list of strings. In this case we just print them. If we execute the script we obtain the following result:
$ ./script.py ['Name', 'Race'] ['Frodo', 'hobbit'] ['Aragorn', 'man'] ['Legolas', 'elf'] ['Gimli', 'dwarf']
That was pretty easy, wasn’t it? What if a character other than the comma is used as a field separator? In that case we could use delimiter
parameter of the function, and specify the character which should be used. Let’s say said character is |
. We would write:
csv_reader = csv.reader(csvfile, delimiter="|")
Read the csv fields in a dictionary
The one we used above is probably the easiest way we can use to read a csv file with python. The csv
modules defines also the DictReader
class, which let us map each row in a csv file to a dictionary, where the keys are the field names and the values are the their actual content in a row. Let’s see an example. Here is how we modify our script:
#!/usr/bin/env python3
import csv
if __name__ == '__main__':
with open('lotr.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row)
The DictReader
class constructor mandatory first argument is the file object created when we opened the file. If we launch the script, this time we obtain the following result:
{'Name': 'Frodo', ' Race': ' hobbit'} {'Name': 'Aragorn', ' Race': ' man'} {'Name': 'Legolas', ' Race': ' elf'} {'Name': 'Gimli', ' Race': ' dwarf'}
As already said, the fields contained in the first row, are used as the dictionary keys; but what if the first row of the file doesn’t contain the field names? In that case we can specify them by using the fieldnames
parameter of the DictReader
class constructor:
reader = csv.DictReader(csvfile, fieldnames=['Name', 'Race])
Create a csv file
Until now we just saw how to read data from a csv file, both as a list of strings each representing a row, and as a dictionary. Now let’s see how to create csv file. As always we just start with an example, and than we explain it. Imagine we want to programmatically create the csv file we created manually before. Here is the code we would write:
#!/usr/bin/env python3 import csv if __name__ == '__main__': with open('lotr.csv', 'w', newline='') as csvfile: writer = csv.writer(csvfile) for row in (('Name', 'Race'), ('Frodo', 'hobbit'), ('Aragorn', 'man'), ('Legoals', 'elf'), ('Gimli', 'dwarf')): writer.writerow(row)
The first thing you should notice is that this time we opened the lotr.csv
file in write mode (w
). In this mode a file is created if it doesn’t exist, and is truncated otherwise (check our article about performing input/output operations on files with Python if you want to know more about this subject).
Instead of a reader object, this time we created a writer one, using the writer
function provided in the csv
module. The parameters this function accepts are very similar to those accepted by the reader
one. We could, for example, specify an alternative delimiter using the parameter with the same name.
Since in this case we already know all the csv rows beforehand, we can avoid using a loop, and write all of them at once using the writerows
method of the writer object:
#!/usr/bin/env python3 import csv if __name__ == '__main__': with open('lotr.csv', 'w', newline='') as csvfile: writer = csv.writer(csvfile) writer.writerows((('Name', 'Race'), ('Frodo', 'hobbit'), ('Aragorn', 'man'), ('Legolas', 'elf'), ('Gimli', 'dwarf')))
Create a csv file with the DictWriter object
The csv
module provides a DictWriter
class, which let us map a dictionary to a csv row. This can be very useful when the data we are working on comes this way and want to represent it in tabular form. Let’s see an example.
Suppose our LOTR characters data is represented as a list of dictionaries (perhaps as it would be returned from an API call made with the requests
module). Here is what we could write to create a csv based on it:
#!/usr/bin/env python3
import csv
characters_data = [
{
'Name': 'Frodo',
'Race': 'hobbit'
},
{
'Name': 'Aragorn',
'Race': 'man'
},
{
'Name': 'Legolas',
'Race': 'elf'
},
{
'Name': 'Gimli',
'Race': 'dwarf'
}
]
if __name__ == '__main__':
with open('lotr.csv', 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=('Name', 'Race'))
writer.writeheader()
writer.writerows(characters_data)
Let’s see what we did. First we created an instance of the DictWriter
class, passing as arguments the file object (csvfile
) and than fieldnames
, which must be a sequence of values to be used as the csv field names, and determines in what order the values contained in each dictionary should be written to the file. While in the case of the DictReader
class constructor this parameter is optional, here it is mandatory, and it’s easy to understand why.
After creating the writer object, we called its writeheader
method: this method is used to create the initial csv row, containing the field names we passed in the constructor.
Finally, we called the writerows
method to write all the csv rows at once, passing the list of dictionaries as argument (here we referenced them by the characters_data
variable). All done!
Conclusions
In this article we learned the basics of reading and creating csv files using the Python programming language. We saw how to read the rows of a csv file both as a list of strings and in a dictionary using a DictReader
object, and how to create a new csv file writing one row at the time, or all rows at once. Finally, we saw how to create a csv file starting from a list of dictionaries as could be returned from an API call. If you want to know more about the csv
python module please consult the official documentation.