Reading CSVs With Python's "csv" Module

Reading and Writing CSV Files Joe Tatusko 04:13

In this video, you’ll learn how to read standard CSV files using Python’s built in csv module. There are two ways to read data from a CSV file using csv. The first method uses csv.Reader() and the second uses csv.DictReader().

csv.Reader() allows you to access CSV data using indexes and is ideal for simple CSV files. csv.DictReader() on the other hand is friendlier and easy to use, especially when working with large CSV files.

We’ll be using the following sample CSV file called employee_birthday.csv:

CSV

name,department,birthday month
John Smith,Accounting,November
Erica Meyers,IT,March

The following code samples show how to read CSV files using the two methods:

Using csv.Reader():

Python
      
    
import csv

with open('employee_birthday.csv') as csv_file:
    csv_reader = csv.Reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        print(f'\t{row[0]} works in the {row[1]} department, and was born in {row[2]}')

Using csv.DictReader():

Python
      
    
import csv

with open('employee_birthday.csv') as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        print(f'\t{row["name"]} works in the {row["department"]} department, and was born in {row["month"]}')

00:00 Now it’s time to start using Python to read CSV files. Here, I’ve got a simple CSV file that contains some employee data for two employees and has their name, department, and birthday month. Open up a new Python script and import csv.

00:17 The first thing you need to do is actually open the CSV file.

00:27 Go ahead and call this csv_file. Next, create a csv_reader, which will just be csv.reader(). Pass in the csv_file and also a delimiter, which in this case, is just a simple comma (',').

00:45 Now you can create a line_counter, which we’ll just set equal to 0, and since the csv_reader generates a list of the different rows, you can go ahead and loop through for row in csv_reader:, and the first row contains all that header information, so when line_count == 0: you can print f'Column names are'—

01:16 and since that’s also a list of each item, just go ahead and .join() that row together. Increase the line_count, and handle everything else.

01:28 So, print(), and let’s start with a tab ('\t') here, and you can go ahead and just index off of that row, so the name will be the first item,

01:42 the department will be the second,

01:51 and the third item is the birthday month. Cool! Like before, increase the line_count and go ahead and print a summary, which will just be something like f'Processed {line_count} lines.' And that’s it! Save this, open up the terminal, and see what we get. All right, Column names are name, department, birthday month, and both employees’ information prints out here. And actually—ha ha—it looks like I printed the name twice.

02:30 This should be a 1 so that we access the second indexed item in there. Let’s try that again. There we go. Now we can see their department. Okay! So that’s pretty cool. You can open CSV files and parse them using Python.

02:47 Now, if your CSV files start getting larger, or if you have columns that you need to add or remove, the indexing can get very difficult, and as you can see, I made a typo here and printed the wrong thing.

02:58 Fortunately, csv has a different type of reader that allows you to use the header names. So go up here to the csv.reader, and you’re going to replace this with a DictReader, for dictionary.

03:10 Everything else will be the same. Go ahead and get rid of this else statement here and remove the indents on your every other row. And now, instead of picking apart the rows by their index, you can actually pass in the header name.

03:26 So what’s going on here is the DictReader (dictionary reader) looks at the first row and assumes that that’s the name of the row, so you can then refer to it by that name.

03:40 So if I go ahead, save this, we should be able to run it and get the same result. Yeah! So if you have that first row that contains the header information, you can use a different reader, and this is a little friendlier to use because you can actually see what you’re referring to. All right!

03:59 So now you can read standard CSV files. In the next video, we’re going to talk about changing some of the reader parameters to deal with those nonstandard cases. Thanks for watching, and I’ll see you there!

newoptionz on June 1, 2019

The code did not recognize the file in windows, I modified the path as follows

import csv
from pathlib import Path

data_folder1 = Path("C:/Users/Stephen/OneDrive/Documents/PythonCSV/")
data_folder1 = Path("C:\\Users\\Stephen\\OneDrive\\Documents\\PythonCSV\\")

file_to_open = data_folder1 / "employee_birthday.txt"

with open(file_to_open) as csv_file:

newoptionz on June 1, 2019

The magical powers of the . In the code above, the second data_folder1 had double \, not single ones as shown above.

josephjaewookim on June 15, 2019

It would be nice if you provided the actual CSV file…

Dan Bader RP Team on June 16, 2019

@josephjaewookim: You can find the CSV file by clicking on the “Supporting Material” button. But I’m also adding it to the description above, thanks for the heads up.

Eric P on Aug. 16, 2019

Humm

Python 3.7.2

import csv
print(csv.__file__)

with open('employee_birthday.txt') as csv_file:
    csv_reader = csv.Reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        print(f'\t{row[0]} works in the {row[1]} department, and was born in {row[3]}')

gives me

/Users/epalmer/.virtualenvs/data_science/bin/python /Users/epalmer/projects_sorted/real_python/data_science/csv/read1.py
Traceback (most recent call last):
  File "/Users/epalmer/projects_sorted/real_python/data_science/csv/read1.py", line 4, in <module>
    csv_reader = csv.Reader(csv_file, delimiter=',')
AttributeError: module 'csv' has no attribute 'Reader'
/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/csv.py

Process finished with exit code 1

In pycharm and at the terminal.

Eric P on Aug. 16, 2019

Python 3 has a lowercase reader method in the csv module.

used in the repl to find out

dir(csv)

Also the code example under Using csv.Reader(): is not working. I’m just going to roll my own example because I have used the csv module before.

The Cool Ghoul on Sept. 17, 2019

I really like the tip about using the DictReader . Never knew that existed. I routinely read large CSV datasets. So this will be a big time saver and improve readability of my code.

Stibbsy on Oct. 17, 2019

Hey! Just wondering what the purpose of the line line_count=0 was in this snippet, as it doesn’t seem to get used anywhere?

with open('employee_birthday.txt') as csv_file:
    csv_reader = csv.Reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        print(f'\t{row[0]} works in the {row[1]} department, and was born in {row[3]}')

Joe Tatusko RP Team on Oct. 17, 2019

Hi Stibbsy! That line_count variable was in there to print out the number of lines processed in the csv, but would need to be incremented as each row is processed. I didn’t include that in the video, but left the variable in there. It’s not needed for the csv module to function correctly. Good find!

muralichintapalli on Feb. 1, 2020

there is no method csv.Reader its csv.reader

Ranit Pradhan on March 21, 2020

import csv with open(‘employee_birthday.txt’) as csv_file: csv_reader = csv.Reader(csv_file, delimiter=’,’) line_count = 0 for row in csv_reader: print(f’\t{row[0]} works in the {row[1]} department, and was born in {row[3]}’)

using this code I got some following errors:-

AttributeError Traceback (most recent call last) <ipython-input-4-fae0f779c01c> in <module> 1 import csv 2 with open(‘employee_birthday.txt’) as csv_file: ----> 3 csv_reader = csv.Reader(csv_file, delimiter=’,’) 4 line_count = 0 5 for row in csv_reader:

AttributeError: module ‘csv’ has no attribute ‘Reader’

## I got the solution by changing csv.Reader to csv.reader and give the limit row[2] in place of row[3] because it was comming out of range.``

Ranit Pradhan on March 21, 2020

import csv

with open(‘employee_birthday.txt’) as csv_file: csv_reader = csv.DictReader(csv_file, delimiter=’,’) line_count = 0 for row in csv_reader: print(f’\t{row[“name”]} works in the {row[“department”]} department, and was born in {row[“month”]}’)

//It needs to write as {row[“birthday month”]} in place of {row[“month”]}................as it’s already given in the text document employee_birthday.

wonderdog on March 28, 2020

Good tutorial and I am able to use this method for opening a single .csv file for reading - however I would like to open multiple .csv files at once.

How would a pythonista use the csv module - to open a variable number of .csv files for ‘reading’ (files that a user specifies at run time via the ‘input’ function) at the same time and dynamically assign a csv_reader, for each .csv file that is opened? Any help is much appreciated.

farlesh1000 on April 21, 2020

Fun (unnecessary) way to create the txt from scratch:

<p>from pathlib import Path

project_folder = Path(‘C:\Users\user_name\project_file_folder_name\‘) file_to_open = project_folder / ‘employee_birthday.txt’

fh = open(file_to_open, mode=’w’) # fh stands for file handler object

the only required parameter for open is mode, therefore you can just type ‘w’

fh.write(‘name,department,birthday month\n’ ‘John Smith,Accounting,November\n’ ‘Erica Meyers,IT,March’) fh.close()

shallah richardson on May 25, 2020

Hi are there files for this lesson if so where are they? The supporting material button only takes me to another tutorial. Thanks

shallah richardson on May 25, 2020

Hi are there files for this lesson if so where are they? The supporting material button only takes me to another tutorial. Thanks

Jon Fincher RP Team on May 26, 2020

shallahrichardson, check out the linked article. The files referenced can be found if you scroll through that article.

shallah richardson on May 26, 2020

@ Jon Fincher I’ve scrooled through all the comments and articles and I still don’t see the files.

Dan Bader RP Team on May 26, 2020

Here’s the direct link for the sample CSVs: realpython.com/courses/reading-and-writing-csv-files/downloads/reading-and-writing-csv-files-samples/

(They can also be found at the start of the course and under the Supporting Materials dropdown on each video lesson.)

theramstoss on June 18, 2020

Could you please explain why the python documentation requires newline = ‘’? I kind of understand it but not 100%. Thanks!

theramstoss on June 18, 2020

Also, the python docs does not seem to specify ‘r’. Is the default ‘r’?

UweSteiner on June 27, 2020

good overview of what’s possible with ZIP

Walt Sorensen on March 31, 2021

Why use this CSV reader from import csv over Pandas read CSV?

import pandas as pd
file = csvfile.csv
df = pd.read_csv(file)

(especially when considering working with data in pandas?)

Bartosz Zaczyński RP Team on April 1, 2021

@Walt Sorensen Well, not everyone needs or knows about Pandas in the first place. It depends on your use case, but one advantage of the built-in csv module over Pandas is that it’s available in every Python distribution, while Pandas is not. That might be a problem if you don’t have Internet access or admin privileges, which is common in corporate environments.

naomifos on April 2, 2021

I have a probably weird question. Now I know that normally CSV files read through a row horizontally. But in my question problem I have, it is wanting me to read through it vertically so that my header is a key in a new dictionary and the values would be the things in the column vertically. How would I go about doing that?

Bartosz Zaczyński RP Team on April 2, 2021

@naomifos With the pandas library, that’s what happens implicitly because it stores data in columnar format:

>>> import pandas as pd
>>> df = pd.read_csv("data.csv")
>>> df["first_name"]
0     Joe
1    Anna
2     Bob
Name: first_name, dtype: object

>>> list(df["first_name"])
['Joe', 'Anna', 'Bob']

If you wanted to simulate the same behavior with pure Python, this is how you could go about it:

import csv
from collections import defaultdict, namedtuple

columns = defaultdict(list)
with open("data.csv", encoding="UTF-8") as file:
    reader = csv.reader(file)
    field_names = next(reader)
    Record = namedtuple("Record", field_names)
    for row in reader:
        record = Record(*row)
        for key, value in record._asdict().items():
            columns[key].append(value)

print(columns["first_name"])

Eduardo on May 11, 2021

I’m trying it with a csv uploaded on Github running my code in Google Colab.

url = 'https://raw.githubusercontent.com/UpInside/play_import_csv/master/municipios.csv'
with open(url) as csv_file:
  csv_reader = csv.reader(csv_file, delimiter = ',')
  line_count = 0
  for row in csv_reader:
    if line_count == 0:
      print(f'Column names are {", ".join(row)}')
      line_count += 1
    else:
      print(f'\t{row[0]} is the UF of {row[2]} city.')
      line_count += 1
  print(f'Processed {line_count} lines.')

But the return is the error:

FileNotFoundError: [Errno 2] No such file or directory: 'https://raw.githubusercontent.com/UpInside/play_import_csv/master/municipios.csv'

Leodanis Pozo Ramos RP Team on May 11, 2021

@Eduardo The problem here seems to be that you’re using open() to open an url instead of a local file. I think you should try downloading the file or using: docs.python.org/3/library/urllib.request.html#urllib.request.urlopen

Kim on Oct. 31, 2021

Hi, all.

I just went through this tutorial and with DictReader, it appears to have iterated rows through all of the rows, but only printed rows 0 and 2 and bypassed 1. Here is the snippet, which I believe is identical to the script in the tutorial.

with open(r'C:\Users\xxxx\xxxx\csv_example.txt') as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count ==0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            print(f'\t{row["name"]} works in the {row["department"]} department, and was born in {row["birthday month"]}.')
            line_count += 1
    print(f'Processed {line_count} lines.')

This code is similar to what I actually ran…I de-identified some details in my file location.

Any hints on the fix? Thanks!

Bartosz Zaczyński RP Team on Nov. 2, 2021

@Kim If you don’t explicitly provide a list of field names for your records, then DictReader will assume the first line in the CSV file is the header. It will try to get those field names from there, always skipping the first line during iteration:

name,department,birthday month
Joe Doe,IT,1999-01-01
Anna Smith,HR,2001-12-31
Bob Brown,Sales,2002-15-31

>>> import csv
>>> with open(r"example.csv") as csv_file:
...     csv_reader = csv.DictReader(csv_file, delimiter=",")
...     for row in csv_reader:
...         print(row)
...
{'name': 'Joe Doe', 'department': 'IT', 'birthday month': '1999-01-01'}
{'name': 'Anna Smith', 'department': 'HR', 'birthday month': '2001-12-31'}
{'name': 'Bob Brown', 'department': 'Sales', 'birthday month': '2002-15-31'}

Notice the loop starts at the second line in the file, which contains the actual data.

SirVeyor on Nov. 19, 2021

I learn just as much if not more from the comments. Thanks to all!

acamposla on Dec. 16, 2021

I am trying this approach

counter = 0
with open ("csv-sample-files/co-curricular-activities-ccas.csv") as singapore:
    singapore = csv.reader(singapore, delimiter = ",")
    line_count = 0
    for row in singapore:
        if line_count == 0:
            print(f"Las columnas son {', '.join(row)}")
            print(f"Hay un total de {len(row)} columnas")
            line_count +=1
        else:
            for i in range(5):
                print(f"\t{row[i]}")
                counter += 1
                if counter == 5:
                    print("---next line---")

I get this:

Las columnas son school_name, school_section, cca_grouping_desc, cca_generic_name, cca_customized_name
Hay un total de 5 columnas
    ADMIRALTY PRIMARY SCHOOL
    PRIMARY
    PHYSICAL SPORTS
    MODULAR CCA (SPORTS)
    SPORTS CLUB
---next line---
    ADMIRALTY PRIMARY SCHOOL
    PRIMARY
    VISUAL AND PERFORMING ARTS
    ART AND CRAFTS
    VISUAL ARTS CLUB
    ADMIRALTY PRIMARY SCHOOL
    PRIMARY
    CLUBS AND SOCIETIES
    ENGLISH LANGUAGE, DRAMA AND DEBATING
    ENGLISH LANGUAGE AND DRAMA
    ADMIRALTY PRIMARY SCHOOL

However i would like to print “Next line” every 5 lines (columns)

Become a Member to join the conversation.