Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Using Data Classes in Python (Summary)

If you want to dive into all the details of data classes, have a look at PEP 557 as well as the discussions in the original GitHub repo.

In addition, Raymond Hettinger’s PyCon 2018 talk Dataclasses: The code generator to end all code generators is well worth watching.

In this course, you’ve learned how to:

  • Define your own data classes
  • Add default values to the fields in your data class
  • Customize the ordering of data class objects
  • Work with immutable data classes

If you aren’t using Python 3.7 or later, there’s also a data classes backport for Python 3.6. And now, go forth and write less code!

Download

Sample Code (.zip)

1.0 KB
Download

Course Slides (.pdf)

2.1 MB

Jon Nyquist on Sept. 15, 2021

Nice class! (pun intended) Can you put a pandas dataframe in a data class? Would you want to?

Geir Arne Hjelle RP Team on Sept. 15, 2021

Hi Jon,

for your first question I guess there are two answers. You can definitely define a data class where one of the fields is a pandas dataframe, for instance to keep some related information together with it:

from dataclasses import dataclass
import pandas as pd

@dataclass
class DataRecord:
    data: pd.DataFrame
    owner: str
    origin: str

things = DataRecord(
    pd.read_csv("things.csv"),
    owner="gahjelle",
    origin="things.csv"
)

Depending on your use case, something like this could be a good idea although it would have a couple of drawbacks as well:

  • You’d need to reach into .data to access dataframe methods, for instance things.data.query("value > 0")
  • All dataframe methods would still return a dataframe and not a DataRecord, so you would have to manually move the extra information around:
positive_things = DataRecord(
    things.data.query("value > 0"),
    owner=things.owner,
    origin=things.origin,
)

pandas dataframes have a .attrs which may be a better option for storing these kinds of metadata.

Another way to put a pandas dataframe into a data class would be to extract the information of the dataframe and put it into a data class with one field for each column in the dataframe.

You can then either manually specify the fields of the dataframe or get them from your dataframe directly:

import pandas as pd
from dataclasses import make_dataclass

things = pd.read_csv("things.csv")
Things = make_dataclass("Things", things.columns)

things_dc = Things(**things.to_dict(orient="list"))

The things_dc will now contain all the information in the data frame (except the index). However, you’ve lost all the pandas methods, so it’s much less usable for doing any kind of analysis.

So to sum up, you can put a pandas dataframe into a data class, but I don’t think there are any general benefits in doing so. You may find a special usecase where it could be convenient though.

Szabi Keresztes on Aug. 24, 2022

Thanks for introducing bpython, the enhanced repl is way better!

Dima on Aug. 24, 2023

Really nice tutorial, thank you!

I wish we can spend more time on __slots__ or I wish there is a separate, advanced tutorial.

Become a Member to join the conversation.