Using Data Classes in Python (Summary)
If you want to dive into all the details of data classes, have a look at PEP 557 as well as the discussions in the original GitHub repo.
In addition, Raymond Hettinger’s PyCon 2018 talk Dataclasses: The code generator to end all code generators is well worth watching.
In this course, you’ve learned how to:
- Define your own data classes
- Add default values to the fields in your data class
- Customize the ordering of data class objects
- Work with immutable data classes
If you aren’t using Python 3.7 or later, there’s also a data classes backport for Python 3.6. And now, go forth and write less code!
Congratulations, you made it to the end of the course! What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment in the discussion section and let us know.
00:00 Summary, well done. You’ve made it to the end of this course. In it, you’ve seen that data classes are a useful feature that were first introduced with Python 3.7, and that with them, you don’t have to write boilerplate code to get proper initialization, representation, and comparisons for your objects.
00:21 In this course, you’ve seen how to define your own data classes, how to add default values for fields in your data class, how to customize the ordering of data class objects, how to work with immutable data classes,
00:35 and how inheritance works for data classes.
00:40 As ever, we hope you found this course useful and we’ll see you again soon at realpython.com.
Geir Arne Hjelle RP Team on Sept. 15, 2021
Hi Jon,
for your first question I guess there are two answers. You can definitely define a data class where one of the fields is a pandas dataframe, for instance to keep some related information together with it:
from dataclasses import dataclass
import pandas as pd
@dataclass
class DataRecord:
data: pd.DataFrame
owner: str
origin: str
things = DataRecord(
pd.read_csv("things.csv"),
owner="gahjelle",
origin="things.csv"
)
Depending on your use case, something like this could be a good idea although it would have a couple of drawbacks as well:
- You’d need to reach into
.data
to access dataframe methods, for instancethings.data.query("value > 0")
- All dataframe methods would still return a dataframe and not a
DataRecord
, so you would have to manually move the extra information around:
positive_things = DataRecord(
things.data.query("value > 0"),
owner=things.owner,
origin=things.origin,
)
pandas dataframes have a .attrs
which may be a better option for storing these kinds of metadata.
Another way to put a pandas dataframe into a data class would be to extract the information of the dataframe and put it into a data class with one field for each column in the dataframe.
You can then either manually specify the fields of the dataframe or get them from your dataframe directly:
import pandas as pd
from dataclasses import make_dataclass
things = pd.read_csv("things.csv")
Things = make_dataclass("Things", things.columns)
things_dc = Things(**things.to_dict(orient="list"))
The things_dc
will now contain all the information in the data frame (except the index). However, you’ve lost all the pandas methods, so it’s much less usable for doing any kind of analysis.
So to sum up, you can put a pandas dataframe into a data class, but I don’t think there are any general benefits in doing so. You may find a special usecase where it could be convenient though.
Szabi Keresztes on Aug. 24, 2022
Thanks for introducing bpython
, the enhanced repl
is way better!
Dima on Aug. 24, 2023
Really nice tutorial, thank you!
I wish we can spend more time on __slots__
or I wish there is a separate, advanced tutorial.
Become a Member to join the conversation.
Jon Nyquist on Sept. 15, 2021
Nice class! (pun intended) Can you put a pandas dataframe in a data class? Would you want to?