This is a digital book! Best practices for manipulating data with Pandas. This book will arm you with years of knowledge and experience that are condensed into an easy to follow format. Rather than taking months reading blogs and websites and searching mailing lists and groups, this book will teach you how to write good Pandas code.
Patterns for Data Manipulation
Manipulate your data with ease.
Master predictable patterns to clean, visualize, prepare, pivot, summarize, and predict data. Write code that is easy to use, debug, come back to a week later, and put into production. Stop worrying about if you will be able to use your Jupyter notebook when you come back tomorrow.
Learn how to quickly and reliable load, clean, and prepare data. Think about data processing like a "recipe".
I have taught 1000's of students how to use Pandas. I have used Pandas since it came out. I have found many warts as well as best-practices. You can learn to avoid the warts and learn the best-practices.
Learn how to wield Pandas for great good. Create win-win situations for you and your future data collaborators!
Meet a Satisfied Reader
Pandas is one of those libraries that suffers from the "guitar principle" (also known as the "Bushnell Principle" in the video game circles): it is easy to use, but difficult to master.
Truly, it is one of the most straightforward and powerful data manipulation libraries, yet, because it is so easy to use, no one really spends much time trying to understand the best, most pythonic way to employ the library to its full extent.
If you haven't read Matt Harrison's book and use Pandas, chances are you're like that Chad at the picnic or camping trip that pulls out his guitar to strum along the same basic chords for an hour straight... Well, NO MORE!
Matt Harrison is ready to drop some knowledge on you and have you riffing your own data manipulation solos like you're Slash in "November Rain", or Prince in "Purple Rain"...
The book goes beyond explaining the data structures and methods that underpin Pandas, but he also provides a ton of practical advice regarding best practices in data manipulation and transformations.
For instance, by the time you're done you'll know which functions to use to leverage Pandas' vectorized structures to ensure your code is fast and efficient, which data types provide huge savings in terms of memory allocation, how to chain operations to ensure you're always accessing the correct intermediary dataframe, how to utilize indices to give you superpowers over your data, how to debug chains, merge, join, melt, style, and more.
It is by far, the best book you can get yourself if you want to take your data science skills to the next level, after all, they say modern data science is 90% data cleaning. I mostly agree.
I have recommended this book to every member of my team. REQUIRED READING.
Highest possible recommendation.
NVidia Engineer Agrees
Here is a snapshot of a (small) subset of all of my Coding, Data Science and Machine Learning books. This collection would get you close to 98%-99% of all the necessary core skills to be a good Data Scientists.
What You'll Learn
Learn to create code that does this
Create Powerful Visualizations
These don't look like your typical Matplotlib charts!
Get your data looking how you want it to.
Make code that reads like a recipe.
Dozens of images to explain and visualize manipulation concepts.
The Contents of "Effective Pandas"
- Series & DataFrame Overview
- Series Deep Dive
- Learn to use operations and methods
- Learn to aggregate and summarize
- Convert to the correct type
- Conditional manipulation
- Handling missing data
- Interpolation of data
- Binning Data
- String manipulation
- Optimization with Cython
- Date Theory
- Date conversion
- Date methods
- Grouping by dates
- Shifting Data
- Rolling Data
- Cumulative Operations
- Styling Plots
- Categorical Counts
- DataFrame Deep Dive
- Summarizing Data
- Learning to .apply correctly
- Creating columns
- Memory usage
- Sorting columns
- Sorting indices
- Filtering rows
- Filtering columns
- Dummy columns
- Melting data
- Stacking data
- Unstacking data
- Joining Data
- Time Series
- Adding timezones
- Cleaning missing data
- Offset aliases
- Advanced anchoring
- Exporting Data
- Adding Rows
- Adding Columns
- Join types
- Join validation
- Styling Data
Reviews in the Wild
To start with machine learning, one of the most important skills you should build:
• Learning how to manipulate data
"Effective Pandas" is probably the best book on the market.
This will single handily give you everything you need!
Oh, and in case you don't know how to toy around with data in pandas DataFrames or just want to take it up to the next level, I can highly recommend Effective Pandas by my favorite Python teacher
. I thought I knew it all, but this book taught me otherwise.
I've been using Pandas for about 10 years, and I still improved my Pandas skill working through the Effective Pandas...
Matt Harrison is the Pandas guru, which in my opinion is equivalent to being Canelo Alvarez of boxers or the Esther Perel of therapists; he’s mastered the fundamentals and knows more than most anyone about the Pandas package. Python can take you far but mastering the Pandas package is a goal every data scientist worth their weight/paycheck should have high in their priority list.
I’ve been following your work for a while and have found your pandas philosophy transformative and super helpful in my own work. Effective Pandas has been a great perspective on the library for me, even after years of using it!
i remember cursing under my breath whenever I had to review coworkers' pull requests containing Pandas code.
but this 'Effective Pandas' style is making me realize Pandas code can actually be aesthetically pleasing.
You know when you are reading a good book. You can’t put it down and have to keep reading it, page after page. It feels like binge-reading, the same way we binge-watch series on Netflix.
When I started learning Data Science, I felt truly overwhelmed by the number of books I could find. Between taking courses, practicing on projects, and competing on Kaggle…how could you find enough time to finish a book? The only answer is to pick books you can’t stop reading.
This is my go-to book for data manipulation with Pandas. The author concentrated on all the best practices that will make you a Jedi of Pandas. A lot of thought has gone into how to break down the subject matter so it is very well organized. It goes from working with series to plotting with Pandas or how to debug.
One specificity of this book is it explains well the chaining method for writing code. It introduces useful Pandas methods like pipe or assigns that are not often seen in courses.
If you are learning Pandas or are an experienced Pandas user, you will for sure benefit from reading this book.
Bonus: At the end of each chapter, you will find exercises that make you practice right away the concepts you’ve just read. My personal advice: do them!
You'll keep Effective Pandas in arm's reach long after you're done.
Effective Pandas is 350+ pages of no-fluff, get-it-done explanations and advice to help you learn pandas or to level up your existing knowledge and skills.
Matt Harrison has done a masterful job putting together an impressive and orderly collection of helpful instructions and meaningful insights on the pandas library. He starts with a peek under the covers, so you get a good understanding of the core pandas objects - Series and DataFrames - and how they work. Every ensuing chapter builds incrementally on that foundation, with clear and pertinent examples throughout. The book is laid out in such a way that topics flow naturally one to the next and are conducive to a read-through for learning the pandas library piece by piece. But it's also organized into bite-size chunks that make it easy to come back to as a reference later when you need that one little piece of information that you can't quite remember.
One of the things that I really appreciate about Matt's presentation of pandas topics is that he is not shy, while also not being adamant, about sharing his thoughts on the best ways to do things. His coverage of chaining and the associated benefits to code clarity is a good case in point. Everyone is allowed their opinions, but my opinion is that his opinions on these things is pretty darn close to right.
(In fact, I like Effective Pandas so much that I bought it three times - the hard copy on Amazon, a stand-alone digital copy on Matt's web site (store.metasnake.com), and later another digital copy that came bundled with some courses, also from his site. Maybe that last one doesn't completely count...)
At any rate, the book is full of helpful content:
- Clear explanations and usage examples of a boatload of pandas Series and DataFrame attributes and methods.
- Valuable thoughts and suggestions on the best ways to combine pandas operations to be efficient and accurate.
- Effective coverage of all pandas table-steaks topics: loading data, filtering, inspecting dataframes, typing, sorting, merging/joining, grouping, aggregating, shaping, melting, plotting, exporting, debugging (and even more).
- Tips and insights on things like effectively chaining pandas methods, improving memory usage through manipulation of data types, making sure that appropriate constructs are used for accessing intermediate data frames in chained sequences (see notes on lambda functions in ch 10 and on the differences btw .query / .loc in ch 24, for example), and augmenting performance in some areas with Cython.
If you're looking for that one resource that will help you build out your knowledge of pandas, and that you'll go back to again and again, this is it. I highly recommend that you get it now.
And, of course, I also recommend checking out Matt's other book and course offerings at store.metasnake.com. There's a lot of good content there around Python and pandas that delivers value at much of the same level as what you'll see in his Effective Pandas book.
I'm currently working as a Data Scientist in the Marketing Science team at [Large Bay Area Company]. We use Pandas extensively for various things from experimentation, modeling, ML etc, to measure incremental benefits of our marketing campaigns. Most of what I know about pandas, numpy, sklearn etc. comes from what I learnt on the job. I realized that I never had an in-depth, rigorous, formal course that covers these topics and maybe was not fully using the capabilities or writing the most optimal code that can be written. I follow you on twitter and frequently see the code snippets you share over there and I thought I could get some value out of the material you cover on your website.
I'm only a few pages into the Effective Pandas book and so far I've found it to be very enjoyable even though the material is mostly known to me. I can't wait to get into the more advanced topics and start applying what I learn to my day-to-day work.
I'm halfway through
@__mharrison__'s effective pandas book and I can say it's, by far, the best book out there for teaching pandas.
It's effortless to follow with many real-world examples and fantastic tips and coding practices.
Best book i have come across till date ☺. I am using pandas for quite a while but never knew i was doing it in dumb way until i started reading this book. I love the concept of chaining, avoiding apply & using vectors approach with numpy select/where, shift operation, assign, clip & many more. Still half way through this book but definitely recommend all analyst/applied scientists to read this 📙. Thanks Matt for writing such a wonderful 📙 ☺
I highly recommend Effective Pandas by
I picked this up upon release & out of the dozen or so Pandas' books I've read, this is one I like most.
It's approachable while being thorough, two qualities that usually aren't present in the same book/course
I am amazed! Mind blowing Everything about Pandas in one book! It will be easier not only on Kaggle but in my personal projects as well. Great book Matt!
I can’t rate Effective Pandas highly enough.
I like how it gives you the visual representation of the syntax—an excellent resource.
I wish I would have read the book »Effective Pandas« by
earlier. It answers every question about Series & DataFrames I had during the last months. If you are working with pandas I absolutely recommend purchasing it. I learned a lot & it improved my understanding.
This book contains best practices with Pandas, essential for anyone who wants to improve their data manipulation skills.
I truly appreciate how Effective Pandas covers the full breadth of pandas' capabilities...it's the definitive pandas resource!
Bout a 1/3 through Effective Pandas and it’s best I’ve read on Pandas by far. Visuals are top notch.
This has revolutionized my workflow so far. Used to commit all the ‘sins’ - lots of intermediate dataframes, lots of single cell calculations/transformations in the notebook. And then you just lose track of what the started off as.
Amazing book. The best book on Python Pandas.
I got this book of Matt on Pandas. The book is wonderfully organized and each chapter builds up your knowledge on the core concepts and clearly charts a picture of what is the purpose of each concept. The author also provides with multiple ways of achieving same results. I like the way how Matt have described the each scenario, graphical/pictorial representation of each topic to give you a visual clarity on what is happening underneath.
Reading the book you can feel that the author has poured in his two plus decades of experience and want to share the most with his audience so that they can implement each scenarios.
I've spent a few weeks finishing this book and learned so much while reading.
Thanks for writing this book.
You've heard that data processing, machine learning, all takes 80% munging or data cleansing, So doesn't it make sense to get really good at that one activity that takes up most of your time? This book will help you do that. It covers everything in pandas from being memory efficient to using some functions over others. Plotting, string methods, aggregating, data frames themselves versus series, math methods. Looping over data, filtering, plotting, reshaping and pivoting. Everything you really need to know about handling data in an effective and proficient way. Do yourself and your career path a favor, get this book!
I bought Effective Pandas to grasp the panda's concepts finally and hoped I would do it efficiently and effectively. What I got exceeded my expectations in many ways. Matt's conceptual design of this book is brilliant. The book gently starts with the Series introduction. Matt spent some time explaining almost all possible operations one might need to do with Series. This pays off very well once you get to the Dateframe chapters, as practically all you easily learned about Series can be implemented on data frames as well. Also, Matt's operations chaining concepts are phenomenal. Once you start chaining operation on a data frame, you will begin to understand what each operation is doing and what the following operation will do. Like a cooking recipe. Easy to understand, easy to learn, easy to write, and easy to recreate on your data frame. At the end of each chapter, Matt asks you to reinforce what you learned by practicing on a dataset of your choice.
This book can be used by a beginner to dive into a wonderful world of pandas possibilities but at the same time, a seasoned user can use it to refine his panda's skills do some things more effectively.
I thought I could take a break during the holidays, but Matt Harrison's book on Effective Pandas was too good to put away.
If you want to do Pandas the "right way", please study this book. I learned how to stop over-relying on the apply() function and how to do chaining properly.
The only book on Pandas you need.
This book is for anyone wanting to learn, currently learning or currently using pandas.
Clearly, it is a labor of love:
A lot of thought has gone into how to break down the subject matter so it is very well organised. The page count may seem long (approx. 450) but the book has a great form factor and the text is well-spaced, clear, and legible. So the pages are easy to read and not dense.
In addition, the descriptions are clear and with no waffle. I hate books that are 10x longer than they need to be. This is not one of those books.
One great aspect of Matt's work is his use and treatment of the pandas tool itself (try and get yourself on one of his free "Idiomatic pandas" webinars if you can). His use of chaining has been a game-changer for me and leads to clear and explainable code.
In summary, if you are looking for a book on pandas, this is the book you need - buy it without hesitation.
Dr B. (part 2)
Must Read Book on Pandas
After reading this book, you will never write pandas code the same way, using method chaining and functional notation, optimizing data types, styling dataframes, and so on.
The book is full of useful information and practical knowledge for the daily use of pandas, and also contains interesting focus on some external features such as Cython, CatBoost or sparklines.
Best book to learn Pandas
The book can be used from a beginner to an expert.
It is well organized and the concepts are explained in a clear stepwise manner.
Color illustrations help explain the concepts.
The book has changed my way of programming Pandas.
Very engaging and informative book. This has been helpful so far in my learning of Pandas.
I’m going through Effective Pandas, and it’s been great so far. Fantastic book!
The Definitive pandas Resource!
I'm a longtime pandas user, and I have to say that this is THE end-to-end pandas resource.
From doing SQL-like operations in-memory, to statistical analysis, to data visualization - this book covers pandas in its full breadth!
If you've been looking for a great pandas resource...you can stop because you just found it 😉!
Effective Pandas changed how I do data analysis…
I have personally attended "Effective Pandas" live workshop by Matt Harrison on 6/17/2021 and then purchased a PDF version of the book. The quality of his teaching is simply mind-blowing. He has a very deep domain knowledge on the topic of pandas. I come from R and statistics background, so I like having a very detailed background explanation of why some things are structured the way they are. Matt provides such deep insights in his explanations.
This book will change my focus in Machine learning studies. Thanks a lot !!
I am now a master in the method chaining. Thanks to this book.
This book really helped me making the most of pandas. It moves from the more procedural way of using pandas to best practices for method chaining and UDF
I can't ask for more
Being a developer and teacher myself, when I read a book like this, I like to pay attention and ponder not only the technical matters but the didactical aspects too.
Many authors know their stuff, some of them explain it clearly, but few go the extra mile in order to assure the reader understands. What I mean is that most authors try to get their word out, teachers go further and try to make the ideas get in readers' brains.
I feel Matt Harrison knows its stuff, but he also cares to teach it. For example: giving real world examples makes it easy to relate to the problem at hand. Short chapters are rightly sized knowledge pills. The summary and exercises at the end of each help to make sure one understands.
I feel clarity was in the author's head all the time, and he succeeded. At least for me, Effective Pandas resulted in an easy read, without any major hurdle, covered everything I wanted to learn, and made me like and consider to adopt the chaining style.
There isn't a lot more I can ask from a book.
Note to self : 3 important practices to use more while reading ´Effective Pandas’:
- The tweak function to clean the df at the beginning of the Jupyter Notebook
- Assign method
I haven’t seen these 3 very often in the online courses. Apply much more.
I think Effective Pandas is great for anyone overwhelmed by docs on the internet about pandas.
Wonderful book with great examples
I have been using Pandas for a long time, so I picked this up thinking I already knew everything about Pandas but I was very wrong. It is a great book from the basics to complex topics and the examples are wonderful.
Best Pandas Resource
This book is really well organized, written, and laid out with excellent color illustrations.
What makes this book particularly invaluable are the learnings from Matt's teaching and consulting experience. Beyond just explaining functionality, Matt draws from practical experience to compare methods, highlight nuances, and explain common pitfalls.
Good for pandas best practices
I've learned and used pandas quite a bit mainly by googling problems, but never really knew what the best practices were. This book has taught me new concepts such as chaining and other ways to improve performance such as the categories topic. I would recommend if you want a good foundation on how to perform well using pandas.
Best Technical Book
I have started digging into your book, and it is one of the best technical books in the area of data handling and the best book in the PyData sphere I have owned and read.
I just can not say it enough, but I bought a lot of books the last 2 years regarding python but Effective Pandas is my most valued one. I have it almost always open and have learned so much from it and I'm still not finished with it. So thanks for the great work!! Hope to read more books from you in the future.
Effective Pandas is excellent, I’m using it for work now and its really helping me convert some reports from excel to python & pandas. My earlier attempts before were a bit like the XKCD comic...
"like a salad recipe written by a corporate lawyer using a phone autocorrect that only knew Excel formulas. It's like someone took a transcript of a couple arguing at Ikea and made random edits until it compiled without errors."
Best Book to Become a Data Scientist
Even if you are ultimately going to be working with terabytes of data, you’ll start out doing exploratory data analysis. The tool that you’ll use for that is most likely going to be Pandas. One of the best investments that you can make when becoming a data scientist is to become a Pandas expert, and there is no better book than Harrison’s to help you get there. Plus, many of the interview questions you will face during the hiring process will probably involve Pandas. Blow your interviewers out of the water by showing them corners of the Pandas library they didn’t even know!
Frequently asked questions
You’ve got questions. We’ve got answers.
What is Pandas useful for?
Any time you have "structured" or tabular data (like a database or from a spreadsheet), consider using Pandas. Pandas is built for working with this type of data and also leverages NumPy so it is speedy and memory efficient.
You will find it being used for:
- Data Analysis
- Statistical work
- Exploratory Data Analysis
- Data pipelines
- Machine Learning
- and more
What is the style of presentation?
The book introduces the features of Pandas by using real-world datasets. So you come across many of the annoying, hairy, and ugly issues and are shown how to deal with them. There is no creation of random data in the book used just to demonstrate features.
How is this different from your other Pandas books?
Great question. My first Pandas book, Learning the Pandas Library, was one of the first books on Pandas. After that book I had the chance to use Pandas even more, teaching and consulting with it. I had been planning on updating it after those experiences. In the meantime I was approached to do the second edition of the Pandas Cookbook. I had read the 1st edition and liked it. Though I added a few chapters to the cookbook, I also re-wrote almost all of the code. I think it is a great book. However, it is not "my" book.
Effective Pandas is my book and recommendation for all who want to write good Pandas code.
Who is the target audience for this book?
The target audience is anyone who wants to write better pandas code. See the review above. Newbies will start the correct way. Experienced users will pick up skills and tips to immediately write better pandas code.
How much Python do I need to use Pandas?
You should have some basic Python skills: Functions, loops, index operations. After that, I really encourage Lambdas (to get the most out of chaining). Comprehension constructs and argument unpacking come in useful as well.
Your style of writing Pandas code does not play well with formatters like Black.
That's not really a question, but a fact. Sadly, Black doesn't format well-written Pandas code very well. Use Black if you must, I hand format my Pandas code. 🤷♀️
I learn best by doing. Are there exercises?
Yes, every chapter has exercises. I don't have solutions in the book, but you can find community supported solutions here.
Is there any tool that can help me visualize similar to your drawings?
Please check out https://pandastutor.com/ which was inspired by my Pandas talks and training material!
Can you help my team learn Pandas?
That's what I do for a living. Reach out to me using the messaging widget in the lower right and we can chat.
Is there an errata?
I track issues (breaking changes in Pandas, etc) with the book at Github. Please file an issue there if you have problems. https://github.com/mattharrison/effective_pandas_book
A pdf and an epub (not a physical book!)