DataTaunew | comments | leaders | submitlogin
Python for Data Analysis - A Critical Line-by-Line Review (medium.com)
9 points by TedPetrou 15 days ago | 1 comment




2 points by TedPetrou 15 days ago | link

Hey all,

I wrote a very detailed review of the book, Python for Data Analysis (2nd edition) by Wes McKinney.

Here is a high-level summary:

PDA is written very much like a reference manual, methodically covering one feature or operation before moving on to the next. The current version of the official documentation is a much more thorough reference guide if you are looking to learn pandas in a similar type of manner.

There is very little actual data analysis and almost no teaching of common techniques or theory that are crucial to making sense of data.

The vast majority of examples use randomly generated or contrived data that bear little resemblance to what data actually look like in the real world.

For the most part, the operations are learned in isolation, independent from other parts of the pandas library. This is not how data analysis happens in the real-world, where many commands from different sections of the library will be combined together to get a desired result.

Although the commands will work for the current pandas version 0.21, it is clear that the book was not updated past version 0.18, which was released in March of 2016. This is apparent because the resample method gained the on parameter in version 0.19 which was absent in PDA. The powerful and popular function merge_asof was also added in version 0.19 and is not mentioned once in the book.

There were numerous instances where it was clear that the book was not updated to show more modern code. For instance, the take method is almost never used any more and has been completely replaced by the .iloc indexer. There were also many instances were code snippets could be significantly transformed by using completely different syntax, which would result in much better performance and readability.

One of the most confusing things for newcomers to pandas are the multiple ways to select data with the indexers[], .loc, and .iloc. There is not enough detailed explanations for the reader to walk away with a thorough understanding of each.

reply




RSS | Announcements