This is a tutorial-style post that offers an alternative approach to dealing with large sets of training data, without resorting to copying and moving files to hard-coded directories named `train`, `validation` and `test`. Instead, you keep the files where they "naturally" reside on your system and track their locations with a Pandas DataFrame, feeding their names to the Keras generator. It scales well when dealing with millions of image files and hundreds of gigabytes of data.
I own and I am reading through the v1.0 currently. What are some of the differences in v2.0? One thing I was hoping for a little more detail on was the team process/governance side of things. For instance how to leverage scrum/kanban methodologies as part of the analytics development process as well as highlighting some of the key differences between analytics development and software development in the agile world.
Random forests have always been my favorite algorithm for reasons like easy understanding of its working, and it does not require many assumptions. RF is one of the first algorithms I go for when working on new datasets.
very cool. it would be nice to see some of the historical rating data annotated by major book releases. I notice that some in the top lists were published within those timespans. or maybe you explored that already and didn't see anything interesting.