Whoa wait. His rationale for 5 teams being outliers seems pretty weak. If I'm reading this right the outliers make up around 17% his data points as well. Seems like a lot of data to disregard as "outliers".
This seems more like data shaping than noise reduction.
This is cool. I wish you would have used the html for the IPython notebook instead of screenshots, but still very cool. Can you post the CSV files (or link them)?
I'm still trying to figure out how to post HTML Ipython cells separately in the ghost blogging platform but all of the code is on my github. Thanks for reading.