So, I have been looking at this competition's data in excel... and i'd like to have a go at it. But a little unsure where to start.
I know about classifiers and could build one in excel, (i presume i could probably do it in R, with a 'black-box' solution, if needed, but i like to do things in excel first so i understand how they work). But that wouldn't help me with predictions on the test set.
So what sort of things should I be looking at? articles or specific tools to use would be very much appreciated.
Thanks for your interest! The getting-into-ML genre is extremely plentiful out on the internets, so I'll just try to address a couple things I think might be helpful for you on the ground. (I'll go with R because you mentioned it and it's a nice choice to start making the jump from Excel.)
- (Bonus, not necessary) One book I've heard recommended quite a bit for people making the jump from Excel to R is called "Data Smart: Using Data Science to Transform Information into Insight" - it's by John Foreman, who is a super smart guy and head of data science at Mailchimp. He also writes in an approachable and entertaining style which never hurts.
From there, you should at least know what places you want to focus your efforts and will have a decent beginner knowledge base to start tackling the problem in R with some popular machine learning classifiers.
Good luck, and feel free to be in touch! (isaac or peter at drivendata.org -- hit us up any time)