DataTaunew | comments | leaders | submitlogin
New ML Competition in public health: "Countable Care" (drivendata.org)
9 points by bull 3338 days ago | 3 comments


2 points by toast 3336 days ago | link

So, I have been looking at this competition's data in excel... and i'd like to have a go at it. But a little unsure where to start.

I know about classifiers and could build one in excel, (i presume i could probably do it in R, with a 'black-box' solution, if needed, but i like to do things in excel first so i understand how they work). But that wouldn't help me with predictions on the test set.

So what sort of things should I be looking at? articles or specific tools to use would be very much appreciated.

-----

3 points by thriptic 3334 days ago | link

Check out introduction to statistical learning. It walks through how to implement and tune many common machine learning algorithms in R.

-----

3 points by isms 3336 days ago | link

Thanks for your interest! The getting-into-ML genre is extremely plentiful out on the internets, so I'll just try to address a couple things I think might be helpful for you on the ground. (I'll go with R because you mentioned it and it's a nice choice to start making the jump from Excel.)

- Breeze through this R intro: http://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-I...

- Work through this longer one, playing along at home in your RStudio -- not trying to memorize, just get comfortable: http://cran.r-project.org/doc/manuals/R-intro.pdf

- Our e-buddy @trevs (Datatau & DrivenData user, and high finisher on our last competition!) put together a great getting started guide for Kaggle's practice challenge: http://trevorstephens.com/post/72916401642/titanic-getting-s...

- (Bonus, not necessary) One book I've heard recommended quite a bit for people making the jump from Excel to R is called "Data Smart: Using Data Science to Transform Information into Insight" - it's by John Foreman, who is a super smart guy and head of data science at Mailchimp. He also writes in an approachable and entertaining style which never hurts.

From there, you should at least know what places you want to focus your efforts and will have a decent beginner knowledge base to start tackling the problem in R with some popular machine learning classifiers.

Good luck, and feel free to be in touch! (isaac or peter at drivendata.org -- hit us up any time)

-----




RSS | Announcements