For those of us working on data science problems in industry, how do you present your findings to your supervisor/employer/boss.
I did some experiments using IPython/Jupyter notebooks, but I have now reverted to plain old Word docs and powerpoints.
Any tips/suggestions/own experiences welcome.
Good question-- for me it definitely depends on the audience.
IPython notebooks are great for technical audiences who might want to see some code, but for management I typically extract plots from the notebooks and put them into slides.
I always feel it is easier my message to engineers/data scientists vs management though. That is, I feel like "plots in slides" are not powerful enough to tell my story all the time. It's not that management doesn't know enough stats or whatever to understand what is happening. It's more like if I show something like a correlation between x and y with z confidence, they might come back with questions like, "...interesting, but what does that mean?" So I have to be a little more creative with my messaging.
In response to '...but what does that mean?', I try to anticipate it and deploy the magic phrase, "...and here's what that means".
When managers review work, there's the 'If you give a mouse a cookie...' problem. In that stage, quick and dirty responses work well to prevent over-producing on idle curiosities. Rough syntax and output speeds the review cycle, and helps minimizes throwaway work.
I mostly use RMarkdown, compiled to a pdf or html, sometimes slides. All with the aid of RStudio.
As already mentioned knowing the audience is essential, more technical stuff goes to the appendix as sometimes is too much work to write one report per audience. In case you really want to look at the code I store the r project in github for all the technical details.
I am fortunate in that I work with a bunch of old school scientists who generally favor pen-and-paper approaches over anything digital and don't mind the cost of office supplies.
So my usual answer to this question (for basic exploratory analyses with project teams) is to use R to make extremely high resolution 36x42in (or whatever) plots of the data and then print them using our map plotter and hang them on the wall. This allows us to quickly draw and annotate on them when we are working with the data. It also helps overcome powerpoint's little resolution problem (and more generally, the lack of resolution inherent to most business class projectors).
If I'm showing lots of high resolution plots, I generally favor using something like windows picture viewer over PowerPoint, because I can use one high resolution graph and zoom in the relevant sections as I discuss.
I've been playing around with things like the threejs plugin for QGIS and the dygraphs package for R to generate self-contained, dynamic, and/or 3D visualizations for management. This gives a "wow" factor and keeps management from falling asleep during a data heavy presentation - and the dygraphs or threejs maps can be recycled to my project teams for their casual use.
The resolution problem is one I struggle with frequently. I want to show 10 years of time series data at 12 locations at once and highlight the relationships between time series - doing that with enough detail and clarity to be useful is a big issue for me. The usual answer is either: really big high-red plot (maintains "connections" in the data but can be difficult to interpret and use during a presentation) or lots of lower resolution plots that window in on certain interesting phenomenon (easier to interpret but chops up the data and implies separate, rather than interconnect, phenomenon)
I wish I was more versed in creating dynamic visualizations in dygraphs and the like, but becoming so would require me to drop everything and learn Java, HTML and CSS - something I have been refusing to do on the grounds of preseving what little sanity I have left....
Right now, my data science duties are incidental to my primary role of doing application support for my company's Spotfire user community, so almost anything I do which other people are going to see is done in Spotfire. If it's more complex code, I might do a proof of concept in RStudio first.