While I agree with a lot of the points that are made in the Nature article Scientists rise up against statistical significance (https://www.nature.com/articles/d41586-019-00857-9), I worry that they are changing one measure that can be abused for another.
Hey, we built amie-fern to address the version control and reproducibility issues from rapid prototyping with Jupyter notebooks. It is a Jupyter labs extension + web app that automatically tracks code, variables, data, and their dependencies in an interactive graph so you can explore your results and create a script that gets you back to any point in your workflow. Check out our short video, sign up for free, and try it! We'd love to hear what you think. Thanks 🙏!
Even though this is a post about joining an engineering team. I think it's just as relevant for data scientists. I've spent a significant amount of time working with data and I consider myself a data scientist as well.
I created the Machine Learning Canvas to make it easier to ask the right questions at the beginning of an ML project, and to save people from wasting time and money due to a poor design of their ML system. I’m now releasing the first draft of a book that contains everything there is to know about this framework, in a 1-hour read.
And you can interact with this library with a whole ecosystem of clients :
- A web client : directly on bender's website, you can visualize the optimization process on nice graphs; and compare the performances of different models on the same problem with a ranking board that ultimately allows you to pick the best model with the best hyperparameters set.
- A python one, a R one : it allows you to get automatic suggestions of hyperparameters set to test within your code.
Hi! I wrote that post! Another friend pointed out yesterday this other post about Gaussian Processes: https://www.jgoertler.com/visual-exploration-gaussian-proces... I think that post has some fun visualizations for showing how different kernels work, but I tend to prefer my explanation. Would love to get more eyes on it and feedback specifically about whether I have any mistakes in there!
I was frustrated with how difficult I found making animations in matplotlib so I wrote something to make it easy and called it celluloid. I found the idea in plotnine and simply took out the plotnine specific code and generalized it some more (adding support for subplots).
The goal is that your visualization code shouldn't need to modified at all or as little as possible. With celluloid you take "photos" of your visualization to create each frame. Once all the frames have been captured you can create an animation with one call. The readme has more details.
Spatial co-location pattern mining refers to the task of discovering the group of objects or events that co-occur at many places. Extracting these patterns from spatial data is very difficult due to the complexity of spatial data types, spatial relationships, and spatial auto-correlation. We model the co-location pattern discovery as a clique enumeration problem over a neighborhood graph (which is materialized using a distributed graph database). Further, we propose three new traversal based algorithms, namely CliqueEnumG, CliqueEnumK and CliqueExtend. These algorithms allow for a trade-off between time and memory requirements and support interactive data analysis without having to recompute all the intermediate results.