It's better if you don't want to spend time on infrastructure. The process of running multiple experiments for fine-tuning or distributed experiments on multi-nodes could become a big problem. Also sharing resources with the rest of the team (Memory, CPU, and GPU) is not completely straightforward, and most of the time requires an in house solution. This is why we built polyaxon, to abstract all these engineering work, so that data scientists can focus on developing machine learning and deep learning algorithms, without worrying too much about infrastructure.
Totally! One of my previous projects was helping build out the Data Observatory at Carto (https://carto.com/data-observatory/, https://cartodb.github.io/bigmetadata/) so for sure this has been on my mind. Our focus right now is exposing a lot of the real-time/rolling segmentation (e.g. currently is daily active [driver, app user, buyer], normally a weekly active [driver, app user, buyer]) that is possible, but soon mixing in some of those external variables.
Speed and connectivity etc are already being used to generate features.
The two biggest benefits are that we can build very nuanced data models at the individual level that support many of our features being extracted. But the primary motivation was to provide features that could be modeled offline by data scientists and models produced could be put back into production, on the mobile device.
That said, there are features that will require server side, and those will come down the road.
An overview of the "built-in" datasets in Python packages, such as statsmodels, scikit-learn, and seaborn. Plus code examples of how to get these datasets in form of pandas DataFrames in just 1-2 lines of code.
I wanted to try out JupyterLab but not having a Vim mode was too painful so I made this. Right now it's a poor man's version of jupyter-vim-binding , but the basics work. If you like Vim and want to try out JupyterLab give it a shot and let me know how it goes.
We talk about our setup to build and train models performing quick (in milliseconds) predictions but also why we built a custom prediction library on top of Spark and some tools that helped industrialize the process.