DataTaunew | comments | leaders | submitlogin

It's better if you don't want to spend time on infrastructure. The process of running multiple experiments for fine-tuning or distributed experiments on multi-nodes could become a big problem. Also sharing resources with the rest of the team (Memory, CPU, and GPU) is not completely straightforward, and most of the time requires an in house solution. This is why we built polyaxon, to abstract all these engineering work, so that data scientists can focus on developing machine learning and deep learning algorithms, without worrying too much about infrastructure.

How is this better than docker+ami+aws?

Here is a curated list data science blogs. What are your favourite blogs?

Totally! One of my previous projects was helping build out the Data Observatory at Carto (, so for sure this has been on my mind. Our focus right now is exposing a lot of the real-time/rolling segmentation (e.g. currently is daily active [driver, app user, buyer], normally a weekly active [driver, app user, buyer]) that is possible, but soon mixing in some of those external variables.

Speed and connectivity etc are already being used to generate features.

Thanks for the q!

Have you considered overlaying demographic information and regional weather info? Device connection speed? etc

The two biggest benefits are that we can build very nuanced data models at the individual level that support many of our features being extracted. But the primary motivation was to provide features that could be modeled offline by data scientists and models produced could be put back into production, on the mobile device.

That said, there are features that will require server side, and those will come down the road.

We are building out features that fall into,

user features: e.g. 'at home'

basic app features: e.g. 'daily active app user' or 'is active'

custom app features: e.g. 'item in cart'

device features: e.g. lots of regional, temporal, and connectivity breakdowns

The above highlight a number of the semantic features, there are also a host of temporal characteristics based on long and short term time-series.

finally we are working on profiles/segmentation, both general and real-time. e.g. '9-5er' 'workaholic' etc.

What kind of feature can you provide? device level or app level?

What is the benefit of doing feature engineering on device? <- clickable link
1 point by agawronski 16 days ago | link | parent | on: R vs Python


for(i in c(1:10)) { printi; }

should be:

for(i in c(1:10)) { print(i); }

1 point by agawronski 16 days ago | link | parent | on: R vs Python

This very first entry is incorrect: Installing Libraries import pandas

should be: pip install pandas

or perhaps easy install, or conda install ...

1 point by nickat 17 days ago | link | parent | on: Datasets in Python

An overview of the "built-in" datasets in Python packages, such as statsmodels, scikit-learn, and seaborn. Plus code examples of how to get these datasets in form of pandas DataFrames in just 1-2 lines of code.

i highly doubt it

this is really nice, pretty intuitive and clean interface. thanks :)


This dataset was scraped to predict player performances in future games for the purposes of building a fantasy team.
1 point by alexperrier 24 days ago | link | parent | on: AutoML on AWS

Using AutoML for feature engineering and AWS Machine learning service for model training to build a predictive analytics pipeline.

Example to show how to plot a confusion matrix to display counts and labels (TN , FN, TP, FP) using sklearn.metrics.confusion_matrix and matplotlib
1 point by mohapsat 25 days ago | link | parent | on: Salesforce partners in India


Please don't advertise here.

1 point by jwkvam 29 days ago | link | parent | on: JupyterLab notebook vim mode

I wanted to try out JupyterLab but not having a Vim mode was too painful so I made this. Right now it's a poor man's version of jupyter-vim-binding [1], but the basics work. If you like Vim and want to try out JupyterLab give it a shot and let me know how it goes.


1 point by akaluismaia 31 days ago | link | parent | on: Gain Muscle Mass Naturally


What kind of notepad do you use?


We talk about our setup to build and train models performing quick (in milliseconds) predictions but also why we built a custom prediction library on top of Spark and some tools that helped industrialize the process.

Really enjoyed reading through this. Thanks for sharing.

Great Work!

You guys are shameless. Everywhere you go you are banned. !s there anything more to your company than pathetic self-promotion?

Is this worth the $10?

Mark, Thanks for doing the benchmark. We are really excited to come out with flying colors and becoming the fastest GPU Database in the market.

RSS | Announcements