1 point by blograbbit21 8 days ago | link | parent | on: Does Random Forest Overfit?

I do not think it does but besides lets look deeper into it

While I agree with a lot of the points that are made in the Nature article Scientists rise up against statistical significance (, I worry that they are changing one measure that can be abused for another.

Hey, we built amie-fern to address the version control and reproducibility issues from rapid prototyping with Jupyter notebooks. It is a Jupyter labs extension + web app that automatically tracks code, variables, data, and their dependencies in an interactive graph so you can explore your results and create a script that gets you back to any point in your workflow. Check out our short video, sign up for free, and try it! We'd love to hear what you think. Thanks 🙏!

Author here:

Even though this is a post about joining an engineering team. I think it's just as relevant for data scientists. I've spent a significant amount of time working with data and I consider myself a data scientist as well.

I wrote up the exact steps I took to break into data science. Feel free to contact me with any questions!
1 point by lexda45 60 days ago | link | parent | on: Data Science Remote Jobs

It offers a lot of remote projects for data scientists.

How do you search for projects?

I created the Machine Learning Canvas to make it easier to ask the right questions at the beginning of an ML project, and to save people from wasting time and money due to a poor design of their ML system. I’m now releasing the first draft of a book that contains everything there is to know about this framework, in a 1-hour read.

The contest is now live!

Haha, so the author goes to interviews to map interview processes? ;-)

An introductory guide on how to do sales, revenue, conversion, etc. forecasting with Google Sheets only.

It's not clear to me why I would want to explore the hyperparameters in a web UI.

The goal here is to give to the ML community an open access, asynchronous, yet powerful, hyperparameters optimization tool. But it is just a Beta for now and we need feedback guys ! :)

The project is based on our open source optimization library (for now based on TPE-like):

And you can interact with this library with a whole ecosystem of clients :

- A web client : directly on bender's website, you can visualize the optimization process on nice graphs; and compare the performances of different models on the same problem with a ranking board that ultimately allows you to pick the best model with the best hyperparameters set.

- A python one, a R one : it allows you to get automatic suggestions of hyperparameters set to test within your code.

Everything is documented on this readthedocs :

did not know kubeflow let you do gpu
4 points by ajschumacher 106 days ago | link | parent | on: Gaussian Processes are Not So Fancy

Hi! I wrote that post! Another friend pointed out yesterday this other post about Gaussian Processes: I think that post has some fun visualizations for showing how different kernels work, but I tend to prefer my explanation. Would love to get more eyes on it and feedback specifically about whether I have any mistakes in there!
2 points by wminshew 141 days ago | link | parent | on: Emrys: p2p gpu compute marketplace

Hi -- founder here open to questions, currently looking for beta users ($10 in compute credits ~= 50 hours on a gtx 1080ti at current pricing) wiling to give feedback.

For users, emrys does ~4 things:

1. uploads a python script and requirements to the server with which a docker image is built

2. syncs the data set, if it exists locally, to the server

3. auctions the job’s execution to supplier’s meeting the user’s hardware requirements

4. streams output logs back to the user & downloads anything the python script saved in ./output/

More information can be found in the docs ( or by contacting support ([email protected])

1 point by jwkvam 151 days ago | link | parent | on: Matplotlib animations made easy

I was frustrated with how difficult I found making animations in matplotlib so I wrote something to make it easy and called it celluloid. I found the idea in plotnine and simply took out the plotnine specific code and generalized it some more (adding support for subplots).

The goal is that your visualization code shouldn't need to modified at all or as little as possible. With celluloid you take "photos" of your visualization to create each frame. Once all the frames have been captured you can create an animation with one call. The readme has more details.

I think the main audience for this is people who read the matplotlib animation tutorial ( and thought it was still too complex.

I'm curious if you all think this is useful or how it could be improved. Thanks!

> Important properties of the Normal distribution:

> * The total area under the curve is 1.

If you think this is a property of the "Normal distribution", rather than a property of distributions in general, let me go ahead and completely ignore your article. is another one
1 point by tomtx 155 days ago | link | parent | on: Building AI & Busting Silos

Why there are just few real AI products for customers in finance?

Why do you think there are so few whiskeys combining smokiness with complexity?

