> For example, in a drug therapy application, you may build a model to predict patient survival as a function of doses of different drugs; you would then use the paper's approach to find the doses that maximize predicted survival.
You mean a random survival forest?
So in this case the doses is the.. controllable independent variable?
I don't get how this work.
> the second step involves solving an optimization problem to find the drug therapy that maximizes the predicted
survival of the given patient group subject to a constraint on the predicted toxicity.
Aren't you undoing CART and making it more of a bagging problem which introduce the greedy algorithm problem that CART tries to solve from bagging?
I have to read more into this when I have time but thank you for your work in this field. My thesis is also on tree based algorithm. Always glad to see more paper in trees.
Interesting article, I admire your dedication to data collection! A zero-inflated Poisson model might be more suitable for your data, since there are a lot of days where you didn't find any coins. It assumes the data generating process has two stages: in the first there is a Bernoulli trial with probability of success p. In the case of no success then there are zero observed events. In the second stage, for cases where there was a success in the first stage, the number of events is determined according to a standard Poisson distribution. I think there's an R package to fit this kind of model.
Yeah, among other biases. "Voluntary response data are useless." I've also tried posting it on /r/datascience but it appears to have not showed up in the sub, maybe because my Reddit account is too new. Do you have suggestions for other DS communities where I could solicit responses?
I think it's more likely than not that you're aware of this, but there's a large selection bias in collecting respondents from DataTau. Those who did not get a job in data science following DS bootcamp are far less likely to be browsing DataTau than those who did.
Hey, I created this survey. The burning question with DS bootcamps seems to be, what are the actual employment prospects after such a short program. So this is more of an "outcome" survey than one on salaries.
I'd like to especially encourage those of you who have not received offers, or who had negative bootcamp experiences, to complete the survey. Your input is valuable and can help others decide on whether to pursue this avenue.
I will of course share the data after collecting enough responses.
Hi all -- author of the paper here. In this paper I consider the problem of how to take a tree ensemble model, like a random forest, and find how the independent variables should be set to maximize the predicted value given by the ensemble. For example, in a drug therapy application, you may build a model to predict patient survival as a function of doses of different drugs; you would then use the paper's approach to find the doses that maximize predicted survival.
The methodology is based on mixed-integer optimization and includes results on the strength of the formulation, how to approximate the formulation and how to exploit the formulation structure to obtain solution methods for solving the problem at scale. The numerics include two case studies, one using data from the Merck Molecular Challenge on Kaggle a few years ago, and one using a grocery store scanner data set.
If you have any comments or thoughts, I'd be very interested to hear them. Thank you!
I'm the author of this piece. It describes a very simple experiment in which I counted how much change I picked up for a month, and my analysis of that data. It might be especially interesting if you're interesting in counting rare events or in cases where normal approximations aren't appropriate.