DataTaunew | comments | leaders | submitlogin
2 points by vvmisic 2481 days ago | link | parent

Hi all -- author of the paper here. In this paper I consider the problem of how to take a tree ensemble model, like a random forest, and find how the independent variables should be set to maximize the predicted value given by the ensemble. For example, in a drug therapy application, you may build a model to predict patient survival as a function of doses of different drugs; you would then use the paper's approach to find the doses that maximize predicted survival.

The methodology is based on mixed-integer optimization and includes results on the strength of the formulation, how to approximate the formulation and how to exploit the formulation structure to obtain solution methods for solving the problem at scale. The numerics include two case studies, one using data from the Merck Molecular Challenge on Kaggle a few years ago, and one using a grocery store scanner data set.

If you have any comments or thoughts, I'd be very interested to hear them. Thank you!



1 point by lackadaisically 2474 days ago | link

> For example, in a drug therapy application, you may build a model to predict patient survival as a function of doses of different drugs; you would then use the paper's approach to find the doses that maximize predicted survival.

You mean a random survival forest?

So in this case the doses is the.. controllable independent variable?

I don't get how this work.

> the second step involves solving an optimization problem to find the drug therapy that maximizes the predicted survival of the given patient group subject to a constraint on the predicted toxicity.

Aren't you undoing CART and making it more of a bagging problem which introduce the greedy algorithm problem that CART tries to solve from bagging?

I have to read more into this when I have time but thank you for your work in this field. My thesis is also on tree based algorithm. Always glad to see more paper in trees.

-----




RSS | Announcements