Hi there -- one of the authors of the paper here. Optimal stopping problems constitute an important class of stochastic control problems, with many real applications, such as pricing financial options. Typically they are solved using approximate dynamic programming methods, which involve coming up with some approximation of the value function or the continuation value function.

In this paper, we take a different approach, where we represent the stopping policy as a tree, and propose a methodology for learning this tree from the data; so in the same way that one comes up with a tree for predicting a binary label in classification, or predicting a continuous value in regression, one obtains a tree that prescribes an action for each possible state. We show using a standard benchmark problem in option pricing that these tree policies perform very well, while being as simple and interpretable as tree models used in other areas of machine learning. We appreciate any questions or comments!