It looks like Google is implementing Deep Learning with a cluster of machines to achieve data/model paralellisms. On the other hand, most state of the art publication results are implemented with a GPU/CUDA approach. Which one is the better approach?