This is a blog post from a colleague that discusses the role of the choice of tree in hierarchical softmax in e.g. word2vec. It reproduces some experiments of Mnih and Hinton, but measures performance on the word analogy task (instead of language modelling).