This is a course I've started teaching. Topics covered include k-means, hierarchical clustering, t-SNE, PCA and NMF. Included are lots of exercises (as Jupyter notebooks) where these techniques can be practised, almost always on real-world data. I hope you like it!
"This data enables us to learn about individuals, and not just population averages."
Good luck selling that sort unconsented* analysis as privacy respecting!
*Marketing is about influencing without express consent of the target (though rarely against the explicit will of a person), influencing with consent is mostly the realm of self-help books, doctors, bank clerks and others.
Regarding the MSc it's more than I could ask for. There are introductory courses in statistics, programming, databases, machine learning etc and there are specific elective courses like Big Data Systems, Natural Language Processing etc.
TBH I think that especially the math/stats background is something invaluable that is not really learned by reading data science specific books, and usually it's neglected. It might not be what strictly prerequisite to work with machine learning algorithms but it's a huge help to delve deeper into it.
I wrote this post as a reference point for having a system to quickly set up high-end VMs on AWS.
The problem I was usually faced with as a MSc student in data science is that I would be trying to develop/run machine learning algorithms on my laptop but it would take too much time.
The two alternatives I have are either buy a high end PC or learn how to use the cloud VMs.
Since I didn't have the budget to buy a high end PC, I was left with the option to create VMs on AWS, though this had problems of it's own, mainly it takes a bit of time to create and configure the machine.
That's why I tried to automate this procedure and ended up with this guide.
Pseudo labeling is an interesting technique and the code you provided is interesting too.
However i strongly disagree with your conclusion. The competitions had a huge leader board shake-up. So saying you had a gain on the leaderboard is simply false. You finished 2551/3835 and lost 759 places on the private leader board. Therefore saying that pseudo labeling improved your score is in my opinion not right.
Moreover i haven't seen yet any Kaggle master use this technique in order to improve their model.
This is a blog post from a colleague that discusses the role of the choice of tree in hierarchical softmax in e.g. word2vec. It reproduces some experiments of Mnih and Hinton, but measures performance on the word analogy task (instead of language modelling).