DataTaunew | comments | leaders | submitlogin
An Introduction to Unsupervised Learning via Scikit Learn (github.io)
12 points by bugra 3420 days ago | 1 comment


2 points by jonan 3419 days ago | link

You're making the hierarchical varieties of clustering look worse than they are by constraining them to k=2.

Ideally, you'd look at the dendrograms and decide which level to cut at. So for the spiral at the bottom you might decide that k=1 is appropriate, and for the 3 blobs you'd decide that 2 or 3 are both acceptable.

Also, why is average linkage paired with 'affinity="cityblock"' ? When using Ward's criterion you're using Euclidean distance, right?

-----




RSS | Announcements