DataTaunew | comments | leaders | submitlogin
Recommender Systems: Factorizing a 28 billion elements matrix with Apache Flink (data-artisans.com)
10 points by rmetzger 3347 days ago | 4 comments


2 points by SixSigma 3345 days ago | link

A full recalculation takes ~5 hours.

If it's block based, would it be possible to do partial recalculation seeing as the corpus is mostly static?

-----

1 point by stsffap 3342 days ago | link

You could use the current item and user matrix as seed for the ALS algorithm. If the solution is close to the original solution, then it should faster converge.

-----

2 points by arshak 3347 days ago | link

Have you considered a model based approach? Vowpal Wabbit has a the ability to achieve significant compression for collaborative filtering problems:

https://github.com/JohnLangford/vowpal_wabbit/wiki/Matrix-fa...

-----

2 points by sewen 3347 days ago | link

Article has experiments on Google Compute Engine for a 40 million x 5 million sparse matrix (28 billion entries), 50 latent factors.

-----




RSS | Announcements