DataTaunew | comments | leaders | submitlogin
A Product similarity space with doc2vec (bookspace.co)
13 points by jeradf 2809 days ago | 5 comments


2 points by soates 2808 days ago | link

This is really cool! Great job.

-----

1 point by thegoz 2802 days ago | link

may I know what preprocessing steps did you do on the strings of the books? Good job. Really useful.

-----

3 points by jeradf 2802 days ago | link

For each book, I combined it's user reviews up to a max length of 10,0000 words. Any book with less than 500 total words was dropped. All punctuation and stop words were removed, and training was done using the doc2vec PV-DBOW method (`dm=0, dbow_words=True` in Gensim).

-----

1 point by thegoz 2801 days ago | link

you used only the reviews without using the text from the books themselves? if that is true, i am really impressed by how good it works.

-----

1 point by sparkEmpire 2808 days ago | link

Very interesting!

-----




RSS | Announcements