DataTaunew | comments | leaders | submitlogin
5 points by pmlandwehr 3480 days ago | link | parent

So, an important follow up to this if you don't want to deal with Java: Myle Ott made a Python port of what I believe is the latest version. It generates results that are almost-but-not-quite identical with the Java version; 83 tweets differed in a text corpus of 1,000,000. You can get it on GitHub at https://github.com/myleott/ark-twokenize-py



RSS | Announcements