DataTaunew | comments | leaders | submitlogin
Industrial strength Python NLP library spacy is now 100% free (spacy.io)
27 points by elyase 3104 days ago | 6 comments


1 point by luto 3103 days ago | link

I was interested in the vector component, but it doesn't seem to work out of the box. I conda installed and downloaded all, but when I tried to do the vector example of apples.similarity(oranges), it threw an attribute error. When I tried to inspect the token object it also has no vector attribute....

I guess it's back to gensim. :/

-----

3 points by syllogism 3102 days ago | link

The version on conda is out of date --- it's v0.89, the latest is v0.93.

I'm working on building the library for conda, so that it's up to date automatically.

In the meantime, "pip install spacy" should work.

-----

1 point by carbonatedmilk 3104 days ago | link

I tried to like Spacy, but it's so Unpythonic it hurts. I actually prefer using CoreNLP's Python bindings because at least it returns JSON. Not knocking it, it appears to be a good library, but it's just anti-pattern up the wazoo.

-----

6 points by syllogism 3104 days ago | link

(Author here.)

I do get what you mean, because I generally disprefer libraries that return their own objects, instead of plain data types.

But I wrote it this way because it's the only way I could actually get all the functionality I wanted into the API.

The thing is, there are all these different representations. You need to be able to iterate over words, sentences, phrases, entities, and the syntactic dependency tree. And you should be able to associate between these representations, e.g. find the word vector of an entity's syntactic head.

So, that's why you get this opaque sequence type, "Doc", and everything is a view from it. I think otherwise, the library just doesn't do enough.

If there's some aspect of your work-flow that the library broke in particular, I'd appreciate if you'd share it. For instance, I get that some people heavily rely on inspecting the doc strings, which I don't do --- so that data not being consistently there is something I need to fix.

-----

1 point by elyase 3104 days ago | link

Blog post: http://spacy.io/blog/spacy-now-mit/

-----

1 point by SixSigma 3104 days ago | link

I wonder if anyone paid $20k?

-----




RSS | Announcements