I was interested in the vector component, but it doesn't seem to work out of the box. I conda installed and downloaded all, but when I tried to do the vector example of apples.similarity(oranges), it threw an attribute error. When I tried to inspect the token object it also has no vector attribute....
I tried to like Spacy, but it's so Unpythonic it hurts. I actually prefer using CoreNLP's Python bindings because at least it returns JSON.
Not knocking it, it appears to be a good library, but it's just anti-pattern up the wazoo.
I do get what you mean, because I generally disprefer libraries that return their own objects, instead of plain data types.
But I wrote it this way because it's the only way I could actually get all the functionality I wanted into the API.
The thing is, there are all these different representations. You need to be able to iterate over words, sentences, phrases, entities, and the syntactic dependency tree. And you should be able to associate between these representations, e.g. find the word vector of an entity's syntactic head.
So, that's why you get this opaque sequence type, "Doc", and everything is a view from it. I think otherwise, the library just doesn't do enough.
If there's some aspect of your work-flow that the library broke in particular, I'd appreciate if you'd share it. For instance, I get that some people heavily rely on inspecting the doc strings, which I don't do --- so that data not being consistently there is something I need to fix.