DataTaunew | comments | leaders | submitlogin
1 point by carbonatedmilk 3135 days ago | link | parent

I tried to like Spacy, but it's so Unpythonic it hurts. I actually prefer using CoreNLP's Python bindings because at least it returns JSON. Not knocking it, it appears to be a good library, but it's just anti-pattern up the wazoo.


6 points by syllogism 3135 days ago | link

(Author here.)

I do get what you mean, because I generally disprefer libraries that return their own objects, instead of plain data types.

But I wrote it this way because it's the only way I could actually get all the functionality I wanted into the API.

The thing is, there are all these different representations. You need to be able to iterate over words, sentences, phrases, entities, and the syntactic dependency tree. And you should be able to associate between these representations, e.g. find the word vector of an entity's syntactic head.

So, that's why you get this opaque sequence type, "Doc", and everything is a view from it. I think otherwise, the library just doesn't do enough.

If there's some aspect of your work-flow that the library broke in particular, I'd appreciate if you'd share it. For instance, I get that some people heavily rely on inspecting the doc strings, which I don't do --- so that data not being consistently there is something I need to fix.

-----




RSS | Announcements