It'd be cool if the results were clickable, so users could indicate actual origins of their names. Could make for a super interesting follow up post too
I was thinking for a while to add a "Is this correct?" kind of checkmark for people to click. But it greatly increases the complexity (have to add a database etc). And out of the 20 or so people who try it every day, I guess only a small fraction would actually click it. Not to mention how many would click it truthfully... Many of the queries people enter are profanities :p
But I think using those sorts of inputs for correction in an active-learning style is generally interesting. No idea if there's an "active" version of Multinomial NB.
The training data is highly biased, you should take prior prob into consideration, I tried a lot of chinese names, no chinese pops out. The result is not satisfactory at all