Monday, October 06, 2008

CAPTCHA - Natural Language Processing


In my previous blog entry on CAPTCHA, I mentioned that, using natural language is a better method than having very distorted images to determine whether the user is human or not.

I mentioned the below statement in my previous article.



Add 1234534512 to 1232429530.2322, and multiply the result with 12143298.4345 and divide it by 983435.234, and ignore this value, and put 4 in the below text box.


The above statement is very easy for humans. But, for computers it is very hard. Natural language processing is not fully developed. If anybody develops a system which can understand the above sentence, then it would become the next biggest thing in computer science. I was always wondering why many companies are not following the above system, and still using very distorted images which are very difficult even for humans, and violating the basic purpose of CAPTCHA.

But, there are many people who use internet extensively without having good knowledge in English. There are many people who use internet and can understand single words, but not sentences. For them, understanding the above sentence may not be trivial. For using mail and chat, one does not need to know English. We cannot expect everyone to know English, when the service itself does not require English knowledge. That's why we still have Image Captchas in most of the places. However, there are few websites which expects the user to solve Natural Language Captcha, to post some content in English. If somebody is writing some content in English, then we can assume that, they can understand simple English. That's why, it is possible to use Natural Language Captcha in those cases. But, most of the services in the internet do not require English knowledge, so, we cannot use Natural Language in those services.

No comments:

Post a Comment