Blogs

Mehul Harry's DevExpress Blog

ASP.NET Captcha – Image Generation

     

ASPxCaptcha Control Last week, you saw a sneak peek of the new ASP.NET Captcha control that’s coming out in the DXperience v2010.1 release. In this follow-up post, you’ll learn about the ASPxCaptcha control and:

  • how it creates the Captcha images
  • why ASPxCaptcha doesn’t use background noise
  • how ASPxCaptcha performed in an OCR test versus a competitor’s Captcha control

Guidelines And Character Set

While designing the ASPxCaptcha, our team reviewed proper guidelines and recommendations:

"Strong CAPTCHA Guidelines by Jonathan Wilkins" [http://www.scribd.com/doc/24497942/Strong-CAPTCHA-Guidelines-v1-2]

Using one of the article’s recommendations, the ASPxCaptcha uses a default character set which excludes symbols that are hard for the end user to recognize. However, you can still define your own set of characters and length using the ASPxCaptcha’s properties.

ASPxCaptcha CharacterSet Property

Algorithm Challenge

The primary goal of the Captcha control is to make it easy to decipher for people but difficult for machines. And the problem of “machine recognition” is really divided into three parts:

  1. Pre-processing - removal of your background and noise
  2. Segmentation - selection of regions in the original image that contain the individual characters
  3. Classification - identification of the characters in each region

The first and third problems are easily solved by most modern and public optical character recognition (OCR) software.

However, the segmentation problem seems to be the last one for spammers to crack easily. So far, there is no universal and trivial algorithm for this step of machine Captcha solving. This ‘recognition’ step requires more researching and computing power from the spammers. And lucky for us, most spammers either do not have this computing power or they do not want to invest in it.

Revealing Microsoft Study

Microsoft HIPS Research Don’t believe me? Check out this study conducted by Microsoft Research which confirms these findings:

"Building Segmentation Based Human-Friendly Human Interaction Proofs (HIPS)" - [http://research.microsoft.com/en-us/um/people/kumarc/pubs/chellapilla_hip05.pdf]

The research reveals some very unexpected results.

Based on the research from Microsoft, machines understand skewed images with noise much better than humans. So, we can conclude that noise and strong distortions are not very effective methods of protection. In fact, the noise in images hampers recognition for the end-users.

This is why the ASPxCaptcha does not use noise in the images. Instead, the image generation algorithm focuses more on the segmentation, specifically, cutting away the segments between the characters.

OCR Test

And the results are easy to verify. Using any moderately priced OCR (for example, FineReader), we verified that NONE of the ASPxCaptcha images were recognized!

For comparison, we took a look at one of our competitor’s Captcha controls. The competitor’s Captcha was identified about 90% of the time by the OCR.

So What Are The Big Companies Using?

Most giants like Google are using the approach of generating a picture similar to how the ASPxCaptcha generates ones.

Google Captcha ASPxCaptcha - Google Style
Google ASPxCaptcha

Vector Not Bitmap Fonts

The ASPxCaptcha uses vector fonts for characters instead of bitmap fonts.

Why does this matter to you? Because it gives you more customization options with the ASPxCaptcha!

All other implementations that we examined used bitmap fonts during the process of gap removal between characters. This way of removing gaps is easier to implement but you can’t fit rendered text in a bigger image than the one that it was designed for without a loss in quality. Therefore, these other Captchas are not as customizable as the ASPxCaptcha which allows developers to customize its image size.

Anti-Aliasing FTW!

To give you better looking images, our ASP.NET team went the extra mile and developed a better way to render the Captcha image. Specifically, they developed a modification for bilinear filter for the anti-aliasing. This bilinear filter makes the image smoother and avoids the pixel staircase effect during skewing.

anti-aliasing is the technique of minimizing the distortion artifacts known as aliasing when representing a high-resolution signal at a lower resolution. [Wikipedia]

This type of modification is used for rendering textures in computer graphics. For example, when you are close to a wall, you see blurred texture, not large square pixels. And this anti-aliasing also works very well for the ASPxCaptcha.

Coming in DXperience v2010.1

ASPxCaptcha will be available in DXperience v2010.1 which should be released sometime around the April timeframe.

DXperience? What's That?

DXperience is the .NET developer's secret weapon. Get full access to a complete suite of professional components that let you instantly drop in new features, designer styles and fast performance for your applications. Try a fully-functional version of DXperience for free now: http://www.devexpress.com/Downloads/NET/

Published Mar 04 2010, 12:15 PM by Mehul Harry (DevExpress)
Technorati tags: Features, v2010.1, DXperience, ASP.NET
Bookmark and Share

Comments

 

Chris Walsh [DX-Squad] said:

Awesome post Mehul.  Great work Team!

March 4, 2010 3:51 PM
 

Mehul Harry (DevExpress) said:

Thanks Chris! Will try to get a screencast recorded for it soon.

March 4, 2010 4:06 PM
 

Henrik Brinch said:

Looking forward to go Captcha :)

Will the ASPxCaptcha control make use of timing-algorithm to determine if the input was performed by a human or a computer.  A human won't be able to enter a captcha in less than e.g. X seconds whereas a computer will do it in zero seconds.   It would be great if you also will implement this!

March 9, 2010 3:45 PM
 

Mehul Harry (DevExpress) said:

@Henrik,

I don't believe the initial release will contain an input timing feature but you can create a suggestion once the ASPxCaptcha is released and we'll consider it for a future release. Thanks!

March 9, 2010 3:53 PM
 

majid said:

great control

March 16, 2010 3:48 AM
 

Jared Stenzel said:

I'd always wondered about how a captcha system worked. It's such a huge part of almost every kind of site in the current day and yet it constantly beats bot after bot.

This was a very informative and interesting post. Thanks for the inside look into the captcha.

April 4, 2010 11:32 AM

About Mehul Harry (DevExpress)

Mehul Harry is an ASP.NET technical evangelist at Developer Express. You can reach him directly at mharry@DevExpress.com. You can also follow him on Twitter: http://twitter.com/mehulharry
More from DevExpress
Live Chat
Have a pre-sales question?
Need assistance with your evaluation?
We are here to help.
Chat is one of the many ways you can contact members of the DevExpress Team. We are available Monday-Friday between 8:30am and 5:00pm Pacific Time.
If you need additional product information, require pre-sales assistance, or want help with your order, write to us at info@devexpress.com or call us at
+1 (818) 844-3383.