Wednesday, November 9, 2011

Complex Email Validation with Regular Expressions for Fun and Profit

a@b.com is a valid email address.

Well, not according to the regular expression pattern I've used to validate email addresses for last several years. And with new domain name changes coming in 2013, I thought it was time to reexamine and revise my email validation best-practices.

Here's my existing pattern, which works just fine about 90% of the time:
\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*

After reading an excellent (albeit ancient) blog post by Phil Haack (whose short and sweet pattern didn't accommodate a@b.com either), I found a good (and very legible) solution in a comment to said post by SeanG. What I like best about his solution is that he actually broke down the pattern into its component parts, instead of just slapping it down in onelongunintelligiblesinglestring. The commenting may seem a little excessive to some, but when it comes to regular expressions (which I don't tinker with very often), I prefer to be spoon fed.

The end result is a new static C# class based on that code.

One nice feature of this class is that not only can it be used for server-side validation, but it also exposes the pattern so it can be used for the ValidationExpression property on a RegularExpressionValidator control. And I've also handled the case where the email address is not required.


So now I've centralized the pattern in one place, and when those new domain names start showing up in 2013 I can make changes to accommodate them in a single place.


using System;
using System.Text.RegularExpressions;

namespace Web.Business.Validators
{
    public static class EmailValidator
    {
        #region Properties

        public static string RegexPattern
        {
            get
            {

                // <any CHAR excepting <">, "\" & CR, and including linear-white-space>
                string qtext = "[^\\x0d\\x22\\x5c\\x80-\\xff]";

                // <any CHAR excluding "[", "]", "\" & CR, & including linear-white-space>
                string dtext = "[^\\x0d\\x5b-\\x5d\\x80-\\xff]";
                // *<any CHAR except specials, SPACE and CTLs>
                string atom = "[^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+";
                // "\" CHAR 
                string quoted_pair = "\\x5c[\\x00-\\x7f]";
                // <"> *(qtext/quoted-pair) <">
                string quoted_string = string.Format("\\x22({0}|{1})*\\x22", qtext, quoted_pair);
                //atom / quoted-string
                string word = string.Format("({0}|{1})", atom, quoted_string);
                // "[" *(dtext / quoted-pair) "]"
                string domain_literal = string.Format("\\x5b({0}|{1})*\\x5d", dtext, quoted_pair);

                // atom
                string domain_ref = atom; 
                // domain-ref / domain-literal
                string sub_domain = string.Format("({0}|{1})", domain_ref, domain_literal);
                // sub-domain *("." sub-domain)
                string domain = string.Format("{0}(\\x2e{0})*", sub_domain);
                // word *("." word) 
                string local_part = string.Format("{0}(\\x2e{0})*", word);
                // local-part "@" domain
                string addr_spec = string.Format("{0}\\x40{1}", local_part, domain);                 // add starting position and ending position
                string regexPattern = string.Format("^{0}$", addr_spec);

                return regexPattern;
            }
        }

        #endregion

        #region Public Methods

        public static bool IsValid(string emailAddress)
        {
            return IsValid(emailAddress, true);
        }

        /// <summary>
        /// RFC822 complaint email address validation.
        /// see http://iamcal.com/publish/articles/php/parsing_email/ for explaination
        /// </summary>
        /// <param name="emailAddress">Email address to check.</param>
        /// <param name="isRequired">Is email address required?</param>
        /// <returns><c>false</c> if not valid email address, otherwise <c>true</c>.</returns>
        public static bool IsValid(string emailAddress, bool isRequired)
        {
            // Check to see if email address is required

            if (!isRequired && string.IsNullOrEmpty(emailAddress.Trim()))
            {

                // Email address not required
                return true;
            }

            return new Regex(RegexPattern).IsMatch(emailAddress);
        }

        #endregion
    }
}


And just for additional yucks, here's a unit test with some pretty wacky examples of both valid and invalid email addresses. (Test cases come from above blog post and Wikipedia.)

/// <summary>
/// A test for IsValid
/// </summary>
[TestMethod()]
public void IsValidTest()
{
    ValidEmailAttribute target = new ValidEmailAttribute();

    // Test valid email addresses
    Assert.AreEqual(true, target.IsValid(null, false));
   
Assert.AreEqual(true, target.IsValid(string.Empty, false));
   
Assert.AreEqual(true, target.IsValid("a@b.com"));
   
Assert.AreEqual(true, target.IsValid("a@b.co"));
   
Assert.AreEqual(true, target.IsValid("a@b.c"));
   
Assert.AreEqual(true, target.IsValid("a.b.c'@example.com"));
   
Assert.AreEqual(true, target.IsValid(@"""Abc\@def""@example.com"));
   
Assert.AreEqual(true, target.IsValid(@"""Fred Bloggs""@example.com"));
   
Assert.AreEqual(true, target.IsValid(@"""Joe\\Blow""@example.com"));
   
Assert.AreEqual(true, target.IsValid(@"""Abc@def""@example.com"));
   
Assert.AreEqual(true, target.IsValid("customer/department=shipping@example.com"));
   
Assert.AreEqual(true, target.IsValid("$A12345@example.com"));
   
Assert.AreEqual(true, target.IsValid("!def!xyz%abc@example.com"));
   
Assert.AreEqual(true, target.IsValid("_somename@example.com"));
   
Assert.AreEqual(true, target.IsValid("niceandsimple@example.com"));
   
Assert.AreEqual(true, target.IsValid("a.little.unusual@example.com"));
   
Assert.AreEqual(true, target.IsValid("a.little.more.unusual@dept.example.com"));
   
Assert.AreEqual(true, target.IsValid(@"much.""more\ unusual""@example.com"));
   
Assert.AreEqual(true, target.IsValid(@"very.unusual.""@"".unusual.com@example.com"));
   
Assert.AreEqual(true, target.IsValid(@"very.""(),:;<>[]"".VERY.""very\\\ \@\""very"".unusual@strange.example.com"));

    // character @ is missing
    Assert.AreEqual(false, target.IsValid("Abc.example.com"));
    // only one @ is allowed outside quotations marks

    Assert.AreEqual(false, target.IsValid("A@b@c@example.com"));
    // none of the characters before the @ in this example is allowed outside quotation marks
    Assert.AreEqual(false, target.IsValid(@"""(),:;<>[\]@example.com")); 

    // quoted strings must be dot separated or the only element making up the local-part
    Assert.AreEqual(false, target.IsValid(@"just""not""right@example.com")); 

    // spaces, quotes and slashes may only exist when within quoted strings and preceded by a slash 
    Assert.AreEqual(false, target.IsValid(@"this\ is\""really\""not\\allowed@example.com"));
}


Share & Enjoy!

6 comments:

  1. i think this should be obvious, but i'm new to .net... how do you implement this with a custom validator?

    thanks!

    ReplyDelete
  2. 1. Create a TextBox control on your page called "EmailTextBox".

    2. Cretae a RegularExpressionValidator control on your page called "EmailRegularExpressionValidator".

    3. Set the ControlToValidate property of EmailRegularExpressionValidator to "EmailTextBox".

    4. In the Page_Load event of your code behind, set the ValidationExpression property of your RegularExpressionValidator control to the RegexPattern proerty of the EmailValidator object:
    EmailRegularExpressionValidator.ValidationExpression = EmailValidator.RegexPattern;

    5. Make sure you check the Page.IsValid property at the beginning of the OnClick event of your form submission button.
    protected void SubmitImageButton_Click(object sender, EventArgs e)
    {
      // Check for page validation
      Page.Validate();
      if (!Page.IsValid)
      {
        throw new ApplicationException("Validation error on page.");
      }

      // Data entered for all controls appears to be valid!
    }

    I hope that helps...if not, let me know.

    ReplyDelete
  3. RFC 822 is not the standard any more. It hasn't been for a very long time. You want to look at RFC 5321.

    ReplyDelete
    Replies
    1. Thanks Michael...will do, and then will post a code update. Cheers!

      Delete
  4. This is an excellent component for verifying email addresses:
    http://www.kellermansoftware.com/p-37-net-email-validation.aspx

    ReplyDelete
    Replies
    1. Thanks Asava - I'll check that out. Cheers!

      Delete