Validating an email address using a regular expression can be tricky. If you wanted to follow the official RFC you would have to use the following monstrosity:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
Obviously, that is a regular expression that is impossible to understand the meaning of, let alone a practical one to use in a real life web application. Consider that 99.9% of all email addresses use the following formats:
[email protected]
[email protected]
[email protected]
Throw in a few special characters that should be alloed, namely - . + and _ and we can create a regular expression to match almost any email address in use today. This is what we come up with:
[-0-9a-zA-Z.+_]+@[-0-9a-zA-Z.+_]+\.[a-zA-Z]{2,4}
This will match a character in the group [-0-9a-zA-Z.+_] one or more times, followed by an @ sign. Then we have the same group again, and a final dot followed by the top-level domain. We allow a top-level domain between two and four characters, upper case and lower case.
If you wanted to use this regular expression to verify an email address in PHP, it’s as simple as this line:
if (!preg_match("/[-0-9a-zA-Z.+_]+@[-0-9a-zA-Z.+_]+\.[a-zA-Z]{2,4}/", $email)) die("Invalid email address");
Enjoy, and leave any feedback you have in the comments section!