¿Cómo validar una dirección de correo electrónico utilizando una expresión regular?

A lo largo de los años he desarrollado lentamente una expresión regular que valida las direcciones de correo electrónico MOST correctamente, suponiendo que no utilicen una dirección IP como parte del servidor.

Lo uso en varios progtwigs PHP, y funciona la mayor parte del tiempo. Sin embargo, de vez en cuando me contacta alguien que tiene problemas con un sitio que lo usa, y termino teniendo que hacer algún ajuste (más recientemente me di cuenta de que no estaba permitiendo TLD de 4 caracteres).

¿Cuál es la mejor expresión regular que tiene o ha visto para validar correos electrónicos?

He visto varias soluciones que usan funciones que usan varias expresiones más cortas, pero prefiero tener una expresión larga y compleja en una función simple en lugar de varias expresiones cortas en una función más compleja.

La expresión regular totalmente compatible con RFC 822 es ineficaz y oscura debido a su longitud. Afortunadamente, RFC 822 fue reemplazado dos veces y la especificación actual para direcciones de correo electrónico es RFC 5322 . RFC 5322 conduce a una expresión regular que se puede entender si se estudia durante unos minutos y es lo suficientemente eficiente para su uso real.

Una expresión regular compatible con RFC 5322 se puede encontrar en la parte superior de la página en http://emailregex.com/ pero utiliza el patrón de dirección IP que está flotando en Internet con un error que permite 00 para cualquiera de los valores decimales de bytes sin signo en una dirección delimitada por puntos, que es ilegal. El rest parece ser consistente con la gramática RFC 5322 y pasa varias pruebas usando grep -Po , incluidos casos de nombres de dominio, direcciones IP, datos incorrectos y nombres de cuentas con y sin comillas.

Corrigiendo el error 00 en el patrón IP, obtenemos una expresión regular bastante rápida y funcional. (Raspe la versión representada, no la marca, para el código real).

(?: [a-z0-9! # $% & ‘* + / =? ^ _ `{|} ~ -] + (?: \. [a-z0-9! # $% &’ * + / =? ^ _ `{|} ~ -] +) * |” (?: [\ x01- \ x08 \ x0b \ x0c \ x0e- \ x1f \ x21 \ x23- \ x5b \ x5d- \ x7f] | \\ [\ x01- \ x09 \ x0b \ x0c \ x0e- \ x7f]) * “) @ (?: (?: [a-z0-9] (?: [a-z0-9 -] * [a-z0 -9])? \.) + [A-z0-9] (?: [A-z0-9 -] * [a-z0-9])? | \ [(? :(? 🙁 2 (5) [0-5] | [0-4] [0-9]) | 1 [0-9] [0-9] | [1-9]? [0-9])) \.) {3} ( ? 🙁 2 (5 [0-5] | [0-4] [0-9]) | 1 [0-9] [0-9] | [1-9]? [0-9]) | [ a-z0-9 -] * [a-z0-9]: (?: [\ x01- \ x08 \ x0b \ x0c \ x0e- \ x1f \ x21- \ x5a \ x53- \ x7f] | \\ [\ x01- \ x09 \ x0b \ x0c \ x0e- \ x7f]) +) \])

Aquí hay un diagtwig de la máquina de estados finitos para la expresión regular anterior, que es más claro que la propia expresión regular enter image description here

Los patrones más sofisticados en Perl y PCRE (biblioteca de expresiones regulares utilizada, por ejemplo, en PHP) pueden analizar correctamente RFC 5322 sin ningún problema . Python y C # también pueden hacer eso, pero usan una syntax diferente a las dos primeras. Sin embargo, si se ve obligado a utilizar uno de los muchos lenguajes de comparación de patrones menos potentes, entonces es mejor utilizar un analizador real.

También es importante comprender que validarlo según el RFC no le dice absolutamente nada sobre si esa dirección existe realmente en el dominio suministrado, o si la persona que ingresa la dirección es su verdadero propietario. Las personas firman otras listas de correo de esta manera todo el tiempo. La reparación requiere un tipo de validación más sofisticada que implique enviar a esa dirección un mensaje que incluya un token de confirmación destinado a ser ingresado en la misma página web que la dirección.

Los tokens de confirmación son la única forma de saber que tienes la dirección de la persona que ingresa. Esta es la razón por la que la mayoría de las listas de correo ahora usan ese mecanismo para confirmar las inscripciones. Después de todo, cualquiera puede dejar presionado president@whitehouse.gov , y eso incluso lo analizará como legal, pero no es probable que sea la persona en el otro extremo.

Para PHP, no debe usar el patrón dado en Validar una dirección de correo electrónico con PHP, de la manera correcta desde la cual cito:

Existe cierto peligro de que el uso común y la encoding descuidada generalizada establezcan un estándar de facto para las direcciones de correo electrónico que sea más restrictivo que el estándar formal registrado.

Eso no es mejor que todos los otros patrones no RFC. Ni siquiera es lo suficientemente inteligente como para manejar incluso el RFC 822 , y mucho menos el RFC 5322. Este , sin embargo, sí lo es.

Si quieres ser elegante y pedante, implementa un motor de estado completo . Una expresión regular solo puede actuar como un filtro rudimentario. El problema con las expresiones regulares es que decirle a alguien que su dirección de correo electrónico perfectamente válida no es válida (un falso positivo) porque su expresión regular no puede manejarlo es grosero y descortés desde la perspectiva del usuario. Un motor de estado para este fin puede validar e incluso corregir direcciones de correo electrónico que de otro modo se considerarían inválidas ya que desensambla la dirección de correo electrónico de acuerdo con cada RFC. Esto permite una experiencia potencialmente más agradable, como

La dirección de correo electrónico especificada ‘myemail @ address, com’ no es válida. ¿Quisiste decir ‘myemail@address.com’?

Consulte también Validar direcciones de correo electrónico , incluidos los comentarios. O comparar la dirección de correo electrónico Validar expresiones regulares .

Visualización de expresión regular

Demostración de Debuggex

No debe usar expresiones regulares para validar direcciones de correo electrónico.

En su lugar, use la clase MailAddress , así:

 try { address = new MailAddress(address).Address; } catch(FormatException) { //address is invalid } 

La clase MailAddress usa un analizador BNF para validar la dirección de acuerdo con RFC822.

Si realmente quieres usar una expresión regular, aquí está :

  (?: (?: \ r \ n)? [\ t]) * (?: (?: (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031 ] + (?: (?: (?: \ r \ n)? [\ t]
 ) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) | "(?: [^ \" \ R \\] | \\. | (?: (?: \ r \ n)? [\ t])) * "(? :( ?:
 \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \ \ ". \ [\] \ 000- \ 031] + (? :(? :(
 ?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) "(?: [ ^ \ "\ r \\] | \\. | (?: (?: \ r \ n)? [ 
 \ t])) * "(?: (?: \ r \ n)? [\ t]) *)) * @ (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 0
 31] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\ ]])) | \ [([^ \ [\] \ r \\] | \\.) * \
 ] (?: (?: \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] +
 (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]]) ) | \ [([^ \ [\] \ r \\] | \\.) * \] (?:
 (?: \ r \ n)? [\ t]) *)) * | (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z
 | (? = [\ ["() <> @,;: \\". \ [\]])) | "(?: [^ \" \ r \\] | \\. | (? :( ?: \ r \ n)? [\ t])) "" (?: (?: \ r \ n)
 ? [\ t]) *) * \ < (?: (?: \ r \ n)? [\ t]) * (?: @ (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \
 r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) | \ [([^ \ [\ ] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [
  \ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)
 ? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) | \ [([^ \ [\] \ r \ \] | \\.) * \] (?: (?: \ r \ n)? [\ t]
 ) *)) * (?:, @ (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [
  \ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *
 ) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031 ] + (?: (?: (?: \ r \ n)? [\ t]
 ) + | \ Z | (? = [\ ["() <> @,;:: \\". \ [\]])) | \ [([^ \ [\] \ R \\] | \\ .) * \] (?: (?: \ r \ n)? [\ t]) *)) *)
 *: (?: (?: \ r \ n)? [\ t]) *)? (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) +
 | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) "(?: [^ \" \ R \\] | \\. | ?: (?: \ r \ n)? [\ t])) "" (?: (?: \ r)
 \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ " . \ [\] \ 000- \ 031] + (? :(? :( ?:
 \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;:: \\". \ [\]])) | "(?: [^ \ "\ r \\] | \\. | (?: (?: \ r \ n)? [\ t
 ])) * "(?: (?: \ r \ n)? [\ t]) *)) * @ (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031
 ] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\] ])) | \ [([^ \ [\] \ r \\] | \\.) * \] (
 ?: (?: \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?
 : (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]]) | \ [([^ \ [\] \ r \\] | \\.) * \] (? :(?
 : \ r \ n)? [\ t]) *)) * \> (?: (?: \ r \ n)? [\ t]) *) | (?: [^ () <> @ ,; : \\ ". \ [\] \ 000- \ 031] + (? :(?
 : (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) "(? : [^ \ "\ r \\] | \\. | (?: (?: \ r \ n)?
 [\ t])) * "(?: (?: \ r \ n)? [\ t]) *) *: (?: (?: \ r \ n)? [\ t]) * (?: (?: (?: [^ () <> @,;: \\ ". \ [\] 
 \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\" . \ [\]])) | "(?: [^ \" \ r \\] |
 \\. | (?: (?: \ r \ n)? [\ t])) * "(?: (?: \ r \ n)? [\ t]) *) (?: \. (? : (?: \ r \ n)? [\ t]) * (?: [^ () <>

 @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ [ "() <> @,;: \\". \ [\]])) | "
 (?: [^ \ "\ r \\] | \\. | (?: (?: \ r \ n)? [\ t]))" "(?: (?: \ r \ n)? [ \ t]) *)) * @ (?: (?: \ r \ n)? [\ t]
 ) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\
 ". \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) * ) (?: \. (?: (?: \ r \ n)? [\ t]) * (?
 : [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [
 \]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) * | (?: [^ () <> @,;: \\ ". \ [\] \ 000-
 \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [ \]])) | "(?: [^ \" \ r \\] | \\. |
 ?: (?: \ r \ n)? [\ t])) "" (?: (?: \ r \ n)? [\ t]) *) * \ < (?: (?: \ r \ n)? [\ t]) * (?: @ (?: [^ () <> @ ,;
 : \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ [" () <> @,;: \\ ". \ [\]])) | \ [([
 ^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ "
 . \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @, ;: \\ ". \ [\]])) | \ [([^ \ [\
 ] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) * (?:, @ (?: (?: \ r \ n )? [\ t]) * (?: [^ () <> @,;: \\ ". \
 [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\ ". \ [\]])) | \ [([^ \ [\] \
 r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] 
 \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\" . \ [\]])) | \ [([^ \ [\] \ r \\]
 | \\.) * \] (?: *?) *) *: (?: (?: \ r \ n)? [\ t]) * )? (?: [^ () <> @,;: \\ ". \ [\] \ 0
 00- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) | "(?: [^ \" \ r \\] | \\
 . | (?: (?: \ r \ n)? [\ t])) * "(?: (?: \ r \ n)? [\ t]) *) (?: \. (? :( ?: \ r \ n)? [\ t]) * (?: [^ () <> @,
 ;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ [" ( ) <> @,;: \\ ". \ [\]])) |" (?
 : [^ \ "\ r \\] | \\. | (?: (?: \ r \ n)? [\ t]))" "(?: (?: \ r \ n)? [\ t ]) *)) * @ (?: (?: \ r \ n)? [\ t]) *
 (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\".
 \ [\]]) | | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *) ( ?: \. (?: (?: \ r \ n)? [\ t]) * (?: [
 ^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | ( ? = [\ ["() <> @,;: \\". \ [\]
 ])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) * \> ( ?: (?: \ r \ n)? [\ t]) *) (?:, \ s * (
 ?: (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\
 ". \ [\]])) |" (?: [^ \ "\ r \\] | \\. | (?: (?: \ r \ n)? [\ t]))" "(? : (?: \ r \ n)? [\ t]) *) (?: \. (? :(
 ?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (? :(? :(? : \ r \ n)? [\ t]) + | \ Z | (? = [
 \ ["() <> @,;: \\". \ [\]])) | "(?: [^ \" \ r \\] | \\. | (?: (?: \ r \ n)? [\ t])) "" (?: (?: \ r \ n)? [\ t
 ]) *)) * @ (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T
 ]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\]])) | \ [([^ \ [\] \ R \\] | \ \.) * \] (?: (?: \ r \ n)? [\ t]) *) (?
 : \. (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + ( ?: (?: (?: \ r \ n)? [\ t]) + |
 \ Z | (? = [\ ["() <> @,;:: \\". \ [\]])) | \ [([^ \ [\] \ R \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) * | (?:
 [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\". \ [\
 ]])) | "(?: [^ \" \ r \\] | \\. | (?: (?: \ r \ n)? [\ t])) * "(?: (?: \ r \ n)? [\ t]) *) * \ < (?: (?: \ r \ n)
 ? [\ t]) * (?: @ (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["
 () <> @,;: \\ ". \ [\]]) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)
 ? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <>

 @,;: \\ ". \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) * (?:, @ (?: (?: \ r \ n)? [
  \ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ ["() <> @,
 ;: \\ ". \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *) (?: \. (?: (?: \ r \ n)? [\ t]
 ) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\
 ". \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) * )) *) *: (?: (?: \ r \ n)? [\ t]) *)?
 (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ R \ n)? [\ T]) + | \ Z | (? = [\ ["() <> @,;: \\".
 \ [\]])) | "(?: [^ \" \ r \\] | \\. | (?: (?: \ r \ n)? [\ t])) "" (? :( ?: \ r \ n)? [\ t]) *) (?: \. (? :( ?:
 \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z | (? = [\ [
 "() <> @,;: \\". \ [\]])) | "(?: [^ \" \ r \\] | \\. | (?: (?: \ r \ n) ? [\ t])) * "(?: (?: \ r \ n)? [\ t])
 *)) * @ (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t])
 + | \ Z | (? = [\ ["() <> @,;:: \\". \ [\]])) | \ [([^ \ [\] \ R \\] | \\. ) * \] (?: (?: \ r \ n)? [\ t]) *) (?: \
 . (?: (?: \ r \ n)? [\ t]) * (?: [^ () <> @,;: \\ ". \ [\] \ 000- \ 031] + (?: (?: (?: \ r \ n)? [\ t]) + | \ Z
 | (? = [\ ["() <> @,;: \\". \ [\]])) | \ [([^ \ [\] \ r \\] | \\.) * \] (?: (?: \ r \ n)? [\ t]) *)) * \> (? :(
 ?: \ r \ n)? [\ t]) *)) *)?; \ s *) 

Esta pregunta se hace mucho, pero creo que debería dar un paso atrás y preguntarse por qué quiere validar las direcciones de correo electrónico sintácticamente. ¿Cuál es el beneficio realmente?

  • No detectará errores tipográficos comunes.
  • No impide que las personas ingresen direcciones de correo electrónico no válidas o inventadas, o que ingresen la dirección de otra persona.

Si desea validar que un correo electrónico es correcto, no tiene más opción que enviar un correo electrónico de confirmación y hacer que el usuario responda a eso. En muchos casos, tendrá que enviar un correo de confirmación de todos modos por razones de seguridad o por razones éticas (por lo tanto, no puede, por ejemplo, firmar a alguien en un servicio en contra de su voluntad).

Depende de a qué te refieras con mejor: si estás hablando de atrapar todas las direcciones de correo electrónico válidas, utiliza lo siguiente:

 (?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?: \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:( ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\ ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?: (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n) ?[ \t])*)*\< (?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\ r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t] )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])* )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*) *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?: \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031 ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\]( ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(? :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(? :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(? :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]| \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<> @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|" (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(? :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[ \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|( ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\< (?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,; :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([ ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\" .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\ ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\ r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\] |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\ .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@, ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(? :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])* (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[ ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\] ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*( ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:( ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[ \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(? :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+| \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\ ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\< (?:(?:\r\n) ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[" ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n) ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<> @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@, ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)? (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?: \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[ "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t]) *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]) +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\ .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:( ?:\r\n)?[ \t])*))*)?;\s*) 

( http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html ) Si está buscando algo más simple pero que captará la mayoría de las direcciones de correo electrónico válidas, pruebe algo como:

 "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$" 

EDITAR: Desde el enlace:

Esta expresión regular solo validará las direcciones a las que se les quitaron los comentarios y se los reemplazó por espacios en blanco (esto lo hace el módulo).

Todo depende de qué tan preciso quieras ser. Para mis propósitos, donde estoy tratando de mantener fuera cosas como bob @ aol.com (espacios en correos electrónicos) o steve (sin dominio) o mary@aolcom (sin período antes de .com), uso

 /^\S+@\S+\.\S+$/ 

Claro, coincidirá con las cosas que no son direcciones de correo electrónico válidas, pero es una cuestión de jugar la regla 90/10.

[ACTUALIZADO] He recostackdo todo lo que sé sobre la validación de la dirección de correo electrónico aquí: http://isemail.info , que ahora no solo valida sino que también diagnostica problemas con las direcciones de correo electrónico. Estoy de acuerdo con muchos de los comentarios aquí que la validación es solo una parte de la respuesta; ver mi ensayo en http://isemail.info/about .

is_email () sigue siendo, hasta donde sé, el único validador que le dirá definitivamente si una cadena determinada es una dirección de correo electrónico válida o no. Subí una nueva versión en http://isemail.info/

Cotejé casos de prueba de Cal Henderson, Dave Child, Phil Haack, Doug Lovell, RFC5322 y RFC 3696. 275 direcciones de prueba en total. Ejecuté todas estas pruebas contra todos los validadores libres que pude encontrar.

Trataré de mantener esta página actualizada a medida que las personas mejoren sus validadores. Gracias a Cal, Michael, Dave, Paul y Phil por su ayuda y cooperación en la comstackción de estas pruebas y la crítica constructiva de mi propio validador .

La gente debería conocer la errata contra RFC 3696 en particular. Tres de los ejemplos canónicos son, de hecho, direcciones inválidas. Y la longitud máxima de una dirección es de 254 o 256 caracteres, no 320.

Según la especificación W3C HTML5 :

 ^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$ 

Contexto:

Una dirección válida de correo electrónico es una cadena que coincide con la producción ABNF […].

Nota: Este requisito es una violación deliberada de RFC 5322 , que define una syntax para las direcciones de correo electrónico que es a la vez demasiado estricta (antes del carácter “@”), demasiado vaga (después del carácter “@”) y demasiado laxa ( permitiendo que los comentarios, los espacios en blanco y las comillas de maneras poco familiares para la mayoría de los usuarios) sean de utilidad práctica aquí.

La siguiente expresión regular compatible con JavaScript y Perl es una implementación de la definición anterior.

 /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/ 

Es fácil en Perl 5.10 o más nuevo:

 /(?(DEFINE) (?
(?&mailbox) | (?&group)) (? (?&name_addr) | (?&addr_spec)) (? (?&display_name)? (?&angle_addr)) (? (?&CFWS)? < (?&addr_spec) > (?&CFWS)?) (? (?&display_name) : (?:(?&mailbox_list) | (?&CFWS))? ; (?&CFWS)?) (? (?&phrase)) (? (?&mailbox) (?: , (?&mailbox))*) (? (?&local_part) \@ (?&domain)) (? (?&dot_atom) | (?&quoted_string)) (? (?&dot_atom) | (?&domain_literal)) (? (?&CFWS)? \[ (?: (?&FWS)? (?&dcontent))* (?&FWS)? \] (?&CFWS)?) (? (?&dtext) | (?&quoted_pair)) (? (?&NO_WS_CTL) | [\x21-\x5a\x5e-\x7e]) (? (?&ALPHA) | (?&DIGIT) | [!#\$%&'*+-/=?^_`{|}~]) (? (?&CFWS)? (?&atext)+ (?&CFWS)?) (? (?&CFWS)? (?&dot_atom_text) (?&CFWS)?) (? (?&atext)+ (?: \. (?&atext)+)*) (? [\x01-\x09\x0b\x0c\x0e-\x7f]) (? \\ (?&text)) (? (?&NO_WS_CTL) | [\x21\x23-\x5b\x5d-\x7e]) (? (?&qtext) | (?&quoted_pair)) (? (?&CFWS)? (?&DQUOTE) (?:(?&FWS)? (?&qcontent))* (?&FWS)? (?&DQUOTE) (?&CFWS)?) (? (?&atom) | (?&quoted_string)) (? (?&word)+) # Folding white space (? (?: (?&WSP)* (?&CRLF))? (?&WSP)+) (? (?&NO_WS_CTL) | [\x21-\x27\x2a-\x5b\x5d-\x7e]) (? (?&ctext) | (?&quoted_pair) | (?&comment)) (? \( (?: (?&FWS)? (?&ccontent))* (?&FWS)? \) ) (? (?: (?&FWS)? (?&comment))* (?: (?:(?&FWS)? (?&comment)) | (?&FWS))) # No whitespace control (? [\x01-\x08\x0b\x0c\x0e-\x1f\x7f]) (? [A-Za-z]) (? [0-9]) (? \x0d \x0a) (? ") (? [\x20\x09]) ) (?&address)/x

yo suelo

 ^\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$ 

Which is the one used in ASP.NET by the RegularExpressionValidator.

Don’t know about best, but this one is at least correct, as long as the addresses have their comments stripped and replaced with whitespace.

Seriamente. You should use an already written library for validating emails. The best way is probably to just send a verification e-mail to that address.

The email addresses I want to validate are going to be used by an ASP.NET web application using the System.Net.Mail namespace to send emails to a list of people. So, rather than using some very complex regular expression, I just try to create a MailAddress instance from the address. The MailAddress construtor will throw an exception if the address is not formed properly. This way, I know I can at least get the email out of the door. Of course this is server-side validation but at a minimum you need that anyway.

 protected void emailValidator_ServerValidate(object source, ServerValidateEventArgs args) { try { var a = new MailAddress(txtEmail.Text); } catch (Exception ex) { args.IsValid = false; emailValidator.ErrorMessage = "email: " + ex.Message; } } 

Quick answer

Use the following regex for input validation:

([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)+

Addresses matched by this regex:

  • have a local part (ie the part before the @-sign) that is strictly compliant with RFC 5321/5322,
  • have a domain part (ie the part after the @-sign) that is a host name with at least two labels, each of which is at most 63 characters long.

The second constraint is a restriction on RFC 5321/5322.

Elaborate answer

Using a regular expression that recognizes email addresses could be useful in various situations: for example to scan for email addresses in a document, to validate user input, or as an integrity constraint on a data repository.

It should however be noted that if you want to find out if the address actually refers to an existing mailbox, there’s no substitute for sending a message to the address. If you only want to check if an address is grammatically correct then you could use a regular expression, but note that ""@[] is a grammatically correct email address that certainly doesn’t refer to an existing mailbox.

The syntax of email addresses has been defined in various RFCs , most notably RFC 822 and RFC 5322 . RFC 822 should be seen as the “original” standard and RFC 5322 as the latest standard. The syntax defined in RFC 822 is the most lenient and subsequent standards have restricted the syntax further and further, where newer systems or services should recognize obsolete syntax, but never produce it.

In this answer I’ll take “email address” to mean addr-spec as defined in the RFCs (ie jdoe@example.org , but not "John Doe" , nor some-group:jdoe@example.org,mrx@exampel.org; ).

There’s one problem with translating the RFC syntaxes into regexes: the syntaxes are not regular! This is because they allow for optional comments in email addresses that can be infinitely nested, while infinite nesting can’t be described by a regular expression. To scan for or validate addresses containing comments you need a parser or more powerful expressions. (Note that languages like Perl have constructs to describe context free grammars in a regex-like way.) In this answer I’ll disregard comments and only consider proper regular expressions.

The RFCs define syntaxes for email messages, not for email addresses as such. Addresses may appear in various header fields and this is where they are primarily defined. When they appear in header fields addresses may contain (between lexical tokens) whitespace, comments and even linebreaks. Semantically this has no significance however. By removing this whitespace, etc. from an address you get a semantically equivalent canonical representation . Thus, the canonical representation of first. last (comment) @ [3.5.7.9] is first.last@[3.5.7.9] .

Different syntaxes should be used for different purposes. If you want to scan for email addresses in a (possibly very old) document it may be a good idea to use the syntax as defined in RFC 822. On the other hand, if you want to validate user input you may want to use the syntax as defined in RFC 5322, probably only accepting canonical representations. You should decide which syntax applies to your specific case.

I use POSIX “extended” regular expressions in this answer, assuming an ASCII compatible character set.

RFC 822

I arrived at the following regular expression. I invite everyone to try and break it. If you find any false positives or false negatives, please post them in a comment and I’ll try to fix the expression as soon as possible.

([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*]))*

I believe it’s fully complient with RFC 822 including the errata . It only recognizes email addresses in their canonical form. For a regex that recognizes (folding) whitespace see the derivation below.

The derivation shows how I arrived at the expression. I list all the relevant grammar rules from the RFC exactly as they appear, followed by the corresponding regex. Where an erratum has been published I give a separate expression for the corrected grammar rule (marked “erratum”) and use the updated version as a subexpression in subsequent regular expressions.

As stated in paragraph 3.1.4. of RFC 822 optional linear white space may be inserted between lexical tokens. Where applicable I’ve expanded the expressions to accommodate this rule and marked the result with “opt-lwsp”.

 CHAR =  =~ . CTL =  =~ [\x00-\x1F\x7F] CR =  =~ \r LF =  =~ \n SPACE =  =~ HTAB =  =~ \t < "> =  =~ " CRLF = CR LF =~ \r\n LWSP-char = SPACE / HTAB =~ [ \t] linear-white-space = 1*([CRLF] LWSP-char) =~ ((\r\n)?[ \t])+ specials = "(" / ")" / "< " / ">" / "@" / "," / ";" / ":" / "\" / < "> / "." / "[" / "]" =~ [][()<>@,;:\\".] quoted-pair = "\" CHAR =~ \\. qtext = , "\" & CR, and including linear-white-space> =~ [^"\\\r]|((\r\n)?[ \t])+ dtext =  =~ [^][\\\r]|((\r\n)?[ \t])+ quoted-string = < "> *(qtext|quoted-pair) < "> =~ "([^"\\\r]|((\r\n)?[ \t])|\\.)*" (erratum) =~ "(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*" domain-literal = "[" *(dtext|quoted-pair) "]" =~ \[([^][\\\r]|((\r\n)?[ \t])|\\.)*] (erratum) =~ \[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*] atom = 1* =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+ word = atom / quoted-string =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*" domain-ref = atom sub-domain = domain-ref / domain-literal =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*] local-part = word *("." word) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))* (opt-lwsp) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))* domain = sub-domain *("." sub-domain) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))* (opt-lwsp) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))* addr-spec = local-part "@" domain =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))* (opt-lwsp) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")((\r\n)?[ \t])*(\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")((\r\n)?[ \t])*)*@((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))* (canonical) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*]))* 

RFC 5322

I arrived at the following regular expression. I invite everyone to try and break it. If you find any false positives or false negatives, please post them in a comment and I’ll try to fix the expression as soon as possible.

([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|\[[\t -Z^-~]*])

I believe it’s fully complient with RFC 5322 including the errata . It only recognizes email addresses in their canonical form. For a regex that recognizes (folding) whitespace see the derivation below.

The derivation shows how I arrived at the expression. I list all the relevant grammar rules from the RFC exactly as they appear, followed by the corresponding regex. For rules that include semantically irrelevant (folding) whitespace, I give a separate regex marked “(normalized)” that doesn’t accept this whitespace.

I ignored all the “obs-” rules from the RFC. This means that the regexes only match email addresses that are strictly RFC 5322 compliant. If you have to match “old” addresses (as the looser grammar including the “obs-” rules does), you can use one of the RFC 822 regexes from the previous paragraph.

 VCHAR = %x21-7E =~ [!-~] ALPHA = %x41-5A / %x61-7A =~ [A-Za-z] DIGIT = %x30-39 =~ [0-9] HTAB = %x09 =~ \t CR = %x0D =~ \r LF = %x0A =~ \n SP = %x20 =~ DQUOTE = %x22 =~ " CRLF = CR LF =~ \r\n WSP = SP / HTAB =~ [\t ] quoted-pair = "\" (VCHAR / WSP) =~ \\[\t -~] FWS = ([*WSP CRLF] 1*WSP) =~ ([\t ]*\r\n)?[\t ]+ ctext = %d33-39 / %d42-91 / %d93-126 =~ []!-'*-[^-~] ("comment" is left out in the regex) ccontent = ctext / quoted-pair / comment =~ []!-'*-[^-~]|(\\[\t -~]) (not regular) comment = "(" *([FWS] ccontent) [FWS] ")" (is equivalent to FWS when leaving out comments) CFWS = (1*([FWS] comment) [FWS]) / FWS =~ ([\t ]*\r\n)?[\t ]+ atext = ALPHA / DIGIT / "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~" =~ [-!#-'*+/-9=?AZ^-~] dot-atom-text = 1*atext *("." 1*atext) =~ [-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)* dot-atom = [CFWS] dot-atom-text [CFWS] =~ (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)? (normalized) =~ [-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)* qtext = %d33 / %d35-91 / %d93-126 =~ []!#-[^-~] qcontent = qtext / quoted-pair =~ []!#-[^-~]|(\\[\t -~]) (erratum) quoted-string = [CFWS] DQUOTE ((1*([FWS] qcontent) [FWS]) / FWS) DQUOTE [CFWS] =~ (([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)? (normalized) =~ "([]!#-[^-~ \t]|(\\[\t -~]))+" dtext = %d33-90 / %d94-126 =~ [!-Z^-~] domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS] =~ (([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)? (normalized) =~ \[[\t -Z^-~]*] local-part = dot-atom / quoted-string =~ (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)? (normalized) =~ [-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+" domain = dot-atom / domain-literal =~ (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)? (normalized) =~ [-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|\[[\t -Z^-~]*] addr-spec = local-part "@" domain =~ ((([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)?)@((([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)?) (normalized) =~ ([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|\[[\t -Z^-~]*]) 

Note that some sources (notably w3c ) claim that RFC 5322 is too strict on the local part (ie the part before the @-sign). This is because “..”, “a..b” and “a.” are not valid dot-atoms, while they may be used as mailbox names. The RFC, however, does allow for local parts like these, except that they have to be quoted. So instead of a..b@example.net you should write "a..b"@example.net , which is semantically equivalent.

Further restrictions

SMTP (as defined in RFC 5321 ) further restricts the set of valid email addresses (or actually: mailbox names). It seems reasonable to impose this stricter grammar, so that the matched email address can actually be used to send an email.

RFC 5321 basically leaves alone the “local” part (ie the part before the @-sign), but is stricter on the domain part (ie the part after the @-sign). It allows only host names in place of dot-atoms and address literals in place of domain literals.

The grammar presented in RFC 5321 is too lenient when it comes to both host names and IP addresses. I took the liberty of “correcting” the rules in question, using this draft and RFC 1034 as guidelines. Here’s the resulting regex.

([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*|\[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)])

Note that depending on the use case you may not want to allow for a “General-address-literal” in your regex. Also note that I used a negative lookahead (?!IPv6:) in the final regex to prevent the “General-address-literal” part to match malformed IPv6 addresses. Some regex processors don’t support negative lookahead. Remove the substring |(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+ from the regex if you want to take the whole “General-address-literal” part out.

Here’s the derivation:

 Let-dig = ALPHA / DIGIT =~ [0-9A-Za-z] Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig =~ [0-9A-Za-z-]*[0-9A-Za-z] (regex is updated to make sure sub-domains are max. 63 charactes long - RFC 1034 section 3.5) sub-domain = Let-dig [Ldh-str] =~ [0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])? Domain = sub-domain *("." sub-domain) =~ [0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)* Snum = 1*3DIGIT =~ [0-9]{1,3} (suggested replacement for "Snum") ip4-octet = DIGIT / %x31-39 DIGIT / "1" 2DIGIT / "2" %x30-34 DIGIT / "25" %x30-35 =~ 25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9] IPv4-address-literal = Snum 3("." Snum) =~ [0-9]{1,3}(\.[0-9]{1,3}){3} (suggested replacement for "IPv4-address-literal") ip4-address = ip4-octet 3("." ip4-octet) =~ (25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3} (suggested replacement for "IPv6-hex") ip6-h16 = "0" / ( (%x49-57 / %x65-70 /%x97-102) 0*3(%x48-57 / %x65-70 /%x97-102) ) =~ 0|[1-9A-Fa-f][0-9A-Fa-f]{0,3} (not from RFC) ls32 = ip6-h16 ":" ip6-h16 / ip4-address =~ (0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3} (suggested replacement of "IPv6-addr") ip6-address = 6(ip6-h16 ":") ls32 / "::" 5(ip6-h16 ":") ls32 / [ ip6-h16 ] "::" 4(ip6-h16 ":") ls32 / [ *1(ip6-h16 ":") ip6-h16 ] "::" 3(ip6-h16 ":") ls32 / [ *2(ip6-h16 ":") ip6-h16 ] "::" 2(ip6-h16 ":") ls32 / [ *3(ip6-h16 ":") ip6-h16 ] "::" ip6-h16 ":" ls32 / [ *4(ip6-h16 ":") ip6-h16 ] "::" ls32 / [ *5(ip6-h16 ":") ip6-h16 ] "::" ip6-h16 / [ *6(ip6-h16 ":") ip6-h16 ] "::" =~ (((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?:: IPv6-address-literal = "IPv6:" ip6-address =~ IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::) Standardized-tag = Ldh-str =~ [0-9A-Za-z-]*[0-9A-Za-z] dcontent = %d33-90 / %d94-126 =~ [!-Z^-~] General-address-literal = Standardized-tag ":" 1*dcontent =~ [0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+ address-literal = "[" ( IPv4-address-literal / IPv6-address-literal / General-address-literal ) "]" =~ \[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)] Mailbox = Local-part "@" ( Domain / address-literal ) =~ ([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*|\[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)]) 

User input validation

A common use case is user input validation, for example on an html form. In that case it’s usually reasonable to preclude address-literals and to require at least two labels in the hostname. Taking the improved RFC 5321 regex from the previous section as a basis, the resulting expression would be:

([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)+

I do not recommend restricting the local part further, eg by precluding quoted strings, since we don’t know what kind of mailbox names some hosts allow (like "a..b"@example.net or even "ab"@example.net ).

I also do not recommend explicitly validating against a list of literal top-level domains or even imposing length-constraints (remember how “.museum” invalidated [az]{2,4} ), but if you must:

([-!#-'*+/-9=?AZ^-~]+(\.[-!#-'*+/-9=?AZ^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?\.)*(net|org|com|info| etc… )

Make sure to keep your regex up-to-date if you decide to go down the path of explicit top-level domain validation.

Consideraciones adicionales

When only accepting host names in the domain part (after the @-sign), the regexes above accept only labels with at most 63 characters, as they should. However, they don’t enforce the fact that the entire host name must be at most 253 characters long (including the dots). Although this constraint is strictly speaking still regular, it’s not feasible to make a regex that incorporates this rule.

Another consideration, especially when using the regexes for input validation, is feedback to the user. If a user enters an incorrect address, it would be nice to give a little more feedback than a simple “syntactically incorrect address”. With “vanilla” regexes this is not possible.

These two considerations could be addressed by parsing the address. The extra length constraint on host names could in some cases also be addressed by using an extra regex that checks it, and matching the address against both expressions.

None of the regexes in this answer are optimized for performance. If performance is an issue, you should see if (and how) the regex of your choice can be optimized.

There are plenty examples of this out on the net (and I think even one that fully validates the RFC – but it’s tens/hundreds of lines long if memory serves). People tend to get carried away validating this sort of thing. Why not just check it has an @ and at least one . and meets some simple minimum length. It’s trivial to enter a fake email and still match any valid regex anyway. I would guess that false positives are better than false negatives.

While deciding which characters are allowed, please remember your apostrophed and hyphenated friends. I have no control over the fact that my company generates my email address using my name from the HR system. That includes the apostrophe in my last name. I can’t tell you how many times I have been blocked from interacting with a website by the fact that my email address is “invalid”.

This regex is from Perl’s Email::Valid library. I believe it to be the most accurate, it matches all 822. And, it is based on the regular expression in the O’Reilly book:

Regular expression built using Jeffrey Friedl’s example in Mastering Regular Expressions ( http://www.ora.com/catalog/regexp/ ).

 $RFC822PAT = < <'EOF'; [\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\ xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xf f\n\015()]*)*\)[\040\t]*)*(?:(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\x ff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n\015 "]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\ xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80 -\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]* )*(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\ \\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\ x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x8 0-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n \015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x 80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^ \x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040 \t]*)*)*@[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([ ^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\ \\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\ x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80- \xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015() ]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\ x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\04 0\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\ n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\ 015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?! [^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\ ]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\ x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\01 5()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*|(?:[^(\040)<>@,;:". \\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff] )|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[^ ()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037]*(?:(?:\([^\\\x80-\xff\n\0 15()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][ ^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)|"[^\\\x80-\xff\ n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[^()<>@,;:".\\\[\]\ x80-\xff\000-\010\012-\037]*)*< [\040\t]*(?:\([^\\\x80-\xff\n\015()]*(? :(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80- \xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:@[\040\t]* (?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015 ()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015() ]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\0 40)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\ [^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\ xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]* )*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80 -\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x 80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t ]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\ \[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff]) *\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x 80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80 -\xff\n\015()]*)*\)[\040\t]*)*)*(?:,[\040\t]*(?:\([^\\\x80-\xff\n\015( )]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\ \x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*@[\040\t ]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\0 15()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015 ()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^( \040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]| \\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80 -\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015() ]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x 80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^ \x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040 \t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:". \\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff ])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\ \x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x 80-\xff\n\015()]*)*\)[\040\t]*)*)*)*:[\040\t]*(?:\([^\\\x80-\xff\n\015 ()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\ \\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)?(?:[^ (\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000- \037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\ n\015"]*)*")[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]| \([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\)) [^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80-\xff \n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\x ff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*( ?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\ 000-\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\ xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\x ff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*) *\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*@[\040\t]*(?:\([^\\\x80-\x ff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80- \xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*) *(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\ ]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\] )[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80- \xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\x ff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*( ?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80 -\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x8 0-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?: \([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()] *(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*) *\)[\040\t]*)*)*>) EOF 

As you’re writing in PHP I’d advice you to use the PHP build-in validation for emails.

 filter_var($value, FILTER_VALIDATE_EMAIL) 

If you’re running a php-version lower than 5.3.6 please be aware of this issue: https://bugs.php.net/bug.php?id=53091

If you want more information how this buid-in validation works, see here: Does PHP’s filter_var FILTER_VALIDATE_EMAIL actually work?

Cal Henderson (Flickr) wrote an article called Parsing Email Adresses in PHP and shows how to do proper RFC (2)822-compliant Email Address parsing. You can also get the source code in php , python and ruby which is cc licensed .

I never bother creating with my own regular expression, because chances are that someone else has already come up with a better version. I always use regexlib to find one to my liking.

There is not one which is really usable.
I discuss some issues in my answer to Is there a php library for email address validation? , it is discussed also in Regexp recognition of email address hard?

In short, don’t expect a single, usable regex to do a proper job. And the best regex will validate the syntax, not the validity of an e-mail (jhohn@example.com is correct but it will probably bounce…).

One simple regular expression which would at least not reject any valid email address would be checking for something, followed by an @ sign and then something followed by a period and at least 2 somethings. It won’t reject anything, but after reviewing the spec I can’t find any email that would be valid and rejected.

email =~ /.+@[^@]+\.[^@]{2,}$/

You could use the one employed by the jQuery Validation plugin:

 /^((([az]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([az]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([az]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([az]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([az]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([az]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([az]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([az]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([az]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([az]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$/i 

For the most comprehensive evaluation of the best regular expression for validating an email address please see this link; ” Comparing E-mail Address Validating Regular Expressions ”

Here is the current top expression for reference purposes:

 /^([\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+\.)*[\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+@((((([a-z0-9]{1}[a-z0-9\-]{0,62}[a-z0-9]{1})|[az])\.)+[az]{2,6})|(\d{1,3}\.){3}\d{1,3}(\:\d{1,5})?)$/i 

Not to mention that non-Latin (Chinese, Arabic, Greek, Hebrew, Cyrillic and so on) domain names are to be allowed in the near future . Everyone has to change the email regex used, because those characters are surely not to be covered by [az]/i nor \w . They will all fail.

After all, the best way to validate the email address is still to actually send an email to the address in question to validate the address. If the email address is part of user authentication (register/login/etc), then you can perfectly combine it with the user activation system. Ie send an email with a link with an unique activation key to the specified email address and only allow login when the user has activated the newly created account using the link in the email.

If the purpose of the regex is just to quickly inform the user in the UI that the specified email address doesn’t look like in the right format, best is still to check if it matches basically the following regex:

 ^([^.@]+)(\.[^.@]+)*@([^.@]+\.)+([^.@]+)$ 

Simple como eso. Why on earth would you care about the characters used in the name and domain? It’s the client’s responsibility to enter a valid email address, not the server’s. Even when the client enters a syntactically valid email address like aa@bb.cc , this does not guarantee that it’s a legit email address. No one regex can cover that.

The HTML5 spec suggests a simple regex for validating email addresses:

 /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/ 

This intentionally doesn’t comply with RFC 5322 .

Note: This requirement is a willful violation of RFC 5322 , which defines a syntax for e-mail addresses that is simultaneously too strict (before the @ character), too vague (after the @ character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

The total length could also be limited to 254 characters, per RFC 3696 errata 1690 .

For a vivid demonstration, the following monster is pretty good but still does not correctly recognize all syntactically valid email addresses: it recognizes nested comments up to four levels deep.

This is a job for a parser, but even if an address is syntactically valid, it still may not be deliverable. Sometimes you have to resort to the hillbilly method of “Hey, y’all, watch ee-us!”

 // derivative of work with the following copyright and license: // Copyright (c) 2004 Casey West. All rights reserved. // This module is free software; you can redistribute it and/or // modify it under the same terms as Perl itself. // see http://search.cpan.org/~cwest/Email-Address-1.80/ private static string gibberish = @" (?-xism:(?:(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:\ s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^ \x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+)) |(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+ |\s+)*[^\x00-\x1F\x7F()<>\[\]:;@\,.\s]+(?-xism:(?-xism:\ s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^ \x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+)) |(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+ |\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:( ?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((? :\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x 0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*(?-xism:(?-xism:[ ^\\])|(?-xism:\\(?-xism:[^\x0A\x0D])))+(?-xism:(?-xi sm:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xis m:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\ ]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\ s*)+|\s+)*))+)?(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(? -xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism: \s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[ ^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*< (?-xism:(?-xi sm:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^( )\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*( ?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D])) |)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\,.\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.\s] +)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+)) |(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism: (?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s *\)\s*))+)*\s*\)\s*)+|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((? :\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x 0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xi sm:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)* (?-xism:(?-xism:[^\\])|(?-xism:\\(?-xism:[^\x0A\x0D] )))+(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\ ]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-x ism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+ )*\s*\)\s*))+)*\s*\)\s*)+|\s+)*))\@(?-xism:(?-xism:(?-xism:( ?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(? -xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^ ()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s *\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\,.\s]+( ?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.\s]+)*)(?-xism:(?-xism: \s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[ ^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+) )|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*) +|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism: (?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\(( ?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\ x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*\[(?:\s*(?-xism:(?-x ism:[^\[\]\\])|(?-xism:\\(?-xism:[^\x0A\x0D])))+)*\s*\](?-xi sm:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism: \\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:( ?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+ )*\s*\)\s*)+|\s+)*)))>(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?- xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\ s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^ \x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*))|(?-xism:(?-x ism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^ ()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s* (?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]) )|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F() <>\[\]:;@\,.\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.\s ]+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+) )|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism :(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\ s*\)\s*))+)*\s*\)\s*)+|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\(( ?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\ x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-x ism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+) *(?-xism:(?-xism:[^\\])|(?-xism:\\(?-xism:[^\x0A\x0D ])))+(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\ \]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?- xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|) +)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*))\@(?-xism:(?-xism:(?-xism: (?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\( ?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[ ^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\ s*\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\,.\s]+ (?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.\s]+)*)(?-xism:(?-xism :\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism: [^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+ ))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s* )+|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism :(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\( (?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A \x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*\[(?:\s*(?-xism:(?- xism:[^\[\]\\])|(?-xism:\\(?-xism:[^\x0A\x0D])))+)*\s*\](?-x ism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism :\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism: (?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*)) +)*\s*\)\s*)+|\s+)*))))(?-xism:\s*\((?:\s*(?-xism:(?-xism:(? >[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?: \s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0 D]))|)+)*\s*\)\s*))+)*\s*\)\s*)*)" .Replace("", "\"") .Replace("\t", "") .Replace(" ", "") .Replace("\r", "") .Replace("\n", ""); private static Regex mailbox = new Regex(gibberish, RegexOptions.ExplicitCapture); 

Here’s the PHP I use. I’ve choosen this solution in the spirit of “false positives are better than false negatives” as declared by another commenter here AND with regards to keeping your response time up and server load down … there’s really no need to waste server resources with a regular expression when this will weed out most simple user error. You can always follow this up by sending a test email if you want.

 function validateEmail($email) { return (bool) stripos($email,'@'); } 

According to official standard RFC 2822 valid email regex is

 (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]) 

if you want to use it in Java its really very easy

 import java.util.regex.*; class regexSample { public static void main(String args[]) { //Input the string for validation String email = "xyz@hotmail.com"; //Set the email pattern string Pattern p = Pattern.compile(" (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|" +"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")" + "@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\]"); //Match the given string with the pattern Matcher m = p.matcher(email); //check whether match is found boolean matchFound = m.matches(); if (matchFound) System.out.println("Valid Email Id."); else System.out.println("Invalid Email Id."); } } 

RFC 5322 standard:

Allows dot-atom local-part, quoted-string local-part, obsolete (mixed dot-atom and quoted-string) local-part, domain name domain, (IPv4, IPv6, and IPv4-mapped IPv6 address) domain literal domain, and (nested) CFWS.

 '/^(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){255,})(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){65,}@)((?>(?>(?>((?>(?>(?>\x0D\x0A)?[\t ])+|(?>[\t ]*\x0D\x0A)?[\t ]+)?)(\((?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?3)))*(?2)\)))+(?2))|(?2))?)([!#-\'*+\/-9=?^-~-]+|"(?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*(?2)")(?>(?1)\.(?1)(?4))*(?1)@(?!(?1)[a-z0-9-]{64,})(?1)(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>(?1)\.(?!(?1)[a-z0-9-]{64,})(?1)(?5)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?6)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?6)(?>:(?6)){0,6})?::(?7)?))|(?>(?>IPv6:(?>(?6)(?>:(?6)){5}:|(?!(?:.*[a-f0-9]:){6,})(?8)?::(?>((?6)(?>:(?6)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?9)){3}))\])(?1)$/isD' 

RFC 5321 standard:

Allows dot-atom local-part, quoted-string local-part, domain name domain, and (IPv4, IPv6, and IPv4-mapped IPv6 address) domain literal domain.

 '/^(?!(?>"?(?>\\\[ -~]|[^"])"?){255,})(?!"?(?>\\\[ -~]|[^"]){65,}"?@)(?>([!#-\'*+\/-9=?^-~-]+)(?>\.(?1))*|"(?>[ !#-\[\]-~]|\\\[ -~])*")@(?!.*[^.]{64,})(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?2)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?))|(?>(?>IPv6:(?>(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?6)){3}))\])$/iD' 

BASIC:

Allows dot-atom local-part and domain name domain (requiring at least two domain name labels with the TLD limited to 2-6 alphabetic characters).

 "/^(?!.{255,})(?!.{65,}@)([!#-'*+\/-9=?^-~-]+)(?>\.(?1))*@(?!.*[^.]{64,})(?>[a-z0-9](?>[a-z0-9-]*[a-z0-9])?\.){1,126}[az]{2,6}$/iD" 

Strange that you “cannot” allow 4 characters TLDs. You are banning people from .info and .name , and the length limitation stop .travel and .museum , but yes, they are less common than 2 characters TLDs and 3 characters TLDs.

You should allow uppercase alphabets too. Email systems will normalize the local part and domain part.

For your regex of domain part, domain name cannot starts with ‘-‘ and cannot ends with ‘-‘. Dash can only stays in between.

If you used the PEAR library, check out their mail function (forgot the exact name/library). You can validate email address by calling one function, and it validates the email address according to definition in RFC822.

 public bool ValidateEmail(string sEmail) { if (sEmail == null) { return false; } int nFirstAT = sEmail.IndexOf('@'); int nLastAT = sEmail.LastIndexOf('@'); if ((nFirstAT > 0) && (nLastAT == nFirstAT) && (nFirstAT < (sEmail.Length - 1))) { return (Regex.IsMatch(sEmail, @"^[az|0-9|AZ]*([_][az|0-9|AZ]+)*([.][az|0-9|AZ]+)*([.][az|0-9|AZ]+)*(([_][az|0-9|AZ]+)*)?@[az][az|0-9|AZ]*\.([az][az|0-9|AZ]*(\.[az][az|0-9|AZ]*)?)$")); } else { return false; } }