¿Cuál es la mejor expresión regular para verificar si una cadena es una URL válida?

¿Cómo puedo verificar si una cadena dada es una dirección URL válida?

Mi conocimiento de expresiones regulares es básico y no me permite elegir entre las cientos de expresiones regulares que ya he visto en la web.

Escribí mi patrón de URL (realmente IRI, internacionalizado) para cumplir con RFC 3987 ( http://www.faqs.org/rfcs/rfc3987.html ). Estos están en syntax PCRE.

Para IRI absolutos (internacionalizado):

/^[az](?:[-a-z0-9\+\.])*:(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=@])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?$/i 

Para permitir IRIs relativos:

 /^(?:[az](?:[-a-z0-9\+\.])*:(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=@])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?|(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=@])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=@])+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?)$/i 

Cómo fueron comstackdos (en PHP):

 < ?php /* Regex convenience functions (character class, non-capturing group) */ function cc($str, $suffix = '', $negate = false) { return '[' . ($negate ? '^' : '') . $str . ']' . $suffix; } function ncg($str, $suffix = '') { return '(?:' . $str . ')' . $suffix; } /* Preserved from RFC3986 */ $ALPHA = 'a-z'; $DIGIT = '0-9'; $HEXDIG = $DIGIT . 'a-f'; $sub_delims = '!\\$&\'\\(\\)\\*\\+,;='; $gen_delims = ':\\/\\?\\#\\[\\]@'; $reserved = $gen_delims . $sub_delims; $unreserved = '-' . $ALPHA . $DIGIT . '\\._~'; $pct_encoded = '%' . cc($HEXDIG) . cc($HEXDIG); $dec_octet = ncg(implode('|', array( cc($DIGIT), cc('1-9') . cc($DIGIT), '1' . cc($DIGIT) . cc($DIGIT), '2' . cc('0-4') . cc($DIGIT), '25' . cc('0-5') ))); $IPv4address = $dec_octet . ncg('\\.' . $dec_octet, '{3}'); $h16 = cc($HEXDIG, '{1,4}'); $ls32 = ncg($h16 . ':' . $h16 . '|' . $IPv4address); $IPv6address = ncg(implode('|', array( ncg($h16 . ':', '{6}') . $ls32, '::' . ncg($h16 . ':', '{5}') . $ls32, ncg($h16, '?') . '::' . ncg($h16 . ':', '{4}') . $ls32, ncg($h16 . ':' . $h16, '?') . '::' . ncg($h16 . ':', '{3}') . $ls32, ncg(ncg($h16 . ':', '{0,2}') . $h16, '?') . '::' . ncg($h16 . ':', '{2}') . $ls32, ncg(ncg($h16 . ':', '{0,3}') . $h16, '?') . '::' . $h16 . ':' . $ls32, ncg(ncg($h16 . ':', '{0,4}') . $h16, '?') . '::' . $ls32, ncg(ncg($h16 . ':', '{0,5}') . $h16, '?') . '::' . $h16, ncg(ncg($h16 . ':', '{0,6}') . $h16, '?') . '::', ))); $IPvFuture = 'v' . cc($HEXDIG, '+') . cc($unreserved . $sub_delims . ':', '+'); $IP_literal = '\\[' . ncg(implode('|', array($IPv6address, $IPvFuture))) . '\\]'; $port = cc($DIGIT, '*'); $scheme = cc($ALPHA) . ncg(cc('-' . $ALPHA . $DIGIT . '\\+\\.'), '*'); /* New or changed in RFC3987 */ $iprivate = '\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}'; $ucschar = '\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}' . '\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}' . '\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}' . '\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}' . '\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}' . '\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}'; $iunreserved = '-' . $ALPHA . $DIGIT . '\\._~' . $ucschar; $ipchar = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . ':@')); $ifragment = ncg($ipchar . '|' . cc('\\/\\?'), '*'); $iquery = ncg($ipchar . '|' . cc($iprivate . '\\/\\?'), '*'); $isegment_nz_nc = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . '@'), '+'); $isegment_nz = ncg($ipchar, '+'); $isegment = ncg($ipchar, '*'); $ipath_empty = '(?!' . $ipchar . ')'; $ipath_rootless = ncg($isegment_nz) . ncg('\\/' . $isegment, '*'); $ipath_noscheme = ncg($isegment_nz_nc) . ncg('\\/' . $isegment, '*'); $ipath_absolute = '\\/' . ncg($ipath_rootless, '?'); // Spec says isegment-nz *( "/" isegment ) $ipath_abempty = ncg('\\/' . $isegment, '*'); $ipath = ncg(implode('|', array( $ipath_abempty, $ipath_absolute, $ipath_noscheme, $ipath_rootless, $ipath_empty ))) . ')'; $ireg_name = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . '@'), '*'); $ihost = ncg(implode('|', array($IP_literal, $IPv4address, $ireg_name))); $iuserinfo = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . ':'), '*'); $iauthority = ncg($iuserinfo . '@', '?') . $ihost . ncg(':' . $port, '?'); $irelative_part = ncg(implode('|', array( '\\/\\/' . $iauthority . $ipath_abempty . '', '' . $ipath_absolute . '', '' . $ipath_noscheme . '', '' . $ipath_empty . '' ))); $irelative_ref = $irelative_part . ncg('\\?' . $iquery, '?') . ncg('\\#' . $ifragment, '?'); $ihier_part = ncg(implode('|', array( '\\/\\/' . $iauthority . $ipath_abempty . '', '' . $ipath_absolute . '', '' . $ipath_rootless . '', '' . $ipath_empty . '' ))); $absolute_IRI = $scheme . ':' . $ihier_part . ncg('\\?' . $iquery, '?'); $IRI = $scheme . ':' . $ihier_part . ncg('\\?' . $iquery, '?') . ncg('\\#' . $ifragment, '?'); $IRI_reference = ncg($IRI . '|' . $irelative_ref); 

Editar 7 de marzo de 2011: debido a la forma en que PHP maneja las barras invertidas en cadenas entrecomilladas, estas son inutilizables por defecto. Tendrá que hacer doble escape en las barras invertidas, excepto cuando la barra invertida tenga un significado especial en expresiones regulares. Puedes hacer eso de esta manera:

 $escape_backslash = '/(?< !\\)\\(?![\[\]\\\^\$\.\|\*\+\(\)QEnrtaefvdwsDWSbAZzB1-9GX]|x\{[0-9a-f]{1,4}\}|\c[AZ]|)/'; $absolute_IRI = preg_replace($escape_backslash, '\\\\', $absolute_IRI); $IRI = preg_replace($escape_backslash, '\\\\', $IRI); $IRI_reference = preg_replace($escape_backslash, '\\\\', $IRI_reference); 

Acabo de escribir una publicación de blog para obtener una excelente solución para reconocer URL en los formatos más utilizados, como:

  • www.google.com
  • http://www.google.com
  • mailto:somebody@google.com
  • somebody@google.com
  • www.url-with-querystring.com/?url=has-querystring

La expresión regular utilizada es:

 /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)/ 

Sin embargo, le recomendaría que vaya a http://blog.mattheworiordan.com/post/13174566389/url-regular-expression-for-links-with-or-without-the para ver el ejemplo de trabajo.

¿Qué plataforma? Si usa .NET, use System.Uri.TryCreate , no una expresión regular.

Por ejemplo:

 static bool IsValidUrl(string urlString) { Uri uri; return Uri.TryCreate(urlString, UriKind.Absolute, out uri) && (uri.Scheme == Uri.UriSchemeHttp || uri.Scheme == Uri.UriSchemeHttps || uri.Scheme == Uri.UriSchemeFtp || uri.Scheme == Uri.UriSchemeMailto /*...*/); } // In test fixture... [Test] void IsValidUrl_Test() { Assert.True(IsValidUrl("http://www.example.com")); Assert.False(IsValidUrl("javascript:alert('xss')")); Assert.False(IsValidUrl("")); Assert.False(IsValidUrl(null)); } 

(Gracias a @Yoshi por el consejo sobre javascript: 🙂

Esto es lo que RegexBuddy usa.

 (\b(https?|ftp|file)://)?[-A-Za-z0-9+&@#/%?=~_|!:,.;]+[-A-Za-z0-9+&@#/%=~_|] 

Concuerda con estos a continuación (dentro de las ** ** marcas):

 **http://www.regexbuddy.com** **http://www.regexbuddy.com/** **http://www.regexbuddy.com/index.html** **http://www.regexbuddy.com/index.html?source=library** 

Puede descargar RegexBuddy en http://www.regexbuddy.com/download.html .

Con respecto a la respuesta del párpado post que dice “Esto se basa en mi lectura de la especificación del URI”: Gracias párpado, la tuya es la solución perfecta que busqué, ¡ya que está basada en la especificación de URI! Excelente trabajo. 🙂

Tuve que hacer dos enmiendas. El primero en obtener la expresión regular para que coincida con las URL de la dirección IP correctamente en PHP (v5.2.10) con la función preg_match ().

Tuve que agregar un conjunto más de paréntesis a la línea arriba de “Dirección IP” alrededor de las tuberías:

 )|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}(?# 

No estoy seguro por qué.

También reduje la longitud mínima del dominio de nivel superior de 3 a 2 letras para admitir .co.uk y similares.

Código final:

 /^(https?|ftp):\/\/(?# protocol )(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+(?# username )(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?(?# password )@)?(?# auth requires @ )((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*(?# domain segments AND )[az][a-z0-9-]*[a-z0-9](?# top level domain OR )|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}(?# )(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])(?# IP address ))(:\d+)?(?# port ))(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*(?# path )(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)(?# query string )?)?)?(?# path and query string optional )(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?(?# fragment )$/i 

Esta versión modificada no se verificó con la especificación URI, por lo que no puedo garantizar el cumplimiento, se modificó para manejar URL en entornos de red local y TLD de dos dígitos, así como otros tipos de URL web, y para trabajar mejor en PHP configuración que uso.

Como código PHP :

 define('URL_FORMAT', '/^(https?):\/\/'. // protocol '(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'. // username '(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'. // password '@)?(?#'. // auth requires @ ')((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*'. // domain segments AND '[az][a-z0-9-]*[a-z0-9]'. // top level domain OR '|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}'. '(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'. // IP address ')(:\d+)?'. // port ')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*'. // path '(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)'. // query string '?)?)?'. // path and query string optional '(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?'. // fragment '$/i'); 

Aquí hay un progtwig de prueba en PHP que valida una variedad de URL usando la expresión regular:

 < ?php define('URL_FORMAT', '/^(https?):\/\/'. // protocol '(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'. // username '(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'. // password '@)?(?#'. // auth requires @ ')((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*'. // domain segments AND '[az][a-z0-9-]*[a-z0-9]'. // top level domain OR '|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}'. '(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'. // IP address ')(:\d+)?'. // port ')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*'. // path '(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)'. // query string '?)?)?'. // path and query string optional '(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?'. // fragment '$/i'); /** * Verify the syntax of the given URL. * * @access public * @param $url The URL to verify. * @return boolean */ function is_valid_url($url) { if (str_starts_with(strtolower($url), 'http://localhost')) { return true; } return preg_match(URL_FORMAT, $url); } /** * String starts with something * * This function will return true only if input string starts with * niddle * * @param string $string Input string * @param string $niddle Needle string * @return boolean */ function str_starts_with($string, $niddle) { return substr($string, 0, strlen($niddle)) == $niddle; } /** * Test a URL for validity and count results. * @param url url * @param expected expected result (true or false) */ $numtests = 0; $passed = 0; function test_url($url, $expected) { global $numtests, $passed; $numtests++; $valid = is_valid_url($url); echo "URL Valid?: " . ($valid?"yes":"no") . " for URL: $url. Expected: ".($expected?"yes":"no").". "; if($valid == $expected) { echo "PASS\n"; $passed++; } else { echo "FAIL\n"; } } echo "URL Tests:\n\n"; test_url("http://localserver/projects/public/assets/javascript/widgets/UserBoxMenu/widget.css", true); test_url("http://www.google.com", true); test_url("http://www.google.co.uk/projects/my%20folder/test.php", true); test_url("https://myserver.localdomain", true); test_url("http://192.168.1.120/projects/index.php", true); test_url("http://192.168.1.1/projects/index.php", true); test_url("http://projectpier-server.localdomain/projects/public/assets/javascript/widgets/UserBoxMenu/widget.css", true); test_url("https://2.4.168.19/project-pier?c=test&a=b", true); test_url("https://localhost/a/b/c/test.php?c=controller&arg1=20&arg2=20", true); test_url("http://user:password@localhost/a/b/c/test.php?c=controller&arg1=20&arg2=20", true); echo "\n$passed out of $numtests tests passed.\n\n"; ?> 

¡Gracias de nuevo por el parpado para la expresión regular!

Mathias Bynens tiene un gran artículo sobre la mejor comparación de muchas expresiones regulares: En busca de la validación de URL perfecta regex

El mejor publicado es un poco largo, pero coincide con casi cualquier cosa que puedas lanzar.

Versión de JavaScript

 /^(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[az\u00a1-\uffff0-9]-*)*[az\u00a1-\uffff0-9]+)(?:\.(?:[az\u00a1-\uffff0-9]-*)*[az\u00a1-\uffff0-9]+)*(?:\.(?:[az\u00a1-\uffff]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?$/i 

Versión de PHP

 _^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[az\x{00a1}-\x{ffff}0-9]-*)*[az\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[az\x{00a1}-\x{ffff}0-9]-*)*[az\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[az\x{00a1}-\x{ffff}]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?$_iuS 

El post Obtención de partes de una URL (Regex) analiza el análisis de una URL para identificar sus diversos componentes. Si desea verificar si una URL está bien formada, debería ser suficiente para sus necesidades.

Si necesita verificar si es realmente válido, eventualmente tendrá que intentar acceder a lo que esté al otro lado.

Sin embargo, en general, es mejor que uses una función que te proporciona tu framework u otra biblioteca. Muchas plataformas incluyen funciones que analizan las URL. Por ejemplo, existe el módulo urlparse de Python, y en .NET puede usar el constructor de la clase System.Uri como medio para validar la URL.

Puede que no sea un trabajo para expresiones regulares, sino para herramientas existentes en el idioma de su elección. Probablemente desee utilizar un código existente que ya se haya escrito, probado y depurado.

En PHP, use la función parse_url .

Perl: módulo URI .

Ruby: módulo URI .

.NET: clase ‘Uri’

Los regexes no son una varita mágica que agitas en cada problema que involucra cadenas.

Analizador de referencia URI no validador

A modo de referencia, aquí está la especificación IETF: ( TXT | HTML ). En particular, el Apéndice B. El análisis de una referencia de URI con una expresión regular demuestra cómo analizar una expresión regular válida . Esto se describe como,

para ver un ejemplo de un analizador de referencia de URI no validable que tomará cualquier cadena dada y extraerá los componentes de URI.

Aquí está la expresión regular que proporcionan:

  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? 

Como dijo otra persona, probablemente sea mejor dejar esto en un lib / framework que ya está usando.

Esto coincidirá con todas las URL

  • con o sin http / https
  • con o sin www

… incluidos los subdominios y las nuevas extensiones de nombre de dominio de nivel superior, como. museo,. academia,. fundación, etc. que puede tener hasta 63 caracteres (no solo. com , .net , .info , etc.)

 (([\w]+:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)? 

Debido a que hoy la longitud máxima de la extensión de nombre de dominio de nivel superior disponible es de 13 caracteres, como. internacional , puede cambiar el número 63 en la expresión a 13 para evitar que alguien lo use indebidamente.

como javascript

 var urlreg=/(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?/; $('textarea').on('input',function(){ var url = $(this).val(); $(this).toggleClass('invalid', urlreg.test(url) == false) }); $('textarea').trigger('input'); 
 textarea{color:green;} .invalid{color:red;} 
      

La mejor expresión regular de URL para mí sería:

 "(([\w]+:)?//)?(([\d\w]|%[a-fA-F\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?" 
  function validateURL(textval) { var urlregex = new RegExp( "^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$"); return urlregex.test(textval); } 

Coincide con http://site.com/dir/file.php?var=moo | ftp: // usuario: pass@site.com: 21 / file / dir

No coincide con site.com | http://site.com/dir//

 function validateURL(textval) { var urlregex = new RegExp( "^(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*$"); return urlregex.test(textval); } 

Coincidencias http://www.asdah.com/~joe | ftp://ftp.asdah.co.uk:2828/asdah%20asdah.gif | https://asdah.gov/asdh-ah.as

Si realmente busca la pareja definitiva , probablemente la encuentre en ” A Good Url Regular Expression? “.

Pero una expresión regular que realmente concuerda con todos los dominios posibles y que permite todo lo que está permitido de acuerdo con las RFC es terriblemente larga e ilegible, confía en mí 😉

No pude encontrar la expresión regular que estaba buscando, así que modifiqué una expresión regular para cumplir mis requisitos, y aparentemente parece funcionar bien ahora. Mis requisitos fueron:

Aquí lo que se me ocurre, cualquier sugerencia es apreciada:

 @Test public void testWebsiteUrl(){ String regularExpression = "((http|ftp|https):\\/\\/)?[\\w\\-_]+(\\.[\\w\\-_]+)+([\\w\\-\\.,@?^=%&:/~\\+#]*[\\w\\-\\@?^=%&/~\\+#])?"; assertTrue("www.google.com".matches(regularExpression)); assertTrue("www.google.co.uk".matches(regularExpression)); assertTrue("http://www.google.com".matches(regularExpression)); assertTrue("http://www.google.co.uk".matches(regularExpression)); assertTrue("https://www.google.com".matches(regularExpression)); assertTrue("https://www.google.co.uk".matches(regularExpression)); assertTrue("google.com".matches(regularExpression)); assertTrue("google.co.uk".matches(regularExpression)); assertTrue("google.mu".matches(regularExpression)); assertTrue("mes.intnet.mu".matches(regularExpression)); assertTrue("cse.uom.ac.mu".matches(regularExpression)); assertTrue("http://www.google.com/path".matches(regularExpression)); assertTrue("http://subdomain.web-site.com/cgi-bin/perl.cgi?key1=value1&key2=value2e".matches(regularExpression)); assertTrue("http://www.google.com/?queryparam=123".matches(regularExpression)); assertTrue("http://www.google.com/path?queryparam=123".matches(regularExpression)); assertFalse("www..dr.google".matches(regularExpression)); assertFalse("www:google.com".matches(regularExpression)); assertFalse("https://www@.google.com".matches(regularExpression)); assertFalse("https://www.google.com\"".matches(regularExpression)); assertFalse("https://www.google.com'".matches(regularExpression)); assertFalse("http://www.google.com/path'".matches(regularExpression)); assertFalse("http://subdomain.web-site.com/cgi-bin/perl.cgi?key1=value1&key2=value2e'".matches(regularExpression)); assertFalse("http://www.google.com/?queryparam=123'".matches(regularExpression)); assertFalse("http://www.google.com/path?queryparam=12'3".matches(regularExpression)); } 

Escribí una pequeña versión maravillosa que puedes ejecutar

coincide con las siguientes URL (que es lo suficientemente bueno para mí)

 public static void main(args){ String url = "go to http://www.m.abut.ly/abc its awesome" url = url.replaceAll(/https?:\/\/w{0,3}\w*?\.(\w*?\.)?\w{2,3}\S*|www\.(\w*?\.)?\w*?\.\w{2,3}\S*|(\w*?\.)?\w*?\.\w{2,3}[\/\?]\S*/ , { it -> "woof${it}woof" }) println url } 

http://google.com

http://google.com/help.php

http://google.com/help.php?a=5

http://www.google.com

http://www.google.com/help.php

http://www.google.com?a=5

google.com?a=5

google.com/help.php

google.com/help.php?a=5

http://www.m.google.com/help.php?a=5 (y todas sus permutaciones)

http://www.m.google.com/help.php?a=5 (y todas sus permutaciones)

m.google.com/help.php?a=5 (y todas sus permutaciones)

Lo importante para las URL que no comienzan con http o www es que deben incluir un / o?

apuesto a que esto se puede ajustar un poco más, pero hace el trabajo bastante bien por ser tan corto y compacto … porque puedes dividirlo en 3:

encuentre algo que empiece por http: https?: // w {0,3} \ w *?. \ w {2,3} \ S *

encuentre algo que comience con www: www. \ w *?. \ w {2,3} \ S *

o encuentra algo que debe tener un texto, luego un punto, luego al menos 2 letras y luego a? o /: \ w *?. \ w {2,3} [/ \?] \ S *

Yo uso esta expresión regular:

 ((https?:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)? 

Para apoyar ambos:

 http://stackoverflow.com https://stackoverflow.com 

Y:

 //stackoverflow.com 

He estado trabajando en un artículo en profundidad sobre la validación de URI usando expresiones regulares. Está basado en RFC3986.

Validación de URI de expresión regular

Aunque el artículo aún no está completo, he creado una función PHP que hace un buen trabajo al validar las URL HTTP y FTP. Aquí está la versión actual:

 // function url_valid($url) { Rev:20110423_2000 // // Return associative array of valid URI components, or FALSE if $url is not // RFC-3986 compliant. If the passed URL begins with: "www." or "ftp.", then // "http://" or "ftp://" is prepended and the corrected full-url is stored in // the return array with a key name "url". This value should be used by the caller. // // Return value: FALSE if $url is not valid, otherwise array of URI components: // eg // Given: "http://www.jmrware.com:80/articles?height=10&width=75#fragone" // Array( // [scheme] => http // [authority] => www.jmrware.com:80 // [userinfo] => // [host] => www.jmrware.com // [IP_literal] => // [IPV6address] => // [ls32] => // [IPvFuture] => // [IPv4address] => // [regname] => www.jmrware.com // [port] => 80 // [path_abempty] => /articles // [query] => height=10&width=75 // [fragment] => fragone // [url] => http://www.jmrware.com:80/articles?height=10&width=75#fragone // ) function url_valid($url) { if (strpos($url, 'www.') === 0) $url = 'http://'. $url; if (strpos($url, 'ftp.') === 0) $url = 'ftp://'. $url; if (!preg_match('/# Valid absolute URI having a non-empty, valid DNS host. ^ (?P[A-Za-z][A-Za-z0-9+\-.]*):\/\/ (?P (?:(?P(?:[A-Za-z0-9\-._~!$&\'()*+,;=:]|%[0-9A-Fa-f]{2})*)@)? (?P (?P \[ (?: (?P (?: (?:[0-9A-Fa-f]{1,4}:){6} | ::(?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){4} | (?:(?:[0-9A-Fa-f]{1,4}:){0,1}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){3} | (?:(?:[0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){2} | (?:(?:[0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})?:: [0-9A-Fa-f]{1,4}: | (?:(?:[0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})?:: ) (?P[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4} | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) | (?:(?:[0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})?:: [0-9A-Fa-f]{1,4} | (?:(?:[0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})?:: ) | (?P[Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&\'()*+,;=:]+) ) \] ) | (?P(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)) | (?P(?:[A-Za-z0-9\-._~!$&\'()*+,;=]|%[0-9A-Fa-f]{2})+) ) (?::(?P[0-9]*))? ) (?P(?:\/(?:[A-Za-z0-9\-._~!$&\'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*) (?:\?(?P (?:[A-Za-z0-9\-._~!$&\'()*+,;=:@\\/?]|%[0-9A-Fa-f]{2})*))? (?:\#(?P (?:[A-Za-z0-9\-._~!$&\'()*+,;=:@\\/?]|%[0-9A-Fa-f]{2})*))? $ /mx', $url, $m)) return FALSE; switch ($m['scheme']) { case 'https': case 'http': if ($m['userinfo']) return FALSE; // HTTP scheme does not allow userinfo. break; case 'ftps': case 'ftp': break; default: return FALSE; // Unrecognized URI scheme. Default to FALSE. } // Validate host name conforms to DNS "dot-separated-parts". if ($m['regname']) { // If host regname specified, check for DNS conformance. if (!preg_match('/# HTTP DNS host name. ^ # Anchor to beginning of string. (?!.{256}) # Overall host length is less than 256 chars. (?: # Group dot separated host part alternatives. [A-Za-z0-9]\. # Either a single alphanum followed by dot | # or... part has more than one char (63 chars max). [A-Za-z0-9] # Part first char is alphanum (no dash). [A-Za-z0-9\-]{0,61} # Internal chars are alphanum plus dash. [A-Za-z0-9] # Part last char is alphanum (no dash). \. # Each part followed by literal dot. )* # Zero or more parts before top level domain. (?: # Explicitly specify top level domains. com|edu|gov|int|mil|net|org|biz| info|name|pro|aero|coop|museum| asia|cat|jobs|mobi|tel|travel| [A-Za-z]{2}) # Country codes are exactly two alpha chars. \.? # Top level domain can end in a dot. $ # Anchor to end of string. /ix', $m['host'])) return FALSE; } $m['url'] = $url; for ($i = 0; isset($m[$i]); ++$i) unset($m[$i]); return $m; // return TRUE == array of useful named $matches plus the valid $url. } 

This function utilizes two regexes; one to match a subset of valid generic URIs (absolute ones having a non-empty host), and a second to validate the DNS “dot-separated-parts” host name. Although this function currently validates only HTTP and FTP schemes, it is structured such that it can be easily extended to handle other schemes.

This one works for me very well. (https?|ftp)://(www\d?|[a-zA-Z0-9]+)?\.[a-zA-Z0-9-]+(\:|\.)([a-zA-Z0-9.]+|(\d+)?)([/?:].*)?

Here’s a ready-to-go Java version from the Android source code. This is the best one I’ve found.

 public static final Matcher WEB = Pattern.compile(new StringBuilder() .append("((?:(http|https|Http|Https|rtsp|Rtsp):") .append("\\/\\/(?:(?:[a-zA-Z0-9\\$\\-\\_\\.\\+\\!\\*\\'\\(\\)") .append("\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,64}(?:\\:(?:[a-zA-Z0-9\\$\\-\\_") .append("\\.\\+\\!\\*\\'\\(\\)\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,25})?\\@)?)?") .append("((?:(?:[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}\\.)+") // named host .append("(?:") // plus top level domain .append("(?:aero|arpa|asia|a[cdefgilmnoqrstuwxz])") .append("|(?:biz|b[abdefghijmnorstvwyz])") .append("|(?:cat|com|coop|c[acdfghiklmnoruvxyz])") .append("|d[ejkmoz]") .append("|(?:edu|e[cegrstu])") .append("|f[ijkmor]") .append("|(?:gov|g[abdefghilmnpqrstuwy])") .append("|h[kmnrtu]") .append("|(?:info|int|i[delmnoqrst])") .append("|(?:jobs|j[emop])") .append("|k[eghimnrwyz]") .append("|l[abcikrstuvy]") .append("|(?:mil|mobi|museum|m[acdghklmnopqrstuvwxyz])") .append("|(?:name|net|n[acefgilopruz])") .append("|(?:org|om)") .append("|(?:pro|p[aefghklmnrstwy])") .append("|qa") .append("|r[eouw]") .append("|s[abcdeghijklmnortuvyz]") .append("|(?:tel|travel|t[cdfghjklmnoprtvwz])") .append("|u[agkmsyz]") .append("|v[aceginu]") .append("|w[fs]") .append("|y[etu]") .append("|z[amw]))") .append("|(?:(?:25[0-5]|2[0-4]") // or ip address .append("[0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\\.(?:25[0-5]|2[0-4][0-9]") .append("|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1]") .append("[0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}") .append("|[1-9][0-9]|[0-9])))") .append("(?:\\:\\d{1,5})?)") // plus option port number .append("(\\/(?:(?:[a-zA-Z0-9\\;\\/\\?\\:\\@\\&\\=\\#\\~") // plus option query params .append("\\-\\.\\+\\!\\*\\'\\(\\)\\,\\_])|(?:\\%[a-fA-F0-9]{2}))*)?") .append("(?:\\b|$)").toString() ).matcher(""); 

I found the following Regex for URLs, tested successfully with 500+ URLs :

/\b(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[az\x{00a1}-\x{ffff}0-9]+-?)*[az\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[az\x{00a1}-\x{ffff}0-9]+-?)*[az\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[az\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:\/[^\s]*)?\b/gi

I know it looks ugly, but the good thing is that it works. 🙂

Explanation and demo with 581 random URLs on regex101.

Source: In search of the perfect URL validation regex

I tried to formulate my version of url. My requirement was to capture instances in a String where possible url can be cse.uom.ac.mu – noting that it is not preceded by http nor www

 String regularExpression = "((((ht{2}ps?://)?)((w{3}\\.)?))?)[^.&&[a-zA-Z0-9]][a-zA-Z0-9.-]+[^.&&[a-zA-Z0-9]](\\.[a-zA-Z]{2,3})"; assertTrue("www.google.com".matches(regularExpression)); assertTrue("www.google.co.uk".matches(regularExpression)); assertTrue("http://www.google.com".matches(regularExpression)); assertTrue("http://www.google.co.uk".matches(regularExpression)); assertTrue("https://www.google.com".matches(regularExpression)); assertTrue("https://www.google.co.uk".matches(regularExpression)); assertTrue("google.com".matches(regularExpression)); assertTrue("google.co.uk".matches(regularExpression)); assertTrue("google.mu".matches(regularExpression)); assertTrue("mes.intnet.mu".matches(regularExpression)); assertTrue("cse.uom.ac.mu".matches(regularExpression)); //cannot contain 2 '.' after www assertFalse("www..dr.google".matches(regularExpression)); //cannot contain 2 '.' just before com assertFalse("www.dr.google..com".matches(regularExpression)); // to test case where url www must be followed with a '.' assertFalse("www:google.com".matches(regularExpression)); // to test case where url www must be followed with a '.' //assertFalse("http://wwwe.google.com".matches(regularExpression)); // to test case where www must be preceded with a '.' assertFalse("https://www@.google.com".matches(regularExpression)); 

For Python, this is the actual URL validating regex used in Django 1.5.1:

 import re regex = re.compile( r'^(?:http|ftp)s?://' # http:// or https:// r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[AZ]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' # domain... r'localhost|' # localhost... r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|' # ...or ipv4 r'\[?[A-F0-9]*:[A-F0-9:]+\]?)' # ...or ipv6 r'(?::\d+)?' # optional port r'(?:/?|[/?]\S+)$', re.IGNORECASE) 

This does both ipv4 and ipv6 addresses as well as ports and GET parameters.

Found in the code here , Line 44.

whats wrong with plain and simple FILTER_VALIDATE_URL ?

  $url = "http://www.example.com"; if(!filter_var($url, FILTER_VALIDATE_URL)) { echo "URL is not valid"; } else { echo "URL is valid"; } 

I know its not the question exactly but it did the job for me when I needed to validate urls so thought it might be useful to others who come across this post looking for the same thing

The following RegEx will work:

 "@((((ht)|(f))tp[s]?://)|(www\.))([az][-a-z0-9]+\.)?([az][-a-z0-9]+\.)?[az][-a-z0-9]+\.[az]+[/]?[a-z0-9._\/~#&=;%+?-]*@si" 

For convenience here’s a one-liner regexp for URL’s that will also match localhost where you’re more likely to have ports than .com or similar.

 (http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}(\.[az]{2,6}|:[0-9]{3,4})\b([-a-zA-Z0-9@:%_\+.~#?&\/\/=]*) 

You don’t specify which language you’re using. If PHP is, there is a native function for that:

 $url = 'http://www.yoururl.co.uk/sub1/sub2/?param=1&param2/'; if ( ! filter_var( $url, FILTER_VALIDATE_URL ) ) { // Wrong } else { // Valid } 

Returns the filtered data, or FALSE if the filter fails.

Check it here >>

Espero eso ayude.

I hope it’s helpful for you…

 ^(http|https):\/\/+[\www\d]+\.[\w]+(\/[\w\d]+)? 

This is a rather old thread now and the question asks for a regex based URL validator. I ran into the thread whilst looking for precisely the same thing. While it may well be possible to write a really comprehensive regex to validate URLs I eventually settled on another way to do things – by using PHP’s parse_url function.

It returns boolean false if the url cannot be parsed. Otherwise it returns the scheme, the host and other information. This may well not be enough for a comprehensive URL check on its own but can be drilled down into for further analysis. If the intent is to simply catch typos, invalid schemes etc it is perfectly adequate.

To Check URL regex would be:

 ^http(s{0,1})://[a-zA-Z0-9_/\\-\\.]+\\.([A-Za-z/]{2,5})[a-zA-Z0-9_/\\&\\?\\=\\-\\.\\~\\%]*