Javadoc of JavaMail API says InternetAddress
class uses "the syntax of RFC822".
https://javaee.github.io/javamail/docs/api/javax/mail/internet/InternetAddress.html
Then how about actual implementation? Is it different from some other email validation implementation like <input type="email">
?
I have tested following implementation of email address validation and compared results.
- InternetAddress *1 class in JavaMail *2
<input type="email">
implementation described in HTML Living Standard 25 June 2020 *3 ,/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
- "General Email Regex (RFC 5322 Official Standard)" described in Almost Perfect Email Regex *4,
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
- Regexp used in Perl正規表現雑技 *5 ,
/^(?:[-!#-'*+/-9=?A-Z^-~]+(?:\.[-!#-'*+/-9=?A-Z^-~]+)*|"(?:[!#-\[\]-~]|\\[\x09 -~])*")@[-!#-'*+/-9=?A-Z^-~]+(?:\.[-!#-'*+/-9=?A-Z^-~]+)*$/
(Popular in Japan)
I have used http://www.htmq.com/html5/input_type_email.shtml to test behavior of <input type="email">. Also I have used https://codepen.io/shingorow/pen/oBPZbL to test the regexps.
The results is shown in the table below.
# | Test case | Address | JavaMail | HTML Living Standard | Almost Perfect Email Regex | Perl正規表現雑技 |
---|---|---|---|---|---|---|
1 | ordinal address | abc.def@example.com |
valid | valid | valid | valid |
2 | quoted by double quote | "abc.def"@example.com |
valid | invalid | valid | invalid |
3 | quoted by double quote and contain newline + space | "abc.\n def"@example.com |
valid | invalid (*) | invalid (*) | invalid (*) |
4 | quoted by double quote and joined by "." | "abc"."def"@example.com |
invalid *6 | invalid | invalid | invalid |
5 | quoted by single quote | 'abc.def'@example.com |
valid | valid | valid | valid |
6 | quoted by single quote and contain newline + space | 'abc.\n def'@example.com |
invalid *7 | invalid (*) | invalid (*) | invalid (*) |
7 | quoted by single quote and name joined by "." | 'abc'.'def'@example.com |
valid | valid | valid | valid |
8 | contains comment with parenthesis | (abc)abc.def@example.com |
valid | invalid | invalid | invalid |
9 | contains comment with parenthesis and space | (abc) abc.def@example.com |
valid | invalid | invalid | invalid |
10 | address with angle bracket | <abc.def@example.com> |
valid | invalid | invalid | invalid |
11 | address with angle bracket and newline | <abc\n .def@example.com> |
invalid *8 | invalid (*) | invalid (*) | invalid (*) |
12 | address with angle bracket and quote | <abc"def"ghi@example.com> |
invalid *9 | invalid | invalid | invalid |
13 | address with real name | foo bar <abc.def@example.com> |
valid | invalid | invalid | invalid |
14 | address with real name (single-quoted) | 'foo bar' <abc.def@example.com> |
valid | invalid | invalid | invalid |
15 | address with real name (double-quoted) | "foo bar" <abc.def@example.com> |
valid | invalid | invalid | invalid |
16 | domain with square bracket | abc.def@[example.com] |
valid | invalid | invalid | invalid |
17 | domain with square bracket and space | abc.def@[exa mple.com] |
invalid *10 | invalid | invalid | invalid |
18 | domain with square bracket and escaped character | abc.def@[exa\nmple.com] |
invalid *11 | invalid (*) | invalid (*) | invalid (*) |
19 | domain with square bracket and quote | abc.def@[example."hoge".com] |
valid | invalid | invalid | invalid |
20 | start with comma | ,abc.def@example.com |
valid | invalid | invalid | invalid |
21 | start with semicolon | ;abc.def@example.com |
valid | invalid | invalid | invalid |
22 | contains double-dot | abc..def@example.com |
invalid *12 | valid | invalid | invalid |
23 | local address ends with dot | abc.def.@example.com |
invalid *13 | valid | invalid | invalid |
24 | local address starts with dot | .abc.def@example.com |
invalid *14 | valid | invalid | invalid |
25 | contains character outside of ASCII | ⛄bc.def@example.com |
valid | invalid | invalid | invalid |
26 | domain without TLD | abc.def@localhost |
valid | valid | invalid | valid |
27 | domain with 1 char TLD | abc.def@e.c |
valid | valid | valid | valid |
28 | domain with 2 char TLD | abc.def@e.co |
valid | valid | valid | valid |
29 | domain not compliant with RFC952 | abc.def@-.com |
valid | invalid | invalid | valid |
30 | host specified in IP address | abc.def@203.0.113.1 |
valid | valid | valid | valid |
31 | capital local address | ABC.DEF@example.com |
valid | valid | invalid | valid |
32 | tags in the double-quoted name | "<script>" <abc.def@example.com> |
valid | invalid | invalid | invalid |
33 | tags in the single-quoted name | '<script>' <abc.def@example.com> |
invalid *15 | invalid | invalid | invalid |
(*) I could not input \n
(CRLF) directly, so typed "\n" as ordinal string.
*1:https://javaee.github.io/javamail/docs/api/javax/mail/internet/InternetAddress.html
*2:https://javaee.github.io/javamail/
*3:https://html.spec.whatwg.org/multipage/input.html#valid-e-mail-address , Actual behavior of <input type="email"> on Chrome 83 is the same
*5:http://www.din.or.jp/~ohzaki/mail_regex.htm
*6:Quote not at end of local address
*7:Local address contains control or whitespace
*8:Local address contains control or whitespace
*9:Quote not at start of local address
*10:Domain contains control or whitespace
*11:Domain contains control or whitespace
*12:Local address contains dot-dot
*13:Local address ends with dot
*14:Local address starts with dot
*15:Extra route-addr