Lazy Diary @ Hatena Blog

PowerShell / Java / miscellaneous things about software development, Tips & Gochas. CC BY-SA 4.0/Apache License 2.0

What does "the syntax of RFC822" means in javax.mail.internet.InternetAddress ?

Javadoc of JavaMail API says InternetAddress class uses "the syntax of RFC822". https://javaee.github.io/javamail/docs/api/javax/mail/internet/InternetAddress.html

Then how about actual implementation? Is it different from some other email validation implementation like <input type="email">?

I have tested following implementation of email address validation and compared results.

  • InternetAddress *1 class in JavaMail *2
  • <input type="email"> implementation described in HTML Living Standard 25 June 2020 *3 , /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
  • "General Email Regex (RFC 5322 Official Standard)" described in Almost Perfect Email Regex *4, (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
  • Regexp used in Perl正規表現雑技 *5 , /^(?:[-!#-'*+/-9=?A-Z^-~]+(?:\.[-!#-'*+/-9=?A-Z^-~]+)*|"(?:[!#-\[\]-~]|\\[\x09 -~])*")@[-!#-'*+/-9=?A-Z^-~]+(?:\.[-!#-'*+/-9=?A-Z^-~]+)*$/ (Popular in Japan)

I have used http://www.htmq.com/html5/input_type_email.shtml to test behavior of <input type="email">. Also I have used https://codepen.io/shingorow/pen/oBPZbL to test the regexps.

The results is shown in the table below.

# Test case Address JavaMail HTML Living Standard Almost Perfect Email Regex Perl正規表現雑技
1 ordinal address abc.def@example.com valid valid valid valid
2 quoted by double quote "abc.def"@example.com valid invalid valid invalid
3 quoted by double quote and contain newline + space "abc.\n def"@example.com valid invalid (*) invalid (*) invalid (*)
4 quoted by double quote and joined by "." "abc"."def"@example.com invalid *6 invalid invalid invalid
5 quoted by single quote 'abc.def'@example.com valid valid valid valid
6 quoted by single quote and contain newline + space 'abc.\n def'@example.com invalid *7 invalid (*) invalid (*) invalid (*)
7 quoted by single quote and name joined by "." 'abc'.'def'@example.com valid valid valid valid
8 contains comment with parenthesis (abc)abc.def@example.com valid invalid invalid invalid
9 contains comment with parenthesis and space (abc) abc.def@example.com valid invalid invalid invalid
10 address with angle bracket <abc.def@example.com> valid invalid invalid invalid
11 address with angle bracket and newline <abc\n .def@example.com> invalid *8 invalid (*) invalid (*) invalid (*)
12 address with angle bracket and quote <abc"def"ghi@example.com> invalid *9 invalid invalid invalid
13 address with real name foo bar <abc.def@example.com> valid invalid invalid invalid
14 address with real name (single-quoted) 'foo bar' <abc.def@example.com> valid invalid invalid invalid
15 address with real name (double-quoted) "foo bar" <abc.def@example.com> valid invalid invalid invalid
16 domain with square bracket abc.def@[example.com] valid invalid invalid invalid
17 domain with square bracket and space abc.def@[exa mple.com] invalid *10 invalid invalid invalid
18 domain with square bracket and escaped character abc.def@[exa\nmple.com] invalid *11 invalid (*) invalid (*) invalid (*)
19 domain with square bracket and quote abc.def@[example."hoge".com] valid invalid invalid invalid
20 start with comma ,abc.def@example.com valid invalid invalid invalid
21 start with semicolon ;abc.def@example.com valid invalid invalid invalid
22 contains double-dot abc..def@example.com invalid *12 valid invalid invalid
23 local address ends with dot abc.def.@example.com invalid *13 valid invalid invalid
24 local address starts with dot .abc.def@example.com invalid *14 valid invalid invalid
25 contains character outside of ASCII ⛄bc.def@example.com valid invalid invalid invalid
26 domain without TLD abc.def@localhost valid valid invalid valid
27 domain with 1 char TLD abc.def@e.c valid valid valid valid
28 domain with 2 char TLD abc.def@e.co valid valid valid valid
29 domain not compliant with RFC952 abc.def@-.com valid invalid invalid valid
30 host specified in IP address abc.def@203.0.113.1 valid valid valid valid
31 capital local address ABC.DEF@example.com valid valid invalid valid
32 tags in the double-quoted name "<script>" <abc.def@example.com> valid invalid invalid invalid
33 tags in the single-quoted name '<script>' <abc.def@example.com> invalid *15 invalid invalid invalid

(*) I could not input \n (CRLF) directly, so typed "\n" as ordinal string.

*1:https://javaee.github.io/javamail/docs/api/javax/mail/internet/InternetAddress.html

*2:https://javaee.github.io/javamail/

*3:https://html.spec.whatwg.org/multipage/input.html#valid-e-mail-address , Actual behavior of <input type="email"> on Chrome 83 is the same

*4:http://emailregex.com/

*5:http://www.din.or.jp/~ohzaki/mail_regex.htm

*6:Quote not at end of local address

*7:Local address contains control or whitespace

*8:Local address contains control or whitespace

*9:Quote not at start of local address

*10:Domain contains control or whitespace

*11:Domain contains control or whitespace

*12:Local address contains dot-dot

*13:Local address ends with dot

*14:Local address starts with dot

*15:Extra route-addr