Java Patten class (regexp) supports POSIX character classes like \p{XDigit}
. They are very useful when you want to check hex strings.
In Java API Document, POSIX character classes say (US-ASCII only). What does it mean?
https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#sum
You can set UNICODE_CHARACTER_CLASS
flag with embedded flag (?U)
.
System.out.println("F8FF".matches("\\p{XDigit}+")); // true System.out.println("F8FF".matches("(?U)\\p{XDigit}+")); // true System.out.println("F8FF".matches("[0-9a-fA-F]+")); // true System.out.println("F8FF".matches("(?U)[0-9a-fA-F]+")); // true System.out.println("F8FF".matches("\\p{XDigit}+")); // false System.out.println("F8FF".matches("(?U)\\p{XDigit}+")); // true System.out.println("F8FF".matches("[0-9a-fA-F]+")); // false System.out.println("F8FF".matches("(?U)[0-9a-fA-F]+")); // false
As you can see, when you specify (?U)
, \p{XDigit}
matches with non-ascii (full-width) letters (e.g. U+FF10-FF19). \p{XDigit}
may have match to full-width letters in POSIX context, but it is better not to match in practical context. So, I think you don’t have to worry about the UNICODE_CHARACTER_CLASS
flag.