JDK正则表达式

使用的JDK版本为jdk1.7.0_79，它的正则表达式匹配引擎采用的匹配模型见《一种正则表达式匹配模型》。
匹配过程涉及到两个对象：正则表达式（可由所有字符构成，包括打印字符和非打印字符）和目标字符串（可由所有字符构成，包括打印字符和非打印字符）。

一、语法

特别需要强调的是，由于“locale设置”不同，操作系统环境不同等因素，同一个正则表达式可能具有不同的表达含义。比如在“locale=C”设置下，“[a-d]”等价于“[abcd]”，而在有些locale设置下，“[a-d]”等价于“[aBbCcDd]”。
接下来的语法介绍只针对“一般情形”，而不针对“特殊情形”，为了确保在你的“具体情形”下，正则表达式能够如你所期望地进行表达，最好能够提前进行一些测试。
正则表达式中不同语法单元具有不同的优先级顺序，无需记忆，使用“匹配组（即()对）”方式更加清晰明了。

1.1、字符类

1.1.1、定义

字符类	定义
[abc]	匹配“a，b，c”3个字符之一
[^abc]	匹配除了“a，b，c”3个字符之外的任意一个字符
[a-zA-Z]	匹配26个小写英文字符和26个大写英文字符中的一个
[^a-z]	匹配除了a-z字符之外的任意一个字符
[a-d[m-p]]	并集：等价于[a-dm-p]
[a-z&&[def]]	交集：匹配d或者e或者f字符
[a-z&&[^bc]]	差集：匹配a-z，除了b，c字符，等价于[ad-z]
[a-z&&[^m-p]]	差集：匹配a-z，除了m-p，等价于[a-lq-z]

1.1.2、实验

正则表达式	目标字符串	匹配结果
`[abc][^abc]`	`ad`	I found the text “ad” starting at index 0 and ending at index 2.
`[a-z][^a-z]`	`aA`	I found the text “aA” starting at index 0 and ending at index 2.
`[a-h[w-z]][a-z&&[c]]`	`bcyc`	I found the text “bc” starting at index 0 and ending at index 2. I found the text “yc” starting at index 2 and ending at index 4.
`[a-h&&[^b-h]][a-d&&[^acd]]`	`ab`	I found the text “ab” starting at index 0 and ending at index 2.

1.2、预定义字符类

1.2.1、定义

预定义字符类	含义
.	匹配任意字符，除了行终止符
\d	等价于[0-9]
\D	等价于[^\d]
\s	等价于[ \t\n\x0B\f\r]，注意最前面有个空格字符
\S	等价于[^\s]
\w	等价于[a-zA-Z_0-9]
\W	等价于[^\w]

备注：
根据ASCII表，“\x0B”即十进制值为“11”的字符“vertical tab”。

1.2.2、实验

正则表达式	目标字符串	匹配结果
`.\d\D`	`z0b`	I found the text “z0b” starting at index 0 and ending at index 3.
`a\sb\S`	`a bz`	I found the text “a bz” starting at index 0 and ending at index 4.
`\w\W\w`	`_-a`	I found the text “_-a” starting at index 0 and ending at index 3.

1.3、量词修饰符

1.3.1、定义

贪心型量词修饰符	勉强型量词修饰符	占有型量词修饰符	含义
X?	X??	X?+	匹配0次或者1次
X*	X*?	X*+	匹配0次或者0次以上
X+	X+?	X++	匹配1次或者1次以上
X{n}	X{n}?	X{n}+	匹配n次，由于匹配次数固定，实质上不分“贪心型”，“勉强型”和“占有型”
X{n,}	X{n,}?	X{n,}+	匹配至少n次
X{n,m}	X{n,m}?	X{n,m}+	匹配至少n次，至多m次

关于“量词修饰符”有以下3点需要强调：

“量词修饰符”不仅能修饰单个字符，也能修饰字符类和匹配组等。比如“abc{n}”，“[abc]{n}”和“(abc){n}”
由于“量词修饰符”的存在，可能会导致匹配过程中出现“零长度匹配”，千万不要落下“零长度匹配”
含有“量词修饰符”的正则表达式，与目标字符串的匹配过程不再是确定性的，因为“量词修饰符子表达式（即由“量词修饰符”所修饰的子表达式）”应该匹配多少次不能明确确定。与此“不确定性”相关，“量词修饰符”可分为3种类型：贪心型，勉强型和占有型（不常用，可忽略）

对于含有“量词修饰符子表达式”的正则表达式（假定正则表达式由“abc”3部分构成，其中a和c部分表示“一般子表达式”，b部分为“量词修饰符子表达式”。以上是简单情形，对于更加复杂的情形其实是类似的），匹配过程可用伪代码描述如下：

String regex = "abc";

public boolean match() {
    A: if (!findNextMatch(a)) {
            //目标字符串中，从左往右尝试匹配a：成功，进入下一步；失败，则整个匹配过程失败
            return false;
        }


    if (obtainType(b) == '贪心型') {
        //b中的量词修饰符为“贪心型”
        //b最小匹配min次，最多匹配max次
        //按匹配次数从大到小尝试匹配
        len = max;
        while (len >= min && len <= max) {
            b匹配len次
            if (c 匹配) {
                //成功匹配，则整个匹配过程成功
                return true;
            } else {
                len--;
            }
        }
        //尝试找下一个a匹配点
        GOTO A;
    } else if (obtainType(b) == '勉强型') {
        //b中的量词修饰符为“勉强型”
        //b最小匹配min次，最多匹配max次
        //按匹配次数从小到大尝试匹配

        len = min;
        while (len >= min && len <= max) {
            b匹配len次
            if (c 匹配) {
                //成功匹配，则整个匹配过程成功
                return true;
            } else {
                len++;
            }
        }
        //尝试找下一个a匹配点
        GOTO A;
    } else {
        //b中的量词修饰符为“占有型”
        //b最小匹配min次，最多匹配max次
        //直接匹配max次

        len = max;
        b匹配len次
        if (c 匹配) {
            //成功匹配，则整个匹配过程成功
            return true;
        } else {
            //匹配失败，尝试找下一个a匹配点
            GOTO A;
        }
    }
}

1.3.2、实验

下表“解释”列中的“S”代表目标字符串。

正则表达式	目标字符串	匹配结果	解释
`aa.?f`	`aabfbcdaaf`	I found the text “aabf” starting at index 0 and ending at index 4. I found the text “aaf” starting at index 7 and ending at index 10.	第一次匹配过程：“aa”匹配S中第1，2个字符，“.?”匹配S中第3个字符，“f”匹配S中第4个字符，匹配成功；第二次匹配过程：“aa”匹配S中第8，9个字符，“.?”匹配S中第10个字符，“f”匹配失败，匹配次数从大到小，“.?”零长度匹配，“f”匹配S中第10个字符，匹配成功；第三次匹配过程：S中找不到未匹配过的“aa”字符串，匹配失败
`aa.??f`	`aabfbcdaaf`	I found the text “aabf” starting at index 0 and ending at index 4. I found the text “aaf” starting at index 7 and ending at index 10.	第一次匹配过程：“aa”匹配S中第1，2个字符，“.??”零长度匹配，“f”匹配失败，匹配次数从小到大，“.??”匹配S中第3个字符，“f”匹配S中第4个字符，匹配成功；第二次匹配过程：“aa”匹配S中第8，9个字符，“.??”零长度匹配，“f”匹配S中第10个字符，匹配成功；第三次匹配过程：S中找不到未匹配过的“aa”字符串，匹配失败
`aa.?+f`	`aabfbcdaaf`	I found the text “aabf” starting at index 0 and ending at index 4.	第一次匹配过程：“aa”匹配S中第1，2个字符，“.?+”匹配第3个字符，“f”匹配第4个字符，匹配成功；第二次匹配过程：“aa”匹配S中第8，9个字符，“.?+”匹配第10个字符，“f”匹配失败，S中找不到未匹配过的“aa”字符串，匹配失败
`aa.*f`	`aabfbcdaaf`	I found the text “aabfbcdaaf” starting at index 0 and ending at index 10.	无
`aa.*?f`	`aabfbcdaaf`	I found the text “aabf” starting at index 0 and ending at index 4. I found the text “aaf” starting at index 7 and ending at index 10.	无
`aa.*+f`	`aabfbcdaaf`	No match found.	无
`aa.+f`	`aabfbcdaaf`	I found the text “aabfbcdaaf” starting at index 0 and ending at index 10.	第一次匹配过程：“aa”匹配S中第1，2个字符，“.+”匹配S中第3-10个字符，“f”匹配失败，匹配次数从大到小，“.+”匹配S中第3-9个字符，“f”匹配S中第10个字符，匹配成功；第二次匹配过程，S中找不到未匹配过的“aa”字符串，匹配失败
`aa.+?f`	`aabfbcdaaf`	I found the text “aabf” starting at index 0 and ending at index 4.	第一次匹配过程：“aa”匹配S中第1，2个字符，“.+?”匹配S中第3个字符，“f”匹配S中第4个字符，匹配成功；第二次匹配过程：“aa”匹配S中第8，9个字符，“.+?”匹配S中第10个字符，“f”匹配失败，匹配次数从小到大，“.+?”当下最大匹配次数是1，已经到最大，S中找不到未匹配过的“aa”字符串，匹配失败
`aa.++f`	`aabfbcdaaf`	No match found.	第一次匹配过程：“aa”匹配S中第1，2个字符，“.++”匹配S中第3-10个字符，“f”匹配失败，接下来“aa”匹配S中第8，9个字符，“.++”匹配S中第10个字符，“f”匹配失败，S中找不到未匹配过的“aa”字符串，匹配失败
`aa.{4}f`	`abbbbfccaabbbbf`	I found the text “aabbbbf” starting at index 8 and ending at index 15.	无
`aa.{4}?f`	`abbbbfccaabbbbf`	I found the text “aabbbbf” starting at index 8 and ending at index 15.	无
`aa.{4}+f`	`abbbbfccaabbbbf`	I found the text “aabbbbf” starting at index 8 and ending at index 15.	无
`aa.{2,}f`	`aabbfaabf`	I found the text “aabbfaabf” starting at index 0 and ending at index 9.	无
`aa.{2,}?f`	`aabbfaabf`	I found the text “aabbf” starting at index 0 and ending at index 5.	无
`aa.{2,}+f`	`aabbfaabf`	No match found.	无
`aa.{2,6}f`	`aabbfaabf`	I found the text “aabbfaabf” starting at index 0 and ending at index 9.	无
`aa.{2,6}?f`	`aabbfaabf`	I found the text “aabbf” starting at index 0 and ending at index 5.	无
`aa.{2,6}+f`	`aabbfaabf`	I found the text “aabbfaabf” starting at index 0 and ending at index 9.	无
`a?`	``	I found the text “” starting at index 0 and ending at index 0.	零长度匹配例子（目标字符串为空）
`a*`	``	I found the text “” starting at index 0 and ending at index 0.	零长度匹配例子（目标字符串为空）
`a?`	`abab`	I found the text “a” starting at index 0 and ending at index 1. I found the text “” starting at index 1 and ending at index 1. I found the text “a” starting at index 2 and ending at index 3. I found the text “” starting at index 3 and ending at index 3. I found the text “” starting at index 4 and ending at index 4.	零长度匹配例子
`a*`	`abaab`	I found the text “a” starting at index 0 and ending at index 1. I found the text “” starting at index 1 and ending at index 1. I found the text “aa” starting at index 2 and ending at index 4. I found the text “” starting at index 4 and ending at index 4. I found the text “” starting at index 5 and ending at index 5.	零长度匹配例子
`[abc]{3}`	`abccabaaaccbbbc`	I found the text “abc” starting at index 0 and ending at index 3. I found the text “cab” starting at index 3 and ending at index 6. I found the text “aaa” starting at index 6 and ending at index 9. I found the text “ccb” starting at index 9 and ending at index 12. I found the text “bbc” starting at index 12 and ending at index 15.	不只匹配“aaa”，“bbb”或者“ccc”

1.4、匹配组

1.4.1、定义

正则表达式中由“()”对括起来的子表达式被称为“匹配组”，它作为独立的匹配单元参与匹配。在正则表达式中，“匹配组”的数量就是“()”对的数量，“匹配组”的标号按照“(”符号出现的先后顺序进行确定：从“1标号-匹配组”开始。也存在“0标号-匹配组”，它是特殊的“匹配组”，它代表整个正则表达式。比如有((A)(B(C)))，其中“匹配组”数量为“4”，“1标号-匹配组”是“((A)(B(C)))”，“2标号-匹配组”是“(A)”，“3标号-匹配组”是“(B(C))”，“4标号-匹配组”是“(C)”，“0标号-匹配组”是“((A)(B(C)))”；又有(A)(B(C))，其中“匹配组”数量为“3”，“1标号-匹配组”是“(A)”，“2标号-匹配组”是“(B(C))”，“3标号-匹配组”是“(C)”，“0标号-匹配组”是“(A)(B(C))”。
可通过“\匹配组标号”形式在正则表达式引用相应“匹配组”在目标字符串中的匹配字符串内容，需要注意的是，不能使用“\0”，因为这是“语义非法的”。证明如下：现有一个正则表达式“abc\0”（a，b，c分别表示3个部分），为使该正则表达式有意义，“abc”3部分至少有一个不为空，假定该正则表达式存在对应的匹配字符串T，由于“T”与“\0”匹配，而“abc”至少有一个不为空，得到一个矛盾，因此假定不成立，即该正则表达式无对应的匹配字符串。

1.4.2、实验

正则表达式	目标字符串	匹配结果
`(\d\d)\1`	`1212`	I found the text “1212” starting at index 0 and ending at index 4.
`(\d\d)\1`	`1234`	No match found.

1.4.3、特殊匹配组

1.4.3.1、定义

特殊匹配组	定义	描述
(?idmsux)	“idmsux”无需被匹配，同时不被作为一个普通的“匹配组”。影响“匹配组”计数和通过“\匹配组标号”形式的引用行为。详见“2.1、Pattern类”小节的“等价正则表达式语法”	常用
(?:子表达式)	“子表达式”需被匹配，但不被作为一个普通的“匹配组”。影响“匹配组”计数和通过“\匹配组标号”形式的引用行为	常用
(?idmsux:子表达式)	“(?idmsux)”和“(?:子表达式)”的结合，“子表达式”需被匹配	无
(?=X)	详见[2][3]	无
(?!X)	详见[2][3]	无
(?<=X)	详见[2][3]	无
(?<!X)	详见[2][3]	无
(?>X)	详见[2][3]	无

1.4.3.2、实验

正则表达式	目标字符串	匹配结果	正则表达式中“匹配组”数量
`(?i)(ab)\1`	`ababiabi`	I found the text “abab” starting at index 0 and ending at index 4.	1
`(ac)(?:hello)(bd)\2`	`achellobdbdachellobdhello`	I found the text “achellobdbd” starting at index 0 and ending at index 11.	2

1.5、边界匹配符

1.5.1、定义

边界匹配府	含义
^	索引位置为“行首位置”
$	索引位置为“行尾位置”，“行尾位置”的定义前提是：过滤掉“行终止符”
\A	索引位置为“输入起始位置”
\Z	索引位置为“输入结尾位置”，此处“输入结尾位置”是指过滤掉“行终止符”后的“输入结尾位置”
\z	索引位置为“输入结尾位置”，此处“输入结尾位置”是指未过滤掉“行终止符”情形下的“输入结尾位置”
\b	索引位置为“单词边界位置”
\B	索引位置为“非单词边界位置”
\G	索引位置为“前一个匹配的结束位置”

如何更好地理解“边界匹配符”：正则表达式中的“边界匹配符”给相邻字符的匹配额外增加了一层约束。比如有正则表达式“\bhe\Bllo$”，则“h”字符在匹配时需额外满足“左侧索引位置为单词边界位置”条件，“e”和“l”字符在匹配时需额外满足“两者之间索引位置为非单词边界位置”条件，“o”字符在匹配时需额外满足“右侧索引位置为行尾位置”条件。

1.5.2、实验

正则表达式	目标字符串	匹配结果
`^aab$`	`aab`	I found the text “aab” starting at index 0 and ending at index 3.
`^aab$`	`aab\naab`	No match found.
`^dog[\w\W]*$`	`dog(two tab chars before)`	No match found.
`^(\s)dog[\w\W]$`	`dog(two tab chars before)`	I found the text “ dog(two tab chars before)” starting at index 0 and ending at index 33.
`^dog\w*`	`dogblahblah`	I found the text “dogblahblah” starting at index 0 and ending at index 11.
`\Aaab`	`aab\naab`	I found the text “aab” starting at index 0 and ending at index 3.
`aab\Z`	`aabaab`	I found the text “aab” starting at index 3 and ending at index 6.
`aab\Z`	`aabaab\n`	I found the text “aab” starting at index 3 and ending at index 6.
`aab\z`	`aabaab\n`	No match found.
`aab\n\z`	`aabaab\n`	I found the text “aab “ starting at index 3 and ending at index 7.
`he\Bllo\b \bworld`	`hello world`	I found the text “hello world” starting at index 0 and ending at index 11.
`\Gaab`	`aabaab`	I found the text “aab” starting at index 0 and ending at index 3. I found the text “aab” starting at index 3 and ending at index 6.
`\Gaab`	`aabcaab`	I found the text “aab” starting at index 0 and ending at index 3.

1.6、转义

1.6.1、定义

正则表达式中有元字符<([{\^-=$!|]})?*+.>，可通过两种方式进行转义：1）在元字符前加前导“\”字符；2）将元字符嵌在“\Q”和“\E”标记字符对之间。

备注：
“\”字符可转义后续单个元字符，“\Q”和“\E”标记字符对可转义被嵌套的字符串中的所有元字符。

1.6.2、实验

正则表达式	目标字符串	匹配结果
`\.`	`.`	I found the text “.” starting at index 0 and ending at index 1.
`\Q.\E`	`.`	I found the text “.” starting at index 0 and ending at index 1.
`\Q+*()\E`	`+*()`	I found the text “+*()” starting at index 0 and ending at index 4.

1.7、操作符

1.7.1、定义

操作符	定义
\|	“或”操作符

1.7.2、实验

正则表达式	目标字符串	匹配结果
`a(ac\|bd)b`	`aacbdfabdb`	I found the text “aacb” starting at index 0 and ending at index 4. I found the text “abdb” starting at index 6 and ending at index 10.

二、使用

2.1、Pattern类

2.1.1、常用用法

Pattern pattern=Pattern.compile(String regex)
Pattern pattern=Pattern.compile(String regex, int flags)

//生成相应于目标字符串“input”的Matcher对象
Matcher matcher=pattern.matcher(CharSequence input);

//以下两者等价，返回生成Pattern对象时的正则表达式
String toString();
String pattern();

其中Pattern.compile(String regex)等价于Pattern.compile(String regex, 0)。“flags”参数用于指定匹配模式，常见的匹配模式描述如下表，如果想使用混合匹配模式，可使用如int flags=Pattern.CANON_EQ | Pattern.CASE_INSENSITIVE的形式。

匹配模式	描述	等价正则表达式语法
Pattern.CANON_EQ	首先获取“正则表达式”和“目标字符串”各自的最终等价形式，再进行比较	无
Pattern.CASE_INSENSITIVE	“正则表达式”和“目标字符串”匹配时不区分大小写	(?i)
Pattern.COMMENTS	允许“正则表达式”中出现“空白字符和以‘#’开头的字符串”，这些被认为是注释而不参与匹配	(?x)
Pattern.DOTALL	使得“.”元字符能够匹配任意字符，包括行终止符	(?s)
Pattern.LITERAL	“正则表达式”中的所有字符被认为是没有特殊含义的“原义字符”	无
Pattern.MULTILINE	开启“多行模式”，默认为“单行模式”，从而影响“^”和“$”边界匹配符的含义。在“单行模式”中，“^”等价于“\A”，“$”等价于“\Z”；而在“多行模式”中，“^”不再等价于“\A”，“$”不再等价于“\Z”。具体内容见实验例子	(?m)
Pattern.UNICODE_CASE	与“Pattern.CASE_INSENSITIVE”联合使用。在开启“Pattern.CASE_INSENSITIVE”后，默认为“ASCII”语境下的“不区分大小写”，再开启“Pattern.UNICODE_CASE”，则为“Unicode”语境下的“不区分大小写”	(?u)
Pattern.UNIX_LINES	默认行终止符为“\r”和“\n”，开启“Pattern.UNIX_LINES”后，行终止符为“\n”	(?d)

上述的“等价正则表达式语法”所表示的含义是：类似如下两个代码片段等价。

1
2
3

String regex = "^aab$";
int flags = Pattern.MULTILINE;
Pattern pattern = Pattern.compile(regex, flags);

1 2	String regex = "(?m)^aab$"; Pattern pattern = Pattern.compile(regex);

2.1.2、实验

正则表达式	flags值	目标字符串	匹配结果	其他
`a\u030A`	0	`\u00E5`	No match found.	未首先获取“a\u030A”的最终等价形式“å”，此时“\u00E5”为“å”，故匹配失败
`a\u030A`	Pattern.CANON_EQ	`\u00E5`	I found the text “å” starting at index 0 and ending at index 1.	首先获取“a\u030A”的最终等价形式“å”，此时“\u00E5”为“å”，故匹配成功
`abc`	0	`ABC`	No match found.	无
`abc`	Pattern.CASE_INSENSITIVE	`ABC`	I found the text “ABC” starting at index 0 and ending at index 3.	无
`abc#comment`	0	`abc`	No match found.	无
`abc#comment`	Pattern.COMMENTS	`abc`	I found the text “abc” starting at index 0 and ending at index 3.	无
`.`	0	`\n`	No match found.	无
`.`	Pattern.DOTALL	`\n`	I found the text “ “ starting at index 0 and ending at index 1.	无
`\d`	0	`1`	I found the text “1” starting at index 0 and ending at index 1.	无
`\d`	Pattern.LITERAL	`1`	No match found.	无
`^aab$`	0	`aab\naab`	No match found.	无
`^aab$`	Pattern.MULTILINE	`aab\naab`	I found the text “aab” starting at index 0 and ending at index 3. I found the text “aab” starting at index 4 and ending at index 7.	无
`á`	0	`Á`	No match found.	程序运行编码环境需要为“UTF-8”
`á`	Pattern.UNICODE_CASE \| Pattern.CASE_INSENSITIVE	`Á`	I found the text “Á” starting at index 0 and ending at index 1.	程序运行编码环境需要为“UTF-8”
`a\Z`	0	`a\r`	I found the text “a” starting at index 0 and ending at index 1.	“\r”是行终止符
`a\Z`	0	`a\n`	I found the text “a” starting at index 0 and ending at index 1.	“\n”是行终止符
`.`	0	`\r\n`	No match found.	“\r”和“\n”是行终止符
`.`	Pattern.UNIX_LINES	`\r\n`	" starting at index 0 and ending at index 1.	“\n”是行终止符，“\r”不是行终止符

2.2、Matcher类

2.2.1、常用用法

“目标字符串”与“正则表达式”全匹配：

1	public boolean matches()

在“目标字符串”中查找与“正则表达式”匹配的匹配字符串：

//从上一个匹配的结束位置开始，寻找下一个匹配字符串
public boolean find()

//从指定位置开始，寻找下一个匹配字符串
public boolean find(int start)

//首先通过“find()”或者“find(int start)”方法获取一个匹配字符串，再通过以下方法获取该匹配字符串的内容和位置信息

//等价于group(0)
public String group()

//获取第“group”个匹配组对应的匹配字符串内容，“0”表示整个匹配字符串
public String group(int group)

//该匹配字符串的开始位置（对应“匹配模型”中的索引位置）
public int start()

//该匹配字符串的结束位置（对应“匹配模型”中的索引位置）
public int end()

//“指定匹配组”对应的匹配字符串内容的开始位置（对应“匹配模型”中的索引位置）
public int start(int group)

//“指定匹配组”对应的匹配字符串内容的结束位置（对应“匹配模型”中的索引位置）
public int end(int group)

2.2.2、实验

1、实验1
代码：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("a[0-9]b");

        Matcher matcher1 = pattern.matcher("a1b");
        Matcher matcher2 = pattern.matcher("a2");
        Matcher matcher3 = pattern.matcher("a3b4");

        System.out.println(matcher1.matches());
        System.out.println(matcher2.matches());
        System.out.println(matcher3.matches());
    }
}

结果：

1
2
3

true
false
false

2、实验2
代码：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("(ad)((bc)*)(df)");

        Matcher matcher = pattern.matcher("addfadbcdfadbcbcdf");
        while (matcher.find()) {
            System.out.println(output(matcher));
        }

        matcher.find(3);
        System.out.println("\n---start from 3---");
        System.out.println(output(matcher));

        matcher.find(5);
        System.out.println("\n---start from 5---");
        System.out.println(output(matcher));
    }

    public static String output(Matcher matcher) {
        StringBuilder sb = new StringBuilder();
        sb.append("[match str \"");
        sb.append(matcher.group());
        sb.append("\" ");
        sb.append("start pos ");
        sb.append(matcher.start());
        sb.append(" end pos ");
        sb.append(matcher.end());
        sb.append(" : ");
        sb.append("first group str \"");
        sb.append(matcher.group(1));
        sb.append("\" ");
        sb.append("start pos ");
        sb.append(matcher.start(1));
        sb.append(" end pos ");
        sb.append(matcher.end(1));
        sb.append(" ; ");
        sb.append("second group str \"");
        sb.append(matcher.group(2));
        sb.append("\" ");
        sb.append("start pos ");
        sb.append(matcher.start(2));
        sb.append(" end pos ");
        sb.append(matcher.end(2));
        sb.append(" ; ");
        sb.append("third group str \"");
        sb.append(matcher.group(3));
        sb.append("\" ");
        sb.append("start pos ");
        sb.append(matcher.start(3));
        sb.append(" end pos ");
        sb.append(matcher.end(3));
        sb.append(" ; ");
        sb.append("fourth group str \"");
        sb.append(matcher.group(4));
        sb.append("\" ");
        sb.append("start pos ");
        sb.append(matcher.start(4));
        sb.append(" end pos ");
        sb.append(matcher.end(4));
        sb.append("]");
        return sb.toString();
    }
}

结果：

[match str "addf" start pos 0 end pos 4 : first group str "ad" start pos 0 end pos 2 ; second group str "" start pos 2 end pos 2 ; third group str "null" start pos -1 end pos -1 ; fourth group str "df" start pos 2 end pos 4]
[match str "adbcdf" start pos 4 end pos 10 : first group str "ad" start pos 4 end pos 6 ; second group str "bc" start pos 6 end pos 8 ; third group str "bc" start pos 6 end pos 8 ; fourth group str "df" start pos 8 end pos 10]
[match str "adbcbcdf" start pos 10 end pos 18 : first group str "ad" start pos 10 end pos 12 ; second group str "bcbc" start pos 12 end pos 16 ; third group str "bc" start pos 14 end pos 16 ; fourth group str "df" start pos 16 end pos 18]

---start from 3---
[match str "adbcdf" start pos 4 end pos 10 : first group str "ad" start pos 4 end pos 6 ; second group str "bc" start pos 6 end pos 8 ; third group str "bc" start pos 6 end pos 8 ; fourth group str "df" start pos 8 end pos 10]

---start from 5---
[match str "adbcbcdf" start pos 10 end pos 18 : first group str "ad" start pos 10 end pos 12 ; second group str "bcbc" start pos 12 end pos 16 ; third group str "bc" start pos 14 end pos 16 ; fourth group str "df" start pos 16 end pos 18]

三、其他

上述描述内容中主要用到的实验代码如下：

import java.io.Console;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class Main {

    public static void main(String[] args) {
        Console console = System.console();
        if (console == null) {
            System.err.println("No console.");
            System.exit(1);
        }

        while (true) {
            Pattern pattern = Pattern.compile(console.readLine("%nEnter your regex: "));

            Matcher matcher = pattern.matcher(console.readLine("Enter input string to search: "));

            boolean found = false;
            while (matcher.find()) {
                console.format("I found the text" + " \"%s\" starting at " + "index %d and ending at index %d.%n",
                        matcher.group(), matcher.start(), matcher.end());
                found = true;
            }

            if (!found) {
                console.format("No match found.%n");
            }
        }
    }
}

需要注意的是：有些特殊字符并不能通过“运行上述代码而得到的命令行界面”输入获得，可使用在代码中显式指定的方式，比如“\n”代表的换行符。

四、常见匹配

4.1、IP地址

朴素版：

1	[1-9](\d){0,2}(\.[1-9](\d){0,2}){3}

精确版：

1	(25[0-5]\|2[0-4][0-9]\|1[0-9]{2}\|[1-9][0-9]\|[0-9])(\.(25[0-5]\|2[0-4][0-9]\|1[0-9]{2}\|[1-9][0-9]\|[0-9])){3}

4.2、邮箱

1	(\w)[\w\-](\.[\w\-]+)@[\w\-]+(\.[\w\-]+)*

4.3、11位手机号

1(\d){10}

4.4、中文字符

1	[\u4e00-\u9fa5]

参考文献： [1]https://docs.oracle.com/javase/tutorial/essential/regex/index.html [2]https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#jcc [3]《Mastering Regular Expressions, 3rd Edition》 [4]https://books.google.com/books?id=HgtdsuQdOEUC&pg=PA600&lpg=PA600&dq=java+final+terminator&source=bl&ots=wWh7rkHxwz&sig=WR_9SYY6b1zwpBq0hR8IdeTenp8&hl=zh-CN&sa=X&ved=0ahUKEwjTl_-g-PjTAhXkj1QKHfUSBmkQ6AEIWTAH#v=onepage&q=java%20final%20terminator&f=false