2017-09-29

Java正则表达式中的匹配模式

前言：最开始看的是这篇文章“Java正则表达中Greedy Reluctant Possessive 的区别”来进行学习理解，但是发现这篇文章说得并不是很清楚，经过最终领悟，现在将自己的理解作为笔记分享出来。

从Java的官方文档中我们可以看到，正则表达式表示数量词的符号有三套，分别是Greedy(贪婪的)、Reluctant(勉强的)和Possessive(独占的)。
regex_quantifers

通过官方文档对比我们可以看出规律：
贪婪模式即在X字符后面增加限定符号如：？、*、+、{n}、{n,}、{n,m}

懒惰模式是在带有限定符号:？、*、+、{n}、{n,}、{n,m}的后面，增加”?”，如：X??

独占模式也叫侵占模式，是在带有限定符号:？、*、+、{n}、{n,}、{n,m}的后面，增加”+”，如：X?+

用代码说话

1、Greedy 贪婪模式

public static void main(String[] args) {
    String targetStr = "xfooxxxxxxxfoo";
    Pattern pattern = Pattern.compile(".*foo");
    Matcher matcher = pattern.matcher(targetStr);

    while (matcher.find()) {
        System.out.println("find text = "
            + matcher.group()
            + "   start index = "
            + matcher.start()
            + "  end index = "
            + matcher.end());
    }
}

输出结果：

find text = xfooxxxxxxxfoo   start index = 0  end index = 14

2、Reluctant 懒惰模式

 public static void main(String[] args) {
    String targetStr = "xfooxxxxxxxfoo";
    Pattern pattern = Pattern.compile(".*?foo");

    Matcher matcher = pattern.matcher(targetStr);

    while (matcher.find()) {
        System.out.println("find text = "
            + matcher.group()
            + "   start index = "
            + matcher.start()
            + "  end index = "
            + matcher.end());
    }
}

输出结果：

   find text = xfoo   start index = 0  end index = 4
find text = xxxxxxxfoo   start index = 4  end index = 14

3、Possessive 独占模式

public static void main(String[] args) {
    String targetStr = "xfooxxxxxxxfoo";
    Pattern pattern = Pattern.compile(".*+foo");

    Matcher matcher = pattern.matcher(targetStr);

    while (matcher.find()) {
        System.out.println("find text = "
            + matcher.group()
            + "   start index = "
            + matcher.start()
            + "  end index = "
            + matcher.end());
    }
}

输出结果为空白,即matcher.find()返回false,未匹配成功.

下面说说原理：

Greedy“贪婪模式”是因为匹配器被强制要求第一次尝试匹配时就读入整个输入串，如果第一次尝试匹配失败，则从后往前逐个字符地回退并尝试再次匹配，直到匹配成功或没有字符可回退。
模式串：.*foo
查找串：xfooxxxxxxxfoo
结果：find text = xfooxxxxxxxfoo start index = 0 end index = 14
greedy

要点：一次性读入整个输入串，第一次匹配整个字符串时，符合 .* ,此时到达末尾，回退1个字符，继续进行第二次匹配，依次类推，直到回退到 foo时，匹配成功，此时 match.find()==true,输出结果

Reluctant采用与Greedy相反的方法，它从输入串的首字符位置开始，在一次尝试匹配查找中只勉强地读一个字符，直到尝试完整个字符串。
模式串：.*?foo
查找串：xfooxxxxxxxfoo
结果： find text = xfoo start index = 0 end index = 4
find text = xxxxxxxfoo start index = 4 end index = 14

reluctant

这个应该也很容易理解，主要思想同greedy，只是首次读入是从头开始。

最难理解的应该是独占模式吧（其实特别简单，哈哈）：

Possessive（独占模式）同greedy一样一次性读入整个输入串，但是只尝试一次仅且一次，Possessive从不回退.

possessive

(参考greedy的图的第一次匹配，区别在于执行第一次匹配后不再进行第二次回退匹配，直接返回matcher.find结果)

根据greedy要点推出，当第一次读取匹配时，读取在末尾，此时只符合. ,并未匹配到foo,所以直接返回matcher.find()== false
模式串：.+foo
查找串：xfooxxxxxxxfoo
结果：
//未匹配成功

到此，应该就明白了吧！

简单总结一下：
贪婪性：能够使匹配成功的最大可能重复数
勉强性：匹配最小的重复次数
独占性：即使让整个匹配失败，也要匹配最大的重复数。

下面来个”独占模式”成功匹配的例子

目标匹配字段：adbcde
匹配模式：[a-z]*+
匹配结果，匹配成功，mactcher.find()==true,matcher.group()==””,匹配成功的开始位置为6，结束位置为6，刚好是目标匹配字符的长度

    public static void main(String[] args) {
    String targetStr = "adbcde";
    Pattern pattern = Pattern.compile("[a-z]*+");

    Matcher matcher = pattern.matcher(targetStr);

    while (matcher.find()) {
        System.out.println(matcher.find());
        System.out.println("find text = "
            + matcher.group()
            + "   start index = "
            + matcher.start()
            + "  end index = "
            + matcher.end());
    }
}

结果输出：

true
find text =    start index = 6  end index = 6

个人觉得独占模式的使用率不高，因为太严格了，严格到没出路！！！

本文标题:Java正则表达式中的匹配模式

文章作者:qingsong.xu

发布时间:2017年9月29日 - 14时09分

最后更新:2017年9月30日 - 15时09分

原始链接:http://qingsong-xu.github.io/2017/09/29/Java正则表达式中的匹配模式/

许可协议: "署名-非商用-相同方式共享 3.0" 转载请保留原文链接及作者。