String.replaceAll 正則表達式特殊字符橫線-

需求,把以下字符替換成空格:html

!#$%&()[]*+-@?{|}~¢£¤¥¦§©ª«¬­®¯°±²³µ¶¹º»¼«½¾¿×~‘’`_\\^þÞ¡¨!<>\'*˝´\"ſß÷ΓΔΘΛΞΠΣΦΨΩγδθΛΦЂЃЉЊЋЍЏБДЖЗИЙЛФЦШЧЩЪЫЬЭЮЯ‐–—―‘’‚「」„†‡…•‰‹›‽₂₁₀ⁿ⁾⁽⁼⁻⁺⁹⁸⁷⁶⁵⁴⁰⁄₃₄₅₆₇₈₉₊₋₌₎₍€℅ℓ№℗⅟⅞⅝⅜⅛⅚⅙⅘⅗⅖⅕⅔⅓℮Ω™℠←↑→↓↔↕↖↗↘↙∂∆∏∑−∙√fflffiflfiff◊≥≤≠≈∫∞ѲҐΏГПѝѢjava

天然考慮使用String的replaceAll來替換,jdk中此方法的定義以下:正則表達式

/**
     * Replaces each substring of this string that matches the given <a
     * href="../util/regex/Pattern.html#sum">regular expression</a> with the
     * given replacement.
     *
     * <p> An invocation of this method of the form
     * <i>str</i>{@code .replaceAll(}<i>regex</i>{@code ,} <i>repl</i>{@code )}
     * yields exactly the same result as the expression
     *
     * <blockquote>
     * <code>
     * {@link java.util.regex.Pattern}.{@link
     * java.util.regex.Pattern#compile compile}(<i>regex</i>).{@link
     * java.util.regex.Pattern#matcher(java.lang.CharSequence) matcher}(<i>str</i>).{@link
     * java.util.regex.Matcher#replaceAll replaceAll}(<i>repl</i>)
     * </code>
     * </blockquote>
     *
     *<p>
     * Note that backslashes ({@code \}) and dollar signs ({@code $}) in the
     * replacement string may cause the results to be different than if it were
     * being treated as a literal replacement string; see
     * {@link java.util.regex.Matcher#replaceAll Matcher.replaceAll}.
     * Use {@link java.util.regex.Matcher#quoteReplacement} to suppress the special
     * meaning of these characters, if desired.
     *
     * @param   regex
     *          the regular expression to which this string is to be matched
     * @param   replacement
     *          the string to be substituted for each match
     *
     * @return  The resulting {@code String}
     *
     * @throws  PatternSyntaxException
     *          if the regular expression's syntax is invalid
     *
     * @see java.util.regex.Pattern
     *
     * @since 1.4
     * @spec JSR-51
     */
    public String replaceAll(String regex, String replacement) {
        return Pattern.compile(regex).matcher(this).replaceAll(replacement);
    }

第一個參數是正則表達式,把須要替換的字符放到[]中,而後放入第一個參數,這還沒完,須要把這些字符中的屬於正則表達式的特殊字符轉義一下。express

特殊字符可見以下連接:連接測試

把特殊字符抽取出來,單獨替換,代碼以下:this

result = result.replaceAll("[\\$\\(\\)\\*\\+\\.\\[\\]\\?\\\\^\\{\\}\\|]", " ");
        result = result.replaceAll("[!#%&-@~¢£¤¥¦§©ª«¬\u00AD®¯°±²³µ¶¹º»¼«½¾¿×~‘’`_þÞ¡¨!<>'˝´\"ſß÷ΓΔΘΛΞΠΣΦΨΩγδθΛΦЂЃЉЊЋЍЏБДЖЗИЙЛФЦШЧЩЪЫЬЭЮЯ‐–—―‘’‚「」„†‡…•‰‹›‽₂₁₀ⁿ⁾⁽⁼⁻⁺⁹⁸⁷⁶⁵⁴⁰⁄₃₄₅₆₇₈₉₊₋₌₎₍€℅ℓ№℗⅟⅞⅝⅜⅛⅚⅙⅘⅗⅖⅕⅔⅓℮Ω™℠←↑→↓↔↕↖↗↘↙∂∆∏∑−∙√fflffiflfiff◊≥≤≠≈∫∞ѲҐΏГПѝѢ]", " ");

寫完了以後測試發現數字也能夠被替換掉,這就奇怪了,使用二分法來篩選究竟是哪塊除了問題,最後定位到&-@,原來橫線也是特殊字符,只要ASCII碼在&(38)和@(64)之間的(好比數字、括號、星號、加號)都會知足正則表達式。把它也抽取出來轉義就行了,以下:spa

result = result.replaceAll("[\\$\\(\\)\\*\\+\\.\\[\\]\\?\\\\^\\{\\}\\|\\-]", " ");
        result = result.replaceAll("[!#%&@~¢£¤¥¦§©ª«¬\u00AD®¯°±²³µ¶¹º»¼«½¾¿×~‘’`_þÞ¡¨!<>'˝´\"ſß÷ΓΔΘΛΞΠΣΦΨΩγδθΛΦЂЃЉЊЋЍЏБДЖЗИЙЛФЦШЧЩЪЫЬЭЮЯ‐–—―‘’‚「」„†‡…•‰‹›‽₂₁₀ⁿ⁾⁽⁼⁻⁺⁹⁸⁷⁶⁵⁴⁰⁄₃₄₅₆₇₈₉₊₋₌₎₍€℅ℓ№℗⅟⅞⅝⅜⅛⅚⅙⅘⅗⅖⅕⅔⅓℮Ω™℠←↑→↓↔↕↖↗↘↙∂∆∏∑−∙√fflffiflfiff◊≥≤≠≈∫∞ѲҐΏГПѝѢ]", " ");
相關文章
相關標籤/搜索