我的網站對xss跨站腳本攻擊（重點是富文本編輯器狀況）和sql注入攻擊的防範

時間 2019-11-18

標籤我的網站 xss 腳本攻擊重點文本編輯器狀況 sql 注入防範欄目網站開發简体版

原文原文鏈接

昨天本博客受到了xss跨站腳本注入攻擊，3分鐘攻陷……其實攻擊者進攻的手法很簡單，沒啥技術含量。只能感嘆本身以前居然徹底沒防範。javascript

這是數據庫裏留下的一些記錄。最後那人弄了一個無限循環彈出框的腳本，估計這個腳本以後他再想輸入也無法了。html

相似這種：java

<html>
     <body onload='while(true){alert(1)}'>
     </body>
</html>

我馬上認識到這事件嚴重性，它說明個人博客有嚴重安全問題。由於xss跨站腳本攻擊可能致使用戶Cookie甚至服務器Session用戶信息被劫持，後果嚴重。雖然攻擊者就用些未必有什麼技術含量的腳本便可作到。node

次日花些時間去了解，該怎麼防範。順便也看了sql注入方面。web

sql注入是源於sql語句的拼接。因此須要對用戶輸入參數化。因爲我使用的是jpa，不存在sql拼接問題，但仍是對一些用戶輸入作處理比較好。個人博客系統並不複雜，一共四個表，Article,User,Message,Comment。正則表達式

涉及數據庫查詢且由用戶輸入的就只有用戶名，密碼，文章標題。其它後臺產生的如文章日期一類就不用管。sql

對於這三個字段的校驗，可使用自定義註解方式。數據庫

/**
* @ClassName: IsValidString 
* @Description: 自定義註解實現先後臺參數校驗，判斷是否包含非法字符
* @author 無名
* @date 2016-7-25 下午8:22:58  
* @version 1.0
 */
@Target({ElementType.FIELD, ElementType.METHOD})
@Retention(RetentionPolicy.RUNTIME)
@Constraint(validatedBy = IsValidString.ValidStringChecker.class) @Documented public @interface IsValidString { String message() default "The string is invalid."; Class<?>[] groups() default {}; Class<? extends Payload>[] payload() default{}; class ValidStringChecker implements ConstraintValidator<IsValidString,String> { @Override public void initialize(IsValidString arg0) { } @Override public boolean isValid(String strValue, ConstraintValidatorContext context) { //校驗方法添在這裏 return true; } } }

定義了自定義註解之後就能夠在對應的實體類字段上添上@IsValidString便可。安全

但因爲我還沒研究出怎麼攔截自定義註解校驗返回的異常，就在controller類裏作校驗吧。服務器

    public static boolean contains_sqlinject_illegal_ch(String str_input) {
        //"[`~!@#$%^&*()+=|{}':;',\\[\\].<>/?~！@#￥%……&*（）——+|{}【】‘；：」「’。，、？]"
        String regEx = "['=<>;\"]"; Pattern p = Pattern.compile(regEx); Matcher m = p.matcher(str_input); if (m.find()) { return true; } else { return false; } }

攔截的字符有 ' " [] <> ;

我以爲這幾個就夠了吧。<>順便就解決了xxs跨站腳本注入問題。

而xxs跨站腳本注入問題仍是讓我很頭疼。由於個人博客系統使用wangEditor web文本編輯器，返回給後臺的包含不少合法的html標籤，用來表現文章格式。因此不能統一過濾<>這類字符。

例如，將<html><body onload='while(true){alert(1)}'></body></html>這句輸入編輯器，提交。後臺獲得的是：

中間被轉意的&lt,&gt是合法的可供頁面顯示的<>字符。而外面的 就是文本編輯器生產的用來控制格式的正常的html標籤。

問題在於，若是有人點擊編輯器「源代碼」標識，將文本編輯器生產的正常的html標籤，再輸入這句<html><body onload='while(true){alert(1)}'></body></html>結果返回後臺的就是原封不動的<html><body onload='while(true){alert(1)}'></body></html> <和>沒有變成&lt和&gt。

這讓人頭痛，我在想這個編輯器爲何提供什麼狗屁查看源代碼功能，致使不能統一對<>。

在這種狀況下，我只能過濾一部分認準是有危害的html標籤，而衆所周知，這類黑名單校驗是不夠安全的。(2016-12-30:下面這個函數是確定不行的，寫的很蠢，下文已經把它幹掉，用白名單校驗，並應用正則表達式的方式來作)

    /*
     * Cross-site scripting (XSS) is a type of computer security vulnerability
     * typically found in web applications. XSS enables attackers to inject
     * client-side scripts into web pages viewed by other users. A cross-site
     * scripting vulnerability may be used by attackers to bypass access
     * controls such as the same-origin policy. Cross-site scripting carried out
     * on websites accounted for roughly 84% of all security vulnerabilities
     * documented by Symantec as of 2007. Their effect may range from a petty
     * nuisance to a significant security risk, depending on the sensitivity of
     * the data handled by the vulnerable site and the nature of any security
     * mitigation implemented by the site's owner.(From en.wikipedia.org)
     */
    public static boolean contains_xss_illegal_str(String str_input) {
        if (str_input.contains("<html") || str_input.contains("<HTML") || str_input.contains("<body") || str_input.contains("<BODY") || str_input.contains("<script") || str_input.contains("<SCRIPT") || str_input.contains("<link") || str_input.contains("<LINK") || str_input.contains("%3Cscript") || str_input.contains("%3Chtml") || str_input.contains("%3Cbody") || str_input.contains("%3Clink") || str_input.contains("%3CSCRIPT") || str_input.contains("%3CHTML") || str_input.contains("%3CBODY") || str_input.contains("%3CLINK") || str_input.contains("<META") || str_input.contains("<meta") || str_input.contains("%3Cmeta") || str_input.contains("%3CMETA") || str_input.contains("<style") || str_input.contains("<STYLE") || str_input.contains("%3CSTYLE") || str_input.contains("%3Cstyle") || str_input.contains("<xml") || str_input.contains("<XML") || str_input.contains("%3Cxml") || str_input.contains("%3CXML")) { return true; } else { return false; } }

我在考慮着把這個文本編輯器的查看源代碼功能給幹掉。

另外，仍是要系統學習xss跨站腳本注入防範。開始看一本書《白帽子講web安全》，以爲這本書不錯。

到時候有新看法再在這篇文章補充。

2016-12-30日補充：

今天讀了那本《白帽子講web安全》，果真獲益很多。其中提到富文本編輯器的狀況，因爲富文本編輯器自己會使用正常的一些html標籤，因此須要作白名單校驗。只容許使用一些肯定安全的標籤，除富文本編輯器使用的標籤，其餘的都過濾掉。這是白名單方式，是真正合理的。

另外下午研究下正則表達式的寫法：<([^(a)(img)(div)(p)(span)(pre)(br)(code)(b)(u)(i)(strike)(font)(blockquote)(ul)(li)(ol)(table)(tr)(td)(/)][^>]*)>（2016-12-30夜-2016-12-31 發現這個正則有誤，下面就繼續補充）

[^]是非的意思。

上面的正則的意思就是若含有a、img、div……以外的標籤則匹配。

    /*
     * Cross-site scripting (XSS) is a type of computer security vulnerability
     * typically found in web applications. XSS enables attackers to inject
     * client-side scripts into web pages viewed by other users. A cross-site
     * scripting vulnerability may be used by attackers to bypass access
     * controls such as the same-origin policy. Cross-site scripting carried out
     * on websites accounted for roughly 84% of all security vulnerabilities
     * documented by Symantec as of 2007. Their effect may range from a petty
     * nuisance to a significant security risk, depending on the sensitivity of
     * the data handled by the vulnerable site and the nature of any security
     * mitigation implemented by the site's owner.(From en.wikipedia.org)
     */
    public static boolean contains_xss_illegal_str(String str_input) {
        final String REGULAR_EXPRESSION =
                "<([^(a)(img)(div)(p)(span)(pre)(br)(code)(b)(u)(i)(strike)(font)(blockquote)(ul)(li)(ol)(table)(tr)(td)(/)][^>]*)>"; Pattern pattern = Pattern.compile(REGULAR_EXPRESSION); Matcher matcher = pattern.matcher(str_input); if (matcher.find()) { return true; } else { return false; } }

2016-12-30夜-2016-12-31 補充：

實驗發現前面寫的那個正則表達式是無效的。同時發現這個正則是很是難寫、頗有技術含量的，對於我這個基本正則都不太熟悉的菜鳥來講。

這種‘非’的表達，不能簡單的用上面提到的[^]。那種沒法匹配字符串的非。例如(a[^bc]d)表示地是ad其中的字符串不能爲b或c。

對於字符串的非，應該用這種表達式：^(?!.*helloworld).*$

以此爲前提，下面的正則能夠表達不爲的html標籤：

<((?!p)[^>])> 後面[^]表示<>中只有一個字符(?!p)且第一個字符非p

若寫成<((?!p)[^>]*)>則表示有n個字符，且第一個字符非p

    @Test
    public void test_Xss_check() { System.out.println("begin"); String str_input = "<p>"; final String REGULAR_EXPRESSION = "<((?!p)[^>])>"; Pattern pattern = Pattern.compile(REGULAR_EXPRESSION); Matcher matcher = pattern.matcher(str_input); if (matcher.find()) { System.out.println("yes"); } }

那麼該如何匹配，不爲AA且不爲BB的html標籤呢？

<((?!p)(?!a)[^>]*)>匹配的就是不以p開頭且不以a開頭html標籤！

咱們要求的匹配的是：不爲、不爲<ul>、不爲<li>……且不以<a 開頭、不以<img 開頭、不以</開頭……的html標籤。該如何寫？

先寫一個簡單的例子：<(((?!p )(?!a )[^>]*)((?!p)(?!a).))>匹配的是非且非<a xxxx>且非且非<a>的<html>標籤。

例如，字符串<pasd>則匹配，則不匹配，則不匹配。然而不精準的一點是，<ppp>或<aaa>也不匹配。其餘問題也有，例如非<table>的標籤就不知道該怎麼表示。

總之感受這個正則很難寫，超出了個人能力範圍。因此最後決定用正則先篩選html標籤，再由java代碼作白名單篩選。

用於篩選html標籤的正則是<(?!a )(?!p )(?!img )(?!code )(?!spab )(?!pre )(?!font )(?!/)[^>]*>，篩選到的html排除掉<a xxx><img xx></>等等，由於那些是默認合法的。篩選獲得的<html>標籤存進List裏，再作白名單校驗。

代碼以下：

    @Test
    public void test_Xss_check() {
        String str_input =
                "<a ss><script>sds<body><a></adsd><d/s><p dsd><pp><a><dsds>dsdas<font ds>" +
                "<fontdsdsd><font>das<oooioacc><pp sds><script><code ><br><code><ccc><abug>";
        System.out.println("String inputed:" + str_input);
        final String REGULAR_EXPRESSION = 
                "<(?!a )(?!p )(?!img )(?!code )(?!spab )(?!pre )(?!font )(?!/)[^>]*>";
        final Pattern PATTERN = Pattern.compile(REGULAR_EXPRESSION);
        final Matcher MATCHER = PATTERN.matcher(str_input);
        List<String> str_lst = new ArrayList<String>();
        while (MATCHER.find()) {
            str_lst.add(MATCHER.group());
        }
        final String  LEGAL_TAGS = "<a><img><div><p><span><pre><br><code>" +
                "<b><u><i><strike><font><blockquote><ul><li><ol><table><tr><td>";
        for (String str:str_lst) {
            if (!LEGAL_TAGS.contains(str)) {
                    System.out.println(str + " is illegal");
            }
        }
    }

上述代碼輸出爲：

String inputed:<a ss><script>sds<body><a></adsd><d/s><pp><a><dsds>dsdas<fontdsdsd>das<oooioacc><pp sds><script><code > <code><ccc><abug>
<script> is illegal
<body> is illegal
<d/s> is illegal
<pp> is illegal
<dsds> is illegal
<fontdsdsd> is illegal
<oooioacc> is illegal
<pp sds> is illegal
<script> is illegal
<ccc> is illegal
<abug> is illegal

2017年1月1日

新年好，然而，不得再也不說下這個xss白名單校驗的新進展。昨天，更新了上述校驗方法。那個腳本小子又來了，根據上文內容可知，我如今作到的是隻限定有限的html標籤，但沒對標籤屬性作限制。結果這個腳本小子就拿這個作文章。好比把p標籤設爲絕對定位，綁定指定位置，設置長寬，一類的……

並且onclick、onload這些東西不少標籤都有。

因此上文所述的方法寫的也不夠。但又感受去再校驗屬性對我來講好麻煩。就上網上找找別人怎麼作的。最後就找到了jsoup這個開源jar包。

https://jsoup.org/download

引入jar包後，這樣寫便可：

articleContent = Jsoup.clean(articleContent, Whitelist.basicWithImages());

媽的，能用輪子就儘快用，本身造太難了，浪費我五天。

最後，祝天下全部腳本猴子，2017年倒大黴！！！

2017年1月10日

上次加了Jsoup的過濾後，感受寫博客方面有些問題。明顯是一些不應被過濾的標籤被過濾掉了。

articleContent = Jsoup.clean(articleContent,Whitelist.basicWithImages());

以爲有必要繼續處理。

設置斷點調試。做爲例子，博客中寫這樣的html代碼：

<html>
     <body>
    <audio controls="controls" autoplay="autoplay" height="100" width="100">
            <source src="<%=basePath %>music/Breath and Life.mp3" type="audio/mp3" />
          <source src="<%=basePath %>music/Breath and Life.ogg" type="audio/ogg" />
          <embed height="100" width="100" src="<%=basePath %>music/Breath and Life.mp3" />
     </audio>
    <script type="text/javascript" src="<%=basePath %>js/global.js"></script>
    <script type="text/javascript" src="<%=basePath %>js/photos.js"></script>
    </body>
</html>

富文本編輯器傳到後臺的字符串爲：

hello，日向blog<pre style="max-width:100%;overflow-x:auto;"><code class="html hljs xml"
codemark="1"><html>
<body>
<audio controls="controls" autoplay="autoplay" height="100" width="100">
<source src="<%=basePath %>music/Breath and Life.mp3" type="audio/mp3" />
<source src="<%=basePath %>music/Breath and Life.ogg" type="audio/ogg" />
<embed height="100" width="100" src="<%=basePath %>music/Breath and Life.mp3" />
</audio>
<script type="text/javascript" src="<%=basePath
%>js/global.js"></script>
<script type="text/javascript" src="<%=basePath
%>js/photos.js"></script>
</body>
</html></code></pre>

經jsoup過濾後的值爲：

hello，日向blog
<pre><code><html>
<body>
<audio controls="controls" autoplay="autoplay" height="100" width="100">
<source src="<%=basePath %>music/Breath and Life.mp3" type="audio/mp3" />
<source src="<%=basePath %>music/Breath and Life.ogg" type="audio/ogg" />
<embed height="100" width="100" src="<%=basePath %>music/Breath and Life.mp3" />
</audio>
<script type="text/javascript" src="<%=basePath %>js/global.js"></script>
<script type="text/javascript" src="<%=basePath %>js/photos.js"></script>
</body>
</html></code></pre>

顯然pre標籤的style、span和code標籤的class屬性被過濾掉了，而這些屬性是無害而必須的。因此，咱們須要改動jsoup原有的白名單。

查看代碼，瞭解到Jsoup的過濾是經過傳入Whitelist.basicWithImages()這個參數實現的，這是個白名單。

查看其源代碼：

    /**
     <p>
     This whitelist allows a fuller range of text nodes: <code>a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li,
     ol, p, pre, q, small, span, strike, strong, sub, sup, u, ul</code>, and appropriate attributes.
     </p>
     <p>
     Links (<code>a</code> elements) can point to <code>http, https, ftp, mailto</code>, and have an enforced
     <code>rel=nofollow</code> attribute.
     </p>
     <p>
     Does not allow images.
     </p>

     @return whitelist
     */
    public static Whitelist basic() {
        return new Whitelist()
                .addTags(
                        "a", "b", "blockquote", "br", "cite", "code", "dd", "dl", "dt", "em",
                        "i", "li", "ol", "p", "pre", "q", "small", "span", "strike", "strong", "sub",
                        "sup", "u", "ul")

                .addAttributes("a", "href")
                .addAttributes("blockquote", "cite")
                .addAttributes("q", "cite")

                .addProtocols("a", "href", "ftp", "http", "https", "mailto")
                .addProtocols("blockquote", "cite", "http", "https")
                .addProtocols("cite", "cite", "http", "https")

                .addEnforcedAttribute("a", "rel", "nofollow")
                ;

    }

    /**
     This whitelist allows the same text tags as {@link #basic}, and also allows <code>img</code> tags, with appropriate
     attributes, with <code>src</code> pointing to <code>http</code> or <code>https</code>.

     @return whitelist
     */
    public static Whitelist basicWithImages() {
        return basic()
                .addTags("img")
                .addAttributes("img", "align", "alt", "height", "src", "title", "width")
                .addProtocols("img", "src", "http", "https")
                ;
    }

我作了修改後爲：

   public static Whitelist basic() {
       return new Whitelist()
               .addTags(
                       "a", "b", "blockquote", "br", "cite", "code", "dd", "dl", "dt", "em",
                       "i", "li", "ol", "p", "pre", "q", "small", "span", "strike", "strong", "sub",
                       "sup", "u", "ul")

               .addAttributes("a", "href")
               .addAttributes("blockquote", "cite")
               .addAttributes("q", "cite")
               .addAttributes("code", "class")
               .addAttributes("span", "class")
               .addAttributes("pre", "style")

               .addProtocols("a", "href", "ftp", "http", "https", "mailto")
               .addProtocols("blockquote", "cite", "http", "https")
               .addProtocols("cite", "cite", "http", "https")

               .addEnforcedAttribute("a", "rel", "nofollow")
               ;

   }