WebMagic學習-解析json

這篇文章要解決什麼

當頁面使用前端ajax方式渲染的頁面數據時,頁面會使用js請求ajaxUrl獲取json格式數據時,而後再用js把數據解析並渲染到頁面的指定位置上。html

 

錯誤寫法

當爬蟲要住區ajaxUrl返回的json格式數據時,我當時是這樣寫的:前端

// 聲明:下面這種方法是錯誤的。是我沒有看官方demo的時候憑感受寫出的解析json數據的代碼。
JsonPathSelector json = new JsonPathSelector(page.getRawText());
List<String> name = json.selectList("$.data.itemList[*].brand.name");
List<String> uri = json.selectList("$.data.itemList[*].brand.uri");

看了官方us.codecraft.webmagic.selector.JsonPathSelectorTest,才知道原來參數寫錯了。java

 

正確寫法

JsonPathSelector(String jsonPathStr)這個構造函數的參數是jsonPathStr,也就是提取規則的字符串web

String select(String text)方法List<String> selectList(String text)方法,參數都是text,也就是json的字符串。ajax

 

Demo

package us.codecraft.webmagic.selector;

import org.junit.Test;

import java.util.List;

import static org.assertj.core.api.Assertions.assertThat;

/**
 * @author code4crafter@gmai.com <br>
 */
public class JsonPathSelectorTest {

    private String text = "{ \"store\": {\n" +
            "    \"book\": [ \n" +
            "      { \"category\": \"reference\",\n" +
            "        \"author\": \"Nigel Rees\",\n" +
            "        \"title\": \"Sayings of the Century\",\n" +
            "        \"price\": 8.95\n" +
            "      },\n" +
            "      { \"category\": \"fiction\",\n" +
            "        \"author\": \"Evelyn Waugh\",\n" +
            "        \"title\": \"Sword of Honour\",\n" +
            "        \"price\": 12.99,\n" +
            "        \"isbn\": \"0-553-21311-3\"\n" +
            "      }\n" +
            "    ],\n" +
            "    \"bicycle\": {\n" +
            "      \"color\": \"red\",\n" +
            "      \"price\": 19.95\n" +
            "    }\n" +
            "  }\n" +
            "}";

    @Test
    public void testJsonPath() {
    	System.out.println("須要解析的json:"+text);

        JsonPathSelector jsonPathSelector = new JsonPathSelector("$.store.book[*].author");
        String select = jsonPathSelector.select(text);
        List<String> list = jsonPathSelector.selectList(text);

        assertThat(select).isEqualTo("Nigel Rees");
        assertThat(list).contains("Nigel Rees","Evelyn Waugh");

        jsonPathSelector = new JsonPathSelector("$.store.book[?(@.category == 'reference')]");
        list = jsonPathSelector.selectList(text);
        select = jsonPathSelector.select(text);

        System.out.println("select方法的結果:\t"+select);
        System.out.println("selectList方法的結果:\t"+list);

        assertThat(select).isEqualTo("{\"author\":\"Nigel Rees\",\"price\":8.95,\"category\":\"reference\",\"title\":\"Sayings of the Century\"}");
        assertThat(list).contains("{\"author\":\"Nigel Rees\",\"price\":8.95,\"category\":\"reference\",\"title\":\"Sayings of the Century\"}");
    }
}

 

我的看法

我以爲這個實現不太好。在一個page中,jsonStr是同樣的,而提取規則不一樣。若是每次都new 一個新的JsonPathSelector做爲提取規則,那要建立多少對象啊。並且和下面這種實現比較來講,提取規則開發方式不一樣:json

String brand_price = html.xpath("//span[@id=\"item-sellprice\"]/text()").toString();
String brand_img = html.xpath("//img[@id=\"brand-img\"]/@src").toString();
String brand_describe = html.xpath("//p[@id=\"brand-describe\"]/text()").toString();
String location_text = html.xpath("//span[@id=\"location-text\"]/text()").toString();

估計不是我本身出現這種問題吧,因此就記錄一下。嘿嘿。api

相關文章
相關標籤/搜索