Java爬取51job保存到MySQL並進行分析

大二下實訓課結業做業,想着就爬個工做信息,本來是要用python的,後面想一想就用java試試看,html

java就自學了一個月左右,想要鍛鍊一下本身面向對象的思想等等的,java

而後網上轉了一圈,拉鉤什麼的是動態生成的網頁,51job是靜態網頁,比較方便,就決定爬51job了。node

 

前提:python

建立Maven Project方便包管理mysql

使用httpclient 3.1以及jsoup1.8.3做爲爬取網頁和篩選信息的包,這兩個版本用的人多。sql

mysql-connect-java 8.0.13用來將數據導入數據庫,支持mysql8.0+數據庫

分析使用,tablesaw(可選,會用的就行)
api

 

「大數據+上海」以此URL爲例子,只要是相似的URL均可行瀏覽器

https://search.51job.com/list/020000,000000,0000,00,9,99,%25E5%25A4%25A7%25E6%2595%25B0%25E6%258D%25AE,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=app

 

先設計了個大概的功能,修改了好幾版,最後以爲這樣思路比較清晰,以JobBean容器做爲全部功能的媒介

 

先完成爬取網頁,以及保存到本地

建立JobBean對象

public class JobBean {
    private String jobName;
    private String company;
    private String address;
    private String salary;
    private String date;
    private String jobURL;
    
    public JobBean(String jobName, String company, String address, String salary, String date, String jobURL) {
        this.jobName = jobName;
        this.company = company;
        this.address = address;
        this.salary = salary;
        this.date = date;
        this.jobURL = jobURL;
    }
    
    
    
    @Override
    public String toString() {
        return "jobName=" + jobName + ", company=" + company + ", address=" + address + ", salary=" + salary
                + ", date=" + date + ", jobURL=" + jobURL;
    }



    public String getJobName() {
        return jobName;
    }
    public void setJobName(String jobName) {
        this.jobName = jobName;
    }
    public String getCompany() {
        return company;
    }
    public void setCompany(String company) {
        this.company = company;
    }
    public String getAddress() {
        return address;
    }
    public void setAddress(String address) {
        this.address = address;
    }
    public String getSalary() {
        return salary;
    }
    public void setSalary(String salary) {
        this.salary = salary;
    }
    public String getDate() {
        return date;
    }
    public void setDate(String date) {
        this.date = date;
    }
    public String getJobURL() {
        return jobURL;
    }
    public void setJobURL(String jobURL) {
        this.jobURL = jobURL;
    }
}

而後寫一個用於保存容器的工具類,這樣在任何階段均可以保存容器

import java.io.*;
import java.util.*;

/**實現
 * 1。將JobBean容器存入本地
 * 2.從本地文件讀入文件爲JobBean容器(有篩選)
 * @author PowerZZJ
 *
 */
public class JobBeanUtils {
    
    /**保存JobBean到本地功能實現
     * @param job
     */
    public static void saveJobBean(JobBean job) {
        try(BufferedWriter bw =
                new BufferedWriter(
                        new FileWriter("JobInfo.txt",true))){
            String jobInfo = job.toString();
            bw.write(jobInfo);
            bw.newLine();
            bw.flush();
        }catch(Exception e) {
            System.out.println("保存JobBean失敗");
            e.printStackTrace();
        }
    }
    
    /**保存JobBean容器到本地功能實現
     * @param jobBeanList JobBean容器
     */
    public static void saveJobBeanList(List<JobBean> jobBeanList) {
        System.out.println("正在備份容器到本地");
        for(JobBean jobBean : jobBeanList) {
            saveJobBean(jobBean);
        }
        System.out.println("備份完成,一共"+jobBeanList.size()+"條信息");
    }
    
    /**從本地文件讀入文件爲JobBean容器(有篩選)
     * @return jobBean容器
     */
    public static List<JobBean> loadJobBeanList(){
        List<JobBean> jobBeanList = new ArrayList<>();
        try(BufferedReader br = 
                new BufferedReader(
                        new FileReader("JobInfo.txt"))){
            String str = null;
            while((str=br.readLine())!=null) {
                //篩選,有些公司名字帶有","不規範,直接跳過
                try {
                    String[] datas = str.split(","); 
                    String jobName = datas[0].substring(8);
                    String company = datas[1].substring(9);
                    String address = datas[2].substring(9);
                    String salary = datas[3].substring(8);
                    String date = datas[4].substring(6);
                    String jobURL = datas[5].substring(8);
                    //篩選,所有都不爲空,工資是個區間,URL以https開頭,才創建JobBean
                    if (jobName.equals("") || company.equals("") || address.equals("") || salary.equals("")
                            || !(salary.contains("-"))|| date.equals("") || !(jobURL.startsWith("http")))
                        continue;
                    JobBean jobBean = new JobBean(jobName, company, address, salary, date, jobURL);
                    //放入容器
                    jobBeanList.add(jobBean);
                }catch(Exception e) {
                    System.out.println("本地讀取篩選:有問題須要跳過的數據行:"+str);
                    continue;
                }
            }
            System.out.println("讀取完成,一共讀取"+jobBeanList.size()+"條信息");
            return jobBeanList;
        }catch(Exception e) {
            System.out.println("讀取JobBean失敗");
            e.printStackTrace();
        }
        return jobBeanList;
    }
}

接着就是關鍵的爬取了

標籤是el 裏面是須要的信息,以及第一個el出來的是整體信息,一會須要去除。

各自裏面都有t1,t2,t3,t4,t5標籤,按照順序一個個取出來就好。

再查看"下一頁"元素,在bk標籤下,這裏要注意,有兩個bk,第一個bk是上一頁,第二個bk纔是下一頁,

以前我爬取進入死循環了。。。。

最後一個spider功能把爬取信息以及迭代下一頁所有都放在一塊兒

import java.net.URL;
import java.util.ArrayList;
import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

/**爬取網頁信息
 * @author PowerZZJ
 *
 */
public class Spider {
    //記錄爬到第幾頁
    private static int pageCount = 1;
    
    private String strURL;
    private String nextPageURL;
    private Document document;//網頁所有信息
    private List<JobBean> jobBeanList;
    
    public Spider(String strURL) {
        this.strURL = strURL;
        nextPageURL = strURL;//下一頁URL初始化爲當前,方便遍歷
        jobBeanList = new ArrayList<JobBean>();
        
    }
    
    /**獲取網頁所有信息
     * @param 網址
     * @return 網頁所有信息
     */
    public Document getDom(String strURL) {
        try {
            URL url = new URL(strURL);
            //解析,並設置超時
            document = Jsoup.parse(url, 4000);
            return document;
        }catch(Exception e) {
            System.out.println("getDom失敗");
            e.printStackTrace();
        }
        return null;
    }
    

    /**篩選當前網頁信息,轉成JobBean對象,存入容器
     * @param document 網頁所有信息
     */
    public void getPageInfo(Document document) {
        //經過CSS選擇器用#resultList .el獲取el標籤信息
        Elements elements = document.select("#resultList .el");
        //整體信息刪去
        elements.remove(0);
        //篩選信息
        for(Element element: elements) {
            Elements elementsSpan = element.select("span");
            String jobURL = elementsSpan.select("a").attr("href");
            String jobName = elementsSpan.get(0).select("a").attr("title");
            String company = elementsSpan.get(1).select("a").attr("title");
            String address = elementsSpan.get(2).text();
            String salary = elementsSpan.get(3).text();
            String date = elementsSpan.get(4).text();
            //創建JobBean對象
            JobBean jobBean = new JobBean(jobName, company, address, salary, date, jobURL);
            //放入容器
            jobBeanList.add(jobBean);
        }
    }
    
    /**獲取下一頁的URL
     * @param document 網頁所有信息
     * @return 有,則返回URL
     */
    public String getNextPageURL(Document document) {
        try {
            Elements elements = document.select(".bk");
            //第二個bk纔是下一頁
            Element element = elements.get(1);
            nextPageURL = element.select("a").attr("href");
            if(nextPageURL != null) {
                System.out.println("---------"+(pageCount++)+"--------");
                return nextPageURL;
            }
        }catch(Exception e) {
            System.out.println("獲取下一頁URL失敗");
            e.printStackTrace();
        }
        return null;
    }
    
    
    /**開始爬取
     * 
     */
    public void spider() {
        while(!nextPageURL.equals("")) {
            //獲取所有信息
            document = getDom(nextPageURL);
            //把相關信息加入容器
            getPageInfo(document);
            //查找下一頁的URL
            nextPageURL = getNextPageURL(document);
        }
    }
    
    //獲取JobBean容器
    public List<JobBean> getJobBeanList() {
        return jobBeanList;
    }
}

 而後測試一下爬取與保存功能

import java.util.ArrayList;
import java.util.List;

public class Test1 {
    public static void main(String[] args) {
        List<JobBean> jobBeanList = new ArrayList<>();
        //大數據+上海
        String strURL = "https://search.51job.com/list/020000,000000,0000,00,9,99,%25E5%25A4%25A7%25E6%2595%25B0%25E6%258D%25AE,2,1.html?lang=c&stype=1&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=";        

        //測試Spider以及保存
        Spider spider = new Spider(strURL);
        spider.spider();
        //獲取爬取後的JobBean容器
        jobBeanList = spider.getJobBeanList();
        
        //調用JobBean工具類保存JobBeanList到本地
        JobBeanUtils.saveJobBeanList(jobBeanList);
    
        //調用JobBean工具類從本地篩選並讀取,獲得JobBeanList
        jobBeanList = JobBeanUtils.loadJobBeanList();
        
    }
}

而後本地就有了JobInfo.txt

而後就是把JobBean容器放到MySQL中了,個人數據庫名字是51job,表名字是jobInfo,全部屬性都是字符串,emmm就字符串吧

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;

public class ConnectMySQL {
    //數據庫信息
    private static final String DBaddress = "jdbc:mysql://localhost/51job?serverTimezone=UTC";
    private static final String userName = "root";
    private static final String password = "Woshishabi2813";
    
    private Connection conn;
    
    //加載驅動,鏈接數據庫
    public ConnectMySQL() {
        LoadDriver();
        //鏈接數據庫
        try {
            conn = DriverManager.getConnection(DBaddress, userName, password);
        } catch (SQLException e) {
            System.out.println("數據庫鏈接失敗");
        }
    }
    
    //加載驅動
    private void LoadDriver() {
        try {
            Class.forName("com.mysql.cj.jdbc.Driver");
            System.out.println("加載驅動成功");
        } catch (Exception e) {
            System.out.println("驅動加載失敗");
        }
    }
    
    //獲取鏈接
    public Connection getConn() {
        return conn;
    }
}

接着就是數據相關操做的工具類的編寫了。

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.util.ArrayList;
import java.util.List;


public class DBUtils {
    
    /**將JobBean容器存入數據庫(有篩選)
     * @param conn 數據庫的鏈接
     * @param jobBeanList jobBean容器
     */
    public static void insert(Connection conn, List<JobBean> jobBeanList) {
        System.out.println("正在插入數據");
        PreparedStatement ps;
        for(JobBean j: jobBeanList) {
            //命令生成
            String command = String.format("insert into jobInfo values('%s','%s','%s','%s','%s','%s')",
                    j.getJobName(),
                    j.getCompany(),
                    j.getAddress(),
                    j.getSalary(),
                    j.getDate(),
                    j.getJobURL());
            
            try {
                ps = conn.prepareStatement(command);
                ps.executeUpdate();
            } catch (Exception e) {
                System.out.println("存入數據庫篩選有誤信息:"+j.getJobName());
            }
        }
        System.out.println("插入數據完成");

    }
    
    /**將JobBean容器,取出
     * @param conn 數據庫的鏈接
     * @return jobBean容器
     */
    public static List<JobBean> select(Connection conn){
        PreparedStatement ps;
        ResultSet rs;
        List<JobBean> jobBeanList  = new ArrayList<JobBean>();

        String command = "select * from jobInfo";
        try {
            ps = conn.prepareStatement(command);
            rs = ps.executeQuery();
            int col = rs.getMetaData().getColumnCount();
            while(rs.next()) {
                JobBean jobBean = new JobBean(rs.getString(1), 
                            rs.getString(2), 
                            rs.getString(3), 
                            rs.getString(4),
                            rs.getString(5),
                            rs.getString(6));

                jobBeanList.add(jobBean);
            }
            return jobBeanList;
        } catch (Exception e) {
            System.out.println("數據庫查詢失敗");
        }
        return null;
    }
}

 

而後測試一下

import java.sql.Connection;
import java.util.ArrayList;
import java.util.List;

public class Test2 {
    public static void main(String[] args) {
        List<JobBean> jobBeanList = new ArrayList<>();
        jobBeanList = JobBeanUtils.loadJobBeanList();

        //數據庫測試
        ConnectMySQL cm = new ConnectMySQL();
        Connection conn = cm.getConn();
        
        //插入測試
        DBUtils.insert(conn, jobBeanList);
        //select測試
        jobBeanList = DBUtils.select(conn);
        for(JobBean j: jobBeanList) {
            System.out.println(j);
        }
    }
}

 

上面的圖能夠看到雖然是「大數據+上海」,可是依舊有運維工程師上面不相關的,後面會進行過濾處理。這裏就先存入數據庫中

先來個功能的總體測試,刪除JobInfo.txt,重建數據庫

import java.sql.Connection;
import java.util.ArrayList;
import java.util.List;


public class TestMain {
    public static void main(String[] args) {
        List<JobBean> jobBeanList = new ArrayList<>();
        //大數據+上海
        String strURL = "https://search.51job.com/list/020000,000000,0000,00,9,99,%25E5%25A4%25A7%25E6%2595%25B0%25E6%258D%25AE,2,1.html?lang=c&stype=1&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=";        
//        //Java+上海
//        String strURL = "https://search.51job.com/list/020000,000000,0000,00,9,99,java,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=";
        
        //全部功能測試
        //爬取的對象
        Spider jobSpider = new Spider(strURL);
        jobSpider.spider();
        //爬取完的JobBeanList
        jobBeanList = jobSpider.getJobBeanList();
        
        //調用JobBean工具類保存JobBeanList到本地
        JobBeanUtils.saveJobBeanList(jobBeanList);
    
        //調用JobBean工具類從本地篩選並讀取,獲得JobBeanList
        jobBeanList = JobBeanUtils.loadJobBeanList();
    
        //鏈接數據庫,並獲取鏈接
        ConnectMySQL cm = new ConnectMySQL();
        Connection conn = cm.getConn();
        
        //調用數據庫工具類將JobBean容器存入數據庫
        DBUtils.insert(conn, jobBeanList);
        
//        //調用數據庫工具類查詢數據庫信息,並返回一個JobBeanList
//        jobBeanList = DBUtils.select(conn);
//        
//        for(JobBean j: jobBeanList) {
//            System.out.println(j);
//        }
    }
}

這些功能都是能獨立使用的,不是必定要這樣一路寫下來。

接下來就是進行數據庫的讀取,進行簡單的過濾,而後進行分析了

先上思惟導圖

首先是過濾關鍵字和日期

 

import java.util.ArrayList;
import java.util.Calendar;
import java.util.List;public class BaseFilter {
    private List<JobBean> jobBeanList;
    //foreach遍歷不能夠remove,Iterator有鎖
    //用新的保存要刪除的,而後removeAll
    private List<JobBean> removeList;
    
    public BaseFilter(List<JobBean> jobBeanList) {
        this.jobBeanList = new ArrayList<JobBean>();
        removeList =  new ArrayList<JobBean>();
        //引用同一個對象,getJobBeanList有沒有都同樣
        this.jobBeanList = jobBeanList;
        printNum();
    }
    
    //打印JobBean容器中的數量
    public void printNum() {
        System.out.println("如今一共"+jobBeanList.size()+"條數據");
    }
    

    /**篩選職位名字
     * @param containJobName 關鍵字保留
     */
    public void filterJobName(String containJobName) {
        for(JobBean j: jobBeanList) {
            if(!j.getJobName().contains(containJobName)) {
                removeList.add(j);
            }
        }
        jobBeanList.removeAll(removeList);
        removeList.clear();
        printNum();
    }
    
    /**篩選日期,要當天發佈的
     * @param
     */
    public void filterDate() {
        Calendar now=Calendar.getInstance();
        int nowMonth = now.get(Calendar.MONTH)+1;
        int nowDay = now.get(Calendar.DATE);
        
        for(JobBean j: jobBeanList) {
            String[] date = j.getDate().split("-");
            int jobMonth = Integer.valueOf(date[0]);
            int jobDay = Integer.valueOf(date[1]);
            if(!(jobMonth==nowMonth && jobDay==nowDay)) {
                removeList.add(j);
            }
        }
        jobBeanList.removeAll(removeList);
        removeList.clear();
        printNum();
    }
    
    public List<JobBean> getJobBeanList(){
        return jobBeanList;
    }
    
}

測試一下過濾的效果

import java.sql.Connection;
import java.util.ArrayList;
import java.util.List;


public class Test3 {
    public static void main(String[] args) {
        List<JobBean> jobBeanList = new ArrayList<>();
        //數據庫讀取jobBean容器
        ConnectMySQL cm = new ConnectMySQL();
        Connection conn = cm.getConn();
        jobBeanList = DBUtils.select(conn);
        
        BaseFilter bf = new BaseFilter(jobBeanList);
        //過濾時間
        bf.filterDate();
        //過濾關鍵字
        bf.filterJobName("數據");
        bf.filterJobName("分析");
        
        for(JobBean j: jobBeanList) {
            System.out.println(j);
        }
    }
}

到這裏基本是統一的功能,後面的分析就要按照不一樣職業,或者不一樣需求而定了,不過基本差很少,

這裏分析的就是「大數據+上海」下的相關信息了,爲了數據量大一點,關鍵字帶有"數據"就行,有247條信息

用到了tablesaw的包,這個我看有人推薦,結果中間遇到問題都基本百度不到,只有官方文檔,反覆看了,並且這個還不能單獨畫出圖,

還要別的依賴包,因此我就作個表格吧。。。可視化什麼的已經不想研究了(我爲何不用python啊。。。)

分析也就沒有什麼面向對象須要寫的了,基本就是一個main裏面一路寫下去了。具體用法能夠看官方文檔,就當看個結果瞭解一下

工資統一爲萬/月

import static tech.tablesaw.aggregate.AggregateFunctions.*;

import java.sql.Connection;
import java.util.ArrayList;
import java.util.List;

import tech.tablesaw.api.*;

public class Analayze {
    public static void main(String[] args) {
        List<JobBean> jobBeanList = new ArrayList<>();

        ConnectMySQL cm = new ConnectMySQL();
        Connection conn = cm.getConn();
        jobBeanList = DBUtils.select(conn);
        
        BaseFilter bf = new BaseFilter(jobBeanList);
        bf.filterDate();
        bf.filterJobName("數據");
        int nums = jobBeanList.size();
        
        //分析
        //按照工資排序
        String[] jobNames = new String[nums];
        String[] companys = new String[nums];
        String[] addresss = new String[nums];
        double[] salarys = new double[nums];
        String[] jobURLs = new String[nums];
        for(int i=0; i<nums; i++) {
            JobBean j = jobBeanList.get(i);
            String jobName = j.getJobName();
            String company = j.getCompany();
            //地址提出區名字
            String address;
            if(j.getAddress().contains("-")) {
                address = j.getAddress().split("-")[1];
            }else{
                address = j.getAddress();
            }
            
            //工資統一單位
            String sSalary = j.getSalary();
            double dSalary;
            if(sSalary.contains("萬/月")) {
                dSalary = Double.valueOf(sSalary.split("-")[0]);
            }else if(sSalary.contains("千/月")) {
                dSalary = Double.valueOf(sSalary.split("-")[0])/10;
                dSalary = (double) Math.round(dSalary * 100) / 100;
            }else if(sSalary.contains("萬/年")) {
                dSalary = Double.valueOf(sSalary.split("-")[0])/12;
                dSalary = (double) Math.round(dSalary * 100) / 100;
            }else {
                dSalary = 0;
                System.out.println("工資轉換失敗");
                continue;
            }
            String jobURL = j.getJobURL();
            
            jobNames[i] = jobName;
            companys[i] = company;
            addresss[i] = address;
            salarys[i] = dSalary;
            jobURLs[i] = jobURL;
        }
        
        Table jobInfo = Table.create("Job Info")
                .addColumns(
                    StringColumn.create("jobName", jobNames),
                    StringColumn.create("company", companys),
                    StringColumn.create("address", addresss),
                    DoubleColumn.create("salary", salarys),
                    StringColumn.create("jobURL", jobURLs)
                        );
        
//        System.out.println("全上海信息");
//        System.out.println(salaryInfo(jobInfo));
        
        
        List<Table> addressJobInfo = new ArrayList<>();
        //按照地區劃分
        Table ShanghaiJobInfo = chooseByAddress(jobInfo, "上海");
        Table jingAnJobInfo = chooseByAddress(jobInfo, "靜安區");
        Table puDongJobInfo = chooseByAddress(jobInfo, "浦東新區");
        Table changNingJobInfo = chooseByAddress(jobInfo, "長寧區");
        Table minHangJobInfo = chooseByAddress(jobInfo, "閔行區");
        Table xuHuiJobInfo = chooseByAddress(jobInfo, "徐彙區");
        //人數太少
//        Table songJiangJobInfo = chooseByAddress(jobInfo, "松江區");
//        Table yangPuJobInfo = chooseByAddress(jobInfo, "楊浦區");
//        Table hongKouJobInfo = chooseByAddress(jobInfo, "虹口區");
//        Table OtherInfo = chooseByAddress(jobInfo, "異地招聘");
//        Table puTuoJobInfo = chooseByAddress(jobInfo, "普陀區");
        
        addressJobInfo.add(jobInfo);
        //上海地區招聘
        addressJobInfo.add(ShanghaiJobInfo);
        addressJobInfo.add(jingAnJobInfo);
        addressJobInfo.add(puDongJobInfo);
        addressJobInfo.add(changNingJobInfo);
        addressJobInfo.add(minHangJobInfo);
        addressJobInfo.add(xuHuiJobInfo);
//        addressJobInfo.add(songJiangJobInfo);
//        addressJobInfo.add(yangPuJobInfo);
//        addressJobInfo.add(hongKouJobInfo);
//        addressJobInfo.add(puTuoJobInfo);
//        addressJobInfo.add(OtherInfo);

        for(Table t: addressJobInfo) {
            System.out.println(salaryInfo(t));
        }
        
        for(Table t: addressJobInfo) {
            System.out.println(sortBySalary(t).first(10));
        }
        
    }
    
    //工資平均值,最小,最大
    public static Table salaryInfo(Table t) {        
        return t.summarize("salary",mean,stdDev,median,max,min).apply();
    }
    
    //salary進行降序
    public static Table sortBySalary(Table t) {
        return t.sortDescendingOn("salary");
    }
    
    //選擇地區
    public static Table chooseByAddress(Table t, String address) {
        Table t2 = Table.create(address)
                .addColumns(
                    StringColumn.create("jobName"),
                    StringColumn.create("company"),
                    StringColumn.create("address"),
                    DoubleColumn.create("salary"),
                    StringColumn.create("jobURL"));
        for(Row r: t) {
            if(r.getString(2).equals(address)) {
                t2.addRow(r);
            }
        }
        return t2;
    }
}

前半段是各個地區的信息

 

後半段是各個區工資最高的前10名的信息,能夠看到這個tablesaw的表要多難看有多難看。。。

jobURL能夠直接在瀏覽器裏面看,

 

換個URL進行測試

我要找Java開發工做

將以前TestMain中的strURL換成Java+上海

https://search.51job.com/list/020000,000000,0000,00,9,99,java,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=

刪除JobInfo.txt,重建數據庫

運行,爬了270多頁,本地JobInfo.txt

數據庫

 

 而後到Analyze中把bf.filterJobName("數據");

改成「Java」,再加一個「開發」,而後運行

信息所有都出來了,分析什麼的,先照着表格說一點把。。。

後面想要拓展的內容就是繼續爬取jobURL而後把職位要求作統計。這還沒作,暑假有興趣應該會搞一下,

而後能夠把數據庫設計一下,把工資分爲最低和最高兩項,存進去就變成double類型,這樣之後分析也會輕鬆一點

相關文章
相關標籤/搜索