結對第二次—文獻摘要熱詞統計及進階需求

時間 2019-12-08

原文原文鏈接

做業格式

課程名稱：軟件工程1916|W（福州大學）
做業要求：結對第二次—文獻摘要熱詞統計及進階需求
結對學號：221600118，221600120
分工：共同設計思路，221600118主要負責代碼編寫，221600120主要負責資料查閱及博客撰寫
做業目標：
-1、基本需求：實現一個可以對文本文件中的單詞的詞頻進行統計的控制檯程序。
-2、進階需求：在基本需求實現的基礎上，編碼實現頂會熱詞統計器。
Github-221600120
Github-221600118
簽入記錄：

做業正文

PSP表格

PSP2.1	Personal Software Process Stages	預估耗時（分鐘）	實際耗時（分鐘）
Planning	計劃
• Estimate	• 估計這個任務須要多少時間
Development	開發
• Analysis	• 需求分析 (包括學習新技術)	60	100
• Design Spec	• 生成設計文檔	50	50
• Design Review	• 設計複審	40	45
• Coding Standard	• 代碼規範 (爲目前的開發制定合適的規範)
• Design	• 具體設計	60	60
• Coding	• 具體編碼	600	650
• Code Review	• 代碼複審	50	60
• Test	• 測試（自我測試，修改代碼，提交修改）	50	200
Reporting	報告
• Test Report	• 測試報告	50	100
• Size Measurement	• 計算工做量	30	30
• Postmortem & Process Improvement Plan	• 過後總結, 並提出過程改進計劃	20	20
	合計	1010	1345

解題思路

在拿到題目以後，首先考慮的是語言的選擇，考慮到因爲221600118比較擅長使用Java以及jJava的類庫比較強大因此選擇使用Java實現；以後就是考慮如何完成需求，因爲編程能力不足，最終只能選擇完成基礎需求。而後就是類的方面，考慮到需求大致能夠分紅三個功能需求，因此封裝了3個類，分別對應字符數統計，行數統計以及單詞數統計，最後再將3個類整合起來。在查找資料方面，主要使用百度和Google查找了Java的api以及去圖書館查閱了Java編程的相關書籍。

實現過程

用3個類分別實現字符數統計，行數統計以及單詞數統計的功能，最後再由Main調用，字符的總數就是讀入文件的總字符數；

行數由讀入的換行符肯定，再減去行中沒有有效字符的行數；

比較難實現的是單詞數的統計，按照需求，單詞要求知足開頭連續4個字符都是字母，碰到分隔符就截斷，因此在處理時先對前四個字符特判，若是是則繼續將字母或數字字符添加到當前單詞上直到遇到分隔符，遇到分隔符後就將這個單詞存入HashMap，HashMap的鍵值對爲單詞-頻率，若是該單詞已存在HashMap中則頻率加一，再對剩下的字符串進行一樣的操做。git

性能

因爲編程能力不足，沒法對程序在進行性能改進；程序中消耗最大的是判斷是不是單詞的函數。

關鍵代碼

public class CountChar {
    
    public static int getNumber(String path){//字符數統計
        int num=0;
        try{    
            File file = new File(path);
            BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));

            int cc;
            char ccc;
                    
            while((cc=br.read())!=-1) {
                ccc=(char)cc;
                if(ccc=='\n') {
                    num--;
                }
                if(cc>=0&&cc<=127) {
                    num++;
                }
            }
            br.close();
            return num;
        }catch(Exception e) {
            e.printStackTrace();
        }
        return 0;
    }
}

public class CountLine {//行數統計
        
    public static int getLine(String path) {
        
        int lines=0;

        try{    
            
            File file = new File(path);
            BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));

            String line;

            boolean bline=false;//行是否含有有效字符
          
            char c[];
            while ((line = br.readLine()) != null) {
                if(line.length()==0) { 
                    continue;
                }
          
                bline=false;
                c=line.toCharArray();
                for(int i=0;i<c.length;i++) {
                    int ch=(int)c[i];
                    if(ch>=33&&ch<127) {
                        bline=true;
                    }
                }
              if(bline) lines++;
            }
            br.close();
            return lines;
        } catch (IOException e) {
            e.printStackTrace();
        }
        
        
        return 0;
    }
}

        public class CountWords {//詞數統計
    public static HashMap<String, Integer> hash = new HashMap<String, Integer>();
    
    public static HashMap<String, Integer>  getWords(String path) {
        
        try{    
            
            File file = new File(path);
            BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));

            String line;
            
            while ((line = br.readLine()) != null) getWord(line);
            
            br.close();
            return hash;
           
        } catch (IOException e) {
            e.printStackTrace();
        }
        return new HashMap<String, Integer>();
    }

    
    public static void getWord(String line) {//得到詞語
        if(line.length()<4) return;
        
        String theline=line.toLowerCase();
        
        char c[]=theline.toCharArray();

        for(int i=0;i<c.length;i++) {
            int ch=(int)c[i];
            if(isAZ(ch)) {
                if(the4(c,i)) {
                    int last=0;
                    for(int j=i+4;j<c.length;j++) {
                        last=j;
                        if(!isAZ((int)c[last])&&!isNUM((int)c[last])) {
                            getWord(theline.substring(last,theline.length()));
                            break;
                        }
                    }
                    if(last==c.length-1) last++;
                    String word=theline.substring(i,last);
                    
                    if(hash.containsKey(word)) {
                        int nn=hash.get(word);
                        hash.put(word, nn+1);
                    }else {
                        hash.put(word, 1);
                    }
                    break;
                }
            }
        }
    }

        public static boolean the4(char[] line,int index) {//判斷是不是單詞
        int n=0;
        for(int i=index+1;i<line.length;i++) {
            int ch=(int)line[i];
            if(isAZ(ch)) {
                if(++n==3) return true;
            }else {
                return false;
            }
            
        }
        return false;
    }

單元測試

單元測試採用的數據包括將已有的樣例混合和採用隨機數隨機生成的文件。

單元測試代碼

部分文件測試結果

遇到的困難

在一開始的時候採用readline讀入一整行的字符串，結果因爲readline會消除換行符致使字符數統計錯誤，後來改用read直接讀入整個文件中的字符再進行處理。

對隊友的評價：

221600118：對時間的把握比我好，在我懶時，會督促我把進度提上來，起到了監督的做用。

221600120：作事認真，比較負責，遇到不懂上網去查詢相關的資料讓咱們一塊兒去學習，去弄懂，一些代碼的編寫比較熟練。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。