詞頻統計php
1.項目要求和基本功能html
項目要求java
基本功能git
2.PSP表格github
Statu | Stages | 預估耗時/min | 實際耗時/min |
Accept | 【計劃】 | 30 | 20 |
Accept | 估計時間 | 30 | 20 |
Accept | 【開發】 | 1330 | 1910 |
Accept | 需求分析 | 20 | 30 |
Accept | 設計文檔 | 30 | 30 |
Accept | 設計複審 | 10 | 5 |
Accept | 代碼規範 | 10 | 5 |
Accept | 具體設計 | 60 | 60 |
Accept | 具體編碼 | 600 | 1000 |
Accept | 代碼複審 | 300 | 300 |
Accept | 測試 | 300 | 480 |
Accept | 【記錄用時】 | 10 | 10 |
Accept | 【測試報告】 | 30 | 60 |
Accept | 【算工做量】 | 10 | 10 |
Accept | 【總結改進】 | 60 | 60 |
Accept | 【合計】 | 1470 | 2090 |
3.解題思路編程
數據結構:ubuntu
全局變量數組
unsigned long characterNum;//存放字符數 unsigned long wordNum; //存放單詞數 unsigned long lineNum; //存放行數
採用結構體數組(動態內存)存儲單詞及其出現次數數據結構
struct wordInfo { char* wordStr; char** nextWordPoint; int* nextWordFrequency; int presentNextWordNum; int frequency; int strlength; int wordLength;//不包含最後的數字部分 }; struct alphaArray { wordInfo* wordArray; int presentWordArrayLength; }; struct wordStatisticsResult { char* wordStr; int wordFrequency; }; struct phaseStatisticsResult { char* firstStr; char* secondStr; int phaseFrequency; };
遍歷文件的方法:ide
_findfirst,_findnext函數實現(Windows平臺),參考例程
readdir函數實現(Linux平臺),參考例程
具體實現方案:
1>主函數:
初始化各變量
遍歷給定文件夾中的每一個文件
只讀方式打開符合要求的文件
單詞統計,詞組統計
循環至全部文件遍歷完成
關閉文件
輸出統計結果
2>單詞統計:
遍歷字符並統計
判斷是不是換行符並統計
創建緩衝區域存儲一個單詞中連續的字符
採集單詞字符串
生成單詞的哈希值(散列函數使用ELFHash、衝突解決方案採用二次探測)
根據首字母和哈希值肯定單詞的存儲位置並存儲單詞信息
將當前單詞的地址存儲到前一個單詞的結構體中,以實現詞組頻率統計
3>詞組統計:
存儲單詞
返回當前單詞在詞表中的位置
若是不是第一個單詞
根據位置獲得字符串指針
在前一個單詞的結構體中查找是否存在該指針
若是存在 該指針對應計數加一
若是不存在 存儲該指針,初始化數量爲1
記錄該位置
最後遍歷便可獲得全部詞組出現頻率
4.代碼實現
(1)初始化詞表
1 void dictionaryInit(struct alphaArray* dictionary) 2 { 3 int i, j, k; 4 characterNum = 0; 5 wordNum = 0; 6 lineNum = 0; 7 for (i = 0; i < alphabet; i++) 8 { 9 (dictionary + i)->wordArray = (wordInfo*)malloc(sizeof(wordInfo)*wordArrayLength); 10 (dictionary + i)->presentWordArrayLength = wordArrayLength; 11 if ((dictionary + i)->wordArray == NULL) exit(-1); 12 for (j = 0; j < (dictionary + i)->presentWordArrayLength; j++) 13 { 14 ((dictionary + i)->wordArray + j)->wordStr = (char*)malloc(sizeof(char)*wordStrLength); 15 if (((dictionary + i)->wordArray + j)->wordStr == NULL) exit(-1); 16 *(((dictionary + i)->wordArray + j)->wordStr) = '\0'; 17 ((dictionary + i)->wordArray + j)->frequency = 0; 18 ((dictionary + i)->wordArray + j)->strlength = wordStrLength; 19 ((dictionary + i)->wordArray + j)->wordLength = 0; 20 ((dictionary + i)->wordArray + j)->nextWordPoint = (char**)malloc(sizeof(char*)*nextWordNum); 21 if (((dictionary + i)->wordArray + j)->nextWordPoint == NULL) exit(-1); 22 ((dictionary + i)->wordArray + j)->nextWordFrequency = (int*)malloc(sizeof(int)*nextWordNum); 23 if (((dictionary + i)->wordArray + j)->nextWordFrequency == NULL) exit(-1); 24 for (k = 0; k < nextWordNum; k++) 25 { 26 *(((dictionary + i)->wordArray + j)->nextWordPoint + k) = NULL; 27 *(((dictionary + i)->wordArray + j)->nextWordFrequency + k) = 0; 28 } 29 ((dictionary + i)->wordArray + j)->presentNextWordNum = nextWordNum; 30 } 31 } 32 }
申請初始內存空間並將全部值置零。
(2)遍歷文件夾
a)Windows平臺
1 void traverseFileandCount(char* filePath, struct alphaArray* dictionary) 2 { 3 _finddata_t FileInfo; 4 char* presentPath; 5 char* newPath; 6 presentPath = (char*)malloc(sizeof(char)*filePathLength); 7 if (presentPath == NULL) exit(-1); 8 newPath = (char*)malloc(sizeof(char)*filePathLength); 9 if (newPath == NULL) exit(-1); 10 strcpy_s(presentPath, filePathLength, filePath); 11 strcat_s(presentPath, filePathLength, "\\*"); 12 long Handle = _findfirst(presentPath, &FileInfo); 13 if (Handle == -1L) exit(-1); 14 do { 15 if (FileInfo.attrib & _A_SUBDIR) 16 { 17 if ((strcmp(FileInfo.name, ".") != 0) && (strcmp(FileInfo.name, "..") != 0)) 18 { 19 generatePath(FileInfo, filePath, newPath); 20 traverseFileandCount(newPath, dictionary); 21 } 22 } 23 else 24 { 25 generatePath(FileInfo, filePath, presentPath); 26 count(presentPath, dictionary); 27 } 28 } while (_findnext(Handle, &FileInfo) == 0); 29 _findclose(Handle); 30 free(presentPath); 31 free(newPath); 32 }
b)Linux平臺
1 void traverseFileandCount(char* path, struct alphaArray* dictionary) 2 { 3 DIR *pDir; //定義一個DIR類的指針 4 struct dirent *ent=NULL; //定義一個結構體 dirent的指針,dirent結構體見上 5 int i = 0; 6 char childpath[512]; //定義一個字符數組,用來存放讀取的路徑 7 pDir = opendir(path); // opendir方法打開path目錄,並將地址付給pDir指針 8 memset(childpath, 0, sizeof(childpath)); //將字符數組childpath的數組元素所有置零 9 while ((ent = readdir(pDir)) != NULL) 10 //讀取pDir打開的目錄,並賦值給ent, 同時判斷是否目錄爲空,不爲空則執行循環體 11 { 12 if (ent->d_type&DT_DIR) 13 /*讀取 打開目錄的文件類型 並與 DT_DIR進行位與運算操做,即若是讀取的d_type類型爲DT_DIR 14 (=4 表示讀取的爲目錄)*/ 15 { 16 if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0) 17 //若是讀取的d_name爲 . 或者.. 表示讀取的是當前目錄符和上一目錄符, 18 //則用contiue跳過,不進行下面的輸出 19 continue; 20 sprintf(childpath, "%s/%s", path, ent->d_name); 21 //若是非. ..則將 路徑 和 文件名d_name 付給childpath, 並在下一行prinf輸出 22 //printf("path:%s\n",childpath);原文連接這裏是要打印出文件夾的地址 23 traverseFileandCount(childpath, dictionary); 24 //遞歸讀取下層的字目錄內容, 由於是遞歸,因此從外往裏逐次輸出全部目錄(路徑+目錄名), 25 //而後纔在else中由內往外逐次輸出全部文件名 26 } 27 else 28 //若是讀取的d_type類型不是 DT_DIR, 即讀取的不是目錄,而是文件, 29 //則直接輸出 d_name, 即輸出文件名 30 { 31 //cout<<ent->d_name<<endl; 輸出文件名 32 //cout<<childpath<<"/"<<ent->d_name<<endl; 輸出帶有目錄的文件名 33 sprintf(childpath, "%s/%s", path, ent->d_name); 34 //你能夠惟一注意的地方是下一行 35 //目前childpath就是你要讀入的文件的path了,能夠做爲你的讀入文件的函數的參數 36 count(childpath, dictionary);//這裏就是你的處理文件的接口!, 37 } 38 } 39 }
(3)計數
1 void count(char* path, struct alphaArray* dictionary) 2 { 3 FILE* fp; 4 bool firstWordSign = 1; 5 int i = 0; 6 int finalAlphaPosition = 0; 7 int tempWordStrLength = wordStrLength; 8 int presentWordOffset; 9 char ch, *tempWordStr; 10 unsigned long hash; 11 struct wordInfo* lastWordInfo = NULL, *presentWordInfo = NULL; 12 tempWordStr = (char*)malloc(sizeof(char)*wordStrLength); 13 if (tempWordStr == NULL) exit(-1); 14 if (fopen_s(&fp, path, "r") != 0) exit(-1); 15 do 16 { 17 ch = fgetc(fp); 18 characterNumandLineNum(ch); 19 if (!isDigitorAlpha(ch)) 20 { 21 tempWordStr[i] = '\0'; 22 if (isWord(tempWordStr)) 23 { 24 hash = storeTempWord(dictionary, tempWordStr, finalAlphaPosition); 25 getOffset(presentWordOffset, tempWordStr[0]); 26 presentWordInfo = ((dictionary + presentWordOffset)->wordArray + hash); 27 if (!firstWordSign) 28 storePhaseInfo(lastWordInfo, presentWordInfo); 29 lastWordInfo = presentWordInfo; 30 firstWordSign = 0; 31 } 32 i = 0; 33 finalAlphaPosition = 0; 34 tempWordStr[0] = '\0'; 35 } 36 else 37 { 38 if (i < wordStrLength) 39 { 40 if (isAlpha(ch)) finalAlphaPosition = i; 41 tempWordStr[i++] = ch; 42 } 43 /*if (i >= tempWordStrLength) 44 { 45 tempWordStrLength *= 2; 46 tempWordStr = (char*)realloc(tempWordStr, sizeof(char)*tempWordStrLength); 47 if (tempWordStr == NULL) exit(-1); 48 }*/ 49 } 50 } while (ch != EOF); 51 free(tempWordStr); 52 lineNum++; 53 fclose(fp); 54 }
(4)存儲單詞信息
1 unsigned long storeTempWord(struct alphaArray* dictionary, char* tempWordArray, int lastAlphaPosition) 2 { 3 unsigned long hash = 0; 4 int i = 0, j = 0, offset; 5 char* wordstrPoint; 6 struct alphaArray* page; 7 hash = ELFHash(tempWordArray, lastAlphaPosition); 8 getOffset(offset,tempWordArray[0]); 9 page = dictionary + offset; 10 hash = hash % (page->presentWordArrayLength); 11 //hash=hash%wordArrayLength; 12 wordstrPoint = (page->wordArray + hash)->wordStr; 13 while (!isEmpty(wordstrPoint) && isDifferent(page->wordArray + hash, tempWordArray, lastAlphaPosition)) 14 { 15 i++; 16 if (i > (page->presentWordArrayLength)) 17 { 18 enlargeWordArrayLength(page); 19 i = 0; 20 } 21 hash += i * i; 22 hash = hash % (page->presentWordArrayLength); 23 wordstrPoint = (page->wordArray + hash)->wordStr; 24 } 25 /*while ((int)strlen(tempWordArray) >= (page->wordArray + hash)->strlength) 26 enlargeStrLength(page, hash);*/ 27 if ((int)strlen(tempWordArray) >= (page->wordArray + hash)->strlength) 28 *(tempWordArray + (page->wordArray + hash)->strlength - 1) = '\0'; 29 wordstrPoint = (page->wordArray + hash)->wordStr; 30 if (isEmpty(wordstrPoint)) 31 { 32 strcpy_s(wordstrPoint, strlen(tempWordArray)+1, tempWordArray); 33 (page->wordArray + hash)->wordLength = lastAlphaPosition; 34 } 35 else 36 { 37 if (strcmp(wordstrPoint, tempWordArray) > 0) 38 strcpy_s(wordstrPoint, strlen(tempWordArray)+1, tempWordArray); 39 } 40 (page->wordArray + hash)->frequency++; 41 wordNum++; 42 return hash; 43 }
(5)存儲詞組信息
1 void storePhaseInfo(struct wordInfo* lastWordInfo, struct wordInfo* presentWordInfo) 2 { 3 int i = 0, k = 0; 4 bool stored = 0; 5 for (i = 0; i < (lastWordInfo->presentNextWordNum);) 6 { 7 if ((*(lastWordInfo->nextWordFrequency + i)) != 0) 8 { 9 if ((*(lastWordInfo->nextWordPoint + i)) == presentWordInfo->wordStr && !stored) 10 { 11 (*(lastWordInfo->nextWordFrequency + i))++; 12 stored = 1; 13 } 14 else 15 i++; 16 } 17 else 18 break; 19 } 20 if (i == (lastWordInfo->presentNextWordNum)) 21 { 22 lastWordInfo->nextWordPoint = (char**)realloc(lastWordInfo->nextWordPoint, sizeof(char*)*(lastWordInfo->presentNextWordNum) * 2); 23 if (lastWordInfo->nextWordPoint == NULL) exit(-1); 24 lastWordInfo->nextWordFrequency = (int*)realloc(lastWordInfo->nextWordFrequency, sizeof(int)*(lastWordInfo->presentNextWordNum) * 2); 25 if (lastWordInfo->nextWordFrequency == NULL) exit(-1); 26 for (k = (lastWordInfo->presentNextWordNum); k < (lastWordInfo->presentNextWordNum)*2; k++) 27 { 28 *(lastWordInfo->nextWordPoint + k) = NULL; 29 *(lastWordInfo->nextWordFrequency + k) = 0; 30 } 31 (lastWordInfo->presentNextWordNum) *= 2; 32 } 33 if (!stored) 34 { 35 *(lastWordInfo->nextWordPoint + i) = presentWordInfo->wordStr; 36 (*(lastWordInfo->nextWordFrequency + i))++; 37 stored = 1; 38 } 39 }
(6)ELF哈希函數
1 unsigned long ELFHash(char* tempWordArray, int lastAlphaPosition) 2 { 3 unsigned long hash = 0, i = 0, x = 0; 4 char *hashStr; 5 hashStr = (char*)malloc(sizeof(char)*(lastAlphaPosition + 1)); 6 if (hashStr == NULL) exit(-1); 7 for (i = 0; i <= (unsigned long)lastAlphaPosition; i++) 8 { 9 if (tempWordArray[i] >= 'a'&&tempWordArray[i] <= 'z'|| tempWordArray[i]>='0'&&tempWordArray[i]<='9') 10 *(hashStr + i) = tempWordArray[i]; 11 else 12 *(hashStr + i) = tempWordArray[i] - 'A' + 'a'; 13 } 14 for (i = 0; i <= (unsigned long)lastAlphaPosition; i++) 15 { 16 hash = (hash << 4) + *(hashStr + i); 17 if ((x = hash & 0xf0000000) != 0) 18 { 19 hash ^= (x >> 24); 20 hash &= ~x; 21 } 22 } 23 hash &= 0x7fffffff; 24 free(hashStr); 25 return hash; 26 }
(7)頻率前十單詞詞組統計
1 void topFrequencyWordStatistics(struct alphaArray* dictionary, struct wordStatisticsResult* topFrequencyWord) 2 { 3 int i = 0, j = 0; 4 int minWordFrequency = 0; 5 for (i = 0; i < topFrequencyWordNum; i++) 6 { 7 (topFrequencyWord + i)->wordStr = NULL; 8 (topFrequencyWord + i)->wordFrequency = 0; 9 } 10 for (i = 0; i < alphabet; i++) 11 { 12 for (j = 0; j < (dictionary + i)->presentWordArrayLength; j++) 13 { 14 if (((dictionary + i)->wordArray + j)->frequency > minWordFrequency) 15 updateTopFrequencyWord(topFrequencyWord, ((dictionary + i)->wordArray + j), minWordFrequency); 16 } 17 } 18 sortTopFrequencyWord(topFrequencyWord); 19 puts("Top 10 word:"); 20 for (i = 0; i < topFrequencyWordNum; i++) 21 printf("%s\t%d\n", (topFrequencyWord + i)->wordStr, (topFrequencyWord + i)->wordFrequency); 22 printf("\n"); 23 } 24 25 void updateTopFrequencyWord(struct wordStatisticsResult* topFrequencyWord, struct wordInfo* dictionary_i_j, int &minWordFrequency) 26 { 27 int i = 0; 28 for (i = 0; i < topFrequencyWordNum; i++) 29 { 30 if ((topFrequencyWord + i)->wordFrequency == minWordFrequency) 31 { 32 (topFrequencyWord + i)->wordStr = dictionary_i_j->wordStr; 33 (topFrequencyWord + i)->wordFrequency = dictionary_i_j->frequency; 34 minWordFrequency = dictionary_i_j->frequency; 35 } 36 } 37 for (i = 0; i < topFrequencyWordNum; i++) 38 { 39 if ((topFrequencyWord + i)->wordFrequency < minWordFrequency) 40 minWordFrequency = (topFrequencyWord + i)->wordFrequency; 41 } 42 } 43 44 void sortTopFrequencyWord(struct wordStatisticsResult* topFrequencyWord) 45 { 46 int i = 0, j = 0; 47 int minWordFrequency; 48 int minWordFrequencyPosition; 49 struct wordStatisticsResult tempWord; 50 for (i = 0; i < topFrequencyWordNum - 1; i++) 51 { 52 minWordFrequency = topFrequencyWord->wordFrequency; 53 minWordFrequencyPosition = 0; 54 for (j = 0; j < topFrequencyWordNum - i; j++) 55 { 56 if ((topFrequencyWord + j)->wordFrequency < minWordFrequency) 57 { 58 minWordFrequency = (topFrequencyWord + j)->wordFrequency; 59 minWordFrequencyPosition = j; 60 } 61 } 62 tempWord.wordStr = (topFrequencyWord + minWordFrequencyPosition)->wordStr; 63 tempWord.wordFrequency = minWordFrequency; 64 (topFrequencyWord + minWordFrequencyPosition)->wordStr = (topFrequencyWord + topFrequencyWordNum - i - 1)->wordStr; 65 (topFrequencyWord + minWordFrequencyPosition)->wordFrequency = (topFrequencyWord + topFrequencyWordNum - i - 1)->wordFrequency; 66 (topFrequencyWord + topFrequencyWordNum - i - 1)->wordStr = tempWord.wordStr; 67 (topFrequencyWord + topFrequencyWordNum - i - 1)->wordFrequency = tempWord.wordFrequency; 68 } 69 } 70 71 void topFrequencyPhaseStatistics(struct alphaArray* dictionary, struct phaseStatisticsResult* topFrequencyPhase) 72 { 73 int i = 0, j = 0, k = 0; 74 int minPhaseFrequency = 0; 75 for (i = 0; i < topFrequencyPhaseNum; i++) 76 { 77 (topFrequencyPhase + i)->firstStr = NULL; 78 (topFrequencyPhase + i)->secondStr = NULL; 79 (topFrequencyPhase + i)->phaseFrequency = 0; 80 } 81 for (i = 0; i < alphabet; i++) 82 { 83 for (j = 0; j < (dictionary + i)->presentWordArrayLength; j++) 84 { 85 for (k = 0; k < ((dictionary + i)->wordArray + j)->presentNextWordNum; k++) 86 { 87 if (*(((dictionary + i)->wordArray + j)->nextWordFrequency + k) > minPhaseFrequency) 88 updateTopFrequencyPhase(topFrequencyPhase, ((dictionary + i)->wordArray + j), k, minPhaseFrequency); 89 } 90 } 91 } 92 sortTopFrequencyPhase(topFrequencyPhase); 93 puts("Top 10 phase:"); 94 for (i = 0; i < topFrequencyPhaseNum; i++) 95 printf("%s %s\t%d\n", (topFrequencyPhase + i)->firstStr, (topFrequencyPhase + i) ->secondStr, (topFrequencyPhase + i)->phaseFrequency); 96 printf("\n"); 97 } 98 99 void updateTopFrequencyPhase(struct phaseStatisticsResult* topFrequencyPhase,wordInfo* dictionary_i_j,int offset,int &minPhaseFrequency) 100 { 101 int i = 0; 102 for (i = 0; i < topFrequencyPhaseNum; i++) 103 { 104 if ((topFrequencyPhase + i)->phaseFrequency == minPhaseFrequency) 105 { 106 (topFrequencyPhase + i)->firstStr = dictionary_i_j->wordStr; 107 (topFrequencyPhase + i)->secondStr = *(dictionary_i_j->nextWordPoint + offset); 108 (topFrequencyPhase + i)->phaseFrequency = *(dictionary_i_j->nextWordFrequency + offset); 109 minPhaseFrequency = (topFrequencyPhase + i)->phaseFrequency; 110 } 111 } 112 for (i = 0; i < topFrequencyPhaseNum; i++) 113 { 114 if ((topFrequencyPhase + i)->phaseFrequency < minPhaseFrequency) 115 minPhaseFrequency = (topFrequencyPhase + i)->phaseFrequency; 116 } 117 } 118 119 void sortTopFrequencyPhase(struct phaseStatisticsResult* topFrequencyPhase) 120 { 121 int i = 0, j = 0; 122 int minPhaseFrequency; 123 int minPhaseFrequencyPosition; 124 struct phaseStatisticsResult tempPhase; 125 for (i = 0; i < topFrequencyPhaseNum - 1; i++) 126 { 127 minPhaseFrequency = topFrequencyPhase->phaseFrequency; 128 minPhaseFrequencyPosition = 0; 129 for (j = 0; j < topFrequencyPhaseNum - i; j++) 130 { 131 if ((topFrequencyPhase + j)->phaseFrequency < minPhaseFrequency) 132 { 133 minPhaseFrequency = (topFrequencyPhase + j)->phaseFrequency; 134 minPhaseFrequencyPosition = j; 135 } 136 } 137 tempPhase.firstStr = (topFrequencyPhase + minPhaseFrequencyPosition)->firstStr; 138 tempPhase.secondStr = (topFrequencyPhase + minPhaseFrequencyPosition)->secondStr; 139 tempPhase.phaseFrequency = minPhaseFrequency; 140 (topFrequencyPhase + minPhaseFrequencyPosition)->firstStr = (topFrequencyPhase + topFrequencyPhaseNum - i - 1)->firstStr; 141 (topFrequencyPhase + minPhaseFrequencyPosition)->secondStr = (topFrequencyPhase + topFrequencyPhaseNum - i - 1)->secondStr; 142 (topFrequencyPhase + minPhaseFrequencyPosition)->phaseFrequency = (topFrequencyPhase + topFrequencyPhaseNum - i - 1)->phaseFrequency; 143 (topFrequencyPhase + topFrequencyPhaseNum - i - 1)->firstStr = tempPhase.firstStr; 144 (topFrequencyPhase + topFrequencyPhaseNum - i - 1)->secondStr = tempPhase.secondStr; 145 (topFrequencyPhase + topFrequencyPhaseNum - i - 1)->phaseFrequency = tempPhase.phaseFrequency; 146 } 147 }
(8)輸出
1 void outputResult(struct alphaArray* dictionary) 2 { 3 int i = 0, j = 0, k = 0; 4 puts("Statistics result:"); 5 printf("characterNum:%lu\n", characterNum); 6 printf("wordNum:%lu\n", wordNum); 7 printf("lineNum:%lu\n\n", lineNum); 8 } 9 10 void outputToFile(struct wordStatisticsResult* topFrequencyWord,struct phaseStatisticsResult* topFrequencyPhase) 11 { 12 int i = 0; 13 FILE* fp; 14 fopen_s(&fp,"D:\\RGhw\\result.txt","wb"); 15 if (fp == NULL) exit(-1); 16 fputs("characterNum:", fp); 17 fprintf(fp, "%lu\r\n", characterNum); 18 fputs("wordNum:", fp); 19 fprintf(fp, "%lu\r\n", wordNum); 20 fputs("lineNum:", fp); 21 fprintf(fp, "%lu\r\n\r\n", lineNum); 22 fputs("Top 10 frequency words:\r\n", fp); 23 for (i = 0; i < topFrequencyWordNum; i++) 24 fprintf(fp,"%s: %d\r\n", (topFrequencyWord + i)->wordStr, (topFrequencyWord + i)->wordFrequency); 25 fputs("\r\n",fp); 26 fputs("Top 10 frequency phases:\r\n", fp); 27 for (i = 0; i < topFrequencyPhaseNum; i++) 28 fprintf(fp,"%s %s: %d\r\n", (topFrequencyPhase + i)->firstStr, (topFrequencyPhase + i)->secondStr, (topFrequencyPhase + i)->phaseFrequency); 29 fputs("\r\n", fp); 30 fclose(fp); 31 }
(9)釋放空間
1 void dictionaryDestroy(struct alphaArray* dictionary) 2 { 3 int i, j; 4 for (i = 0; i < alphabet; i++) 5 { 6 for (j = 0; j < ((dictionary + i)->presentWordArrayLength); j++) 7 { 8 free(((dictionary + i)->wordArray + j)->wordStr); 9 free(((dictionary + i)->wordArray + j)->nextWordPoint); 10 free(((dictionary + i)->wordArray + j)->nextWordFrequency); 11 } 12 free((dictionary + i)->wordArray); 13 } 14 }
5.代碼性能分析
(1)CPU和GPA使用狀況
(2)各函數CPU佔用細節
(3)main函數CPU佔用細節
(4)遍歷文件並進行統計函數(traverseFileandCount)CPU佔用細節
(5)統計函數(count)CPU佔用細節
分析:
6.測試樣例與分析
(1)助教提供的測試集
運行時間32秒(release模式下),運行結果以下(左側爲個人程序結果,右側是助教的,後面都是這樣,注:行數和單詞數輸出順序和助教不同):
前三項偏差均在100左右,這可能和統計方法有關
單詞和詞組統計結果和助教同樣
(2)空文件夾
(3)空文件
(4)只含一個詞的文件
(5)同一類單詞按照詞典順序輸出
文件內容:
運行結果:
(6)詞組按詞典順序輸出
文件內容:
運行結果:
(7)不一樣類型的文件
文件夾:
運行結果:
(8)錯誤的路徑
個人程序直接退出(exit(-1)),沒有輸出錯誤信息。
(9)初版測試集
(10)圖片文件
7.程序存在的問題
程序第一次成功運行後,我對測試集進行了統計,發現THAT這個單詞輸出了兩個。也就是說同一個單詞存放在兩個不一樣的位置。一開始感受很奇怪,百思不得其解。後來發現,問題出在動態內存上。爲了保證程序的健壯性,我使用了動態內存。當詞表存放不下單詞的時候,程序會申請兩倍的空間。可是我忽略了當詞表容量發生變化的時候,根據哈希值肯定的單詞的存儲位置也會發生變化。這形成了一樣的單詞,存放在了不一樣的地方。我想出的解決方案是,依次在一倍初始空間,兩倍初始空間……進行查找,這樣的話能夠保證每個單詞只有一個肯定的位置。不過發現這個問題的時候已經離DDL沒多久了,因此我只是簡單的擴大了初始空間去解決這個問題。
8.總結反思
整體過程上,因爲最開始進行了大體規劃,整個過程比較順利。出現了兩次卡殼:動態內存代碼、虛擬機的使用。詞表採用了動態內存,須要判斷內存是否夠用,不夠用時要從新申請。寫這部分代碼的時候因爲思路不夠清楚,花費了較多時間。程序運行成功後就開始進行移植性的修改。爲了進行測試,安裝了ubuntu虛擬機。成功測試以後忽然虛擬機掛掉了,從新安裝了三次,仍然失敗(心好累)。。因此最後輸出文件的函數沒辦法驗證。
代碼規範上,相比之前稍有進步。此次代碼編寫時,我着重注意了變量命名和函數命名,以加強代碼可讀性。另外,我儘量的將長函數拆分紅若干個小函數,儘管這樣仍然有四五十行的代碼。
時間安排上,我只能說,我是先寫軟工做業而後寫其餘課程做業。
不足之處,虛擬機使用不熟練,出現問題不能儘快解決;代碼性能分析不夠詳細;代碼繁瑣難讀。
之後編程過程當中會不斷鍛鍊、改進。