在垃圾短信過濾應用 SMSFilters
中,須要使用 Jieba
分詞庫來対短信進行分詞,而後使用 TF-IDF
來進行處理` 分詞庫是 C++ 寫的,這就意味着須要在Swift中集成 C++ 庫。
在官方文檔 "Using Swift with Cocoa and Objective-C" 中,Apple只是介紹了怎麼將 Swift 代碼跟 Objective-C 代碼作整合,可是沒有提C++,後來在官方文檔中看到了這樣一段話:html
You cannot import C++ code directly into Swift. Instead, create an Objective-C or C wrapper for C++ code.
也就是不能直接導入 C++ 代碼,可是可使用 Objective-C 或者 C 對 C++ 進行封裝。因此項目中使用 Objective-C 作封裝,而後在 Swift 中調用,下面就是這個過程的實踐,Demo 代碼見 SwiftJiebaDemo。ios
分紅三步:c++
Demo中使用的是"結巴"中文分詞的 C++ 版本 yanyiwu/cppjieba。將其中的 include/cppjieba
和依賴 limonp
合併,並加入 dict
中的 hmm_model
和 jiaba.dict
做爲基礎數據,並暴露 JiebaInit
和 JiebaCut
接口:git
// // Segmentor.cpp // iosjieba // // Created by yanyiwu on 14/12/24. // Copyright (c) 2014年 yanyiwu. All rights reserved. // #include "Segmentor.h" #include <iostream> using namespace cppjieba; cppjieba::MixSegment * globalSegmentor; void JiebaInit(const string& dictPath, const string& hmmPath, const string& userDictPath) { if(globalSegmentor == NULL) { globalSegmentor = new MixSegment(dictPath, hmmPath, userDictPath); } cout << __FILE__ << __LINE__ << endl; } void JiebaCut(const string& sentence, vector<string>& words) { assert(globalSegmentor); globalSegmentor->Cut(sentence, words); cout << __FILE__ << __LINE__ << endl; cout << words << endl; }
以及github
// // Segmentor.h // iosjieba // // Created by yanyiwu on 14/12/24. // Copyright (c) 2014年 yanyiwu. All rights reserved. // #ifndef __iosjieba__Segmentor__ #define __iosjieba__Segmentor__ #include <stdio.h> #include "cppjieba/MixSegment.hpp" #include <string> #include <vector> extern cppjieba::MixSegment * globalSegmentor; void JiebaInit(const std::string& dictPath, const std::string& hmmPath, const std::string& userDictPath); void JiebaCut(const std::string& sentence, std::vector<std::string>& words); #endif /* defined(__iosjieba__Segmentor__) */
目錄以下:swift
$ tree iosjieba iosjieba ├── Segmentor.cpp ├── Segmentor.h ├── cppjieba │ ├── DictTrie.hpp │ ├── FullSegment.hpp │ ├── HMMModel.hpp │ ├── HMMSegment.hpp │ ├── Jieba.hpp │ ├── KeywordExtractor.hpp │ ├── MPSegment.hpp │ ├── MixSegment.hpp │ ├── PosTagger.hpp │ ├── PreFilter.hpp │ ├── QuerySegment.hpp │ ├── SegmentBase.hpp │ ├── SegmentTagged.hpp │ ├── TextRankExtractor.hpp │ ├── Trie.hpp │ ├── Unicode.hpp │ └── limonp │ ├── ArgvContext.hpp │ ├── BlockingQueue.hpp │ ├── BoundedBlockingQueue.hpp │ ├── BoundedQueue.hpp │ ├── Closure.hpp │ ├── Colors.hpp │ ├── Condition.hpp │ ├── Config.hpp │ ├── FileLock.hpp │ ├── ForcePublic.hpp │ ├── LocalVector.hpp │ ├── Logging.hpp │ ├── Md5.hpp │ ├── MutexLock.hpp │ ├── NonCopyable.hpp │ ├── StdExtension.hpp │ ├── StringUtil.hpp │ ├── Thread.hpp │ └── ThreadPool.hpp └── iosjieba.bundle └── dict ├── hmm_model.utf8 ├── jieba.dict.small.utf8 └── user.dict.utf8
接下來開始在項目中集成。首先建立一個空項目 iOSJiebaDemo
,將 iosjieba
加入項目中。xcode
單頁應用 | SwiftJiebaDemo | 添加 SwiftJiebaDemo |
---|---|---|
添加 iosjieba:bash
見代碼: https://github.com/qiwihui/Sw...app
這個過程是將 C++ 的接口進行 Objective-C 封裝,向 Swift 暴露。這個封裝只暴露了 objcJiebaInit
和 objcJiebaCut
兩個接口。ide
// // iosjiebaWrapper.h // SMSFilters // // Created by Qiwihui on 1/14/19. // Copyright © 2019 qiwihui. All rights reserved. // #import <Foundation/Foundation.h> @interface JiebaWrapper : NSObject - (void) objcJiebaInit: (NSString *) dictPath forPath: (NSString *) hmmPath forDictPath: (NSString *) userDictPath; - (void) objcJiebaCut: (NSString *) sentence toWords: (NSMutableArray *) words; @end
// // iosjiebaWrapper.mm // iOSJiebaTest // // Created by Qiwihui on 1/14/19. // Copyright © 2019 Qiwihui. All rights reserved. // #import <Foundation/Foundation.h> #import "iosjiebaWrapper.h" #include "Segmentor.h" @implementation JiebaWrapper - (void) objcJiebaInit: (NSString *) dictPath forPath: (NSString *) hmmPath forDictPath: (NSString *) userDictPath { const char *cDictPath = [dictPath UTF8String]; const char *cHmmPath = [hmmPath UTF8String]; const char *cUserDictPath = [userDictPath UTF8String]; JiebaInit(cDictPath, cHmmPath, cUserDictPath); } - (void) objcJiebaCut: (NSString *) sentence toWords: (NSMutableArray *) words { const char* cSentence = [sentence UTF8String]; std::vector<std::string> wordsList; for (int i = 0; i < [words count];i++) { wordsList.push_back(wordsList[i]); } JiebaCut(cSentence, wordsList); [words removeAllObjects]; std::for_each(wordsList.begin(), wordsList.end(), [&words](std::string str) { id nsstr = [NSString stringWithUTF8String:str.c_str()]; [words addObject:nsstr]; }); } @end
見代碼: https://github.com/qiwihui/Sw...
在 Swift 中調用 Objecttive-C 的接口,這個在官方文檔和許多博客中都有詳細介紹。
{project_name}-Bridging-Header.h
頭文件,即 SwiftJiebaDemo_Bridging_Header_h
,引入以前封裝的頭文件,並在 Targets -> Build Settings -> Objective-C Bridging Header
中設置頭文件路徑 SwiftJiebaDemo/SwiftJiebaDemo_Bridging_Header_h
。// // SwiftJiebaDemo-Bridging-Header.h // SwiftJiebaDemo // // Created by Qiwihui on 1/15/19. // Copyright © 2019 Qiwihui. All rights reserved. // #ifndef SwiftJiebaDemo_Bridging_Header_h #define SwiftJiebaDemo_Bridging_Header_h #import "iosjiebaWrapper.h" #endif /* SwiftJiebaDemo_Bridging_Header_h */
.m
改成 .mm
: iosjiebaWrapper.m
改成 iosjiebaWrapper.mm
。見代碼:https://github.com/qiwihui/Sw...
使用時須要先初始化 Jiaba
分詞,而後再進行分詞。
class Classifier { init() { let dictPath = Bundle.main.resourcePath!+"/iosjieba.bundle/dict/jieba.dict.small.utf8" let hmmPath = Bundle.main.resourcePath!+"/iosjieba.bundle/dict/hmm_model.utf8" let userDictPath = Bundle.main.resourcePath!+"/iosjieba.bundle/dict/user.dict.utf8" JiebaWrapper().objcJiebaInit(dictPath, forPath: hmmPath, forDictPath: userDictPath); } func tokenize(_ message:String) -> [String] { print("tokenize...") let words = NSMutableArray() JiebaWrapper().objcJiebaCut(message, toWords: words) return words as! [String] } }
控制檯輸出結果:
能夠看到,測試用例 小明碩士畢業於中國科學院計算所,後在日本京都大學深造
通過分詞後爲〔拼音〕["小明", "碩士", "畢業", "於", "中國科學院", "計算所", ",", "後", "在", "日本", "京都大學", "深造"]
,完成集成。
見代碼: https://github.com/qiwihui/Sw...
因爲本身對於編譯連接原理不瞭解,以及是 iOS 開發初學,所以上面的這個過程當中遇到了不少問題,耗時兩週才解決,故將遇到的一些問題記錄於此,以便往後。
"cassert" file not found
將 .m
改成 .mm
便可。
compiler not finding <tr1/unordered_map>
設置 C++ Standard Library
爲 LLVM libc++
參考: mac c++ compiler not finding <tr1/unordered_map>
warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
Build Setting -> C++ Standard Library -> libstdc++
修改成 Build Setting -> C++ Standard Library -> libc++
use of unresolved identifier
這個問題在於向項目中加入文件時,Target Membership
設置不正確致使。須要將對於使用到的 Target 都勾上。
相關參考: Understanding The "Use of Unresolved Identifier" Error In Xcode
」 的回答