如摘要描述的上下文,如今須要解決Emoji輸入致使的數據請求Error的問題。數據庫
問題緣由:後端
在Unicode編碼中,Emoji字符最小2個字節,小部分3個字節,更多4個字節,更喪心病狂的是有些輸入法(好比某狗),存在11個字節的Emoji。而咱們後端使用的數據庫是MySql,默認使用Utf8編碼,而默認的Utf8編碼對字符的理解就是三個字節,因此當超過三個字節的Emoji入庫時,就會報錯執行失敗(Incorrect string value '\xx\xx\xx\xx' for column 'field' at row 1),簡單來講就是放不下,這問題其實修改MySql編碼爲Utf8mb4能夠解決,但數據庫改編碼這種事情,是很敏感的,反正各類緣由下,這問題得客戶端解決。服務器
需求分析:網絡
咱們的APP是一款汽車工具類APP,所提交的數據都是嚴謹且有意義的,而Emoji這種數據對咱們來講能夠說是無關緊要。以前的作法是不作過濾不作限制(除格式要求區域外),如今既然遇到了問題,那索性的,將Emoji輸入所有過濾掉。函數
方案分析:工具
需求已定,那如今的問題就有兩個,一是如何全局過濾UITextField的輸入(UITextView之類的咱們沒用),二是Emoji表情如何去過濾。這倆問題咱們分開的討論。測試
問題追究:編碼
1、全局攔截UITextField輸入atom
方案一:UITextFiled輸入的信息最終都走網絡接口發往服務器,基於這個,能夠在網絡層對全部Params作過濾,可保證到達服務器的數據不包含Emoji信息。但僅僅是解決了服務器報錯問題,對客戶端的體驗很糟糕(明明能夠輸入Emoji,可數據拉取以後卻丟失),另外,網絡包是個公共的區域,它提供全局甚至多APP服務,改這裏帶來的測試成本條件不容許,因此方案一否認。代理
方案二:由於這款APP的UI是純代碼編寫,咱們能夠擴展一個EmojiTextField,在內部代理系統默認的Delegate,對Emoji在輸入時作屏蔽,這樣能提供最好的輸入體驗,並且輸入的信息和數據庫信息一致。可這個方案須要替換全局的UITextField,雖然Xcode能很容易作到這點,但作法略顯粗暴,作待選方案。
方案三:Runtime,我一直認爲OC的強大至少一半歸功於Runtime,它能在運行時修改函數表,而"Category"特性又提供了函數的擴展和覆蓋,這讓方案三的實現成爲可能。
具體實現,導入全局的Category,這裏叫UITextField+Emoji,覆蓋+load函數(該函數在當前類讀入內存時會收到消息),在該函數中經過Runtime替換初始化和Delegate相關函數,在初始化函數中,咱們代理業務代碼設置的Delegate,當用戶有輸入操做時,會觸發咱們代理的Delegate,在處理完Emoji校驗以後,再路由給業務代碼的Delegate。
咱們項目採用.pch作全局h導入,在這裏咱們導入UITextField+Emoji的Gategory
/// 過濾Emoji字符 #import "UITextField+EmojiText.h"
在.m中,咱們在+load函數內替換相關函數指針,讓原有函數指向咱們實現的IMP
void exchangeMethod(Class class, SEL oSEL, SEL nSEL) { Method oMethod = class_getInstanceMethod(class, oSEL); Method nMethod = class_getInstanceMethod(class, nSEL); // 驗證當前實例是否實現originalSEL,避免返回父類SEL BOOL ok = class_addMethod(class, oSEL, method_getImplementation(nMethod), method_getTypeEncoding(nMethod)); if (ok) { class_replaceMethod(class, nSEL, method_getImplementation(oMethod), method_getTypeEncoding(oMethod)); } else { method_exchangeImplementations(oMethod, nMethod); } } + (void) load { // setDelegate,攔截Delegate設置,默認走Emoji過濾 exchangeMethod([self class], @selector(setDelegate:), @selector(emoji_setDelegate:)); // getDelegate,返回業務代碼設置的Delegate,確保set和get統一 exchangeMethod([self class], @selector(delegate), @selector(emoji_delegate)); // 幾種初始化狀況 exchangeMethod([self class], @selector(init), @selector(emoji_init)); exchangeMethod([self class], @selector(initWithFrame:), @selector(emoji_initWithFrame:)); exchangeMethod([self class], @selector(initWithCoder:), @selector(emoji_initWithCoder:)); // 釋放內部持有資源 exchangeMethod([self class], @selector(dealloc), @selector(emoji_dealloc)); }
相關替換的函數實現
- (id) emoji_init { id ret = [self emoji_init]; // 由於執行了函數指針替換,setDelegate會走emoji_setDelegate,這裏調用setDelegate是爲了確保沒有設置delegate的業務代碼一樣過濾Emoji self.delegate = nil; return ret; } - (id) emoji_initWithFrame:(CGRect)frame { id ret = [self emoji_initWithFrame:frame]; self.delegate = nil; return ret; } - (id) emoji_initWithCoder:(NSCoder *)aDecoder { id ret = [self emoji_initWithCoder:aDecoder]; self.delegate = nil; return ret; } - (void) emoji_setDelegate:(id<UITextFieldDelegate>)delegate { // 若是沒有設置過delegate,須要設置內部代理的Delegate,不然讓替換內部originalDelegate id<UITextFieldDelegate> del = [self emoji_delegate]; if (!del) { EmojiDelegate *emojiDelegate = [[EmojiDelegate alloc] initWithTextField:self]; emojiDelegate.originalDelegate = delegate; [self emoji_setDelegate:emojiDelegate]; } else { EmojiDelegate *emojiDelegate = (EmojiDelegate *) del; emojiDelegate.originalDelegate = delegate; } } - (id<UITextFieldDelegate>) emoji_delegate { return ((EmojiDelegate *)[self emoji_delegate]).originalDelegate; } - (void) emoji_dealloc { // EmojiDelegate默認是retain的,須要手動釋放一次資源 [[self emoji_delegate] release]; [self emoji_setDelegate:nil]; [self emoji_dealloc]; }
EmojiDelegate的具體實現,很簡單很單純的一個代理
@interface EmojiDelegate : NSObject<UITextFieldDelegate> @property(nonatomic, weak) UITextField *textField; @property(nonatomic, weak) id<UITextFieldDelegate> originalDelegate; @property(nonatomic, strong) NSString *prevText; // 上次的輸入結果 - (id) initWithTextField:(UITextField *)textField; @end @implementation EmojiDelegate - (id) initWithTextField:(UITextField *)textField { self = [super init]; self.textField = textField; [textField addTarget:self action:@selector(textFieldDidChange:) forControlEvents:UIControlEventEditingChanged]; return self; } - (void) dealloc { [_textField removeTarget:self action:@selector(textFieldDidChange:) forControlEvents:UIControlEventEditingChanged]; self.originalDelegate = nil; self.prevText = nil; [super dealloc]; } - (BOOL)textFieldShouldBeginEditing:(UITextField *)textField { if ([self.originalDelegate respondsToSelector:@selector(textFieldShouldBeginEditing:)]) { return [self.originalDelegate textFieldShouldBeginEditing:textField]; } return YES; } - (void)textFieldDidBeginEditing:(UITextField *)textField { if ([self.originalDelegate respondsToSelector:@selector(textFieldDidBeginEditing:)]) { return [self.originalDelegate textFieldDidBeginEditing:textField]; } } - (BOOL)textFieldShouldEndEditing:(UITextField *)textField { if ([self.originalDelegate respondsToSelector:@selector(textFieldShouldEndEditing:)]) { return [self.originalDelegate textFieldShouldEndEditing:textField]; } return YES; } - (void)textFieldDidEndEditing:(UITextField *)textField { if ([self.originalDelegate respondsToSelector:@selector(textFieldDidEndEditing:)]) { return [self.originalDelegate textFieldDidEndEditing:textField]; } } - (BOOL)textField:(UITextField *)textField shouldChangeCharactersInRange:(NSRange)range replacementString:(NSString *)string { if (string.length == 0) { return YES; } /// 過濾emoji // 忽略系統默認的emoji鍵盤 if ([[[textField textInputMode] primaryLanguage] isEqualToString:@"emoji"]) { return NO; } // 驗證string的emoji字符 if ([string containEmoji]) { return NO; } if ([self.originalDelegate respondsToSelector:@selector(textField:shouldChangeCharactersInRange:replacementString:)]) { return [self.originalDelegate textField:textField shouldChangeCharactersInRange:range replacementString:string]; } return YES; } - (BOOL)textFieldShouldClear:(UITextField *)textField { if ([self.originalDelegate respondsToSelector:@selector(textFieldShouldClear:)]) { return [self.originalDelegate textFieldShouldClear:textField]; } return NO; } - (BOOL)textFieldShouldReturn:(UITextField *)textField { if ([self.originalDelegate respondsToSelector:@selector(textFieldShouldReturn:)]) { return [self.originalDelegate textFieldShouldReturn:textField]; } return NO; } /** * 監聽UITextField文本變更,規避中文輸入法聯想輸入Emoji問題 */ - (void) textFieldDidChange:(UITextField *)textField { if ([textField markedTextRange] == nil) { NSString *text = textField.text; if ([text containEmoji]) { NSUInteger location = [textField selectedRange].location - 2; textField.text = _prevText; if (location > _prevText.length) { location = _prevText.length; } [textField setSelectedRange:NSMakeRange(location, 0)]; } else { self.prevText = text; } } }
這裏重點在containEmoji函數,內部驗證輸入string是否包含Emoji元素。
至此,全局攔截問題解決,至於具體的Emoji過濾,接下來討論。
2、字符串過濾Emoji
和大多數人同樣,在解決這個問題時,百度谷歌bing一通搜索,找到了不少種解決方案,可實際效果都不盡人意,Emoji這問題比想象中要麻煩的多。
作以前作下簡單掃盲,Emoji來源就很少說了,只要知道在某個版本的Unicode編碼中加入了Emoji,而且不是放一塊的,也就說在Unicode編碼中,Emoji的地址沒有規律可尋,那隻能去硬匹配,可Emoji數量幾百上千,這一個個去匹配實在太蠢了,咱得縮小匹配範圍。
相信如今你們都用的UTF8編碼,這是一種變長編碼,提到變長,那確定會有一個描述頭,幾個內容體,UTF8是同樣的。
在一個字節中,若是第一個bit位是0,那麼表明當前爲單字節字符,0以後的7位bit爲數據部分,表明在Unicode中的序號
對應的,若是第一位是1開頭,表明是多字節字符,若是第二位是0,表明這個字節是多字節字符的數據字節,跟在頭字節後面;若是前面有多個1,則幾個1表明該字符有幾個字節(包含當前字節),例如:
110xxxxx // 表明有兩個字節,後面必定跟着一個10開頭的數據字節 >>110xxxxx 10xxxxxx 1110xxxx // 表明有三個字節,後面跟着兩個10開頭的數據字節 >>1110xxxx 10xxxxxx 10xxxxxx
推理可知,Utf8中一個字符最長7個字節,其中數據位6個字節,其中Emoji在Unicode中分佈在二、三、四、4+長度的地址中,其中長度爲2的Emoji大部分是文字字符,這些咱們能夠放行,四、4+的Emoji可所有過濾,而咱們可見文字基本都分部在3字節地址中,這裏重點須要過濾3字節的Emoji(3字節的Emoji已經能夠入庫了,但爲了統一體驗,仍是須要過濾掉),幸運的是3字節的Emoji不是不少,硬匹配也算說得過去。
根據從Unicode官網找到的資料,匹配三字節Unicode
- (BOOL) emojiInUnicode:(short)code { if (code == 0x0023 || code == 0x002A || (code >= 0x0030 && code <= 0x0039) || code == 0x00A9 || code == 0x00AE || code == 0x203C || code == 0x2049 || code == 0x2122 || code == 0x2139 || (code >= 0x2194 && code <= 0x2199) || code == 0x21A9 || code == 0x21AA || code == 0x231A || code == 0x231B || code == 0x2328 || code == 0x23CF || (code >= 0x23E9 && code <= 0x23F3) || (code >= 0x23F8 && code <= 0x23FA) || code == 0x24C2 || code == 0x25AA || code == 0x25AB || code == 0x25B6 || code == 0x25C0 || (code >= 0x25FB && code <= 0x25FE) || (code >= 0x2600 && code <= 0x2604) || code == 0x260E || code == 0x2611 || code == 0x2614 || code == 0x2615 || code == 0x2618 || code == 0x261D || code == 0x2620 || code == 0x2622 || code == 0x2623 || code == 0x2626 || code == 0x262A || code == 0x262E || code == 0x262F || (code >= 0x2638 && code <= 0x263A) || (code >= 0x2648 && code <= 0x2653) || code == 0x2660 || code == 0x2663 || code == 0x2665 || code == 0x2666 || code == 0x2668 || code == 0x267B || code == 0x267F || (code >= 0x2692 && code <= 0x2694) || code == 0x2696 || code == 0x2697 || code == 0x2699 || code == 0x269B || code == 0x269C || code == 0x26A0 || code == 0x26A1 || code == 0x26AA || code == 0x26AB || code == 0x26B0 || code == 0x26B1 || code == 0x26BD || code == 0x26BE || code == 0x26C4 || code == 0x26C5 || code == 0x26C8 || code == 0x26CE || code == 0x26CF || code == 0x26D1 || code == 0x26D3 || code == 0x26D4 || code == 0x26E9 || code == 0x26EA || (code >= 0x26F0 && code <= 0x26F5) || (code >= 0x26F7 && code <= 0x26FA) || code == 0x26FD || code == 0x2702 || code == 0x2705 || (code >= 0x2708 && code <= 0x270D) || code == 0x270F || code == 0x2712 || code == 0x2714 || code == 0x2716 || code == 0x271D || code == 0x2721 || code == 0x2728 || code == 0x2733 || code == 0x2734 || code == 0x2744 || code == 0x2747 || code == 0x274C || code == 0x274E || (code >= 0x2753 && code <= 0x2755) || code == 0x2757 || code == 0x2763 || code == 0x2764 || (code >= 0x2795 && code <= 0x2797) || code == 0x27A1 || code == 0x27B0 || code == 0x27BF || code == 0x2934 || code == 0x2935 || (code >= 0x2B05 && code <= 0x2B07) || code == 0x2B1B || code == 0x2B1C || code == 0x2B50 || code == 0x2B55 || code == 0x3030 || code == 0x303D || code == 0x3297 || code == 0x3299 // 第二段 || code == 0x23F0) { return YES; } return NO; }
另外還有很古老的一套Emoji,採用Unicode私有區域,如今基本沒用了,不過仍是過濾下
/** * 一種非官方的, 採用私有Unicode 區域 * e0 - e5 01 - 59 */ - (BOOL) emojiInSoftBankUnicode:(short)code { return ((code >> 8) >= 0xE0 && (code >> 8) <= 0xE5 && (Byte)(code & 0xFF) < 0x60); }
另外就是對輸入string的過濾,須要過濾掉字節長度爲非3的字符,而後校驗3字節的unicode編碼
- (BOOL) containEmoji { NSUInteger len = [self lengthOfBytesUsingEncoding:NSUTF8StringEncoding]; if (len < 3) { // 大於2個字符須要驗證Emoji(有些Emoji僅三個字符) return NO; } // 僅考慮字節長度爲3的字符,大於此範圍的所有作Emoji處理 NSData *data = [self dataUsingEncoding:NSUTF8StringEncoding]; Byte *bts = (Byte *)[data bytes]; Byte bt; short v; for (NSUInteger i = 0; i < len; i++) { bt = bts[i]; if ((bt | 0x7F) == 0x7F) { // 0xxxxxxx ASIIC編碼 continue; } if ((bt | 0x1F) == 0xDF) { // 110xxxxx 兩個字節的字符 i += 1; continue; } if ((bt | 0x0F) == 0xEF) { // 1110xxxx 三個字節的字符(重點過濾項目) // 計算Unicode下標 v = bt & 0x0F; v = v << 6; v |= bts[i + 1] & 0x3F; v = v << 6; v |= bts[i + 2] & 0x3F; // NSLog(@"%02X%02X", (Byte)(v >> 8), (Byte)(v & 0xFF)); if ([self emojiInSoftBankUnicode:v] || [self emojiInUnicode:v]) { return YES; } i += 2; continue; } if ((bt | 0x3F) == 0xBF) { // 10xxxxxx 10開頭,爲數據字節,直接過濾 continue; } return YES; // 不是以上狀況的字符所有超過三個字節,作Emoji處理 } return NO; }
而後將相關函數封裝爲NSString (Emoji)
完工。
相關參考資料: