開篇語: html
上次發佈的 《LINQ:進階 - LINQ 標準查詢操做概述》(90+贊) 社會反響不錯,但本身卻始終以爲缺點什麼!「紙上得來終覺淺,絕知此事要躬行」,沒錯,就是實戰!此次讓咱們一塊兒來看看一些操做字符串的技巧,也許能引咱們從不一樣的角度思考問題,從而走出思惟的死角!node
LINQ 可用於查詢和轉換字符串和字符串集合。它對文本文件中的半結構化數據尤爲有用。LINQ 查詢可與傳統的字符串函數和正則表達式結合使用。git
1 const string text = @"Historically, the world of data and the world of objects" + 2 @" have not been well integrated. Programmers work in C# or Visual Basic" + 3 @" and also in SQL or XQuery. On the one side are concepts such as classes," + 4 @" objects, fields, inheritance, and .NET Framework APIs. On the other side" + 5 @" are tables, columns, rows, nodes, and separate languages for dealing with" + 6 @" them. Data types often require translation between the two worlds; there are" + 7 @" different standard functions. Because the object world has no notion of query, a" + 8 @" query can only be represented as a string without compile-time type checking or" + 9 @" IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to" + 10 @" objects in memory is often tedious and error-prone."; 11 12 const string searchTerm = "data"; 13 14 //字符串轉換成數組 15 var source = text.Split(new[] { '.', '?', '!', ' ', ';', ':', ',' }, StringSplitOptions.RemoveEmptyEntries); 16 17 //建立查詢,並忽略大小寫比較 18 var matchQuery = from word in source 19 where string.Equals(word, searchTerm, StringComparison.InvariantCultureIgnoreCase) 20 select word; 21 22 //統計匹配數量 23 var wordCount = matchQuery.Count(); 24 Console.WriteLine($"{wordCount} occurrences(s) of the search term \"{searchTerm}\" were found.");
1 const string text = @"Historically, the world of data and the world of objects " + 2 @"have not been well integrated. Programmers work in C# or Visual Basic " + 3 @"and also in SQL or XQuery. On the one side are concepts such as classes, " + 4 @"objects, fields, inheritance, and .NET Framework APIs. On the other side " + 5 @"are tables, columns, rows, nodes, and separate languages for dealing with " + 6 @"them. Data types often require translation between the two worlds; there are " + 7 @"different standard functions. Because the object world has no notion of query, a " + 8 @"query can only be represented as a string without compile-time type checking or " + 9 @"IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to " + 10 @"objects in memory is often tedious and error-prone."; 11 12 //將文本塊切割成數組 13 var sentences = text.Split('.', '?', '!'); 14 15 //定義搜索條件,此列表能夠運行時動態添加 16 string[] wordsToMatch = { "Historically", "data", "integrated" }; 17 18 var match = from sentence in sentences 19 let t = 20 sentence.Split(new char[] { '.', '?', '!', ' ', ';', ':', ',' }, StringSplitOptions.RemoveEmptyEntries) 21 where t.Distinct().Intersect(wordsToMatch).Count() == wordsToMatch.Length //去重,取交集後的數量對比 22 select sentence; 23 24 foreach (var s in match) 25 { 26 Console.WriteLine(s); 27 }
查詢運行時首先將文本拆分紅句子,而後將句子拆分紅包含每一個單詞的字符串數組。對於每一個這樣的數組,Distinct<TSource> 方法移除全部重複的單詞,而後查詢對單詞數組和 wordstoMatch 數組執行 Intersect<TSource> 操做。若是交集的計數與 wordsToMatch 數組的計數相同,則在單詞中找到了全部的單詞,且返回原始句子。正則表達式
由於 String 類實現泛型 IEnumerable<T> 接口,因此能夠將任何字符串做爲字符序列進行查詢。可是,這不是 LINQ 的常見用法。若要執行復雜的模式匹配操做,請使用 Regex 類。 數據庫
下面的示例查詢一個字符串以肯定它包含的數字的數目。express
1 const string aString = "ABCDE99F-J74-12-89A"; 2 3 //只選擇數字的字符 4 var digits = from ch in aString 5 where char.IsDigit(ch) 6 select ch; 7 8 Console.Write("digit: "); 9 10 foreach (var n in digits) 11 { 12 Console.Write($"{n} "); 13 } 14 15 Console.WriteLine(); 16 17 //選擇第一個「-」以前的全部字符 18 var query = aString.TakeWhile(x => x != '-'); 19 20 foreach (var ch in query) 21 { 22 Console.Write(ch); 23 }
此示例演示如何使用 Regex 類建立正則表達式以便在文本字符串中進行更復雜的匹配。使用 LINQ 查詢能夠方便地對您要用正則表達式搜索的文件進行準確篩選,以及對結果進行加工。 跨域
1 //根據不一樣版本的 vs 修改路徑 2 const string floder = @"C:\Program Files (x86)\Microsoft Visual Studio 14.0\"; 3 var infoes = GetFiles(floder); 4 //建立正則表達式來尋找全部的"Visual" 5 var searchTerm = new Regex(@"Visual (Basic|C#|C\+\+|J#|SourceSafe|Studio)"); 6 7 //搜索每個「.html」文件 8 //經過 where 找到匹配項 9 //【注意】select 中的變量要求顯示聲明其類型,由於 MatchCollection 不是泛型 IEnumerable 集合 10 var query = from fileInfo in infoes 11 where fileInfo.Extension == ".html" 12 let text = File.ReadAllText(fileInfo.FullName) 13 let matches = searchTerm.Matches(text) 14 where matches.Count > 0 15 select new 16 { 17 name = fileInfo.FullName, 18 matchValue = from Match match in matches select match.Value 19 }; 20 21 Console.WriteLine($"The term \"{searchTerm}\" was found in:"); 22 23 foreach (var q in query) 24 { 25 //修剪匹配找到的文件中的路徑 26 Console.WriteLine($"{q.name.Substring(floder.Length - 1)}"); 27 28 //輸出找到的匹配值 29 foreach (var v in q.matchValue) 30 { 31 Console.WriteLine(v); 32 } 33 }
1 private static IList<FileInfo> GetFiles(string path) 2 { 3 var files = Directory.GetFiles(path, "*.*", SearchOption.AllDirectories); 4 5 return files.Select(file => new FileInfo(file)).ToList(); 6 }
您還能夠查詢由 RegEx 搜索返回的 MatchCollection 對象。在此示例中,結果中僅生成每一個匹配項的值。但也可以使用 LINQ 對該集合執行各類篩選、排序和分組操做。數組
【注意】因爲 MatchCollection 是非泛型 IEnumerable 集合,所以必須顯式聲明查詢中的範圍變量的類型。緩存
Bankov, Peter
Holm, Michael
Garcia, Hugo
Potra, Cristina
Noriega, Fabricio
Aw, Kam Foo
Beebe, Ann
Toyoshima, Tim
Guy, Wey Yuan
Garcia, Debra
Liu, Jinghao
Bankov, Peter
Holm, Michael
Garcia, Hugo
Beebe, Ann
Gilchrist, Beth
Myrcha, Jacek
Giakoumakis, Leo
McLin, Nkenge
El Yassir, Mehdi
1 //建立數據源 2 var names1Text = File.ReadAllLines(@"names1.txt"); 3 var names2Text = File.ReadAllLines(@"names2.txt"); 4 5 //建立查詢,這裏必須使用方法語法 6 var query = names1Text.Except(names2Text); 7 8 //執行查詢 9 Console.WriteLine("The following lines are in names1.txt but not names2.txt"); 10 foreach (var name in query) 11 { 12 Console.WriteLine(name); 13 }
111, 97, 92, 81, 60 112, 75, 84, 91, 39 113, 88, 94, 65, 91 114, 97, 89, 85, 82 115, 35, 72, 91, 70 116, 99, 86, 90, 94 117, 93, 92, 80, 87 118, 92, 90, 83, 78 119, 68, 79, 88, 92 120, 99, 82, 81, 79 121, 96, 85, 91, 60 122, 94, 92, 91, 91
1 //建立數據源 2 var scores = File.ReadAllLines(@"scores.csv"); 3 //能夠改成 0~4 的任意值 4 const int sortField = 1; 5 6 //演示從方法返回查詢 7 //返回查詢變量,非查詢結果 8 //這裏執行查詢 9 foreach (var score in RunQuery(scores, sortField)) 10 { 11 Console.WriteLine(score); 12 }
1 private static IEnumerable<string> RunQuery(IEnumerable<string> score, int num) 2 { 3 //分割字符串來排序 4 var query = from line in score 5 let fields = line.Split(',') 6 orderby fields[num] descending 7 select line; 8 9 return query; 10 }
此示例還演示如何從方法返回查詢變量。ide
逗號分隔值 (CSV) 文件是一種文本文件,一般用於存儲電子表格數據或其餘由行和列表示的表格數據。經過使用 Split 方法分隔字段,能夠很是輕鬆地使用 LINQ 來查詢和操做 CSV 文件。事實上,可使用此技術來從新排列任何結構化文本行部分;此技術不侷限於 CSV 文件。
Adams,Terry,120 Fakhouri,Fadi,116 Feng,Hanying,117 Garcia,Cesar,114 Garcia,Debra,115 Garcia,Hugo,118 Mortensen,Sven,113 O'Donnell,Claire,112 Omelchenko,Svetlana,111 Tucker,Lance,119 Tucker,Michael,122 Zabokritski,Eugene,121
1 //數據源 2 var lines = File.ReadAllLines(@"spreadsheet1.csv"); 3 //將舊數據的第2列的字段放到第一位,逆向結合第0列和第1列的字段 4 var query = from line in lines 5 let t = line.Split(',') 6 orderby t[2] 7 select $"{t[2]}, {t[1]} {t[0]}"; 8 9 foreach (var q in query) 10 { 11 Console.WriteLine(q); 12 } 13 14 //寫入文件 15 File.WriteAllLines("spreadsheet2.csv", query);
此示例演示如何合併包含文本行的文件,而後排序結果。具體來講,此示例演示如何對兩組文本行執行簡單的串聯、聯合和交集。
Bankov, Peter
Holm, Michael
Garcia, Hugo
Potra, Cristina
Noriega, Fabricio
Aw, Kam Foo
Beebe, Ann
Toyoshima, Tim
Guy, Wey Yuan
Garcia, Debra
Liu, Jinghao
Bankov, Peter
Holm, Michael
Garcia, Hugo
Beebe, Ann
Gilchrist, Beth
Myrcha, Jacek
Giakoumakis, Leo
McLin, Nkenge
El Yassir, Mehdi
1 var names1Text = File.ReadAllLines(@"names1.txt"); 2 var names2Text = File.ReadAllLines(@"names2.txt"); 3 4 //簡單鏈接,並排序。重複保存。 5 var concatQuery = names1Text.Concat(names2Text).OrderBy(x => x); 6 OutputQueryResult(concatQuery, "Simple concatenate and sort. Duplicates are preserved:"); 7 8 //基於默認字符串比較器鏈接,並刪除重名。 9 var unionQuery = names1Text.Union(names2Text).OrderBy(x => x); 10 OutputQueryResult(unionQuery, "Union removes duplicate names:"); 11 12 //查找在兩個文件中出現的名稱 13 var intersectQuery = names1Text.Intersect(names2Text).OrderBy(x => x); 14 OutputQueryResult(intersectQuery, "Merge based on intersect:"); 15 16 //在每一個列表中找到匹配的字段。使用 concat 將兩個結果合併,而後使用默認的字符串比較器進行排序 17 const string nameMatch = "Garcia"; 18 var matchQuery1 = from name in names1Text 19 let t = name.Split(',') 20 where t[0] == nameMatch 21 select name; 22 var matchQuery2 = from name in names2Text 23 let t = name.Split(',') 24 where t[0] == nameMatch 25 select name; 26 27 var temp = matchQuery1.Concat(matchQuery2).OrderBy(x => x); 28 OutputQueryResult(temp, $"Concat based on partial name match \"{nameMatch}\":");
1 private static void OutputQueryResult(IEnumerable<string> querys, string title) 2 { 3 Console.WriteLine(Environment.NewLine + title); 4 foreach (var query in querys) 5 { 6 Console.WriteLine(query); 7 } 8 9 Console.WriteLine($"{querys.Count()} total names in list"); 10 }
1 //每行 names.csv 包含姓氏,名字,和身份證號,以逗號分隔。例如,Omelchenko,Svetlana,111 2 var names = File.ReadAllLines(@"names.csv"); 3 //每行 scores.csv 包括身份證號碼和四個測試評分,以逗號分隔。例如,111,97,92,81,60 4 var scores = File.ReadAllLines(@"scores.csv"); 5 6 //使用一個匿名的類型合併數據源。 7 //【注意】動態建立一個 int 的考試成績成員列表。 8 //跳過度割字符串中的第一項,由於它是學生的身份證,不是一個考試成績 9 var students = from name in names 10 let t = name.Split(',') 11 from score in scores 12 13 let t2 = score.Split(',') 14 where t[2] == t2[0] 15 select new 16 { 17 FirstName = t[0], 18 LastName = t[1], 19 ID = Convert.ToInt32(t[2]), 20 ExamScores = (from scoreAsText in t2.Skip(1) 21 select Convert.ToInt32(scoreAsText)).ToList() 22 }; 23 24 foreach (var student in students) 25 { 26 Console.WriteLine( 27 $"The average score of {student.FirstName} {student.LastName} is {student.ExamScores.Average()}."); 28 }
Bankov, Peter
Holm, Michael
Garcia, Hugo
Potra, Cristina
Noriega, Fabricio
Aw, Kam Foo
Beebe, Ann
Toyoshima, Tim
Guy, Wey Yuan
Garcia, Debra
Liu, Jinghao
Bankov, Peter
Holm, Michael
Garcia, Hugo
Beebe, Ann
Gilchrist, Beth
Myrcha, Jacek
Giakoumakis, Leo
McLin, Nkenge
El Yassir, Mehdi
1 var fileA = File.ReadAllLines(@"names1.txt"); 2 var fileB = File.ReadAllLines(@"names2.txt"); 3 4 //並集:鏈接並刪除重複的名字 5 var mergeQuery = fileA.Union(fileB); 6 //根據姓氏的首字母對姓名進行分組 7 var query = from name in mergeQuery 8 let t = name.Split(',') 9 group name by t[0][0] into g 10 orderby g.Key 11 select g; 12 13 //注意嵌套的 foreach 循環 14 foreach (var g in query) 15 { 16 var fileName = @"testFile_" + g.Key + ".txt"; 17 Console.WriteLine(g.Key + ":"); 18 19 //寫入文件 20 using (var sw = new StreamWriter(fileName)) 21 { 22 foreach (var name in g) 23 { 24 sw.WriteLine(name); 25 Console.WriteLine(" " + name); 26 } 27 } 28 }
111, 97, 92, 81, 60 112, 75, 84, 91, 39 113, 88, 94, 65, 91 114, 97, 89, 85, 82 115, 35, 72, 91, 70 116, 99, 86, 90, 94 117, 93, 92, 80, 87 118, 92, 90, 83, 78 119, 68, 79, 88, 92 120, 99, 82, 81, 79 121, 96, 85, 91, 60 122, 94, 92, 91, 91
Omelchenko,Svetlana,111 O'Donnell,Claire,112 Mortensen,Sven,113 Garcia,Cesar,114 Garcia,Debra,115 Fakhouri,Fadi,116 Feng,Hanying,117 Garcia,Hugo,118 Tucker,Lance,119 Adams,Terry,120 Zabokritski,Eugene,121 Tucker,Michael,122
scores.csv:此文件表示電子表格數據。第 1 列是學生的 ID,第 2 至 5 列是測驗分數。
names.csv:此文件表示一個電子表格。該電子表格包含學生的姓氏、名字和學生 ID。
1 var names = File.ReadAllLines(@"names.csv"); 2 var scores = File.ReadAllLines(@"scores.csv"); 3 4 //Name: Last[0], First[1], ID[2] 5 // Omelchenko, Svetlana, 11 6 //Score: StudentID[0], Exam1[1] Exam2[2], Exam3[3], Exam4[4] 7 // 111, 97, 92, 81, 60 8 9 //該查詢基於 id 鏈接兩個不一樣的電子表格 10 var query = from name in names 11 let t1 = name.Split(',') 12 from score in scores 13 let t2 = score.Split(',') 14 where t1[2] == t2[0] 15 orderby t1[0] 16 select $"{t1[0]},{t2[1]},{t2[2]},{t2[3]},{t2[4]}"; 17 18 //輸出 19 OutputQueryResult(query, "Merge two spreadsheets:");
1 private static void OutputQueryResult(IEnumerable<string> querys, string title) 2 { 3 Console.WriteLine(Environment.NewLine + title); 4 foreach (var query in querys) 5 { 6 Console.WriteLine(query); 7 } 8 9 Console.WriteLine($"{querys.Count()} total names in list"); 10 }
111, 97, 92, 81, 60 112, 75, 84, 91, 39 113, 88, 94, 65, 91 114, 97, 89, 85, 82 115, 35, 72, 91, 70 116, 99, 86, 90, 94 117, 93, 92, 80, 87 118, 92, 90, 83, 78 119, 68, 79, 88, 92 120, 99, 82, 81, 79 121, 96, 85, 91, 60 122, 94, 92, 91, 91
scores.csv:假定第一列表示學員 ID,後面幾列表示四次考試的分數。
1 var scores = File.ReadAllLines(@"scores.csv"); 2 3 //指定要計算的列 4 const int examNum = 3; 5 6 //scores.csv 格式: 7 //Student ID Exam#1 Exam#2 Exam#3 Exam#4 8 //111, 97, 92, 81, 60 9 10 //+1 表示跳過第一列 11 //計算但一列 12 SingleColumn(scores, examNum+1); 13 14 Console.WriteLine(); 15 16 //計算多列 17 MultiColumns(scores);
1 private static void SingleColumn(IEnumerable<string> strs, int examNum) 2 { 3 Console.WriteLine("Single Column Query:"); 4 5 //查詢分兩步: 6 // 1.分割字符串 7 // 2.對要計算的列的值轉換爲 int 8 var query = from str in strs 9 let t = str.Split(',') 10 select Convert.ToInt32(t[examNum]); 11 12 //對指定的列進行統計 13 var average = query.Average(); 14 var max = query.Max(); 15 var min = query.Min(); 16 17 Console.WriteLine($"Exam #{examNum}: Average:{average:##.##} High Score:{max} Low Score:{min}"); 18 } 19 20 private static void MultiColumns(IEnumerable<string> strs) 21 { 22 Console.WriteLine("Multi Column Query:"); 23 24 //查詢步驟: 25 // 1.分割字符串 26 // 2.跳過 id 列(第一列) 27 // 3.將當前行的每一個評分都轉換成 int,並選擇整個序列做爲一行結果。 28 var query = from str in strs 29 let t1 = str.Split(',') 30 let t2 = t1.Skip(1) 31 select (from t in t2 32 select Convert.ToInt32(t)); 33 34 //執行查詢並緩存結果以提升性能 35 var results = query.ToList(); 36 //找出結果的列數 37 var count = results[0].Count(); 38 39 //執行統計 40 //爲每一列分數的循環執行一次循環 41 for (var i = 0; i < count; i++) 42 { 43 var query2 = from result in results 44 select result.ElementAt(i); 45 46 var average = query2.Average(); 47 var max = query2.Max(); 48 var min = query2.Min(); 49 50 //+1 由於 #1 表示第一次考試 51 Console.WriteLine($"Exam #{i + 1} Average: {average:##.##} High Score: {max} Low Score: {min}"); 52 } 53 54 }
查詢的工做原理是使用 Split 方法將每一行文本轉換爲數組。每一個數組元素表示一列。最後,每一列中的文本都轉換爲其數字表示形式。若是文件是製表符分隔文件,只需將 Split 方法中的參數更新爲 \t。
================================================== 傳送門分割線 ==================================================
LINQ 其它隨筆 - 《開始使用 LINQ》
================================================== 傳送門分割線 ==================================================
【首聯】http://www.cnblogs.com/liqingwen/p/5814204.html
【參考】https://msdn.microsoft.com/zh-cn/library/bb397915(v=vs.100).aspx