最近公司在使用 ABP
重構以前的老項目,數據庫也由 SQL SERVER
切換到了 MySql
。吐槽一下,以前的產品使用的是 Windows Server 2008
, SqlServer 2008R2
, .Net Framework 4.5
,如今開始擁抱 .net core
。回到正題。目前單表有 10w+
,100w+
數據不等,等會都測試一下。數據庫切換,以及數據庫表結構變化,不能夠避免的須要進行數據遷移。而遷移方案也並非不少,下面是我嘗試使用的兩種方案進行測試。mysql
private static async Task BatchInsertTestUsers(List<TestUser> testUsers) { var prefix = "INSERT INTO users (Id,Name,Age) VALUES"; using (IDbConnection conn = new MySqlConnection(DataMigrationConfig.MySqlConstr)) { var sqlText = new StringBuilder(); sqlText.Append(prefix); foreach (var testUser in testUsers) { sqlText.AppendFormat( $"({testUser.Id},'{testUser.Name}', {testUser.Age}),"); } var insertSql = sqlText.ToString().Substring(0, sqlText.ToString().LastIndexOf(',')); await conn.ExecuteAsync(insertSql); } }
BatchInsertTestUsers
將傳入的集合,拼接成 SQL
並執行。public static Task RunMultiTasks(List<TestUser> users) { var tasks = new List<Task>(); var pageSize = 10000; var writeCount = (users.Count() / pageSize) + 2; for (var i = 1; i < writeCount; i++) { var skipCount = (i - 1) * pageSize; var batchInsertList = users.Skip(skipCount).Take(pageSize).ToList(); var task = Task.Run(() => { BatchInsertTestUsers(batchInsertList); }); tasks.Add(task); } var sw = new Stopwatch(); sw.Start(); Task.WaitAll(tasks.ToArray()); sw.Stop(); Console.WriteLine($"多線程批量插入用時:{sw.ElapsedMilliseconds} ms"); return Task.FromResult(0); }
RunMultiTasks
將數據分批,一次性插入 1w
條。瞭解到 MySqlBulkLoader
是由於 SqlServer
的 Sqlbulkcopy
。MySqlBulkLoader
並不支持集合的導入,須要先將數據導出爲 .csv
格式,而後讀取 .csv
數據導入。git
public static async Task Export(string filePath, List<TestUser> items) { IExporter exporter = new CsvExporter(); await exporter.Export(filePath, items); }
public static void Load(string filePath, string tableName) { using MySqlConnection conn = new MySqlConnection(DataMigrationConfig.MySqlConstr); var bulk = new MySqlBulkLoader(conn) { FieldTerminator = ",", FieldQuotationCharacter = '"', EscapeCharacter = '"', LineTerminator = "\r\n", FileName = filePath, Local = true, NumberOfLinesToSkip = 1, TableName = tableName, CharacterSet = "utf8mb4", }; bulk.Load(); }
Local = true
讀取本地文件,進行導入。Docker
容器內,用的是機械硬盤。若是您的使用的是 SSD
硬盤,效果會更佳。public class TestUser { public int Id { get; set; } public string Name { get; set; } public int Age { get; set; } }
1w
,10w
,100w
條數據插入的性能,以及開啓索引以及關閉索引的影響class Program { static async Task Main(string[] args) { var testData = DataGen.Run(100 * 10000); await RunMultiTasks(testData); await RunMySqlLoaderTask(testData); } public static async Task RunMultiTasks(List<TestUser> users) { await DataMigrateTask.RunMultiTasks(users); } public static async Task RunMySqlLoaderTask(List<TestUser> users) { var fileName = "users"; var filePath = Directory.GetCurrentDirectory() + "\\" + fileName + ".csv"; await DataMigrateTask.Export(filePath, users); var sw = new Stopwatch(); sw.Start(); DataMigrateTask.Load(filePath, "users"); sw.Stop(); Console.WriteLine($"MySqlBulkLoader 用時:{sw.ElapsedMilliseconds} ms"); } }
說了那麼多,這裏纔是最重點。github
方案 | 1w | 10w | 100w |
---|---|---|---|
RunMultiTasks | 367ms | 3548ms | 91263ms |
RunMySqlLoaderTask | 2031ms | 1597ms | 13105ms |
RunMultiTasks(關閉索引) | 233ms | 3230ms | 67040ms |
RunMySqlLoaderTask (關閉索引) | 1785ms | 1367ms | 12456ms |
以上的測試僅供參考,上面的簡單測試一下,數據量大的時候 MySqlLoaderTask
優點是明顯的,對於小於 1w
數據量的能夠採用多線程批量插入效果更好。有興趣的小夥伴的能夠本身下載代碼玩玩。若有更好的
方案,不吝賜教。sql
MySqlLoader
導入 null
數據使用 NULL
,而不是mysql
文檔上說的 \N