mysql導入太慢解決方法

半調子數據科學家又要折騰數據,拿到數據一看,3.6G的zip文件,解壓看看,臥槽12個G的sql文件。好吧,又要折騰sql數據了。第一件事,確定是搭一個數據庫,導入數據咯。mysql

折騰過sql導入的親們都知道,mysql默認的參數,導入的速度仍是很慢的,特別是數據忒多的狀況。此次的數據,折騰完了以後,有1000W條那麼多,不用猜也知道,慢的要死,因此須要對數據庫作一些設置。sql

能夠設置的地方有兩個,第一個是innodb_flush_log_at_trx_commit。官方手冊對各個值解釋以下:docker

Controls the balance between strict ACID compliance for commit operations and higher performance that is possible when commit-related I/O operations are rearranged and done in batches. You can achieve better performance by changing the default value but then you can lose up to a second of transactions in a crash.

The default value of 1 is required for full ACID compliance. With this value, the contents of the InnoDB log buffer are written out to the log file at each transaction commit and the log file is flushed to disk.

With a value of 0, the contents of the InnoDB log buffer are written to the log file approximately once per second and the log file is flushed to disk. No writes from the log buffer to the log file are performed at transaction commit. Once-per-second flushing is not guaranteed to happen every second due to process scheduling issues. Because the flush to disk operation only occurs approximately once per second, you can lose up to a second of transactions with any mysqld process crash.

With a value of 2, the contents of the InnoDB log buffer are written to the log file after each transaction commit and the log file is flushed to disk approximately once per second. Once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues. Because the flush to disk operation only occurs approximately once per second, you can lose up to a second of transactions in an operating system crash or a power outage.

InnoDB log flushing frequency is controlled by innodb_flush_log_at_timeout, which allows you to set log flushing frequency to N seconds (where N is 1 ... 2700, with a default value of 1). However, any mysqld process crash can erase up to N seconds of transactions.

DDL changes and other internal InnoDB activities flush the InnoDB log independent of the innodb_flush_log_at_trx_commit setting.

InnoDB crash recovery works regardless of the innodb_flush_log_at_trx_commit setting. Transactions are either applied entirely or erased entirely.

For durability and consistency in a replication setup that uses InnoDB with transactions:

If binary logging is enabled, set sync_binlog=1.

Always set innodb_flush_log_at_trx_commit=1.

Caution
Many operating systems and some disk hardware fool the flush-to-disk operation. They may tell mysqld that the flush has taken place, even though it has not. In this case, the durability of transactions is not guaranteed even with the setting 1, and in the worst case, a power outage can corrupt InnoDB data. Using a battery-backed disk cache in the SCSI disk controller or in the disk itself speeds up file flushes, and makes the operation safer. You can also try to disable the caching of disk writes in hardware caches.

也就是數據庫

  • 1 默認值,最慢,每次事務提交都要寫入log並刷新到磁盤上,這是最保險的方式
  • 0 最快,每隔1S將log刷新到磁盤,可是不保證。事務提交不會觸發log寫入。很不安全,mysql掛了,那麼上一秒的數據就都丟了。
  • 2 折中的一種,事務提交會寫入log,可是log刷新仍是每秒一次,不保證。這種時候,就算mysql崩了,可是隻要操做系統還在運轉,數據仍是會被寫到磁盤上。

這裏提到,有些磁盤系統,就算是刷新也沒法保證數據確實被寫入了,筆者就碰到過文件copy到硬盤(機械硬盤)上,機器死掉了,重啓以後,只有不到一半的數據還在。查了才知道,數據只是被寫入硬盤的緩存上了,尚未寫入硬盤。緩存

這個參數能夠在my.ini裏面設置,可是咱們只是臨時用一下,並且我本地用的是docker的mysql,弄配置文件比較麻煩,因此直接在mysql命令行裏面設置就能夠了。安全

mysql> set GLOBAL innodb_flush_log_at_trx_commit = 0;

第二個能夠設置的地方,在導入sql時候使用的參數:session

net_buffer_lengthapp

Each client thread is associated with a connection buffer and result buffer. Both begin with a size given by net_buffer_length but are dynamically enlarged up to max_allowed_packet bytes as needed. The result buffer shrinks to net_buffer_length after each SQL statement.

This variable should not normally be changed, but if you have very little memory, you can set it to the expected length of statements sent by clients. If statements exceed this length, the connection buffer is automatically enlarged. The maximum value to which net_buffer_length can be set is 1MB.

max_allowed_packetless

The maximum size of one packet or any generated/intermediate string, or any parameter sent by the mysql_stmt_send_long_data() C API function. The default is 4MB.

The packet message buffer is initialized to net_buffer_length bytes, but can grow up to max_allowed_packet bytes when needed. This value by default is small, to catch large (possibly incorrect) packets.

You must increase this value if you are using large BLOB columns or long strings. It should be as big as the largest BLOB you want to use. The protocol limit for max_allowed_packet is 1GB. The value should be a multiple of 1024; nonmultiples are rounded down to the nearest multiple.

When you change the message buffer size by changing the value of the max_allowed_packet variable, you should also change the buffer size on the client side if your client program permits it. The default max_allowed_packet value built in to the client library is 1GB, but individual client programs might override this. For example, mysql and mysqldump have defaults of 16MB and 24MB, respectively. They also enable you to change the client-side value by setting max_allowed_packet on the command line or in an option file.

The session value of this variable is read only. The client can receive up to as many bytes as the session value. However, the server will not send to the client more bytes than the current global max_allowed_packet value. (The global value could be less than the session value if the global value is changed after the client connects.)

須要注意的事,須要先肯定服務端的設置,客戶端的設置不能大於服務端設置。ide

mysql>show variables like 'max_allowed_packet'; 
mysql>show variables like 'net_buffer_length';

事實上,我用的mariadb的docker,這兩個值的設置已經很是大了。並且官方也提到,mysql命令行裏面的默認設置是足夠大的,不過我測試的結果,仍是寫上去,速度會快一點,不曉得爲啥。

mysql -h127.0.0.1 -uroot -proot123 data_base_name --max_allowed_packet=16777216 --net_buffer_length=16384<your_sql_script.sql

不過,雖然說速度快了不少,可是也是幾個小時的功夫才折騰完,這一次的數據文本居多,不知道是否是由於這個,仍是有什麼別的設置我不知道的。

順便說一句,後面爲了方便仍是把數據折騰到mongo裏面了,數據佔的空間大了挺多,可是一樣是單線程操做,中間還加了挺多數據處理,可是一小時以內就搞定了。

半調子數據科學家,還要繼續折騰數據。。。

(* ̄︿ ̄)

相關文章
相關標籤/搜索