qhooge 0人評論 2975人閱讀 2012-03-22 21:07:47php
Copyright © 2009, The e. Publishing Dept. of Morpho Studio (Spruce Int. Found.® ) All rights reserved.
分清兩類基本的文件系統操做是很是重要的:讀 和 寫算法
有人可能會說這是一個很簡單甚至愚蠢的問題。可是你必須有耐心聽我說完,做爲FS文件系統的兩個I/O 子系統,讀和寫的數據操做流程具備很是大的差別,這意味着提高讀/寫性能的方法是有差異的。數據庫
咱們可使用 zpool iostat
或 iostat(1M)
再次,這些不一樣的性能視角在進行性能優化時有不一樣的意義,你只須要了解本身面臨的是哪一類特殊問題。 同時讀/寫都有兩種不一樣的性能模式:性能優化
這裏有一個好消息,ZFS 經過稱爲copy-on-write的魔術(操做特性),自動將隨機的寫操做轉化爲連續的寫操做。這是一類較少被其餘文件系統較少顧及到的性能問題。服務器
*肯定實際的預期:ZFS是很棒的,是的。 可是你須要遵照物理學定律。 一個10000 rpm的一個磁盤不能實現超過每秒166次的隨機IOPS,由於10000 prm(周/分鐘) 除以60秒(每分鐘)等於166。這表示磁頭每秒鐘只能在一個隨機街區上方定位它本身的位置166次。 任何多於那個數的尋道和你的數據讀/寫其實不是隨機的。磁盤隨機讀/寫操做的最大理論IOPS數就是這麼計算出來的。
與此相似,RAID-Z 意味對於每一個RAID-Z磁盤組你只會得到至關於單個磁盤的IOPS性能,由於每一個文件系統IO 將並行發生在一個RAID-Z磁盤組的所有磁盤上。
*設定性能目標:究竟什麼狀況是" 太慢" ? 什麼性能將是可接受的? 如今得到了多大的性能,而且你想要多大的性能?
設定性能目標很重要,由於他們告訴你何時你已經作到了。 總有方法提升性能,可是不惜任何代價提升性能是無用的。 知道何時你已經作到了,而後慶祝!
#1: 增長足夠的內存
你須要多少內存? 根據thumb的粗略計算規則是你磁盤的總容量除以1千,而後加上爲操做系統保留的1GB 。這意味着每1TB數據,你將須要至少1GB的內存用於緩存ZFS元數據,加上操做系統和其餘應用程序所需的額外內存容量。
#2: 增長跟多的內存
ZFS 會使用他找到的每一塊內存來緩存數據。ZFS具備很是精緻的緩存算法,他會嘗試緩存最進使用和最常用的數據,根據數據的使用狀況自適應平衡兩種數據類型的緩存。ZFS同時還有高級的預讀能力,能夠極大得改善不一樣類型的數據順序讀取性能。
若是你但願更加自動化得進行以上工做, Ben Rockwood 編寫了一個很是棒的工具,稱爲 arc_summary (ARC——ZFS Adaptive Replacement Cache ZFS自適應可調整緩存). 其中兩個"Ghost" 變量將確切的告訴你根據過去的一段時間內數據負載,到底須要增長多少內存,才能幫助你明顯得改善你的ZFS性能。
If you want to influence the balance between user data and metadata in the ZFS ARC cache, check out the primarycache filesystem property that you can set using the zfs(1M) command. For RAM-starved servers with a lot of random reads, it may make sense to restrict the precious RAM cache to metadata and use an L2ARC, explained in tip #4 below.
#3: 增長更多的內存得到重複數據消除技術帶來的提高性能
在較早的文章裏, 我寫過關於ZFS 重複數據消除(ZFS Deduplication.)的基礎知識。若是你計劃使用這項功能,請記住ZFS將分配一個表格包含文件系統中存儲的每個數據塊的存儲位置信息以及數據塊的校驗和,而後就能肯定是否一個特定的數據塊已經被寫入過,以及安全得將這些數據標記爲重複的。
重複消除技術將可以節省你的存儲空間,同時由於節省了沒必要要的讀寫IOPs ,你的ZFS性能也將得到提高。可是,使用這一技術的成本是你須要更多的內存來存儲重複數據表(ZFS dedup table),不然額外的低速磁盤的IO操做反而會下降文件系統的性能。
那麼ZFS 重複數據表到底有多大呢?Richard Elling 在最近發表的一篇文章中指出:針對每個數據塊,ZFS 重複數據表會有一條記錄,每條記錄會使用大約250字節。假設數據塊大小爲8K,那麼每1TB的用戶數據將須要32GB的內存來容納。若是你存儲的主要是大尺寸的文件,那麼你會有一個比較大的平均數據塊大小,好比64K,那你只須要4GB內存就能容納整個重複數據表。
#4: 使用固態硬盤(SSDs)提高讀取性能
你能夠經過 zpool(1M)命令很是簡便得完成配置工做,參閱man-page的 "Cache devices" 章節。
SSDs can deliver two orders of magnitude better IOPS than traditional harddisks, and they're much cheaper on a per-GB basis than RAM.
They form an excellent layer of cache between the ZFS RAM-based ARC and the actual disk storage.
You don't need to observe any reliability requirements when configuring L2ARC devices: If they fail, no data is lost because it can always be retrieved from disk.
This means that L2ARC devices can be cheap, but before you start putting USB sticks into your server, you should make sure they deliver a good performance benefit over your rotating disks :).
SSDs come in various sizes: From drop-in-replacements for existing SATA disks in the range of 32GB to the Oracle Sun F20 PCI card with 96GB of flash and built-in SAS controllers (which is one of the secrets behind Oracle Exadata V2's breakthrough performance), to the mighty fast Oracle Sun F5100 flash array (which is the secret behind Oracle's current TPC-C and other world records) with a whopping 1.96TB of pure flash memory and over a million IOPS. Nice!
And since the dedup table is stored in the ZFS ARC and consequently spills off into the L2ARC if available, using SSDs as cache devices will also benefit deduplication performance.
#5: Use SSDs to Improve Write Performance
Most write performance problems are related to synchronous writes. These are mostly found in file servers and database servers.
With synchronous writes, ZFS needs to wait until each particular IO is written to stable storage, and if that's your disk, then it'll need to wait until the rotating rust has spun into the right place, the harddisk's arm moved to the right position, and finally, until the block has been written. This is mechanical, it's latency-bound, it's slow.
See Roch's excellent article on ZFS NFS performance for a more detailed discussion on this.
SSDs can change the whole game for synchronous writes because they have 100x better latency: No moving parts, no waiting, instant writes, instant performance.
So if you're suffering from a high load in synchronous writes, add SSDs as ZFS log devices (aka ZIL, Logzillas) and watch your synchronous writes fly. Check out the zpool(1M) man page under the "Intent Log" section for more details.
Make sure you mirror your ZIL devices: They are there to guarantee the POSIX requirement for "stable storage" so they must function reliably, otherwise data may be lost on power or system failure.
Also, make sure you use high quality SLC Flash Memory devices, because they can give you reliable write transactions. Cheaper MLC cells can damage existing data if the power fails during write operations, something you really don't want.
#6: Use Mirroring
Many people configure their storage for maximum capacity. They just look at how many TB they can get out of their system. After all, storage is expensive, isn't it?
Wrong. Storage capacity is cheap. Every 18 months or so, the same disk only costs half as much, or you can buy double the capacity for the same price, depending on how you view it.
But storage performance can be precious. So why squeeze the last GB out of your storage if capacity is cheap anyway? Wouldn't it make more sense to trade in capacity for speed?
This is what mirroring disks offer as opposed to RAID-Z or RAID-Z2:
For a more detailed discussion on this, I highly recommend Richard Elling's post on ZFS RAID recommendations: Space, performance and MTTDL.
Also, there's some more discussion on this in my earlier RAID-GREED-article.
Bottom line: If you want performance, use mirroring.
#7: Add More Disks
Our next tip was already buried inside tip #6: Add more disks. The more vdevs ZFS has to play with, the more shoulders it can place its load on and the faster your storage performance will become.
This works both for increasing IOPS and for increasing bandwidth, and it'll also add to your storage space, so there's nothing to lose by adding more disks to your pool.
But keep in mind that the performance benefit of adding more disks (and of using mirrors instead of RAID-Z(2)) only accelerates aggregate performance. The performance of every single I/O operation is still confined to that of a single disk's I/O performance.
So, adding more disks does not substitute for adding SSDs or RAM, but it'll certainly help aggregate IOPS and bandwidth for the cases where lots of concurrent IOPS and bigger overall bandwidth are needed.
#8 Leave Enough Free Space
Don't wait until your pool is full before adding new disks, though.
ZFS uses copy on write which means that it writes new data into free blocks, and only when the überblock has been updated, the new state becomes valid.
This is great for performance because it gives ZFS the opportunity to turn random writes into sequential writes - by choosing the right blocks out of the list of free blocks so they're nicely in order and thus can be written to quickly.
That is, when there are enough blocks.
Because if you don't have enough free blocks in your pool, ZFS will be limited in its choice, and that means it won't be able to choose enough blocks that are in order, and hence it won't be able to create an optimal set of sequential writes, which will impact write performance.
As a rule of thumb, don't let your pool become more full than about 80% of its capacity. Once it reaches that point, you should start adding more disks so ZFS has enough free blocks to choose from in sequential write order.
#9: Hire A ZFS Expert
There's a reason why this point comes up almost last: In the utter majority of all ZFS performance cases, one or more of #1-#8 above are almost always the solution.
And they're cheaper than hiring a ZFS performance expert who will likely tell you to add more RAM, or add SSDs or switch from RAID-Z to mirroring after looking at your configuration for a couple of minutes anyway!
But sometimes, a performance problem can be really tricky. You may think it's a storage performance problem, but instead your application may be suffering from an entirely different effect.
Or maybe there are some complex dependencies going on, or some other unusual interaction between CPUs, memory, networking, I/O and storage.
Or perhaps you're hitting a bug or some other strange phenomenon?
So, if all else fails and none of the above options seem to help, contact your favorite Oracle/Sun representative (or send me a mail) and ask for a performance workshop quote.
If your performance problem is really that hard, we want to know about it.
#10: Be An Evil Tuner - But Know What You Do
If you don't want to go for option #9 and if you know what you do, you can check out the ZFS Evil Tuning Guide.
There's a reason it's called "evil": ZFS is not supposed to be tuned. The default values are almost always the right values, and most of the time, changing them won't help, unless you really know what you're doing. So, handle with care.
Still, when people encounter a ZFS performance problem, they tend to Google "ZFS tuning", then they'll find the Evil Tuning Guide, then think that performance is just a matter of setting that magic variable in /etc/system.
This is simply not true.
Measuring performance in a standardized way, setting goals, then sticking to them helps. Adding RAM helps. Using SSDs helps. Thinking about the right number and RAID level of disks helps. Letting ZFS breathe helps.
But tuning kernel parameters is reserved for very special cases, and then you're probably much better off hiring an expert to help you do that correctly.
Bonus: Some Miscellaneous Settings
If you look through the zfs(1M) man page, you'll notice a few performance related properties you can set.
They're not general cures for all performance problems (otherwise they'd be set by default), but they can help in specific situations. Here are a few:
Your Turn
Sorry for the long article. I hope the table of contents at the beginning makes it more digestible, and I hope it's useful to you as a little checklist for ZFS performance planning and for dealing with ZFS performance problems.
Let me know if you want me to split up longer articles like these (though this one is really meant to remain together).
Now it's your turn: What is your experience with ZFS performance? What options from the above list did you implement for what kind of application/problem and what were your results? What helped and what didn't and what are your own ZFS performance secrets?
Share your ZFS performance expertise in the comments section and help others get the best performance out of ZFS!
Related Posts