我做爲SQL Server DBA工做了8年多,管理和優化服務器的性能。在個人空閒時間,我想爲宇宙和個人同事作一些有用的事情。這就是咱們最終爲SQL Server和Azure 提供免費索引維護工具的方法。git
每隔一段時間,人們在處理他們的優先事項時,可能就像一個手指式電池 - 一個激勵充電只持續一閃,而後一切都消失了。直到最近,我在這一輩子活觀察中也不例外。我常常被想法創造屬於我本身的想法所困擾,但優先級從一個變爲另外一個而且沒有完成任何事情。sql
DevArt開發用於開發和管理SQL Server,MySQL和Oracle數據庫的軟件,對個人動機和專業成長產生了很大的影響。數據庫
在他們來以前,我對建立本身的產品的具體細節知之甚少,但在此過程當中,我得到了不少關於SQL Server內部結構的知識。一年多以來,我一直致力於優化產品線中的查詢,逐漸開始瞭解市場上哪些功能比另外一種功能更受歡迎。緩存
在某個階段,製做一個新的利基產品的想法出如今我面前,但因爲某些狀況,這個想法沒有成功。那時,基本上我沒有爲公司內部的新項目找到足夠的資源而不影響核心業務。服務器
在一個嶄新的地方工做,並試圖本身建立一個項目讓我不斷妥協。製造一個擁有全部花裏胡哨的大產品的最初想法很快就會中止並逐漸轉變爲一個不一樣的方向 - 將計劃的功能分解爲單獨的迷你工具並相互獨立地實現它們。網絡
所以,SQL Index Manager誕生了,它是SQL Server和Azure的免費索引維護工具。主要想法是將RedGate和Devart公司的商業替代品做爲基礎,並嘗試在我本身的項目中改進其功能。函數
口頭上說,一切聽起來都很簡單......只需觀看幾個激勵視頻,打開「Rocky Balboa」模式,開始製做一款很酷的產品。但讓咱們面對音樂,一切都不那麼樂觀,由於在使用系統表函數時存在許多陷阱,sys.dm_db_index_physical_stats
同時,它是惟一能夠從中獲取有關索引碎片的最新信息的地方。工具
從開發的最初幾天起,就有很好的機會在標準方案中製造沉悶的方式,並複製已經調試過的競爭應用程序的邏輯,同時添加一些自組織。但在分析了元數據的查詢後,我想作一些更優化的事情,因爲大公司的官僚主義,它們永遠不會出如今他們的產品中。性能
在分析RedGate SQL索引管理器(v1.1.9.1378 - 每一個用戶155美圓)時,您能夠看到應用程序使用一種很是簡單的方法:使用第一個查詢,咱們得到用戶表和視圖的列表,而後第二個,咱們返回所選數據庫中全部索引的列表。優化
SELECT objects.name AS tableOrViewName , objects.object_id AS tableOrViewId , schemas.name AS schemaName , CAST(ISNULL(lobs.NumLobs, 0) AS BIT) AS ContainsLobs , o.is_memory_optimized FROM sys.objects AS objects JOIN sys.schemas AS schemas ON schemas.schema_id = objects.schema_id LEFT JOIN ( SELECT object_id , COUNT(*) AS NumLobs FROM sys.columns WITH (NOLOCK) WHERE system_type_id IN (34, 35, 99) OR max_length = -1 GROUP BY object_id ) AS lobs ON objects.object_id = lobs.object_id LEFT JOIN sys.tables AS o ON o.object_id = objects.object_id WHERE objects.type = 'U' OR objects.type = 'V' SELECT i.object_id AS tableOrViewId , i.name AS indexName , i.index_id AS indexId , i.allow_page_locks AS allowPageLocks , p.partition_number AS partitionNumber , CAST((c.numPartitions - 1) AS BIT) AS belongsToPartitionedIndex FROM sys.indexes AS i JOIN sys.partitions AS p ON p.index_id = i.index_id AND p.object_id = i.object_id JOIN ( SELECT COUNT(*) AS numPartitions , object_id , index_id FROM sys.partitions GROUP BY object_id , index_id ) AS c ON c.index_id = i.index_id AND c.object_id = i.object_id WHERE i.index_id > 0 -- ignore heaps AND i.is_disabled = 0 AND i.is_hypothetical = 0
接下來,在while
每一個索引分區的循環中,發送請求以肯定其大小和碎片級別。在掃描結束時,客戶端上會顯示重量小於進入閾值的索引。
EXEC sp_executesql N' SELECT index_id, avg_fragmentation_in_percent, page_count FROM sys.dm_db_index_physical_stats(@databaseId, @objectId, @indexId, @partitionNr, NULL)' , N'@databaseId int,@objectId int,@indexId int,@partitionNr int' , @databaseId = 7, @objectId = 2133582639, @indexId = 1, @partitionNr = 1 EXEC sp_executesql N' SELECT index_id, avg_fragmentation_in_percent, page_count FROM sys.dm_db_index_physical_stats(@databaseId, @objectId, @indexId, @partitionNr, NULL)' , N'@databaseId int,@objectId int,@indexId int,@partitionNr int' , @databaseId = 7, @objectId = 2133582639, @indexId = 2, @partitionNr = 1 EXEC sp_executesql N' SELECT index_id, avg_fragmentation_in_percent, page_count FROM sys.dm_db_index_physical_stats(@databaseId, @objectId, @indexId, @partitionNr, NULL)' , N'@databaseId int,@objectId int,@indexId int,@partitionNr int' , @databaseId = 7, @objectId = 2133582639, @indexId = 3, @partitionNr = 1
在分析此應用程序的邏輯時,您可能會發現各類缺點。例如,在發送請求以前,不會檢查當前分區是否包含任何行以從掃描中排除空分區。
可是問題在另外一個方面表現得更加尖銳 - 對服務器的請求數量大約等於來自的總行數sys.partitions
。鑑於真實數據庫能夠包含數萬個分區,這種細微差異可能致使對服務器的大量相似請求。在數據庫位於遠程服務器上的狀況下,因爲每一個請求的執行中的網絡延遲增長,掃描時間將更長,即便是最簡單的一個。
與RedGate不一樣,由DevArt開發的相似產品 - 用於SQL Server的dbForge索引管理器(v1.10.38 - 每用戶99美圓)在一個大型查詢中接收信息,而後在客戶端上顯示全部內容:
SELECT SCHEMA_NAME(o.[schema_id]) AS [schema_name] , o.name AS parent_name , o.[type] AS parent_type , i.name , i.type_desc , s.avg_fragmentation_in_percent , s.page_count , p.partition_number , p.[rows] , ISNULL(lob.is_lob_legacy, 0) AS is_lob_legacy , ISNULL(lob.is_lob, 0) AS is_lob , CASE WHEN ds.[type] = 'PS' THEN 1 ELSE 0 END AS is_partitioned FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) s JOIN sys.partitions p ON s.[object_id] = p.[object_id] AND s.index_id = p.index_id AND s.partition_number = p.partition_number JOIN sys.indexes i ON i.[object_id] = s.[object_id] AND i.index_id = s.index_id LEFT JOIN ( SELECT c.[object_id] , index_id = ISNULL(i.index_id, 1) , is_lob_legacy = MAX(CASE WHEN c.system_type_id IN (34, 35, 99) THEN 1 END) , is_lob = MAX(CASE WHEN c.max_length = -1 THEN 1 END) FROM sys.columns c LEFT JOIN sys.index_columns i ON c.[object_id] = i.[object_id] AND c.column_id = i.column_id AND i.index_id > 0 WHERE c.system_type_id IN (34, 35, 99) OR c.max_length = -1 GROUP BY c.[object_id], i.index_id ) lob ON lob.[object_id] = i.[object_id] AND lob.index_id = i.index_id JOIN sys.objects o ON o.[object_id] = i.[object_id] JOIN sys.data_spaces ds ON i.data_space_id = ds.data_space_id WHERE i.[type] IN (1, 2) AND i.is_disabled = 0 AND i.is_hypothetical = 0 AND s.index_level = 0 AND s.alloc_unit_type_desc = 'IN_ROW_DATA' AND o.[type] IN ('U', 'V')
消除了競爭產品中相似請求的面紗的主要問題,可是這種實現的缺點是沒有額外的參數傳遞給sys.dm_db_index_physical_stats
能夠限制對明顯沒必要要的索引的掃描的函數。實際上,這會致使獲取系統中全部索引的信息以及掃描階段沒必要要的磁盤負載。
值得一提的是,從中獲取的源碼數據sys.dm_db_index_physical_stats
並未永久緩存在緩衝池中,所以在獲取有關索引碎片的信息時最小化物理讀取是個人應用程序開發過程當中的優先任務之一。
通過屢次實驗,我設法將掃描分爲兩部分,將兩種方法結合起來。最初,一個大型請求經過過濾那些未包含在過濾範圍中的分區來預先肯定分區的大小:
INSERT INTO #AllocationUnits (ContainerID, ReservedPages, UsedPages) SELECT [container_id] , SUM([total_pages]) , SUM([used_pages]) FROM sys.allocation_units WITH(NOLOCK) GROUP BY [container_id] HAVING SUM([total_pages]) BETWEEN @MinIndexSize AND @MaxIndexSize
接下來,咱們只獲取包含數據的分區,以免從空索引中進行沒必要要的讀取。
SELECT [object_id]
, [index_id]
, [partition_id]
, [partition_number]
, [rows]
, [data_compression]
INTO #Partitions FROM sys.partitions WITH(NOLOCK) WHERE [object_id] > 255 AND [rows] > 0 AND [object_id] NOT IN (SELECT * FROM #ExcludeList)
根據設置,僅獲取用戶想要分析的索引類型(支持堆,羣集/非羣集索引和列存儲)。
INSERT INTO #Indexes SELECT ObjectID = i.[object_id] , IndexID = i.index_id , IndexName = i.[name] , PagesCount = a.ReservedPages , UnusedPagesCount = a.ReservedPages - a.UsedPages , PartitionNumber = p.[partition_number] , RowsCount = ISNULL(p.[rows], 0) , IndexType = i.[type] , IsAllowPageLocks = i.[allow_page_locks] , DataSpaceID = i.[data_space_id] , DataCompression = p.[data_compression] , IsUnique = i.[is_unique] , IsPK = i.[is_primary_key] , FillFactorValue = i.[fill_factor] , IsFiltered = i.[has_filter] FROM #AllocationUnits a JOIN #Partitions p ON a.ContainerID = p.[partition_id] JOIN sys.indexes i WITH(NOLOCK) ON i.[object_id] = p.[object_id] AND p.[index_id] = i.[index_id] WHERE i.[type] IN (0, 1, 2, 5, 6) AND i.[object_id] > 255
以後,咱們添加了一些魔法,而且......對於全部小的索引,咱們經過重複調用sys.dm_db_index_physical_stats
具備全部參數的完整指示的函數來肯定碎片的級別。
INSERT INTO #Fragmentation (ObjectID, IndexID, PartitionNumber, Fragmentation) SELECT i.ObjectID , i.IndexID , i.PartitionNumber , r.[avg_fragmentation_in_percent] FROM #Indexes i CROSS APPLY sys.dm_db_index_physical_stats_ (@DBID, i.ObjectID, i.IndexID, i.PartitionNumber, 'LIMITED') r WHERE i.PagesCount <= @PreDescribeSize AND r.[index_level] = 0 AND r.[alloc_unit_type_desc] = 'IN_ROW_DATA' AND i.IndexType IN (0, 1, 2)
接下來,咱們經過過濾掉額外的數據將全部可能的信息返回給客戶端:
SELECT i.ObjectID
, i.IndexID
, i.IndexName
, ObjectName = o.[name]
, SchemaName = s.[name]
, i.PagesCount
, i.UnusedPagesCount
, i.PartitionNumber
, i.RowsCount
, i.IndexType
, i.IsAllowPageLocks
, u.TotalWrites
, u.TotalReads
, u.TotalSeeks
, u.TotalScans
, u.TotalLookups
, u.LastUsage
, i.DataCompression
, f.Fragmentation
, IndexStats = STATS_DATE(i.ObjectID, i.IndexID) , IsLobLegacy = ISNULL(lob.IsLobLegacy, 0) , IsLob = ISNULL(lob.IsLob, 0) , IsSparse = CAST(CASE WHEN p.ObjectID IS NULL THEN 0 ELSE 1 END AS BIT) , IsPartitioned = CAST(CASE WHEN dds.[data_space_id] _ IS NOT NULL THEN 1 ELSE 0 END AS BIT) , FileGroupName = fg.[name] , i.IsUnique , i.IsPK , i.FillFactorValue , i.IsFiltered , a.IndexColumns , a.IncludedColumns FROM #Indexes i JOIN sys.objects o WITH(NOLOCK) ON o.[object_id] = i.ObjectID JOIN sys.schemas s WITH(NOLOCK) ON s.[schema_id] = o.[schema_id] LEFT JOIN #AggColumns a ON a.ObjectID = i.ObjectID AND a.IndexID = i.IndexID LEFT JOIN #Sparse p ON p.ObjectID = i.ObjectID LEFT JOIN #Fragmentation f ON f.ObjectID = i.ObjectID AND f.IndexID = i.IndexID AND f.PartitionNumber = i.PartitionNumber LEFT JOIN ( SELECT ObjectID = [object_id] , IndexID = [index_id] , TotalWrites = NULLIF([user_updates], 0) , TotalReads = NULLIF([user_seeks] + [user_scans] + [user_lookups], 0) , TotalSeeks = NULLIF([user_seeks], 0) , TotalScans = NULLIF([user_scans], 0) , TotalLookups = NULLIF([user_lookups], 0) , LastUsage = ( SELECT MAX(dt) FROM ( VALUES ([last_user_seek]) , ([last_user_scan]) , ([last_user_lookup]) , ([last_user_update]) ) t(dt) ) FROM sys.dm_db_index_usage_stats WITH(NOLOCK) WHERE [database_id] = @DBID ) u ON i.ObjectID = u.ObjectID AND i.IndexID = u.IndexID LEFT JOIN #Lob lob ON lob.ObjectID = i.ObjectID AND lob.IndexID = i.IndexID LEFT JOIN sys.destination_data_spaces dds WITH(NOLOCK) _ ON i.DataSpaceID = dds.[partition_scheme_id] AND i.PartitionNumber = dds.[destination_id] JOIN sys.filegroups fg WITH(NOLOCK) _ ON ISNULL(dds.[data_space_id], i.DataSpaceID) = fg.[data_space_id] WHERE o.[type] IN ('V', 'U') AND ( f.Fragmentation >= @Fragmentation OR i.PagesCount > @PreDescribeSize OR i.IndexType IN (5, 6) )
以後,點請求肯定大型索引的碎片級別。
EXEC sp_executesql N' DECLARE @DBID INT = DB_ID() SELECT [avg_fragmentation_in_percent] FROM sys.dm_db_index_physical_stats(@DBID, @ObjectID, @IndexID, @PartitionNumber, ''LIMITED'') WHERE [index_level] = 0 AND [alloc_unit_type_desc] = ''IN_ROW_DATA''' , N'@ObjectID int,@IndexID int,@PartitionNumber int' , @ObjectId = 1044198770, @IndexId = 1, @PartitionNumber = 1 EXEC sp_executesql N' DECLARE @DBID INT = DB_ID() SELECT [avg_fragmentation_in_percent] FROM sys.dm_db_index_physical_stats(@DBID, @ObjectID, @IndexID, @PartitionNumber, ''LIMITED'') WHERE [index_level] = 0 AND [alloc_unit_type_desc] = ''IN_ROW_DATA''' , N'@ObjectID int,@IndexID int,@PartitionNumber int' , @ObjectId = 1552724584, @IndexId = 0, @PartitionNumber = 1
因爲這種方法,在生成請求時,我設法解決了競爭對手應用程序中遇到的掃描性能問題。這多是它的終結,但在開發過程當中,逐漸出現了各類新的想法,這使得擴大個人產品的應用範圍成爲可能。
最初,實現了對使用的支持WAIT_AT_LOW_PRIORITY
,而後可使用DATA_COMPRESSION
和FILL_FACTOR
重建索引。
該應用程序已被「撒上」之前未計劃的功能,如維護列存儲:
SELECT *
FROM ( SELECT IndexID = [index_id] , PartitionNumber = [partition_number] , PagesCount = SUM([size_in_bytes]) / 8192 , UnusedPagesCount = ISNULL(SUM(CASE WHEN [state] = 1 _ THEN [size_in_bytes] END), 0) / 8192 , Fragmentation = CAST(ISNULL(SUM(CASE WHEN [state] = 1 _ THEN [size_in_bytes] END), 0) * 100. / SUM([size_in_bytes]) AS FLOAT) FROM sys.fn_column_store_row_groups(@ObjectID) GROUP BY [index_id] , [partition_number] ) t WHERE Fragmentation >= @Fragmentation AND PagesCount BETWEEN @MinIndexSize AND @MaxIndexSize
或者根據如下信息建立非聚簇索引的能力dm_db_missing_index
:
SELECT ObjectID = d.[object_id]
, UserImpact = gs.[avg_user_impact]
, TotalReads = gs.[user_seeks] + gs.[user_scans]
, TotalSeeks = gs.[user_seeks]
, TotalScans = gs.[user_scans]
, LastUsage = ISNULL(gs.[last_user_scan], gs.[last_user_seek])
, IndexColumns =
CASE WHEN d.[equality_columns] IS NOT NULL _AND d.[inequality_columns] IS NOT NULL THEN d.[equality_columns] + ', ' + d.[inequality_columns] WHEN d.[equality_columns] IS NOT NULL AND d.[inequality_columns] IS NULL THEN d.[equality_columns] ELSE d.[inequality_columns] END , IncludedColumns = d.[included_columns] FROM sys.dm_db_missing_index_groups g WITH(NOLOCK) JOIN sys.dm_db_missing_index_group_stats gs WITH(NOLOCK) _ ON gs.[group_handle] = g.[index_group_handle] JOIN sys.dm_db_missing_index_details d WITH(NOLOCK) _ ON g.[index_handle] = d.[index_handle] WHERE d.[database_id] = DB_ID()
關鍵的是,開發計劃並無就此結束,由於我渴望進一步開發這個應用程序網站源碼。下一步是添加查找重複(已完成)或未使用索引的功能,以及實現對在SQL Server中維護統計信息的徹底支持。
如今市場上有不少付費解決方案。我想相信,因爲自由定位,更優化的查詢以及各類有用的gismos的可用性,這個產品確定會在平常任務中變得有用。