您可能想知道爲何VirtualAlloc在64K邊界分配內存,即便
頁面粒度爲4K。算法
你有Alpha AXP處理器,感謝你。app
在Alpha AXP上,沒有「加載32位整數」指令。要加載32位
整數,實際上要加載兩個16位整數並將它們組合起來。less
所以,若是分配粒度小於64K,則從新定位在內存中的DLL
將須要每一個可重定位地址兩個修正:一個到高16位,一個
到低16位。若是這改變了
兩半之間的進位或借位,事情會變得更糟。(例如,來自0x1234F000移動地址4K到0x12350000,
這迫使這兩個地址的低和高部分發生變化。即便
運動的量遠小於64K,它仍然有在高部的衝擊,因爲
以隨身攜帶。)ide
但等等,還有更多。性能
Alpha AXP實際上將兩個帶符號的 16位整數組合在一塊兒造成一個32位
整數。例如,要加載值0x1234ABCD,首先要使用LDAH指令
將值0x1235加載到目標寄存器的高位字中。而後,您
將使用LDA指令添加簽名值-0x5433。(由於0x5433 = 0x10000
- 0xABCD。)結果是所需的值0x1234ABCD。優化
LDAH t1,0x1235(零)// t1 = 0x12350000 LDA t1,-0x5433(t1)// t1 = t1 - 0x5433 = 0x1234ABCD
所以,若是重定位致使地址在64K塊的「下半部分」
和「上半部分」 之間移動,則必須進行額外的修復以確保
正確調整地址上半部分的算法。因爲編譯器
喜歡從新排序指令,所以LDAH指令可能距離很遠,因此
下半部分的重定位記錄必須有一些方法來找到匹配的
上半部分。ui
更重要的是,編譯器很聰明,若是它須要爲
同一個64K區域內的兩個變量計算地址,它會在它們之間共享LDAH指令。若是
能夠經過不是64K的倍數的值從新定位,則編譯器
將再也不可以執行此優化,由於在
重定位以後,這兩個變量可能再也不屬於同一個64K塊。this
以64K粒度強制內存分配解決了全部這些問題。spa
若是你一直在密切關注,你已經看到這也解釋了
爲何在2GB邊界附近有一個64K「無人區」。考慮
計算值0x7FFFABCD 的方法:因爲低16位在
64K範圍的上半部分,所以須要經過減法而不是加法來計算該值。在
天真的解決辦法是使用orm
LDAH t1,0x8000(零)// t1 = 0x80000000,對嗎? LDA t1,-0x5433(t1)// t1 = t1 - 0x5433 = 0x7FFFABCD,對嗎?
除了這不起做用。Alpha AXP是一個64位處理器,0x8000不
適合16位有符號整數,因此你必須使用-0x8000,一個負數。
實際發生的是
LDAH t1,-0x8000(零)// t1 = 0xFFFFFFFF`80000000 LDA t1,-0x5433(t1)// t1 = t1 - 0x5433 = 0xFFFFFFFF`7FFFABCD
您須要添加第三條指令來清除高32位。巧妙的
作法是添加零並告訴處理器將結果視爲32位整數
並將其符號擴展爲64位。
ADDL t1,零,t1 // t1 = t1 + 0,帶L後綴 // L後綴表示符號擴展結果從32位到64位 // t1 = 0x00000000`7FFFABCD
若是容許2GB邊界的64K內的地址,那麼每一個存儲器地址
計算都必須插入第三個ADDL指令,以防地址
被從新定位到2GB邊界附近的「危險區域」。
這是爲了得到對最後64K地址空間的訪問而付出的很是高的代價
(全部地址計算的性能損失爲50%,以防止
實際上永遠不會發生的狀況),所以將該區域做爲永久無效的
方式一個更謹慎的選擇。
You may have wondered why VirtualAlloc allocates memory at 64K boundaries even though
page granularity is 4K.
You have the Alpha AXP processor to thank for that.
On the Alpha AXP, there is no 「load 32-bit integer」 instruction. To load a 32-bit
integer, you actually load two 16-bit integers and combine them.
So if allocation granularity were finer than 64K, a DLL that got relocated in memory
would require two fixups per relocatable address: one to the upper 16 bits and one
to the lower 16 bits. And things get worse if this changes a carry or borrow between
the two halves. (For example, moving an address 4K from 0x1234F000 to 0x12350000,
this forces both the low and high parts of the address to change. Even though the
amount of motion was far less than 64K, it still had an impact on the high part due
to the carry.)
But wait, there’s more.
The Alpha AXP actually combines two signed 16-bit integers to form a 32-bit
integer. For example, to load the value 0x1234ABCD, you would first use the LDAH instruction
to load the value 0x1235 into the high word of the destination register. Then you
would use the LDA instruction to add the signed value -0x5433. (Since 0x5433 = 0x10000
– 0xABCD.) The result is then the desired value of 0x1234ABCD.
LDAH t1, 0x1235(zero) // t1 = 0x12350000 LDA t1, -0x5433(t1) // t1 = t1 - 0x5433 = 0x1234ABCD
So if a relocation caused an address to move between the 「lower half」 of a 64K block
and the 「upper half」, additional fixing-up would have to be done to ensure that the
arithmetic for the top half of the address was adjusted properly. Since compilers
like to reorder instructions, that LDAH instruction could be far, far away, so the
relocation record for the bottom half would have to have some way of finding the matching
top half.
What’s more, the compiler is clever and if it needs to compute addresses for two variables
that are in the same 64K region, it shares the LDAH instruction between them. If it
were possible to relocate by a value that wasn’t a multiple of 64K, then the compiler
would no longer be able to do this optimization since it’s possible that after the
relocation, the two variables no longer belonged to the same 64K block.
Forcing memory allocations at 64K granularity solves all these problems.
If you have been paying really close attention, you’d have seen that this also explains
why there is a 64K 「no man’s land」 near the 2GB boundary. Consider the method for
computing the value 0x7FFFABCD: Since the lower 16 bits are in the upper half of the
64K range, the value needs to be computed by subtraction rather than addition. The
naïve solution would be to use
LDAH t1, 0x8000(zero) // t1 = 0x80000000, right? LDA t1, -0x5433(t1) // t1 = t1 - 0x5433 = 0x7FFFABCD, right?
Except that this doesn’t work. The Alpha AXP is a 64-bit processor, and 0x8000 does
not fit in a 16-bit signed integer, so you have to use -0x8000, a negative number.
What actually happens is
LDAH t1, -0x8000(zero) // t1 = 0xFFFFFFFF`80000000 LDA t1, -0x5433(t1) // t1 = t1 - 0x5433 = 0xFFFFFFFF`7FFFABCD
You need to add a third instruction to clear the high 32 bits. The clever trick for
this is to add zero and tell the processor to treat the result as a 32-bit integer
and sign-extend it to 64 bits.
ADDL t1, zero, t1 // t1 = t1 + 0, with L suffix // L suffix means sign extend result from 32 bits to 64 // t1 = 0x00000000`7FFFABCD
If addresses within 64K of the 2GB boundary were permitted, then every memory address
computation would have to insert that third ADDL instruction just in case the address
got relocated to the 「danger zone」 near the 2GB boundary.
This was an awfully high price to pay to get access to that last 64K of address space(a 50% performance penalty for all address computations to protect against a casethat in practice would never happen), so roping off that area as permanently invalidwas a more prudent choice.