爲CPU提供足夠的,穩定的指令流和數據流是計算機體系結構設計中兩個永恆的話題。爲了給CPU提供指令流,須要設計分支預測機構,爲了給CPU提供數據流,就須要設計cache了。其實,不管是insn仍是data,都須要訪問存儲器,因此從這個角度來講,cache須要承擔更重要的角色。html
本小節咱們就分析一下or1200的cache部分的實現。算法
仍是那句話,研究一個東西,首先要了解其前因後果,cache也不例外。編程
cache的出現是爲了解決memory wall問題。因爲cpu的頻率愈來愈高,處理能力愈來愈大,但存儲系統雖有必定發展,但仍是和CPU的距離愈來愈大。這樣就會出現「茶壺裏倒餃子」的狀況,就是所謂的存儲牆問題。cache,正是爲了解決這個問題而出現的。緩存
關於cache,咱們須要先了解cache的映射方式,寫策略,替換策略,cache的優化技術,等等相關內容。這些內容,咱們以前都已介紹過了,這裏再也不贅述,若有疑問,請參考:http://blog.csdn.net/rill_zhen/article/details/9491095app
在分析or1200的cache的具體實現以前,咱們有必要先了解cache的通常工做機制。爲了清晰的展現這個過程,我假設了一個例子,這個例子是MMU模塊分析時,那個例子的延伸。less
在分析or1200的MMU時,咱們假設了一個例子,那個示例中,MMU將變量test的虛擬地址(0x2008),轉換成了物理地址(0x1006008)。ide
cpu訪問內存,虛實地址轉換,是其中的第一步,在完成虛實轉換以後,並非直接用這個地址訪問外部的SDRAM,而是MMU先將物理地址發送到cache,若是cache hit則直接ack cpu,若是cache miss則才須要訪問下一級cache或外部SDRAM。oop
上面咱們介紹了cache的大體工做流程,可是,cache的具體工做細節是怎樣的呢?fetch
獲得test的物理地址以後是如何運做的呢,下面,咱們就以直接映射的,大小爲8K,line數目爲512,line寬度爲16-Bytes的一個cache,來講明,以下圖所示:優化
經過這幅圖,咱們能夠很清楚的看到其工做細節。
說明:
a,這個cache的映射方式是direct mapped。
b,cache的總容量是8K,也正好就是一個內存頁。
c,整個cache有512個cache line,或者叫cache entry。
d,每一個cache line緩存16個字節的數據。
e,因爲是直接映射,因此不存在什麼替換算法,哪一個line出現cache miss就替換哪一個。
f,寫策略,write through和write back兩種。
g,因爲cache通常是對軟件編程模型透明的,因此不多須要和軟件交互,只須要最基本的控制,好比,須要把那個通道lock啊,cache flush啊,若是採用LRU替換算法,及時更新LRU值啊,等等。這一點和MMU大不相同,MMU須要軟件的大量的干預和控制。
h,簡單介紹一下工做機制:
首先,cache將虛擬地址的index域進行取模運算(%),具體和那個值取模,就看cache line的數量和緩存的數據大小。本例子中cacheline數量是512,緩存數量是16B,因此,須要將index分紅cache line index(定位到哪一行),和行內偏移(定位到這一行的哪個字節)。
cache根據cache line index定位到cache的具體一行,判斷這一行的valid標誌,若是有效,在將這一行的tag和MMU產生的PPN進行比較(由於一個cache line可能會對應多個內存地址)。若是tag和PPN匹配,那麼說明cache hit,若是兩個判斷條件有一個不知足,說明cache miss,這時,cache會burst access(突發訪問,本例子是疊4,每次4B,正好16B),更新這一個cache line。
i,cache的操做
刷新:cache將valid置0便可。
鎖定:加入有某個程序運行時間很長,爲了防止其餘程序在出現cache miss時將這個程序的cache line刷新,能夠將這個程序使用的cache line 鎖定。具體鎖定方式能夠是通道鎖定,也能夠是某一行鎖定(將整個cache分紅若干組,每一個組有若干行,一個組就叫一個通道(way))。
上面咱們介紹了直接映射cache的工做機制,其餘兩種映射方式的cache也大致相同,不一樣的地方是cache line搜索方法,替換策略,寫策略不一樣。
全相連映射cache的工做機制,以下圖所示:
介於直接映射和全相連映射之間,再也不贅述。
瞭解了cache的工做機制以後,再分析or1200的cache的具體實現就相對容易一些,因爲cache只是內存的一個子集,沒有獨立的編程空間,因此與軟件的交互比較少,分析起來就更簡單一些。
or1200的cache採用直接映射方式,大小是8K,共512個entry,每一個line緩存16個字節,每一個line由1-bit標誌位,19-bit tag和16*8-bit數據組成。
上面咱們已經詳細說明了這種cache的工做機制,or1200的cache也不例外。
or1200的cache,由qmem模塊組成一級cache,dcache/icache組成二級cache,sb模塊組成數據的三級cache。
下面是整個ordb2a開飯板的存儲系統的框圖,從中,咱們能夠清晰的看出整個系統的存儲子系統的數據通路。
qmem模塊是一級cache,在or1200_define.v中,對qmem有以下描述,從中咱們能夠知道qmem的做用,意義,容量等信息。
///////////////////////////////////////////////// // // Quick Embedded Memory (QMEM) // // // Quick Embedded Memory // // Instantiation of dedicated insn/data memory (RAM or ROM). // Insn fetch has effective throughput 1insn / clock cycle. // Data load takes two clock cycles / access, data store // takes 1 clock cycle / access (if there is no insn fetch)). // Memory instantiation is shared between insn and data, // meaning if insn fetch are performed, data load/store // performance will be lower. // // Main reason for QMEM is to put some time critical functions // into this memory and to have predictable and fast access // to these functions. (soft fpu, context switch, exception // handlers, stack, etc) // // It makes design a bit bigger and slower. QMEM sits behind // IMMU/DMMU so all addresses are physical (so the MMUs can be // used with QMEM and QMEM is seen by the CPU just like any other // memory in the system). IC/DC are sitting behind QMEM so the // whole design timing might be worse with QMEM implemented. // //`define OR1200_QMEM_IMPLEMENTED // // Base address and mask of QMEM // // Base address defines first address of QMEM. Mask defines // QMEM range in address space. Actual size of QMEM is however // determined with instantiated RAM/ROM. However bigger // mask will reserve more address space for QMEM, but also // make design faster, while more tight mask will take // less address space but also make design slower. If // instantiated RAM/ROM is smaller than space reserved with // the mask, instatiated RAM/ROM will also be shadowed // at higher addresses in reserved space. // `define OR1200_QMEM_IADDR 32'h0080_0000 `define OR1200_QMEM_IMASK 32'hfff0_0000 // Max QMEM size 1MB `define OR1200_QMEM_DADDR 32'h0080_0000 `define OR1200_QMEM_DMASK 32'hfff0_0000 // Max QMEM size 1MB // // QMEM interface byte-select capability // // To enable qmem_sel* ports, define this macro. // //`define OR1200_QMEM_BSEL // // QMEM interface acknowledge // // To enable qmem_ack port, define this macro. // //`define OR1200_QMEM_ACK
qmem模塊只有一個RTL文件,就是or1200_qmem_top.v,代碼分析,不是代碼的複製,粘貼以後加點註釋那麼簡單。爲了突出重點,在瞭解了qmem的大致功能以後,咱們須要瞭解其核心代碼,下面,咱們分析一下qmem模塊的核心,也就是其FSM,以下所示:
`define OR1200_QMEMFSM_IDLE 3'd0 `define OR1200_QMEMFSM_STORE 3'd1 `define OR1200_QMEMFSM_LOAD 3'd2 `define OR1200_QMEMFSM_FETCH 3'd3 // // QMEM control FSM // always @(`OR1200_RST_EVENT rst or posedge clk) if (rst == `OR1200_RST_VALUE) begin state <= `OR1200_QMEMFSM_IDLE; qmem_dack <= 1'b0; qmem_iack <= 1'b0; end else case (state) // synopsys parallel_case `OR1200_QMEMFSM_IDLE: begin if (qmemdmmu_cycstb_i & daddr_qmem_hit & qmemdcpu_we_i & qmem_ack) begin state <= `OR1200_QMEMFSM_STORE; qmem_dack <= 1'b1; qmem_iack <= 1'b0; end else if (qmemdmmu_cycstb_i & daddr_qmem_hit & qmem_ack) begin state <= `OR1200_QMEMFSM_LOAD; qmem_dack <= 1'b1; qmem_iack <= 1'b0; end else if (qmemimmu_cycstb_i & iaddr_qmem_hit & qmem_ack) begin state <= `OR1200_QMEMFSM_FETCH; qmem_iack <= 1'b1; qmem_dack <= 1'b0; end end `OR1200_QMEMFSM_STORE: begin if (qmemdmmu_cycstb_i & daddr_qmem_hit & qmemdcpu_we_i & qmem_ack) begin state <= `OR1200_QMEMFSM_STORE; qmem_dack <= 1'b1; qmem_iack <= 1'b0; end else if (qmemdmmu_cycstb_i & daddr_qmem_hit & qmem_ack) begin state <= `OR1200_QMEMFSM_LOAD; qmem_dack <= 1'b1; qmem_iack <= 1'b0; end else if (qmemimmu_cycstb_i & iaddr_qmem_hit & qmem_ack) begin state <= `OR1200_QMEMFSM_FETCH; qmem_iack <= 1'b1; qmem_dack <= 1'b0; end else begin state <= `OR1200_QMEMFSM_IDLE; qmem_dack <= 1'b0; qmem_iack <= 1'b0; end end `OR1200_QMEMFSM_LOAD: begin if (qmemdmmu_cycstb_i & daddr_qmem_hit & qmemdcpu_we_i & qmem_ack) begin state <= `OR1200_QMEMFSM_STORE; qmem_dack <= 1'b1; qmem_iack <= 1'b0; end else if (qmemdmmu_cycstb_i & daddr_qmem_hit & qmem_ack) begin state <= `OR1200_QMEMFSM_LOAD; qmem_dack <= 1'b1; qmem_iack <= 1'b0; end else if (qmemimmu_cycstb_i & iaddr_qmem_hit & qmem_ack) begin state <= `OR1200_QMEMFSM_FETCH; qmem_iack <= 1'b1; qmem_dack <= 1'b0; end else begin state <= `OR1200_QMEMFSM_IDLE; qmem_dack <= 1'b0; qmem_iack <= 1'b0; end end `OR1200_QMEMFSM_FETCH: begin if (qmemdmmu_cycstb_i & daddr_qmem_hit & qmemdcpu_we_i & qmem_ack) begin state <= `OR1200_QMEMFSM_STORE; qmem_dack <= 1'b1; qmem_iack <= 1'b0; end else if (qmemdmmu_cycstb_i & daddr_qmem_hit & qmem_ack) begin state <= `OR1200_QMEMFSM_LOAD; qmem_dack <= 1'b1; qmem_iack <= 1'b0; end else if (qmemimmu_cycstb_i & iaddr_qmem_hit & qmem_ack) begin state <= `OR1200_QMEMFSM_FETCH; qmem_iack <= 1'b1; qmem_dack <= 1'b0; end else begin state <= `OR1200_QMEMFSM_IDLE; qmem_dack <= 1'b0; qmem_iack <= 1'b0; end end default: begin state <= `OR1200_QMEMFSM_IDLE; qmem_dack <= 1'b0; qmem_iack <= 1'b0; end endcase
分析:
能夠看出qmem共有4個狀態,爲了便於查看,我畫出了qmem的狀態圖,以下所示,有狀態和狀態轉移條件,一目瞭然,再也不贅述。
data cache和instruction cache機制類似,這裏只分析data cache。
data cache是外部內存的一個子集,其做用也是通常意義上的cache的做用。
這裏只說明一下幾點:
a,cache的預取,在cache空閒的時候,能夠事先將內存中的部分數據填充到cache裏,下降cache miss機率。
b,cache的無效控制,若是有些cache line有特殊要求,軟件能夠設置這些line爲無效。
c,cache的鎖定,本小節開始部分已經介紹了。
dcache由四個文件組成,分別是:or1200_dc_top.v,or1200_dc_fsm.v,or1200_dc_tag.v,or1200_dc_ram.v。這裏只介紹其核心部分,也就是or1200_dc_fsm.v中的FSM,代碼以下所示:
`define OR1200_DCFSM_IDLE 3'd0 `define OR1200_DCFSM_CLOADSTORE 3'd1 `define OR1200_DCFSM_LOOP2 3'd2 `define OR1200_DCFSM_LOOP3 3'd3 `define OR1200_DCFSM_LOOP4 3'd4 `define OR1200_DCFSM_FLUSH5 3'd5 `define OR1200_DCFSM_INV6 3'd6 //invalidate `define OR1200_DCFSM_WAITSPRCS7 3'd7 // // Main DC FSM // always @(posedge clk or `OR1200_RST_EVENT rst) begin if (rst == `OR1200_RST_VALUE) begin state <= `OR1200_DCFSM_IDLE; addr_r <= 32'd0; hitmiss_eval <= 1'b0; store <= 1'b0; load <= 1'b0; cnt <= `OR1200_DCLS'd0; cache_miss <= 1'b0; cache_dirty_needs_writeback <= 1'b0; cache_inhibit <= 1'b0; did_early_load_ack <= 1'b0; cache_spr_block_flush <= 1'b0; cache_spr_block_writeback <= 1'b0; end else case (state) // synopsys parallel_case `OR1200_DCFSM_IDLE : begin if (dc_en & (dc_block_flush | dc_block_writeback)) begin cache_spr_block_flush <= dc_block_flush; cache_spr_block_writeback <= dc_block_writeback; hitmiss_eval <= 1'b1; state <= `OR1200_DCFSM_FLUSH5; addr_r <= spr_dat_i; end else if (dc_en & dcqmem_cycstb_i) begin state <= `OR1200_DCFSM_CLOADSTORE; hitmiss_eval <= 1'b1; store <= dcqmem_we_i; load <= !dcqmem_we_i; end end // case: `OR1200_DCFSM_IDLE `OR1200_DCFSM_CLOADSTORE: begin hitmiss_eval <= 1'b0; if (hitmiss_eval) begin cache_inhibit <= dcqmem_ci_i; // Check for cache inhibit here cache_miss <= tagcomp_miss; cache_dirty_needs_writeback <= dirty; addr_r <= lsu_addr; end // Evaluate any cache line load/stores in first cycle: if (hitmiss_eval & tagcomp_miss & !(store & writethrough) & !dcqmem_ci_i) begin // Miss - first either: // 1) write back dirty line if (dirty) begin // Address for writeback addr_r <= {tag, lsu_addr[`OR1200_DCINDXH:2],2'd0}; load <= 1'b0; store <= 1'b1; `ifdef OR1200_VERBOSE $display("%t: dcache miss and dirty", $time); `endif end // 2) load requested line else begin addr_r <= lsu_addr; load <= 1'b1; store <= 1'b0; end // else: !if(dirty) state <= `OR1200_DCFSM_LOOP2; // Set the counter for the burst accesses cnt <= ((1 << `OR1200_DCLS) - 4); end else if (// Strobe goes low !dcqmem_cycstb_i | // Cycle finishes (!hitmiss_eval & (biudata_valid | biudata_error)) | // Cache hit in first cycle.... (hitmiss_eval & !tagcomp_miss & !dcqmem_ci_i & // .. and you're not doing a writethrough store.. !(store & writethrough))) begin state <= `OR1200_DCFSM_IDLE; load <= 1'b0; store <= 1'b0; cache_inhibit <= 1'b0; cache_dirty_needs_writeback <= 1'b0; end end // case: `OR1200_DCFSM_CLOADSTORE `OR1200_DCFSM_LOOP2 : begin // loop/abort if (!dc_en| biudata_error) begin state <= `OR1200_DCFSM_IDLE; load <= 1'b0; store <= 1'b0; cnt <= `OR1200_DCLS'd0; end if (biudata_valid & (|cnt)) begin cnt <= cnt - 4; addr_r[`OR1200_DCLS-1:2] <= addr_r[`OR1200_DCLS-1:2] + 1; end else if (biudata_valid & !(|cnt)) begin state <= `OR1200_DCFSM_LOOP3; addr_r <= lsu_addr; load <= 1'b0; store <= 1'b0; end // Track if we did an early ack during a load if (load_miss_ack) did_early_load_ack <= 1'b1; end // case: `OR1200_DCFSM_LOOP2 `OR1200_DCFSM_LOOP3: begin // figure out next step if (cache_dirty_needs_writeback) begin // Just did store of the dirty line so now load new one load <= 1'b1; // Set the counter for the burst accesses cnt <= ((1 << `OR1200_DCLS) - 4); // Address of line to be loaded addr_r <= lsu_addr; cache_dirty_needs_writeback <= 1'b0; state <= `OR1200_DCFSM_LOOP2; end // if (cache_dirty_needs_writeback) else if (cache_spr_block_flush | cache_spr_block_writeback) begin // Just wrote back the line to memory, we're finished. cache_spr_block_flush <= 1'b0; cache_spr_block_writeback <= 1'b0; state <= `OR1200_DCFSM_WAITSPRCS7; end else begin // Just loaded a new line, finish up did_early_load_ack <= 1'b0; state <= `OR1200_DCFSM_LOOP4; end end // case: `OR1200_DCFSM_LOOP3 `OR1200_DCFSM_LOOP4: begin state <= `OR1200_DCFSM_IDLE; end `OR1200_DCFSM_FLUSH5: begin hitmiss_eval <= 1'b0; if (hitmiss_eval & !tag_v) begin // Not even cached, just ignore cache_spr_block_flush <= 1'b0; cache_spr_block_writeback <= 1'b0; state <= `OR1200_DCFSM_WAITSPRCS7; end else if (hitmiss_eval & tag_v) begin // Tag is valid - what do we do? if ((cache_spr_block_flush | cache_spr_block_writeback) & dirty) begin // Need to writeback // Address for writeback (spr_dat_i has already changed so // use line number from addr_r) addr_r <= {tag, addr_r[`OR1200_DCINDXH:2],2'd0}; load <= 1'b0; store <= 1'b1; `ifdef OR1200_VERBOSE $display("%t: block flush: dirty block", $time); `endif state <= `OR1200_DCFSM_LOOP2; // Set the counter for the burst accesses cnt <= ((1 << `OR1200_DCLS) - 4); end else if (cache_spr_block_flush & !dirty) begin // Line not dirty, just need to invalidate state <= `OR1200_DCFSM_INV6; end // else: !if(dirty) else if (cache_spr_block_writeback & !dirty) begin // Nothing to do - line is valid but not dirty cache_spr_block_writeback <= 1'b0; state <= `OR1200_DCFSM_WAITSPRCS7; end end // if (hitmiss_eval & tag_v) end `OR1200_DCFSM_INV6: begin cache_spr_block_flush <= 1'b0; // Wait until SPR CS goes low before going back to idle if (!spr_cswe) state <= `OR1200_DCFSM_IDLE; end `OR1200_DCFSM_WAITSPRCS7: begin // Wait until SPR CS goes low before going back to idle if (!spr_cswe) state <= `OR1200_DCFSM_IDLE; end endcase // case (state) end // always @ (posedge clk or `OR1200_RST_EVENT rst)
爲了便於理解,我畫出了其狀態圖,以下所示:
store buffer,其本質是一個FIFO,至關於一個write back的cache,其功能和相關分析,以前已經作過,請參考:http://blog.csdn.net/rill_zhen/article/details/9491095 中的第2.1章節。
關於這個FIFO的depth和width,or1200-define.v中有以下定義:
// // Number of store buffer entries // // Verified number of entries are 4 and 8 entries // (2 and 3 for OR1200_SB_LOG). OR1200_SB_ENTRIES must // always match 2**OR1200_SB_LOG. // To disable store buffer, undefine // OR1200_SB_IMPLEMENTED. // `define OR1200_SB_LOG 2 // 2 or 3 `define OR1200_SB_ENTRIES 4 // 4 or 8
sb模塊包含兩個文件,or1200_sb.v和or1200_sb_fifo.v,第二個從文件名就能夠看出是一個FIFO,其物理結構是一個雙口的RAM,這裏只分析第一個,主要代碼以下所示:
代碼不多,只有150多行。
module or1200_sb( // RISC clock, reset clk, rst, // Internal RISC bus (SB) sb_en, // Internal RISC bus (DC<->SB) dcsb_dat_i, dcsb_adr_i, dcsb_cyc_i, dcsb_stb_i, dcsb_we_i, dcsb_sel_i, dcsb_cab_i, dcsb_dat_o, dcsb_ack_o, dcsb_err_o, // BIU bus sbbiu_dat_o, sbbiu_adr_o, sbbiu_cyc_o, sbbiu_stb_o, sbbiu_we_o, sbbiu_sel_o, sbbiu_cab_o, sbbiu_dat_i, sbbiu_ack_i, sbbiu_err_i ); parameter dw = `OR1200_OPERAND_WIDTH; parameter aw = `OR1200_OPERAND_WIDTH; // // RISC clock, reset // input clk; // RISC clock input rst; // RISC reset // // Internal RISC bus (SB) // input sb_en; // SB enable // // Internal RISC bus (DC<->SB) // input [dw-1:0] dcsb_dat_i; // input data bus input [aw-1:0] dcsb_adr_i; // address bus input dcsb_cyc_i; // WB cycle input dcsb_stb_i; // WB strobe input dcsb_we_i; // WB write enable input dcsb_cab_i; // CAB input input [3:0] dcsb_sel_i; // byte selects output [dw-1:0] dcsb_dat_o; // output data bus output dcsb_ack_o; // ack output output dcsb_err_o; // err output // // BIU bus // output [dw-1:0] sbbiu_dat_o; // output data bus output [aw-1:0] sbbiu_adr_o; // address bus output sbbiu_cyc_o; // WB cycle output sbbiu_stb_o; // WB strobe output sbbiu_we_o; // WB write enable output sbbiu_cab_o; // CAB input output [3:0] sbbiu_sel_o; // byte selects input [dw-1:0] sbbiu_dat_i; // input data bus input sbbiu_ack_i; // ack output input sbbiu_err_i; // err output `ifdef OR1200_SB_IMPLEMENTED // // Internal wires and regs // wire [4+dw+aw-1:0] fifo_dat_i; // FIFO data in wire [4+dw+aw-1:0] fifo_dat_o; // FIFO data out wire fifo_wr; wire fifo_rd; wire fifo_full; wire fifo_empty; wire sel_sb; reg sb_en_reg; reg outstanding_store; reg fifo_wr_ack; // // FIFO data in/out // assign fifo_dat_i = {dcsb_sel_i, dcsb_dat_i, dcsb_adr_i}; assign {sbbiu_sel_o, sbbiu_dat_o, sbbiu_adr_o} = sel_sb ? fifo_dat_o : {dcsb_sel_i, dcsb_dat_i, dcsb_adr_i}; // // Control // assign fifo_wr = dcsb_cyc_i & dcsb_stb_i & dcsb_we_i & ~fifo_full & ~fifo_wr_ack; assign fifo_rd = ~outstanding_store; assign dcsb_dat_o = sbbiu_dat_i; assign dcsb_ack_o = sel_sb ? fifo_wr_ack : sbbiu_ack_i; assign dcsb_err_o = sel_sb ? 1'b0 : sbbiu_err_i; // SB never returns error assign sbbiu_cyc_o = sel_sb ? outstanding_store : dcsb_cyc_i; assign sbbiu_stb_o = sel_sb ? outstanding_store : dcsb_stb_i; assign sbbiu_we_o = sel_sb ? 1'b1 : dcsb_we_i; assign sbbiu_cab_o = sel_sb ? 1'b0 : dcsb_cab_i; assign sel_sb = sb_en_reg & (~fifo_empty | (fifo_empty & outstanding_store)); // // SB enable // always @(posedge clk or `OR1200_RST_EVENT rst) if (rst == `OR1200_RST_VALUE) sb_en_reg <= 1'b0; else if (sb_en & ~dcsb_cyc_i) sb_en_reg <= 1'b1; // enable SB when there is no dcsb transfer in progress else if (~sb_en & (~fifo_empty | (fifo_empty & outstanding_store))) sb_en_reg <= 1'b0; // disable SB when there is no pending transfers from SB // // Store buffer FIFO instantiation // or1200_sb_fifo or1200_sb_fifo ( .clk_i(clk), .rst_i(rst), .dat_i(fifo_dat_i), .wr_i(fifo_wr), .rd_i(fifo_rd), .dat_o(fifo_dat_o), .full_o(fifo_full), .empty_o(fifo_empty) ); // // fifo_rd // always @(posedge clk or `OR1200_RST_EVENT rst) if (rst == `OR1200_RST_VALUE) outstanding_store <= 1'b0; else if (sbbiu_ack_i) outstanding_store <= 1'b0; else if (sel_sb | fifo_wr) outstanding_store <= 1'b1; // // fifo_wr_ack // always @(posedge clk or `OR1200_RST_EVENT rst) if (rst == `OR1200_RST_VALUE) fifo_wr_ack <= 1'b0; else if (fifo_wr) fifo_wr_ack <= 1'b1; else fifo_wr_ack <= 1'b0; `else // !OR1200_SB_IMPLEMENTED assign sbbiu_dat_o = dcsb_dat_i; assign sbbiu_adr_o = dcsb_adr_i; assign sbbiu_cyc_o = dcsb_cyc_i; assign sbbiu_stb_o = dcsb_stb_i; assign sbbiu_we_o = dcsb_we_i; assign sbbiu_cab_o = dcsb_cab_i; assign sbbiu_sel_o = dcsb_sel_i; assign dcsb_dat_o = sbbiu_dat_i; assign dcsb_ack_o = sbbiu_ack_i; assign dcsb_err_o = sbbiu_err_i; `endif endmodule
biu(bus ingerface unit)模塊,是or1200_top和外界進行數據交換的窗口,對於or1200,例化了兩個,分別是dbiu和ibiu。biu模塊除了和外界交換數據外,還有判斷字節對齊等功能。
這個模塊主要是一個wishbone協議的slave和master的一個wrapper,若是你對wishbone總線protocol比較熟悉的話,這個模塊看起來就簡單多了,我以前也寫過wishbone的相關的內容,請參考:http://blog.csdn.net/rill_zhen/article/details/8659788
biu模塊包含一個文件,or1200_wb_biu.v,主要是wishbone協議的時序產生邏輯,這裏不作細說,爲了保持本文的完整性,其主要代碼,以下所示:
module or1200_wb_biu( // RISC clock, reset and clock control clk, rst, clmode, // WISHBONE interface wb_clk_i, wb_rst_i, wb_ack_i, wb_err_i, wb_rty_i, wb_dat_i, wb_cyc_o, wb_adr_o, wb_stb_o, wb_we_o, wb_sel_o, wb_dat_o, `ifdef OR1200_WB_CAB wb_cab_o, `endif `ifdef OR1200_WB_B3 wb_cti_o, wb_bte_o, `endif // Internal RISC bus biu_dat_i, biu_adr_i, biu_cyc_i, biu_stb_i, biu_we_i, biu_sel_i, biu_cab_i, biu_dat_o, biu_ack_o, biu_err_o ); parameter dw = `OR1200_OPERAND_WIDTH; parameter aw = `OR1200_OPERAND_WIDTH; parameter bl = 4; /* Can currently be either 4 or 8 - the two optional line sizes for the OR1200. */ // // RISC clock, reset and clock control // input clk; // RISC clock input rst; // RISC reset input [1:0] clmode; // 00 WB=RISC, 01 WB=RISC/2, 10 N/A, 11 WB=RISC/4 // // WISHBONE interface // input wb_clk_i; // clock input input wb_rst_i; // reset input input wb_ack_i; // normal termination input wb_err_i; // termination w/ error input wb_rty_i; // termination w/ retry input [dw-1:0] wb_dat_i; // input data bus output wb_cyc_o; // cycle valid output output [aw-1:0] wb_adr_o; // address bus outputs output wb_stb_o; // strobe output output wb_we_o; // indicates write transfer output [3:0] wb_sel_o; // byte select outputs output [dw-1:0] wb_dat_o; // output data bus `ifdef OR1200_WB_CAB output wb_cab_o; // consecutive address burst `endif `ifdef OR1200_WB_B3 output [2:0] wb_cti_o; // cycle type identifier output [1:0] wb_bte_o; // burst type extension `endif // // Internal RISC interface // input [dw-1:0] biu_dat_i; // input data bus input [aw-1:0] biu_adr_i; // address bus input biu_cyc_i; // WB cycle input biu_stb_i; // WB strobe input biu_we_i; // WB write enable input biu_cab_i; // CAB input input [3:0] biu_sel_i; // byte selects output [31:0] biu_dat_o; // output data bus output biu_ack_o; // ack output output biu_err_o; // err output // // Registers // wire wb_ack; // normal termination reg [aw-1:0] wb_adr_o; // address bus outputs reg wb_cyc_o; // cycle output reg wb_stb_o; // strobe output reg wb_we_o; // indicates write transfer reg [3:0] wb_sel_o; // byte select outputs `ifdef OR1200_WB_CAB reg wb_cab_o; // CAB output `endif `ifdef OR1200_WB_B3 reg [2:0] wb_cti_o; // cycle type identifier reg [1:0] wb_bte_o; // burst type extension `endif `ifdef OR1200_NO_DC reg [dw-1:0] wb_dat_o; // output data bus `else assign wb_dat_o = biu_dat_i; // No register on this - straight from DCRAM `endif `ifdef OR1200_WB_RETRY reg [`OR1200_WB_RETRY-1:0] retry_cnt; // Retry counter `else wire retry_cnt; assign retry_cnt = 1'b0; `endif `ifdef OR1200_WB_B3 reg [3:0] burst_len; // burst counter `endif reg biu_stb_reg; // WB strobe wire biu_stb; // WB strobe reg wb_cyc_nxt; // next WB cycle value reg wb_stb_nxt; // next WB strobe value reg [2:0] wb_cti_nxt; // next cycle type identifier value reg wb_ack_cnt; // WB ack toggle counter reg wb_err_cnt; // WB err toggle counter reg wb_rty_cnt; // WB rty toggle counter reg biu_ack_cnt; // BIU ack toggle counter reg biu_err_cnt; // BIU err toggle counter reg biu_rty_cnt; // BIU rty toggle counter wire biu_rty; // BIU rty indicator reg [1:0] wb_fsm_state_cur; // WB FSM - surrent state reg [1:0] wb_fsm_state_nxt; // WB FSM - next state wire [1:0] wb_fsm_idle = 2'h0; // WB FSM state - IDLE wire [1:0] wb_fsm_trans = 2'h1; // WB FSM state - normal TRANSFER wire [1:0] wb_fsm_last = 2'h2; // EB FSM state - LAST transfer // // WISHBONE I/F <-> Internal RISC I/F conversion // //assign wb_ack = wb_ack_i; assign wb_ack = wb_ack_i & !wb_err_i & !wb_rty_i; // // WB FSM - register part // always @(posedge wb_clk_i or `OR1200_RST_EVENT wb_rst_i) begin if (wb_rst_i == `OR1200_RST_VALUE) wb_fsm_state_cur <= wb_fsm_idle; else wb_fsm_state_cur <= wb_fsm_state_nxt; end // // WB burst tength counter // always @(posedge wb_clk_i or `OR1200_RST_EVENT wb_rst_i) begin if (wb_rst_i == `OR1200_RST_VALUE) begin burst_len <= 0; end else begin // burst counter if (wb_fsm_state_cur == wb_fsm_idle) burst_len <= bl[3:0] - 2; else if (wb_stb_o & wb_ack) burst_len <= burst_len - 1; end end // // WB FSM - combinatorial part // always @(wb_fsm_state_cur or burst_len or wb_err_i or wb_rty_i or wb_ack or wb_cti_o or wb_sel_o or wb_stb_o or wb_we_o or biu_cyc_i or biu_stb or biu_cab_i or biu_sel_i or biu_we_i) begin // States of WISHBONE Finite State Machine case(wb_fsm_state_cur) // IDLE wb_fsm_idle : begin wb_cyc_nxt = biu_cyc_i & biu_stb; wb_stb_nxt = biu_cyc_i & biu_stb; wb_cti_nxt = {!biu_cab_i, 1'b1, !biu_cab_i}; if (biu_cyc_i & biu_stb) wb_fsm_state_nxt = wb_fsm_trans; else wb_fsm_state_nxt = wb_fsm_idle; end // normal TRANSFER wb_fsm_trans : begin wb_cyc_nxt = !wb_stb_o | !wb_err_i & !wb_rty_i & !(wb_ack & wb_cti_o == 3'b111); wb_stb_nxt = !wb_stb_o | !wb_err_i & !wb_rty_i & !wb_ack | !wb_err_i & !wb_rty_i & wb_cti_o == 3'b010 ; wb_cti_nxt[2] = wb_stb_o & wb_ack & burst_len == 'h0 | wb_cti_o[2]; wb_cti_nxt[1] = 1'b1 ; wb_cti_nxt[0] = wb_stb_o & wb_ack & burst_len == 'h0 | wb_cti_o[0]; if ((!biu_cyc_i | !biu_stb | !biu_cab_i | biu_sel_i != wb_sel_o | biu_we_i != wb_we_o) & wb_cti_o == 3'b010) wb_fsm_state_nxt = wb_fsm_last; else if ((wb_err_i | wb_rty_i | wb_ack & wb_cti_o==3'b111) & wb_stb_o) wb_fsm_state_nxt = wb_fsm_idle; else wb_fsm_state_nxt = wb_fsm_trans; end // LAST transfer wb_fsm_last : begin wb_cyc_nxt = !wb_stb_o | !wb_err_i & !wb_rty_i & !(wb_ack & wb_cti_o == 3'b111); wb_stb_nxt = !wb_stb_o | !wb_err_i & !wb_rty_i & !(wb_ack & wb_cti_o == 3'b111); wb_cti_nxt[2] = wb_ack & wb_stb_o | wb_cti_o[2]; wb_cti_nxt[1] = 1'b1 ; wb_cti_nxt[0] = wb_ack & wb_stb_o | wb_cti_o[0]; if ((wb_err_i | wb_rty_i | wb_ack & wb_cti_o == 3'b111) & wb_stb_o) wb_fsm_state_nxt = wb_fsm_idle; else wb_fsm_state_nxt = wb_fsm_last; end // default state default:begin wb_cyc_nxt = 1'bx; wb_stb_nxt = 1'bx; wb_cti_nxt = 3'bxxx; wb_fsm_state_nxt = 2'bxx; end endcase end // // WB FSM - output signals // always @(posedge wb_clk_i or `OR1200_RST_EVENT wb_rst_i) begin if (wb_rst_i == `OR1200_RST_VALUE) begin wb_cyc_o <= 1'b0; wb_stb_o <= 1'b0; wb_cti_o <= 3'b111; wb_bte_o <= (bl==8) ? 2'b10 : (bl==4) ? 2'b01 : 2'b00; `ifdef OR1200_WB_CAB wb_cab_o <= 1'b0; `endif wb_we_o <= 1'b0; wb_sel_o <= 4'hf; wb_adr_o <= {aw{1'b0}}; `ifdef OR1200_NO_DC wb_dat_o <= {dw{1'b0}}; `endif end else begin wb_cyc_o <= wb_cyc_nxt; if (wb_ack & wb_cti_o == 3'b111) wb_stb_o <= 1'b0; else wb_stb_o <= wb_stb_nxt; `ifndef OR1200_NO_BURSTS wb_cti_o <= wb_cti_nxt; `endif wb_bte_o <= (bl==8) ? 2'b10 : (bl==4) ? 2'b01 : 2'b00; `ifdef OR1200_WB_CAB wb_cab_o <= biu_cab_i; `endif // we and sel - set at beginning of access if (wb_fsm_state_cur == wb_fsm_idle) begin wb_we_o <= biu_we_i; wb_sel_o <= biu_sel_i; end // adr - set at beginning of access and changed at every termination if (wb_fsm_state_cur == wb_fsm_idle) begin wb_adr_o <= biu_adr_i; end else if (wb_stb_o & wb_ack) begin if (bl==4) begin wb_adr_o[3:2] <= wb_adr_o[3:2] + 1; end if (bl==8) begin wb_adr_o[4:2] <= wb_adr_o[4:2] + 1; end end `ifdef OR1200_NO_DC // dat - write data changed after avery subsequent write access if (!wb_stb_o) begin wb_dat_o <= biu_dat_i; end `endif end end // // WB & BIU termination toggle counters // always @(posedge wb_clk_i or `OR1200_RST_EVENT wb_rst_i) begin if (wb_rst_i == `OR1200_RST_VALUE) begin wb_ack_cnt <= 1'b0; wb_err_cnt <= 1'b0; wb_rty_cnt <= 1'b0; end else begin // WB ack toggle counter if (wb_fsm_state_cur == wb_fsm_idle | !(|clmode)) wb_ack_cnt <= 1'b0; else if (wb_stb_o & wb_ack) wb_ack_cnt <= !wb_ack_cnt; // WB err toggle counter if (wb_fsm_state_cur == wb_fsm_idle | !(|clmode)) wb_err_cnt <= 1'b0; else if (wb_stb_o & wb_err_i) wb_err_cnt <= !wb_err_cnt; // WB rty toggle counter if (wb_fsm_state_cur == wb_fsm_idle | !(|clmode)) wb_rty_cnt <= 1'b0; else if (wb_stb_o & wb_rty_i) wb_rty_cnt <= !wb_rty_cnt; end end always @(posedge clk or `OR1200_RST_EVENT rst) begin if (rst == `OR1200_RST_VALUE) begin biu_stb_reg <= 1'b0; biu_ack_cnt <= 1'b0; biu_err_cnt <= 1'b0; biu_rty_cnt <= 1'b0; `ifdef OR1200_WB_RETRY retry_cnt <= {`OR1200_WB_RETRY{1'b0}}; `endif end else begin // BIU strobe if (biu_stb_i & !biu_cab_i & biu_ack_o) biu_stb_reg <= 1'b0; else biu_stb_reg <= biu_stb_i; // BIU ack toggle counter if (wb_fsm_state_cur == wb_fsm_idle | !(|clmode)) biu_ack_cnt <= 1'b0 ; else if (biu_ack_o) biu_ack_cnt <= !biu_ack_cnt ; // BIU err toggle counter if (wb_fsm_state_cur == wb_fsm_idle | !(|clmode)) biu_err_cnt <= 1'b0 ; else if (wb_err_i & biu_err_o) biu_err_cnt <= !biu_err_cnt ; // BIU rty toggle counter if (wb_fsm_state_cur == wb_fsm_idle | !(|clmode)) biu_rty_cnt <= 1'b0 ; else if (biu_rty) biu_rty_cnt <= !biu_rty_cnt ; `ifdef OR1200_WB_RETRY if (biu_ack_o | biu_err_o) retry_cnt <= {`OR1200_WB_RETRY{1'b0}}; else if (biu_rty) retry_cnt <= retry_cnt + 1'b1; `endif end end assign biu_stb = biu_stb_i & biu_stb_reg; // // Input BIU data bus // assign biu_dat_o = wb_dat_i; // // Input BIU termination signals // assign biu_rty = (wb_fsm_state_cur == wb_fsm_trans) & wb_rty_i & wb_stb_o & (wb_rty_cnt ~^ biu_rty_cnt); assign biu_ack_o = (wb_fsm_state_cur == wb_fsm_trans) & wb_ack & wb_stb_o & (wb_ack_cnt ~^ biu_ack_cnt); assign biu_err_o = (wb_fsm_state_cur == wb_fsm_trans) & wb_err_i & wb_stb_o & (wb_err_cnt ~^ biu_err_cnt) `ifdef OR1200_WB_RETRY | biu_rty & retry_cnt[`OR1200_WB_RETRY-1]; `else ; `endif endmodule
終於能夠告一段落了,下面弄個小問題放鬆一下。
不少人可能曾經遇到過這樣一個軟件方面的筆試題,題目是,下面兩段程序,通常狀況下,哪一個的執行時間短(假設cache大小爲8K)?
自此,咱們完成了對OpenRISC的MMU,cache系統的分析,對計算機體系結構中很重要的部分--存儲器組織有了一個完整,清晰,透徹的瞭解了。
1,ORPSoC RTL code
2,OpenRISC arch-manual