OpenRisc-41-or1200的cache模塊分析

引言

爲CPU提供足夠的,穩定的指令流和數據流是計算機體系結構設計中兩個永恆的話題。爲了給CPU提供指令流,須要設計分支預測機構,爲了給CPU提供數據流,就須要設計cache了。其實,不管是insn仍是data,都須要訪問存儲器,因此從這個角度來講,cache須要承擔更重要的角色。html

本小節咱們就分析一下or1200的cache部分的實現。算法


1,cache產生緣由

仍是那句話,研究一個東西,首先要了解其前因後果,cache也不例外。編程

cache的出現是爲了解決memory wall問題。因爲cpu的頻率愈來愈高,處理能力愈來愈大,但存儲系統雖有必定發展,但仍是和CPU的距離愈來愈大。這樣就會出現「茶壺裏倒餃子」的狀況,就是所謂的存儲牆問題。cache,正是爲了解決這個問題而出現的。緩存


2,cache基礎

關於cache,咱們須要先了解cache的映射方式,寫策略,替換策略,cache的優化技術,等等相關內容。這些內容,咱們以前都已介紹過了,這裏再也不贅述,若有疑問,請參考:http://blog.csdn.net/rill_zhen/article/details/9491095app


3,cache工做機制

1>基本工做過程

在分析or1200的cache的具體實現以前,咱們有必要先了解cache的通常工做機制。爲了清晰的展現這個過程,我假設了一個例子,這個例子是MMU模塊分析時,那個例子的延伸。less

在分析or1200的MMU時,咱們假設了一個例子,那個示例中,MMU將變量test的虛擬地址(0x2008),轉換成了物理地址(0x1006008)。ide

cpu訪問內存,虛實地址轉換,是其中的第一步,在完成虛實轉換以後,並非直接用這個地址訪問外部的SDRAM,而是MMU先將物理地址發送到cache,若是cache hit則直接ack cpu,若是cache miss則才須要訪問下一級cache或外部SDRAM。oop


2>直接映射cache的工做機制

上面咱們介紹了cache的大體工做流程,可是,cache的具體工做細節是怎樣的呢?fetch

獲得test的物理地址以後是如何運做的呢,下面,咱們就以直接映射的,大小爲8K,line數目爲512,line寬度爲16-Bytes的一個cache,來講明,以下圖所示:優化

經過這幅圖,咱們能夠很清楚的看到其工做細節。


說明:

a,這個cache的映射方式是direct mapped。

b,cache的總容量是8K,也正好就是一個內存頁。

c,整個cache有512個cache line,或者叫cache entry。

d,每一個cache line緩存16個字節的數據。

e,因爲是直接映射,因此不存在什麼替換算法,哪一個line出現cache miss就替換哪一個。

f,寫策略,write through和write back兩種。

g,因爲cache通常是對軟件編程模型透明的,因此不多須要和軟件交互,只須要最基本的控制,好比,須要把那個通道lock啊,cache flush啊,若是採用LRU替換算法,及時更新LRU值啊,等等。這一點和MMU大不相同,MMU須要軟件的大量的干預和控制。

h,簡單介紹一下工做機制:

首先,cache將虛擬地址的index域進行取模運算(%),具體和那個值取模,就看cache line的數量和緩存的數據大小。本例子中cacheline數量是512,緩存數量是16B,因此,須要將index分紅cache line index(定位到哪一行),和行內偏移(定位到這一行的哪個字節)。

cache根據cache line index定位到cache的具體一行,判斷這一行的valid標誌,若是有效,在將這一行的tag和MMU產生的PPN進行比較(由於一個cache line可能會對應多個內存地址)。若是tag和PPN匹配,那麼說明cache hit,若是兩個判斷條件有一個不知足,說明cache miss,這時,cache會burst access(突發訪問,本例子是疊4,每次4B,正好16B),更新這一個cache line。


i,cache的操做

刷新:cache將valid置0便可。

鎖定:加入有某個程序運行時間很長,爲了防止其餘程序在出現cache miss時將這個程序的cache line刷新,能夠將這個程序使用的cache line 鎖定。具體鎖定方式能夠是通道鎖定,也能夠是某一行鎖定(將整個cache分紅若干組,每一個組有若干行,一個組就叫一個通道(way))。


3>全相連映射cache的工做機制

上面咱們介紹了直接映射cache的工做機制,其餘兩種映射方式的cache也大致相同,不一樣的地方是cache line搜索方法,替換策略,寫策略不一樣。

全相連映射cache的工做機制,以下圖所示:



4>組相連映射cache的工做機制

介於直接映射和全相連映射之間,再也不贅述。


4,or1200的cache系統分析

瞭解了cache的工做機制以後,再分析or1200的cache的具體實現就相對容易一些,因爲cache只是內存的一個子集,沒有獨立的編程空間,因此與軟件的交互比較少,分析起來就更簡單一些。


1>or1200的cache的工做機制

or1200的cache採用直接映射方式,大小是8K,共512個entry,每一個line緩存16個字節,每一個line由1-bit標誌位,19-bit tag和16*8-bit數據組成。

上面咱們已經詳細說明了這種cache的工做機制,or1200的cache也不例外。


2>or1200的cache組成

or1200的cache,由qmem模塊組成一級cache,dcache/icache組成二級cache,sb模塊組成數據的三級cache。

下面是整個ordb2a開飯板的存儲系統的框圖,從中,咱們能夠清晰的看出整個系統的存儲子系統的數據通路。



3>qmem模塊分析

1》總體分析

qmem模塊是一級cache,在or1200_define.v中,對qmem有以下描述,從中咱們能夠知道qmem的做用,意義,容量等信息。


 

/////////////////////////////////////////////////
//
// Quick Embedded Memory (QMEM)
//

//
// Quick Embedded Memory
//
// Instantiation of dedicated insn/data memory (RAM or ROM).
// Insn fetch has effective throughput 1insn / clock cycle.
// Data load takes two clock cycles / access, data store
// takes 1 clock cycle / access (if there is no insn fetch)).
// Memory instantiation is shared between insn and data,
// meaning if insn fetch are performed, data load/store
// performance will be lower.
//
// Main reason for QMEM is to put some time critical functions
// into this memory and to have predictable and fast access
// to these functions. (soft fpu, context switch, exception
// handlers, stack, etc)
//
// It makes design a bit bigger and slower. QMEM sits behind
// IMMU/DMMU so all addresses are physical (so the MMUs can be
// used with QMEM and QMEM is seen by the CPU just like any other
// memory in the system). IC/DC are sitting behind QMEM so the
// whole design timing might be worse with QMEM implemented.
//
//`define OR1200_QMEM_IMPLEMENTED

//
// Base address and mask of QMEM
//
// Base address defines first address of QMEM. Mask defines
// QMEM range in address space. Actual size of QMEM is however
// determined with instantiated RAM/ROM. However bigger
// mask will reserve more address space for QMEM, but also
// make design faster, while more tight mask will take
// less address space but also make design slower. If
// instantiated RAM/ROM is smaller than space reserved with
// the mask, instatiated RAM/ROM will also be shadowed
// at higher addresses in reserved space.
//
`define OR1200_QMEM_IADDR	32'h0080_0000
`define OR1200_QMEM_IMASK	32'hfff0_0000 // Max QMEM size 1MB
`define OR1200_QMEM_DADDR	32'h0080_0000
`define OR1200_QMEM_DMASK	32'hfff0_0000 // Max QMEM size 1MB

//
// QMEM interface byte-select capability
//
// To enable qmem_sel* ports, define this macro.
//
//`define OR1200_QMEM_BSEL

//
// QMEM interface acknowledge
//
// To enable qmem_ack port, define this macro.
//
//`define OR1200_QMEM_ACK

 

 

2》qmem模塊RTL代碼分析

 

qmem模塊只有一個RTL文件,就是or1200_qmem_top.v,代碼分析,不是代碼的複製,粘貼以後加點註釋那麼簡單。爲了突出重點,在瞭解了qmem的大致功能以後,咱們須要瞭解其核心代碼,下面,咱們分析一下qmem模塊的核心,也就是其FSM,以下所示:


 

`define OR1200_QMEMFSM_IDLE	3'd0
`define OR1200_QMEMFSM_STORE	3'd1
`define OR1200_QMEMFSM_LOAD	3'd2
`define OR1200_QMEMFSM_FETCH	3'd3


//
// QMEM control FSM
//
always @(`OR1200_RST_EVENT rst or posedge clk)
	if (rst == `OR1200_RST_VALUE) begin
		state <=  `OR1200_QMEMFSM_IDLE;
		qmem_dack <=  1'b0;
		qmem_iack <=  1'b0;
	end
	else case (state)	// synopsys parallel_case
		`OR1200_QMEMFSM_IDLE: begin
			if (qmemdmmu_cycstb_i & daddr_qmem_hit & qmemdcpu_we_i & qmem_ack) begin
				state <=  `OR1200_QMEMFSM_STORE;
				qmem_dack <=  1'b1;
				qmem_iack <=  1'b0;
			end
			else if (qmemdmmu_cycstb_i & daddr_qmem_hit & qmem_ack) begin
				state <=  `OR1200_QMEMFSM_LOAD;
				qmem_dack <=  1'b1;
				qmem_iack <=  1'b0;
			end
			else if (qmemimmu_cycstb_i & iaddr_qmem_hit & qmem_ack) begin
				state <=  `OR1200_QMEMFSM_FETCH;
				qmem_iack <=  1'b1;
				qmem_dack <=  1'b0;
			end
		end
		`OR1200_QMEMFSM_STORE: begin
			if (qmemdmmu_cycstb_i & daddr_qmem_hit & qmemdcpu_we_i & qmem_ack) begin
				state <=  `OR1200_QMEMFSM_STORE;
				qmem_dack <=  1'b1;
				qmem_iack <=  1'b0;
			end
			else if (qmemdmmu_cycstb_i & daddr_qmem_hit & qmem_ack) begin
				state <=  `OR1200_QMEMFSM_LOAD;
				qmem_dack <=  1'b1;
				qmem_iack <=  1'b0;
			end
			else if (qmemimmu_cycstb_i & iaddr_qmem_hit & qmem_ack) begin
				state <=  `OR1200_QMEMFSM_FETCH;
				qmem_iack <=  1'b1;
				qmem_dack <=  1'b0;
			end
			else begin
				state <=  `OR1200_QMEMFSM_IDLE;
				qmem_dack <=  1'b0;
				qmem_iack <=  1'b0;
			end
		end
		`OR1200_QMEMFSM_LOAD: begin
			if (qmemdmmu_cycstb_i & daddr_qmem_hit & qmemdcpu_we_i & qmem_ack) begin
				state <=  `OR1200_QMEMFSM_STORE;
				qmem_dack <=  1'b1;
				qmem_iack <=  1'b0;
			end
			else if (qmemdmmu_cycstb_i & daddr_qmem_hit & qmem_ack) begin
				state <=  `OR1200_QMEMFSM_LOAD;
				qmem_dack <=  1'b1;
				qmem_iack <=  1'b0;
			end
			else if (qmemimmu_cycstb_i & iaddr_qmem_hit & qmem_ack) begin
				state <=  `OR1200_QMEMFSM_FETCH;
				qmem_iack <=  1'b1;
				qmem_dack <=  1'b0;
			end
			else begin
				state <=  `OR1200_QMEMFSM_IDLE;
				qmem_dack <=  1'b0;
				qmem_iack <=  1'b0;
			end
		end
		`OR1200_QMEMFSM_FETCH: begin
			if (qmemdmmu_cycstb_i & daddr_qmem_hit & qmemdcpu_we_i & qmem_ack) begin
				state <=  `OR1200_QMEMFSM_STORE;
				qmem_dack <=  1'b1;
				qmem_iack <=  1'b0;
			end
			else if (qmemdmmu_cycstb_i & daddr_qmem_hit & qmem_ack) begin
				state <=  `OR1200_QMEMFSM_LOAD;
				qmem_dack <=  1'b1;
				qmem_iack <=  1'b0;
			end
			else if (qmemimmu_cycstb_i & iaddr_qmem_hit & qmem_ack) begin
				state <=  `OR1200_QMEMFSM_FETCH;
				qmem_iack <=  1'b1;
				qmem_dack <=  1'b0;
			end
			else begin
				state <=  `OR1200_QMEMFSM_IDLE;
				qmem_dack <=  1'b0;
				qmem_iack <=  1'b0;
			end
		end
		default: begin
			state <=  `OR1200_QMEMFSM_IDLE;
			qmem_dack <=  1'b0;
			qmem_iack <=  1'b0;
		end
	endcase


分析:

 

能夠看出qmem共有4個狀態,爲了便於查看,我畫出了qmem的狀態圖,以下所示,有狀態和狀態轉移條件,一目瞭然,再也不贅述。



4>dcache模塊分析

data cache和instruction cache機制類似,這裏只分析data cache。

1》總體分析

data cache是外部內存的一個子集,其做用也是通常意義上的cache的做用。

這裏只說明一下幾點:

a,cache的預取,在cache空閒的時候,能夠事先將內存中的部分數據填充到cache裏,下降cache miss機率。

b,cache的無效控制,若是有些cache line有特殊要求,軟件能夠設置這些line爲無效。

c,cache的鎖定,本小節開始部分已經介紹了。


2》代碼分析

dcache由四個文件組成,分別是:or1200_dc_top.v,or1200_dc_fsm.v,or1200_dc_tag.v,or1200_dc_ram.v。這裏只介紹其核心部分,也就是or1200_dc_fsm.v中的FSM,代碼以下所示:


 

`define OR1200_DCFSM_IDLE	3'd0
`define OR1200_DCFSM_CLOADSTORE	3'd1
`define OR1200_DCFSM_LOOP2	3'd2
`define OR1200_DCFSM_LOOP3	3'd3
`define OR1200_DCFSM_LOOP4	3'd4
`define OR1200_DCFSM_FLUSH5	3'd5
`define OR1200_DCFSM_INV6	3'd6 //invalidate
`define OR1200_DCFSM_WAITSPRCS7	3'd7


//
// Main DC FSM
//

always @(posedge clk or `OR1200_RST_EVENT rst)
begin
	if (rst == `OR1200_RST_VALUE) 
		begin
			state <=  `OR1200_DCFSM_IDLE;
			addr_r <=  32'd0;
			hitmiss_eval <=  1'b0;
			store <=  1'b0;
			load <=  1'b0;
			cnt <=  `OR1200_DCLS'd0;
			cache_miss <=  1'b0;
			cache_dirty_needs_writeback <= 1'b0;
			cache_inhibit <=  1'b0;
			did_early_load_ack <= 1'b0;
			cache_spr_block_flush <= 1'b0;
			cache_spr_block_writeback <= 1'b0;
		end
	else
		case (state)	// synopsys parallel_case
			`OR1200_DCFSM_IDLE :
			begin
				if (dc_en & (dc_block_flush | dc_block_writeback))
				begin
					cache_spr_block_flush <= dc_block_flush;
					cache_spr_block_writeback <= dc_block_writeback;
					hitmiss_eval <= 1'b1;
					state <= `OR1200_DCFSM_FLUSH5;
					addr_r <=  spr_dat_i;
				end
			else if (dc_en & dcqmem_cycstb_i)
				begin
					state <= `OR1200_DCFSM_CLOADSTORE;
					hitmiss_eval <=  1'b1;
					store <=  dcqmem_we_i;
					load <=  !dcqmem_we_i;
				end
	     
	     
          end // case: `OR1200_DCFSM_IDLE
	  
		`OR1200_DCFSM_CLOADSTORE:
		begin
			hitmiss_eval <=  1'b0;
			if (hitmiss_eval)
				begin
					cache_inhibit <=  dcqmem_ci_i; // Check for cache inhibit here
					cache_miss <=  tagcomp_miss;
					cache_dirty_needs_writeback <= dirty;
					addr_r <=  lsu_addr;
				end

			// Evaluate any cache line load/stores in first cycle:
			if (hitmiss_eval & tagcomp_miss & !(store & writethrough) & !dcqmem_ci_i)
				begin
					// Miss - first either:
					//  1) write back dirty line 
					if (dirty)
						begin
						// Address for writeback
							addr_r <=  {tag, lsu_addr[`OR1200_DCINDXH:2],2'd0};
							load <= 1'b0;
							store <= 1'b1;
							`ifdef OR1200_VERBOSE		     
							$display("%t: dcache miss and dirty", $time);
							`endif
						end
						//  2) load requested line
					else 
						begin
							addr_r <=  lsu_addr;
							load <= 1'b1;
							store <= 1'b0;
						end // else: !if(dirty)
					state <= `OR1200_DCFSM_LOOP2;		  
					// Set the counter for the burst accesses
					cnt <=  ((1 << `OR1200_DCLS) - 4);
				end
			else if (// Strobe goes low
		      !dcqmem_cycstb_i |
		      // Cycle finishes
		      (!hitmiss_eval & (biudata_valid | biudata_error)) |
		      // Cache hit in first cycle....
		      (hitmiss_eval & !tagcomp_miss & !dcqmem_ci_i &
		      // .. and you're not doing a writethrough store..
		      !(store & writethrough))) 
				begin
					state <=  `OR1200_DCFSM_IDLE;
					load <=  1'b0;
					store <= 1'b0;
					cache_inhibit <= 1'b0;
					cache_dirty_needs_writeback <= 1'b0;
				end	     
		end // case: `OR1200_DCFSM_CLOADSTORE	  
	  
		`OR1200_DCFSM_LOOP2 : 
		begin // loop/abort	     
			if (!dc_en| biudata_error)
				begin
					state <=  `OR1200_DCFSM_IDLE;
					load <=  1'b0;
					store <= 1'b0;
					cnt <= `OR1200_DCLS'd0;
				end
			if (biudata_valid & (|cnt))
				begin
					cnt <=  cnt - 4;
					addr_r[`OR1200_DCLS-1:2] <=  addr_r[`OR1200_DCLS-1:2] + 1;
				end
			else if (biudata_valid & !(|cnt))
				begin
					state <= `OR1200_DCFSM_LOOP3;
					addr_r <=  lsu_addr;
					load <= 1'b0;
					store <= 1'b0;
				end

			// Track if we did an early ack during a load
			if (load_miss_ack)
				did_early_load_ack <= 1'b1;
	     

		end // case: `OR1200_DCFSM_LOOP2
	  
		`OR1200_DCFSM_LOOP3:
		begin // figure out next step
			if (cache_dirty_needs_writeback)
				begin
					// Just did store of the dirty line so now load new one
					load <= 1'b1;
					// Set the counter for the burst accesses
					cnt <=  ((1 << `OR1200_DCLS) - 4);
					// Address of line to be loaded
					addr_r <=  lsu_addr;
					cache_dirty_needs_writeback <= 1'b0;
					state <= `OR1200_DCFSM_LOOP2;
				end // if (cache_dirty_needs_writeback)
			else if (cache_spr_block_flush | cache_spr_block_writeback)
				begin
					// Just wrote back the line to memory, we're finished.
					cache_spr_block_flush <= 1'b0;
					cache_spr_block_writeback <= 1'b0;
					state <= `OR1200_DCFSM_WAITSPRCS7;
				end
			else
				begin
					// Just loaded a new line, finish up
					did_early_load_ack <= 1'b0;
					state <= `OR1200_DCFSM_LOOP4;
				end
		end // case: `OR1200_DCFSM_LOOP3

		`OR1200_DCFSM_LOOP4: 
		begin
			state <=  `OR1200_DCFSM_IDLE;
		end

		`OR1200_DCFSM_FLUSH5: 
		begin
			hitmiss_eval <= 1'b0;
			if (hitmiss_eval & !tag_v)
				begin
					// Not even cached, just ignore
					cache_spr_block_flush <= 1'b0;
					cache_spr_block_writeback <= 1'b0;
					state <=  `OR1200_DCFSM_WAITSPRCS7;
				end
			else if (hitmiss_eval & tag_v)
				begin
					// Tag is valid - what do we do?
					if ((cache_spr_block_flush | cache_spr_block_writeback) & dirty)
						begin
							// Need to writeback
							// Address for writeback (spr_dat_i has already changed so
							// use line number from addr_r)
							addr_r <=  {tag, addr_r[`OR1200_DCINDXH:2],2'd0};
							load <= 1'b0;
							store <= 1'b1;
							`ifdef OR1200_VERBOSE		     
								$display("%t: block flush: dirty block", $time);
							`endif
							state <= `OR1200_DCFSM_LOOP2;		  
							// Set the counter for the burst accesses
							cnt <=  ((1 << `OR1200_DCLS) - 4);
						end
					else if (cache_spr_block_flush & !dirty)
						begin
							// Line not dirty, just need to invalidate
							state <=  `OR1200_DCFSM_INV6;
						end // else: !if(dirty)
					else if (cache_spr_block_writeback & !dirty)
						begin
							// Nothing to do - line is valid but not dirty
							cache_spr_block_writeback <= 1'b0;
							state <=  `OR1200_DCFSM_WAITSPRCS7;
						end
				end // if (hitmiss_eval & tag_v)
		end
		`OR1200_DCFSM_INV6: 
		begin
			cache_spr_block_flush <= 1'b0;
			// Wait until SPR CS goes low before going back to idle
			if (!spr_cswe)
			state <=  `OR1200_DCFSM_IDLE;
		end
		`OR1200_DCFSM_WAITSPRCS7: 
		begin
			// Wait until SPR CS goes low before going back to idle
			if (!spr_cswe)
				state <=  `OR1200_DCFSM_IDLE;
		end

	endcase // case (state)
      
end // always @ (posedge clk or `OR1200_RST_EVENT rst)
   


 

爲了便於理解,我畫出了其狀態圖,以下所示:



5>sb模塊分析

1》總體分析

store buffer,其本質是一個FIFO,至關於一個write back的cache,其功能和相關分析,以前已經作過,請參考:http://blog.csdn.net/rill_zhen/article/details/9491095  中的第2.1章節。

關於這個FIFO的depth和width,or1200-define.v中有以下定義:


 

//
// Number of store buffer entries
//
// Verified number of entries are 4 and 8 entries
// (2 and 3 for OR1200_SB_LOG). OR1200_SB_ENTRIES must
// always match 2**OR1200_SB_LOG.
// To disable store buffer, undefine
// OR1200_SB_IMPLEMENTED.
//
`define OR1200_SB_LOG		2	// 2 or 3
`define OR1200_SB_ENTRIES	4	// 4 or 8


 


2》代碼分析

sb模塊包含兩個文件,or1200_sb.v和or1200_sb_fifo.v,第二個從文件名就能夠看出是一個FIFO,其物理結構是一個雙口的RAM,這裏只分析第一個,主要代碼以下所示:

代碼不多,只有150多行。


 

module or1200_sb(
	// RISC clock, reset
	clk, rst,

	// Internal RISC bus (SB)
	sb_en,

	// Internal RISC bus (DC<->SB)
	dcsb_dat_i, dcsb_adr_i, dcsb_cyc_i, dcsb_stb_i, dcsb_we_i, dcsb_sel_i, dcsb_cab_i,
	dcsb_dat_o, dcsb_ack_o, dcsb_err_o,

	// BIU bus
	sbbiu_dat_o, sbbiu_adr_o, sbbiu_cyc_o, sbbiu_stb_o, sbbiu_we_o, sbbiu_sel_o, sbbiu_cab_o,
	sbbiu_dat_i, sbbiu_ack_i, sbbiu_err_i
);

parameter dw = `OR1200_OPERAND_WIDTH;
parameter aw = `OR1200_OPERAND_WIDTH;

//
// RISC clock, reset
//
input			clk;		// RISC clock
input			rst;		// RISC reset

//
// Internal RISC bus (SB)
//
input			sb_en;		// SB enable

//
// Internal RISC bus (DC<->SB)
//
input	[dw-1:0]	dcsb_dat_i;	// input data bus
input	[aw-1:0]	dcsb_adr_i;	// address bus
input			dcsb_cyc_i;	// WB cycle
input			dcsb_stb_i;	// WB strobe
input			dcsb_we_i;	// WB write enable
input			dcsb_cab_i;	// CAB input
input	[3:0]		dcsb_sel_i;	// byte selects
output	[dw-1:0]	dcsb_dat_o;	// output data bus
output			dcsb_ack_o;	// ack output
output			dcsb_err_o;	// err output

//
// BIU bus
//
output	[dw-1:0]	sbbiu_dat_o;	// output data bus
output	[aw-1:0]	sbbiu_adr_o;	// address bus
output			sbbiu_cyc_o;	// WB cycle
output			sbbiu_stb_o;	// WB strobe
output			sbbiu_we_o;	// WB write enable
output			sbbiu_cab_o;	// CAB input
output	[3:0]		sbbiu_sel_o;	// byte selects
input	[dw-1:0]	sbbiu_dat_i;	// input data bus
input			sbbiu_ack_i;	// ack output
input			sbbiu_err_i;	// err output

`ifdef OR1200_SB_IMPLEMENTED

//
// Internal wires and regs
//
wire	[4+dw+aw-1:0]	fifo_dat_i;	// FIFO data in
wire	[4+dw+aw-1:0]	fifo_dat_o;	// FIFO data out
wire			fifo_wr;
wire			fifo_rd;
wire			fifo_full;
wire			fifo_empty;
wire			sel_sb;
reg			sb_en_reg;
reg			outstanding_store;
reg			fifo_wr_ack;

//
// FIFO data in/out
//
assign fifo_dat_i = {dcsb_sel_i, dcsb_dat_i, dcsb_adr_i};
assign {sbbiu_sel_o, sbbiu_dat_o, sbbiu_adr_o} = sel_sb ? fifo_dat_o : {dcsb_sel_i, dcsb_dat_i, dcsb_adr_i};

//
// Control
//
assign fifo_wr = dcsb_cyc_i & dcsb_stb_i & dcsb_we_i & ~fifo_full & ~fifo_wr_ack;
assign fifo_rd = ~outstanding_store;
assign dcsb_dat_o = sbbiu_dat_i;
assign dcsb_ack_o = sel_sb ? fifo_wr_ack : sbbiu_ack_i;
assign dcsb_err_o = sel_sb ? 1'b0 : sbbiu_err_i;	// SB never returns error
assign sbbiu_cyc_o = sel_sb ? outstanding_store : dcsb_cyc_i;
assign sbbiu_stb_o = sel_sb ? outstanding_store : dcsb_stb_i;
assign sbbiu_we_o = sel_sb ? 1'b1 : dcsb_we_i;
assign sbbiu_cab_o = sel_sb ? 1'b0 : dcsb_cab_i;
assign sel_sb = sb_en_reg & (~fifo_empty | (fifo_empty & outstanding_store));

//
// SB enable
//
always @(posedge clk or `OR1200_RST_EVENT rst)
	if (rst == `OR1200_RST_VALUE)
		sb_en_reg <= 1'b0;
	else if (sb_en & ~dcsb_cyc_i)
		sb_en_reg <=  1'b1; // enable SB when there is no dcsb transfer in progress
	else if (~sb_en & (~fifo_empty | (fifo_empty & outstanding_store)))
		sb_en_reg <=  1'b0; // disable SB when there is no pending transfers from SB

//
// Store buffer FIFO instantiation
//
or1200_sb_fifo or1200_sb_fifo (
	.clk_i(clk),
	.rst_i(rst),
	.dat_i(fifo_dat_i),
	.wr_i(fifo_wr),
	.rd_i(fifo_rd),
	.dat_o(fifo_dat_o),
	.full_o(fifo_full),
	.empty_o(fifo_empty)
);

//
// fifo_rd
//
always @(posedge clk or `OR1200_RST_EVENT rst)
	if (rst == `OR1200_RST_VALUE)
		outstanding_store <=  1'b0;
	else if (sbbiu_ack_i)
		outstanding_store <=  1'b0;
	else if (sel_sb | fifo_wr)
		outstanding_store <=  1'b1;

//
// fifo_wr_ack
//
always @(posedge clk or `OR1200_RST_EVENT rst)
	if (rst == `OR1200_RST_VALUE)
		fifo_wr_ack <=  1'b0;
	else if (fifo_wr)
		fifo_wr_ack <=  1'b1;
	else
		fifo_wr_ack <=  1'b0;

`else	// !OR1200_SB_IMPLEMENTED

assign sbbiu_dat_o = dcsb_dat_i;
assign sbbiu_adr_o = dcsb_adr_i;
assign sbbiu_cyc_o = dcsb_cyc_i;
assign sbbiu_stb_o = dcsb_stb_i;
assign sbbiu_we_o = dcsb_we_i;
assign sbbiu_cab_o = dcsb_cab_i;
assign sbbiu_sel_o = dcsb_sel_i;
assign dcsb_dat_o = sbbiu_dat_i;
assign dcsb_ack_o = sbbiu_ack_i;
assign dcsb_err_o = sbbiu_err_i;

`endif

endmodule


 

6>biu模塊分析

1》總體分析

biu(bus ingerface unit)模塊,是or1200_top和外界進行數據交換的窗口,對於or1200,例化了兩個,分別是dbiu和ibiu。biu模塊除了和外界交換數據外,還有判斷字節對齊等功能。

這個模塊主要是一個wishbone協議的slave和master的一個wrapper,若是你對wishbone總線protocol比較熟悉的話,這個模塊看起來就簡單多了,我以前也寫過wishbone的相關的內容,請參考:http://blog.csdn.net/rill_zhen/article/details/8659788


2》代碼分析

biu模塊包含一個文件,or1200_wb_biu.v,主要是wishbone協議的時序產生邏輯,這裏不作細說,爲了保持本文的完整性,其主要代碼,以下所示:


 

module or1200_wb_biu(
		     // RISC clock, reset and clock control
		     clk, rst, clmode,

		     // WISHBONE interface
		     wb_clk_i, wb_rst_i, wb_ack_i, wb_err_i, wb_rty_i, wb_dat_i,
		     wb_cyc_o, wb_adr_o, wb_stb_o, wb_we_o, wb_sel_o, wb_dat_o,
`ifdef OR1200_WB_CAB
		     wb_cab_o,
`endif
`ifdef OR1200_WB_B3
		     wb_cti_o, wb_bte_o,
`endif

		     // Internal RISC bus
		     biu_dat_i, biu_adr_i, biu_cyc_i, biu_stb_i, biu_we_i, biu_sel_i, biu_cab_i,
		     biu_dat_o, biu_ack_o, biu_err_o
		     );

   parameter dw = `OR1200_OPERAND_WIDTH;
   parameter aw = `OR1200_OPERAND_WIDTH;
   parameter bl = 4; /* Can currently be either 4 or 8 - the two optional line
		      sizes for the OR1200. */
		      
   
   //
   // RISC clock, reset and clock control
   //
   input				clk;		// RISC clock
   input				rst;		// RISC reset
   input [1:0] 				clmode;		// 00 WB=RISC, 01 WB=RISC/2, 10 N/A, 11 WB=RISC/4

   //
   // WISHBONE interface
   //
   input				wb_clk_i;	// clock input
   input				wb_rst_i;	// reset input
   input				wb_ack_i;	// normal termination
   input				wb_err_i;	// termination w/ error
   input				wb_rty_i;	// termination w/ retry
   input [dw-1:0] 			wb_dat_i;	// input data bus
   output				wb_cyc_o;	// cycle valid output
   output [aw-1:0] 			wb_adr_o;	// address bus outputs
   output				wb_stb_o;	// strobe output
   output				wb_we_o;	// indicates write transfer
   output [3:0] 			wb_sel_o;	// byte select outputs
   output [dw-1:0] 			wb_dat_o;	// output data bus
`ifdef OR1200_WB_CAB
   output				wb_cab_o;	// consecutive address burst
`endif
`ifdef OR1200_WB_B3
   output [2:0] 			wb_cti_o;	// cycle type identifier
   output [1:0] 			wb_bte_o;	// burst type extension
`endif

   //
   // Internal RISC interface
   //
   input [dw-1:0] 			biu_dat_i;	// input data bus
   input [aw-1:0] 			biu_adr_i;	// address bus
   input				biu_cyc_i;	// WB cycle
   input				biu_stb_i;	// WB strobe
   input				biu_we_i;	// WB write enable
   input				biu_cab_i;	// CAB input
   input [3:0] 				biu_sel_i;	// byte selects
   output [31:0] 			biu_dat_o;	// output data bus
   output				biu_ack_o;	// ack output
   output				biu_err_o;	// err output

   //
   // Registers
   //
   wire 				wb_ack;		// normal termination
   reg [aw-1:0] 			wb_adr_o;	// address bus outputs
   reg 					wb_cyc_o;	// cycle output
   reg 					wb_stb_o;	// strobe output
   reg 					wb_we_o;	// indicates write transfer
   reg [3:0] 				wb_sel_o;	// byte select outputs
`ifdef OR1200_WB_CAB
   reg 					wb_cab_o;	// CAB output
`endif
`ifdef OR1200_WB_B3
   reg [2:0] 				wb_cti_o;	// cycle type identifier
   reg [1:0] 				wb_bte_o;	// burst type extension
`endif
`ifdef OR1200_NO_DC   
   reg [dw-1:0] 			wb_dat_o;	// output data bus
`else   
   assign wb_dat_o = biu_dat_i;    // No register on this - straight from DCRAM
`endif
   
`ifdef OR1200_WB_RETRY
   reg [`OR1200_WB_RETRY-1:0] 		retry_cnt;	// Retry counter
`else
   wire 				retry_cnt;
   assign retry_cnt = 1'b0;
`endif
`ifdef OR1200_WB_B3
   reg [3:0] 				burst_len;	// burst counter
`endif

   reg  				biu_stb_reg;	// WB strobe
   wire  				biu_stb;	// WB strobe
   reg 					wb_cyc_nxt;	// next WB cycle value
   reg 					wb_stb_nxt;	// next WB strobe value
   reg [2:0] 				wb_cti_nxt;	// next cycle type identifier value

   reg 					wb_ack_cnt;	// WB ack toggle counter
   reg 					wb_err_cnt;	// WB err toggle counter
   reg 					wb_rty_cnt;	// WB rty toggle counter
   reg 					biu_ack_cnt;	// BIU ack toggle counter
   reg 					biu_err_cnt;	// BIU err toggle counter
   reg 					biu_rty_cnt;	// BIU rty toggle counter
   wire 				biu_rty;	// BIU rty indicator

   reg [1:0] 				wb_fsm_state_cur;	// WB FSM - surrent state
   reg [1:0] 				wb_fsm_state_nxt;	// WB FSM - next state
   wire [1:0] 				wb_fsm_idle	= 2'h0;	// WB FSM state - IDLE
   wire [1:0] 				wb_fsm_trans	= 2'h1;	// WB FSM state - normal TRANSFER
   wire [1:0] 				wb_fsm_last	= 2'h2;	// EB FSM state - LAST transfer

   //
   // WISHBONE I/F <-> Internal RISC I/F conversion
   //
   //assign wb_ack = wb_ack_i;
   assign wb_ack = wb_ack_i & !wb_err_i & !wb_rty_i;

   //
   // WB FSM - register part
   // 
   always @(posedge wb_clk_i or `OR1200_RST_EVENT wb_rst_i) begin
      if (wb_rst_i == `OR1200_RST_VALUE) 
	wb_fsm_state_cur <=  wb_fsm_idle;
      else 
	wb_fsm_state_cur <=  wb_fsm_state_nxt;
   end

   //
   // WB burst tength counter
   // 
   always @(posedge wb_clk_i or `OR1200_RST_EVENT wb_rst_i) begin
      if (wb_rst_i == `OR1200_RST_VALUE) begin
	 burst_len <= 0;
      end
      else begin
	 // burst counter
	 if (wb_fsm_state_cur == wb_fsm_idle)
	   burst_len <=  bl[3:0] - 2;
	 else if (wb_stb_o & wb_ack)
	   burst_len <=  burst_len - 1;
      end
   end

   // 
   // WB FSM - combinatorial part
   // 
   always @(wb_fsm_state_cur or burst_len or wb_err_i or wb_rty_i or wb_ack or 
	    wb_cti_o or wb_sel_o or wb_stb_o or wb_we_o or biu_cyc_i or 
	    biu_stb or biu_cab_i or biu_sel_i or biu_we_i) begin
      // States of WISHBONE Finite State Machine
      case(wb_fsm_state_cur)
	// IDLE 
	wb_fsm_idle : begin
	   wb_cyc_nxt = biu_cyc_i & biu_stb;
	   wb_stb_nxt = biu_cyc_i & biu_stb;
	   wb_cti_nxt = {!biu_cab_i, 1'b1, !biu_cab_i};
	   if (biu_cyc_i & biu_stb)
	     wb_fsm_state_nxt = wb_fsm_trans;
	   else
	     wb_fsm_state_nxt = wb_fsm_idle;
	end
	// normal TRANSFER
	wb_fsm_trans : begin
	   wb_cyc_nxt = !wb_stb_o | !wb_err_i & !wb_rty_i & 
			!(wb_ack & wb_cti_o == 3'b111);
	   
	   wb_stb_nxt = !wb_stb_o | !wb_err_i & !wb_rty_i & !wb_ack | 
			!wb_err_i & !wb_rty_i & wb_cti_o == 3'b010 ;
	   wb_cti_nxt[2] = wb_stb_o & wb_ack & burst_len == 'h0 | wb_cti_o[2];
	   wb_cti_nxt[1] = 1'b1  ;
	   wb_cti_nxt[0] = wb_stb_o & wb_ack & burst_len == 'h0 | wb_cti_o[0];

	   if ((!biu_cyc_i | !biu_stb | !biu_cab_i | biu_sel_i != wb_sel_o | 
		biu_we_i != wb_we_o) & wb_cti_o == 3'b010)
	     wb_fsm_state_nxt = wb_fsm_last;
	   else if ((wb_err_i | wb_rty_i | wb_ack & wb_cti_o==3'b111) & 
		    wb_stb_o)
	     wb_fsm_state_nxt = wb_fsm_idle;
	   else
	     wb_fsm_state_nxt = wb_fsm_trans;
	end
	// LAST transfer
	wb_fsm_last : begin
	   wb_cyc_nxt = !wb_stb_o | !wb_err_i & !wb_rty_i & 
			!(wb_ack & wb_cti_o == 3'b111);
	   wb_stb_nxt = !wb_stb_o | !wb_err_i & !wb_rty_i & 
			!(wb_ack & wb_cti_o == 3'b111);
	   wb_cti_nxt[2] = wb_ack & wb_stb_o | wb_cti_o[2];
	   wb_cti_nxt[1] = 1'b1                  ;
	   wb_cti_nxt[0] = wb_ack & wb_stb_o | wb_cti_o[0];
	   if ((wb_err_i | wb_rty_i | wb_ack & wb_cti_o == 3'b111) & wb_stb_o)
	     wb_fsm_state_nxt = wb_fsm_idle;
	   else
	     wb_fsm_state_nxt = wb_fsm_last;
	end
	// default state
	default:begin
	   wb_cyc_nxt = 1'bx;
	   wb_stb_nxt = 1'bx;
	   wb_cti_nxt = 3'bxxx;
	   wb_fsm_state_nxt = 2'bxx;
	end
      endcase
   end

   //
   // WB FSM - output signals
   // 
   always @(posedge wb_clk_i or `OR1200_RST_EVENT wb_rst_i) begin
      if (wb_rst_i == `OR1200_RST_VALUE) begin
	 wb_cyc_o	<=  1'b0;
	 wb_stb_o	<=  1'b0;
	 wb_cti_o	<=  3'b111;
	 wb_bte_o	<=  (bl==8) ? 2'b10 : (bl==4) ? 2'b01 : 2'b00;
`ifdef OR1200_WB_CAB
	 wb_cab_o	<=  1'b0;
`endif
	 wb_we_o		<=  1'b0;
	 wb_sel_o	<=  4'hf;
	 wb_adr_o	<=  {aw{1'b0}};
`ifdef OR1200_NO_DC	 
	 wb_dat_o	<=  {dw{1'b0}};
`endif	 
      end
      else begin
	 wb_cyc_o	<=  wb_cyc_nxt;

         if (wb_ack & wb_cti_o == 3'b111) 
           wb_stb_o        <=  1'b0;
         else
           wb_stb_o        <=  wb_stb_nxt;
`ifndef OR1200_NO_BURSTS
	 wb_cti_o	<=  wb_cti_nxt;
`endif	 
	 wb_bte_o	<=  (bl==8) ? 2'b10 : (bl==4) ? 2'b01 : 2'b00;
`ifdef OR1200_WB_CAB
	 wb_cab_o	<=  biu_cab_i;
`endif
	 // we and sel - set at beginning of access 
	 if (wb_fsm_state_cur == wb_fsm_idle) begin
	    wb_we_o		<=  biu_we_i;
	    wb_sel_o	<=  biu_sel_i;
	 end
	 // adr - set at beginning of access and changed at every termination 
	 if (wb_fsm_state_cur == wb_fsm_idle) begin
	    wb_adr_o	<=  biu_adr_i;
	 end 
	 else if (wb_stb_o & wb_ack) begin
	    if (bl==4) begin
	       wb_adr_o[3:2]	<=  wb_adr_o[3:2] + 1;
	    end
	    if (bl==8) begin
	       wb_adr_o[4:2]	<=  wb_adr_o[4:2] + 1;
	    end
	 end
`ifdef OR1200_NO_DC	 
	 // dat - write data changed after avery subsequent write access
	 if (!wb_stb_o) begin
	    wb_dat_o 	<=  biu_dat_i;
	 end
`endif	 
      end
   end

   //
   // WB & BIU termination toggle counters
   // 
   always @(posedge wb_clk_i or `OR1200_RST_EVENT wb_rst_i) begin
      if (wb_rst_i == `OR1200_RST_VALUE) begin
	 wb_ack_cnt	<=  1'b0;
	 wb_err_cnt	<=  1'b0;
	 wb_rty_cnt	<=  1'b0;
      end
      else begin
	 // WB ack toggle counter
	 if (wb_fsm_state_cur == wb_fsm_idle | !(|clmode))
	   wb_ack_cnt	<=  1'b0;
	 else if (wb_stb_o & wb_ack)
	   wb_ack_cnt	<=  !wb_ack_cnt;
	 // WB err toggle counter
	 if (wb_fsm_state_cur == wb_fsm_idle | !(|clmode))
	   wb_err_cnt	<=  1'b0;
	 else if (wb_stb_o & wb_err_i)
	   wb_err_cnt	<=  !wb_err_cnt;
	 // WB rty toggle counter
	 if (wb_fsm_state_cur == wb_fsm_idle | !(|clmode))
	   wb_rty_cnt	<=  1'b0;
	 else if (wb_stb_o & wb_rty_i)
	   wb_rty_cnt	<=  !wb_rty_cnt;
      end
   end

   always @(posedge clk or `OR1200_RST_EVENT rst) begin
      if (rst == `OR1200_RST_VALUE) begin
         biu_stb_reg	<=  1'b0;
	 biu_ack_cnt	<=  1'b0;
	 biu_err_cnt	<=  1'b0;
	 biu_rty_cnt	<=  1'b0;
`ifdef OR1200_WB_RETRY
	 retry_cnt	<= {`OR1200_WB_RETRY{1'b0}};
`endif
      end
      else begin
	 // BIU strobe
	 if (biu_stb_i & !biu_cab_i & biu_ack_o)
	   biu_stb_reg	<=  1'b0;
	 else
	   biu_stb_reg	<=  biu_stb_i;
	 // BIU ack toggle counter
	 if (wb_fsm_state_cur == wb_fsm_idle | !(|clmode))
	   biu_ack_cnt	<=  1'b0 ;
	 else if (biu_ack_o)
	   biu_ack_cnt	<=  !biu_ack_cnt ;
	 // BIU err toggle counter
	 if (wb_fsm_state_cur == wb_fsm_idle | !(|clmode))
	   biu_err_cnt	<=  1'b0 ;
	 else if (wb_err_i & biu_err_o)
	   biu_err_cnt	<=  !biu_err_cnt ;
	 // BIU rty toggle counter
	 if (wb_fsm_state_cur == wb_fsm_idle | !(|clmode))
	   biu_rty_cnt	<=  1'b0 ;
	 else if (biu_rty)
	   biu_rty_cnt	<=  !biu_rty_cnt ;
`ifdef OR1200_WB_RETRY
	 if (biu_ack_o | biu_err_o)
	   retry_cnt	<=  {`OR1200_WB_RETRY{1'b0}};
	 else if (biu_rty)
	   retry_cnt	<=  retry_cnt + 1'b1;
`endif
      end
   end

   assign biu_stb = biu_stb_i & biu_stb_reg;

   //
   // Input BIU data bus
   //
   assign	biu_dat_o	= wb_dat_i;

   //
   // Input BIU termination signals 
   //
   assign	biu_rty		= (wb_fsm_state_cur == wb_fsm_trans) & wb_rty_i & wb_stb_o & (wb_rty_cnt ~^ biu_rty_cnt);
   assign	biu_ack_o	= (wb_fsm_state_cur == wb_fsm_trans) & wb_ack & wb_stb_o & (wb_ack_cnt ~^ biu_ack_cnt);
   assign	biu_err_o	= (wb_fsm_state_cur == wb_fsm_trans) & wb_err_i & wb_stb_o & (wb_err_cnt ~^ biu_err_cnt)
`ifdef OR1200_WB_RETRY
     | biu_rty & retry_cnt[`OR1200_WB_RETRY-1];
`else
   ;
`endif


endmodule


 

5,一個小問題

終於能夠告一段落了,下面弄個小問題放鬆一下。

不少人可能曾經遇到過這樣一個軟件方面的筆試題,題目是,下面兩段程序,通常狀況下,哪一個的執行時間短(假設cache大小爲8K)?





6,小結

自此,咱們完成了對OpenRISC的MMU,cache系統的分析,對計算機體系結構中很重要的部分--存儲器組織有了一個完整,清晰,透徹的瞭解了。


7,參考文獻

1,ORPSoC RTL code

2,OpenRISC arch-manual

相關文章
相關標籤/搜索