基於SDN網絡的負載均衡研究與實現

時間 2019-11-10

原文原文鏈接

爲何須要軟件定義網絡

1.網絡缺少可擴展性，創新正在停滯不前。python

咱們最新的研究發現，幾乎每兩個組織中就有一個認爲須要將網絡功能擴展爲採用SDN的主要業務觸發因素，而不是其餘催化劑。這一統計數據一點都不使人驚訝,咱們的客戶須要一個足夠靈活的網絡來支持業務，由於每一個功能都試圖對不斷變化的市場條件作出更快速的響應。
這一挑戰與行業無關：在幾乎全部能夠想象到的行業中，企業都在嘗試支持愈來愈多的應用程序和設備，由於它們在其產品和服務中添加了新的性能。網絡容量和複雜性每每會阻礙這種發展,至少會延遲企業的創新能力。
SDN爲此問題提供了潛在的解決方案。它提供了一種從中心管理網絡功能的方法，實現從單個設備(而不是逐個設備)跨多個設備對應用程序的更改。隨着組織需求的發展，這大大減小了擴展所需的時間。算法

2.因爲缺少速度，市場機會正在喪失express

任何IT戰略都必須以其目標業務成果爲基礎,即應該支持競爭優點。在機會窗口愈來愈短暫的市場中，若是企業沒法快速創新，這種優點將會喪失。
這是採用SDN的關鍵驅動因素。SDN是一種集中的，基於策略的IT資產管理方式，這意味着企業能夠更快地進行創新。每一個新的應用程序均可以從中心推出，設備能夠經過與控制器的連接和已經設置的新策略自動配置自本身。
在一個顧客的需求必須獲得知足，但變化迅速且不可預測的世界裏，SDN能夠彌補咱們日益看到的「快速失敗」心理差距。新產品和服務到達目標市場的速度要更快，而且能夠隨時更新或更換。apache

3.公司但願快速創新安全

公司還告訴咱們，擁有敏捷性和靈活性來改善跨業務的服務是相當重要的。所以，SDN部署的速度被視爲另外一個SDN被採用的驅動力。使每一個業務部門更快地獨立相當重要。在咱們合做的許多業務中，聽到不一樣的部門都在嘗試對相互獨立的創新，但卻發現它們的it基礎設施不容許它們以指望的速度前進，這種狀況並很多見。對於須要在工做以外訪問按需服務的業務用戶來講，這是使人沮喪的。
在此背景下，SDN的出現進一步鼓勵了組織的創新能力。這種創新能力體如今它可以在多大程度上試驗和推出新的計劃，不管是內部仍是面向客戶。SDN爲網絡複雜性提供了實用的解決方案，不然將威脅到實驗和轉型。服務器

4.安全問題阻礙了創造力網絡

在一個組織從未如此意識到網絡安全和威脅程度不斷提升的世界中，對重大漏洞或失敗的恐懼會抑制創新。企業擔憂移動太快或與新合做夥伴合做會使他們面臨更多漏洞。能夠理解的是，他們的反應是關注彈性，但這每每會損害改善跨業務的服務敏捷性。
SDN能夠在技術和實踐方面加強企業安全性。一方面，承載加密流量的全封閉網絡本質上比企業的傳統網絡解決方案更安全。另外一方面，SDN爲組織提供了在用戶的虛擬環境中構建現有應用程序安全性的機會。
這意味着企業可以更好地管理其It彈性，同時知足它們對創新的迫切追求。架構

5.效率對於長期創新相當重要併發

若是在這個轉型的新世界中，快速失敗是許多組織的一個重要原則，那麼失敗也是廉價的。當他們嘗試新的應用程序和試用新產品和服務時，面對昂貴且繁瑣的IT基礎設施，企業將很快不堪重負。
SDN在中心進行管理，無需爲應用程序的每次新迭代從新配置單個設備，這可能具備巨大的價值。但更長遠的機遇多是將採用SDN做爲向網絡轉型邁進的一部分，由於企業級虛擬化將爲將來五年及之後的挑戰提供一個精益高效的組織。app

負載均衡在新興網絡環境下的改變

在複雜多變的網絡環境下保證網絡服務的穩定性和效率，是負載均衡機制解決的一個重要問題，因爲傳統網絡架構自身存在的缺點，負載均衡很難有大的突破，隨着新型網絡體系SDN的提出，能夠從另外一種思路出發，爲負載均衡機制的改進提出新的突破，本文經過在以OpenFlow爲表明的SDN架構下實施負載均衡策略，以期提升網絡性能。

負載均衡經常使用算法

軟件負載均衡是指使用軟件的方式來分發和均衡流量。軟件負載均衡，分爲7層協議和4層協議。網絡協議有七層，基於第四層傳輸層來作流量分發的方案稱爲4層負載均衡，例如LVS，而基於第七層應用層來作流量分發的稱爲7層負載均衡，例如Nginx。
這兩種在性能和靈活性上是有些區別的。基於4層的負載均衡性能要高一些，通常能達到幾十萬/秒的處理量，而基於7層的負載均衡處理量通常只在幾萬/秒。基於軟件的負載均衡的特色也很明顯，便宜。在正常的服務器上部署便可，無需額外採購，就是投入一點技術去優化優化便可，所以這種方式是互聯網公司中用得最多的一種方式。SDN的負載均衡天然也屬於軟件負載均衡的範疇。

1.隨機算法

Random隨機，按權重設置隨機機率。在一個截面上碰撞的機率高，但調用量越大分佈越均勻，並且按機率使用權重後也比較均勻，有利於動態調整提供者權重。

2.輪詢及加權輪詢

輪詢(RoundRobbin)當服務器羣中各服務器的處理能力相同時，且每筆業務處理量差別不大時，最適合使用這種算法。輪循，按公約後的權重設置輪循比率。存在慢的提供者累積請求問題，好比：第二臺機器很慢，但沒掛，當請求調到第二臺時就卡在那，長此以往，全部請求都卡在調到第二臺上。加權輪詢(Weighted Round Robbin)爲輪詢中的每臺服務器附加必定權重的算法。好比服務器1權重1，服務器2權重2，服務器3權重3，則順序爲1-2-2-3-3-3-1-2-2-3-3-3- ......

3.最小鏈接及加權最小鏈接

最少鏈接(LeastConnections)在多個服務器中，與處理鏈接數(會話數)最少的服務器進行通訊的算法。即便在每臺服務器處理能力各不相同，每筆業務處理量也不相同的狀況下，也可以在必定程度上下降服務器的負載。
加權最少鏈接(WeightedLeastConnection)爲最少鏈接算法中的每臺服務器附加權重的算法，該算法事先爲每臺服務器分配處理鏈接的數量，並將客戶端請求轉至鏈接數最少的服務器上。

4.哈希算法

一致性Hash，相同參數的請求老是發到同一提供者。當某一臺提供者掛時，本來發往該提供者的請求，基於虛擬節點，平攤到其它提供者，不會引發劇烈變更。

5.IP地址散列

經過管理髮送方IP和目的地IP地址的散列，未來自同一發送方的分組(或發送至同一目的地的分組)統一轉發到相同服務器的算法。當客戶端有一系列業務須要處理而必須和一個服務器反覆通訊時，該算法可以以流(會話)爲單位，保證來自相同客戶端的通訊可以一直在同一服務器中進行處理。

6.URL散列

經過管理客戶端請求URL信息的散列，將發送至相同URL的請求轉發至同一服務器的算法。

解決方案

在多個用戶併發訪問臺服務器的時候，服務器可能會出現性能降低甚至宕機的狀況。爲解決此種狀況，咱們組提出的方案是將用戶的訪問流量分擔不一樣的服務器上，也就是負載均衡的實現。目前傳統網絡的負載均衡存在硬件設備高成本和架構難的特色。所以咱們的方案是用軟件定義網絡（SDN）來實現網絡流量的負載均衡。在獨立的SDN控制器POX控制器，經過python腳本實現與部署該方案。

1、負載均衡架構

SDN的負載均衡的實現架構部署爲三層，分別爲數據層、控制層、應用層。POX控制器用Restful API實現南向接口鏈接控制層與應用層，北向接口鏈接至Mininet軟件的拓撲網絡。主機以GET方法請求POX控制器的內容，POX控制器監測網絡數據並下發流表到各個交換機，交換機按照流標進行數據傳輸，選擇不一樣的服務器，完成流量的負載均衡，增大吞吐量，減小服務端的壓力。如圖所示爲三層架構。

2、策略算法

ip_loadbalancer.py

官方隨機算法實現的一種負載均衡代碼以下

展開查看

#Copyright 2013,2014 James McCauley
#
#Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
#You may obtain a copy of the License at:
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.

"""
A very sloppy IP load balancer.

Run it with --ip=
   
   
   

  
   
  --servers=IP1,IP2,... By default, it will do load balancing on the first switch that connects. If you want, you can add --dpid= 
 
   
     to specify a particular switch. Please submit improvements. :) """ from pox.core import core import pox log = core.getLogger("iplb") from pox.lib.packet.ethernet import ethernet, ETHER_BROADCAST from pox.lib.packet.ipv4 import ipv4 from pox.lib.packet.arp import arp from pox.lib.addresses import IPAddr, EthAddr from pox.lib.util import str_to_bool, dpid_to_str, str_to_dpid import pox.openflow.libopenflow_01 as of import time import random FLOW_IDLE_TIMEOUT = 10 FLOW_MEMORY_TIMEOUT = 60 * 5 class MemoryEntry (object): """ Record for flows we are balancing Table entries in the switch "remember" flows for a period of time, but rather than set their expirations to some long value (potentially leading to lots of rules for dead connections), we let them expire from the switch relatively quickly and remember them here in the controller for longer. Another tactic would be to increase the timeouts on the switch and use the Nicira extension which can match packets with FIN set to remove them when the connection closes. """ def __init__ (self, server, first_packet, client_port): self.server = server self.first_packet = first_packet self.client_port = client_port self.refresh() def refresh (self): self.timeout = time.time() + FLOW_MEMORY_TIMEOUT @property def is_expired (self): return time.time() > self.timeout @property def key1 (self): ethp = self.first_packet ipp = ethp.find('ipv4') tcpp = ethp.find('tcp') return ipp.srcip,ipp.dstip,tcpp.srcport,tcpp.dstport @property def key2 (self): ethp = self.first_packet ipp = ethp.find('ipv4') tcpp = ethp.find('tcp') return self.server,ipp.srcip,tcpp.dstport,tcpp.srcport class iplb (object): """ A simple IP load balancer Give it a service_ip and a list of server IP addresses. New TCP flows to service_ip will be randomly redirected to one of the servers. We probe the servers to see if they're alive by sending them ARPs. """ def __init__ (self, connection, service_ip, servers = []): self.service_ip = IPAddr(service_ip) self.servers = [IPAddr(a) for a in servers] self.con = connection self.mac = self.con.eth_addr self.live_servers = {} # IP -> MAC,port try: self.log = log.getChild(dpid_to_str(self.con.dpid)) except: # Be nice to Python 2.6 (ugh) self.log = log self.outstanding_probes = {} # IP -> expire_time # How quickly do we probe? self.probe_cycle_time = 5 # How long do we wait for an ARP reply before we consider a server dead? self.arp_timeout = 3 # We remember where we directed flows so that if they start up again, # we can send them to the same server if it's still up. Alternate # approach: hashing. self.memory = {} # (srcip,dstip,srcport,dstport) -> MemoryEntry self._do_probe() # Kick off the probing # As part of a gross hack, we now do this from elsewhere #self.con.addListeners(self) def _do_expire (self): """ Expire probes and "memorized" flows Each of these should only have a limited lifetime. """ t = time.time() # Expire probes for ip,expire_at in self.outstanding_probes.items(): if t > expire_at: self.outstanding_probes.pop(ip, None) if ip in self.live_servers: self.log.warn("Server %s down", ip) del self.live_servers[ip] # Expire old flows c = len(self.memory) self.memory = {k:v for k,v in self.memory.items() if not v.is_expired} if len(self.memory) != c: self.log.debug("Expired %i flows", c-len(self.memory)) def _do_probe (self): """ Send an ARP to a server to see if it's still up """ self._do_expire() server = self.servers.pop(0) self.servers.append(server) r = arp() r.hwtype = r.HW_TYPE_ETHERNET r.prototype = r.PROTO_TYPE_IP r.opcode = r.REQUEST r.hwdst = ETHER_BROADCAST r.protodst = server r.hwsrc = self.mac r.protosrc = self.service_ip e = ethernet(type=ethernet.ARP_TYPE, src=self.mac, dst=ETHER_BROADCAST) e.set_payload(r) #self.log.debug("ARPing for %s", server) msg = of.ofp_packet_out() msg.data = e.pack() msg.actions.append(of.ofp_action_output(port = of.OFPP_FLOOD)) msg.in_port = of.OFPP_NONE self.con.send(msg) self.outstanding_probes[server] = time.time() + self.arp_timeout core.callDelayed(self._probe_wait_time, self._do_probe) @property def _probe_wait_time (self): """ Time to wait between probes """ r = self.probe_cycle_time / float(len(self.servers)) r = max(.25, r) # Cap it at four per second return r def _pick_server (self, key, inport): """ Pick a server for a (hopefully) new connection """ return random.choice(self.live_servers.keys()) def _handle_PacketIn (self, event): inport = event.port packet = event.parsed def drop (): if event.ofp.buffer_id is not None: # Kill the buffer msg = of.ofp_packet_out(data = event.ofp) self.con.send(msg) return None tcpp = packet.find('tcp') if not tcpp: arpp = packet.find('arp') if arpp: # Handle replies to our server-liveness probes if arpp.opcode == arpp.REPLY: if arpp.protosrc in self.outstanding_probes: # A server is (still?) up; cool. del self.outstanding_probes[arpp.protosrc] if (self.live_servers.get(arpp.protosrc, (None,None)) == (arpp.hwsrc,inport)): # Ah, nothing new here. pass else: # Ooh, new server. self.live_servers[arpp.protosrc] = arpp.hwsrc,inport self.log.info("Server %s up", arpp.protosrc) return # Not TCP and not ARP. Don't know what to do with this. Drop it. return drop() # It's TCP. ipp = packet.find('ipv4') if ipp.srcip in self.servers: # It's FROM one of our balanced servers. # Rewrite it BACK to the client key = ipp.srcip,ipp.dstip,tcpp.srcport,tcpp.dstport entry = self.memory.get(key) if entry is None: # We either didn't install it, or we forgot about it. self.log.debug("No client for %s", key) return drop() # Refresh time timeout and reinstall. entry.refresh() #self.log.debug("Install reverse flow for %s", key) # Install reverse table entry mac,port = self.live_servers[entry.server] actions = [] actions.append(of.ofp_action_dl_addr.set_src(self.mac)) actions.append(of.ofp_action_nw_addr.set_src(self.service_ip)) actions.append(of.ofp_action_output(port = entry.client_port)) match = of.ofp_match.from_packet(packet, inport) msg = of.ofp_flow_mod(command=of.OFPFC_ADD, idle_timeout=FLOW_IDLE_TIMEOUT, hard_timeout=of.OFP_FLOW_PERMANENT, data=event.ofp, actions=actions, match=match) self.con.send(msg) elif ipp.dstip == self.service_ip: # Ah, it's for our service IP and needs to be load balanced # Do we already know this flow? key = ipp.srcip,ipp.dstip,tcpp.srcport,tcpp.dstport entry = self.memory.get(key) if entry is None or entry.server not in self.live_servers: # Don't know it (hopefully it's new!) if len(self.live_servers) == 0: self.log.warn("No servers!") return drop() # Pick a server for this flow server = self._pick_server(key, inport) self.log.debug("Directing traffic to %s", server) entry = MemoryEntry(server, packet, inport) self.memory[entry.key1] = entry self.memory[entry.key2] = entry # Update timestamp entry.refresh() # Set up table entry towards selected server mac,port = self.live_servers[entry.server] actions = [] actions.append(of.ofp_action_dl_addr.set_dst(mac)) actions.append(of.ofp_action_nw_addr.set_dst(entry.server)) actions.append(of.ofp_action_output(port = port)) match = of.ofp_match.from_packet(packet, inport) msg = of.ofp_flow_mod(command=of.OFPFC_ADD, idle_timeout=FLOW_IDLE_TIMEOUT, hard_timeout=of.OFP_FLOW_PERMANENT, data=event.ofp, actions=actions, match=match) self.con.send(msg) #Remember which DPID we're operating on (first one to connect) _dpid = None def launch (ip, servers, dpid = None): global _dpid if dpid is not None: _dpid = str_to_dpid(dpid) servers = servers.replace(","," ").split() servers = [IPAddr(x) for x in servers] ip = IPAddr(ip) #We only want to enable ARP Responder *only* on the load balancer switch, #so we do some disgusting hackery and then boot it up. from proto.arp_responder import ARPResponder old_pi = ARPResponder._handle_PacketIn def new_pi (self, event): if event.dpid == _dpid: #Yes, the packet-in is on the right switch return old_pi(self, event) ARPResponder._handle_PacketIn = new_pi #Hackery done. Now start it. from proto.arp_responder import launch as arp_launch arp_launch(eat_packets=False,**{str(ip):True}) import logging logging.getLogger("proto.arp_responder").setLevel(logging.WARN) def _handle_ConnectionUp (event): global _dpid if _dpid is None: _dpid = event.dpid if _dpid != event.dpid: log.warn("Ignoring switch %s", event.connection) else: if not core.hasComponent('iplb'): # Need to initialize first... core.registerNew(iplb, event.connection, IPAddr(ip), servers) log.info("IP Load Balancer Ready.") log.info("Load Balancing on %s", event.connection) # Gross hack core.iplb.con = event.connection event.connection.addListeners(core.iplb) core.openflow.addListenerByName("ConnectionUp", _handle_ConnectionUp)

解決方案

本次方案是在SDNHub_tutorial_VM_64（固然你也能夠在烏班圖上進行）系統上實現的。控制器使用POX，虛擬拓撲的搭建使用Mininet，服務器使用python建立簡易的HTTP服務器。進入系統後，經過」sudo mn –topo single，6 –controller=remote，port=6633」建立一個簡單的拓撲圖，其中6633指POX控制器的端口號。
step1.建立拓撲

實驗拓撲如圖所示，由六臺主機地址爲10.0.0.x（1-6）和交換機組成，POX控制器鏈接交換機。

step2.打開服務器
經過xterm[host]打開主機h1和h2，h1和h2做爲服務器實驗中對服務器的要求不是那麼高，因此服務器選用python的server模塊中的SimpleHTTPServer做爲HTTP服務器來響應請求包，HTTP服務器端口設置爲80。

step3.控制器與負載均衡策略
sudo ./pox.py log.level –DEBUG misc.ip_loadbalanced行POX控制器並同時打開了ip_loadbalancer，ip_loadbalancer主要負載均衡的策略實現，POX控制器用於處理流量。當出現「IP LOAD BALANCER READY」和「Server up」表示運行成功，負載均衡和HTTP服務器已開啓。

step4.發送請求
打開其餘的主機，做爲發送請求的主機，經過curl指令，對服務器server1和server2發起METHOD爲GET的請求，發送一個Request packet。請求成功，服務器會回送一個網頁信息。

同時使用多臺主機，重複請求控制器屢次，觀察POX的流量路徑走向。分析流量，能夠觀察到多臺主機請求控制器，最終請求到的服務器不同，流量的走向也不同。
在服務器端，h1和h2上能夠對收到的包進行拆包處理，在GET請求的這些過程當中，h1和h2並行工做，而且每次一樣的請求不會在同一臺服務器模擬上進行處理。

step5.抓包測試
爲了確保實驗結果的偶然性，咱們重複進行屢次測試，且使用wireshark進行抓包分析。.此次不進行大量重複發包實驗，而是使用階段性發包處理，第一次發送Request packet包後，間隔一段時間後再次發送一個包給控制器。以下爲第一次和第二結果的抓包結果，能夠看到第一次處理的服務器爲h2，第二次處理的服務器爲h1。
第一次：
第二次：

實驗視頻（英語好的朋友能夠嘗試跟着作一遍）>youtube

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。