Forwarding and routingnode
2 important network-layer functions – forwarding and routing.session
Forwarding: The router-local action of transferring a packet from an input link interface to the appropriate output link interface (within a single router).app
Terms ‘forwarding’ and ‘switching’ are often used interchangeably.less
Every router has a forwarding table. A router forwards a packet by examining the value of a field in the arriving packet’s header, and then using this header value to index into the router’s forwarding table. The value stored in the forwarding table entry for that header indicates the router’s outgoing link interface to which that packet is to be forwarded.dom
Routing: The network-wide process that determines the end-to-end paths that packets take from source to destination (involves all of a network’s routers).async
Switches & routerside
Packet switch: A general packet-switching device that transfer a packet from input link interface to output link interface, according to the value in a field in the header of the packet.oop
Link-layer switches: Packet switches that base their forwarding decision on values in the fields of the link layer frame. [layer 2 devices]ui
Routers: Packet switches that bases their forwarding decision on the value in the network layer field. [layer 3 devices]this
Connection setup
Connection setup: The process of the routers along the chosen path from source to destination handshaking with each other in order to set up state before network-layer data packets can begin to flow.
Connection setup is an another important network-layer function required by some network-layer architectures. (e.g. ATM, frame relay, MPLS.)
Network service models
Network service model: defines the characteristics of end-to-end transport of packets between sending and receiving end systems.
Some possible services that the network layer could provide:
Guaranteed delivery; Guaranteed delivery with bounded delay; In-order packet delivery; Guaranteed minimal bandwidth; Guaranteed maximum jitter; Security services, etc.
The Internet’s network layer provides a single service – best-effort service.
Virtual circuit and datagram network
A network layer can provide connection and connectionless services between 2 hosts, which have some parallels with transport-layer connection-oriented and connectionless services. The crucial differences are:
1) In the network layer, host-to-to services are provided by the network layer for the transport layer;
In the transport layer, process-to-process services are provided by the transport layer for the application layer.
2) In many major computer network architectures, the network layer provides either a host-to-host connectionless service or a host-to-host connection service, but not both.
3) The transport-layer connection-oriented service is implemented at the edge of the network in the end systems;
The network-layer connection service is implemented in the routers in the network core as well as in the end systems.
Virtual-circuit network and datagram network are 2 fundamental classes of computer networks.
Virtual-circuit (VC) networks: Computer networks that provide only a connection service at the network layer. E.g. ATM, frame relay.
Datagram networks: Computer networks that provide only a connectionless service at the network layer. E.g. The Internet.
Virtual-circuit networks
Virtual circuits (VCs): Network-layer connections used in VC networks. A VC consists of (1) a path (a series of links and routers) between the source and destination hosts; (2) VC numbers, one number for each link along the path; (3) entries in the forwarding table in each router along the path.
Each router has number translation in its forwarding table, so that it can replace the VC number of each traversing packet with a new VC number according to the forwarding table.
Signaling messages: The message that the end systems send into the network to initiate or terminate a VC, and the messages passed between the routers to set up the VC.
Signaling protocols: The protocols used to exchange signaling messages.
e.g. The forwarding table in a VC network router:
When a new VC is established across a router, an entry is added to the forwarding table;
When a VC terminates, the appropriate entries in each table along its path are removed.
Why a packet doesn’t just keep the same VC number on each of the links along its route?
1) To reduce the length of the VC field in the packet header;
2) To simplify VC setup (otherwise the routers have to exchange and process a substantial number of message to agree on a common VC number).
Each router must maintain connection state information for the ongoing connections.
3 phases in a VC:
1) VC setup:
The sending transport layer contacts the network layer, specifies the receiver’s address; ->
The network layer determines the path, and the VC number for each link along the path; ->
The network adds an entry in the forwarding table in each router along the path.
(The network layer may also reserve resources along the path.)
2) Data transfer:
Once the VC has been established, packets begin to flow along the VC.
3) VC teardown:
The sender or receiver informs the network layer of its desire to terminate the VC; ->
The network layer informs the end system on the other side of the call termination, and update the forwarding tables in each of the routers on the path to indicate that the VC no longer exists.
Distinction between the VC setup at the network layer and connection setup at the transport layer:
Transport-layer connection setup involves only the 2 end systems, only the 2 end systems are aware of the transport-layer connection.
Network-layer VC setup involves routers along the path, each router is aware of all the VCs passing through it.
Datagram networks
In a datagram network, each time an end system wants to send a packet, it stamps the packet with the address of the destination end system and then pops the packet into the network.
Each router has a forwarding table that maps destination address to link interfaces. When a packet arrives at a router, the router looks up the appropriate output link interface and forward the packet.
The router matches a prefix of the packet’s destination address with the entries in the table.
When there are multiple matches, the router uses the longest prefix matching rule – to choose the longest matching entry in the table.
e.g. The forwarding table in a datagram network router
(Meaning: )
Inside a router
4 router components
Input ports:
Physical layer function – to terminate an incoming physical link at a router;
Link-layer functions – to interoperate with the link layer at the other side of the incoming link;
Lookup function – to use the forwarding table to determine the output port that an arriving packet will be forwarded via the switching fabric.
(Control packets: packets carrying routing protocol information, are forwarded from an input port to the routing processor.)
Switching fabric: through which that packets are switched from an input port to an output port.
Output ports: stores packet received from the switching fabric and transmit them on the outgoing link (by performing link-layer and physical layer functions).
When a link is bidirectional, an output port will typically be paired with the input port for that link on the same line card.
Line card: a printed circuit board containing one or more input ports, which is connected to the switching fabric.
Routing processor:
Executes the routing protocols;
Maintains (updates) routing tables and attached link state information;
Computes the forwarding table for the router;
Performs the network management functions.
The router forwarding plane functions (i.e. the forwarding functions of a router) are always implemented in hardware, because they must operate at the nanosecond time scale – far too fast for software implementation.
The router control plane functions (i.e. the control functions of a router) are always implemented in software, because they operate at the millisecond or second timescale.
For today's Internet routers and routing algorithms, the network-wide routing control plane is decentralized -- with different pieces executing at different routers and interacting by sending control messages to each other.
Input Port Processing
A shadow copy of the forwarding table (maintained by the routing processor) is stored at each input port, thus forwarding decisions can be made locally.
「Match plus action」 is A general abstraction performed in many networked devices, the input port processing in a router is a special case. (「match」 – looks up an IP address, 「action」 – sends the packet into the switching fabric.)
Switching
Switching can be accomplished in a number of ways:
1) via memory
early routers – switching is done under direct control of the routing processor.
modern routers – the lookup of the destination address and the storing of the packet are performed by processing on the input line cards.
2) via a bus
An input port transfers a packet directly to the output port over a shared bus (without intervention by the routing processor).
The input port pre-pends a switch-internal label to the packet. All output ports receive the packet, but only the output port that matches the label will keep it.
3) via an interconnection network
crossbar switch: An interconnection network consisting of 2N buses that connect N input ports to N output ports.
Output Processing
Where does queueing occur?
How much buffer is required to absorb the fluctuations in traffic load?
B – The amount of buffering; RTT – An average round-trip time; C – The link capacity; N – the number of TCP flows.
Early rule (for small N): B = RTT·C
Recent rule (for large N): B = RTT·C/√N
Packet queues may form at both the input ports and the output ports.
At output port –
1) A packet scheduler must choose one packet among those queued for transmission.
2) When there is not enough memory, a decision must be made to either drop the arriving packet (i.e. drop-tail) or remove one or more already-queued packets.
Active queue management (AQM) algorithms: Algorithms of packet-dropping and marking policies, used to drop (or mark the header of) a packet before the buffer is full.
Random Early Detection (RED) algorithm: One of the AQM algorithms. Under RED, a weighted average is maintained for the length of the output queue.
If the (average) queue length < the minimum threshold minth, the (newly arriving) packet will be admitted to the queue;
If the queue length > the maximum threshold maxth, the packet will be marked or dropped;
If the queue length is in [minth, maxth], the packet will be marked or dropped with a probability that is some function of the queue length, minth, and maxth.
At input port – If the switch fabric is not fast enough, packet queueing can also occur.
Head-of-the-line (HOL) blocking: The phenomenon that a queued packet in an input queue must wait for transfer through the fabric (even though its output port is free) because it is blocked by another packet at the head of the line.
IP
The Internet’s network layer has 3 major components – 1) the IP protocol; 2) the Internet routing protocols e.g. RIP, OSPF, BGP; 3) facility to report and respond (ICMP)
IPv4
IPv4 Datagram format
Version number (4 bits)
Header length (4 bits): used to determine where the data actually begins.
Type of service (TOS): used to allow different types of IP datagrams to be distinguished. E.g. datagrams that requires low delay/high throughput/reliability.
Datagram length (16 bits): the total length of the datagram (header+data).
Identifier, flags, fragmentation offset: have to do with IP fragmentation.
Time-to-live (TTL): used to ensure datagrams do not circulate forever. Each time the datagram is processed by a router, TTL-1. If TTL reaches 0, the datagram will be dropped.
Protocol: indicates the specific transport-layer protocol to which this datagram should be passed (used only when the datagram reaches its final destination).
Header checksum: aids in detecting bit errors, must be recomputed and stored again at each router.
Why does TCP/IP perform error checking in both the transport and network layers?
1) IP layer only checksums the IP header, TCP/UDP checksums the entire TCP/UDP segment; 2) TCP/UDP and IP do not necessarily have to belong to the same protocol stack.
Source and destination IP address
Options: allows an IP header to be extended. (were dropped in IPv6)
Data (payload): contains the transport-layer segment in most circumstances, can carry other data such as ICMP messages.
A typical IP datagram has a total of 20 bytes of header (assuming no options).
IP Datagram Fragmentation
MTU (the maximum transmission unit): The maximum amount of data that a link-layer frame can carry, depends on the link-layer protocol.
Problem: For a router that interconnects several links with different MTUs, how to squeeze oversized IP datagram into the link-layer frame (to send it to the link that has a smaller MTU)?
Solution: To fragment the data in the IP datagram into two or more smaller IP datagrams (i.e. fragment) and encapsulate these fragments in separate link-layer frames.
How to reassemble fragments before they reach the transport layer at the destination?
1) The sending host stamps the datagram with identification number, source address and destination address;
2) When a router needs to fragment a datagram, it stamps each resulting fragment with the identification number, source address and destination address of the original datagram:
The flag bit of the last fragment is set to 0, flag bits of all the other fragments are set to 1;
The offset field of each fragment specifies where the fragment fits within the original datagram.
3) At the destination, the payload of the datagram is passed to the transport layer only after the IP layer has fully reconstructed the original IP datagram.
Datagrams are reassembled in the end systems rather than network routers. (to keep the network core simple)
e.g. P337, Figure 4.14, TABLE 4.2.
Costs of fragmentation: 1) complicates routers and end systems; 2) can be used to create lethal DoS attacks.
IPv6 does away with fragmentation.
IPv4 Addressing
Interface: The boundary between the router/host and any one of its physical links. (a host typically has only a single link)
IP requires each host and router interface to have its own IP address. Each IP address is 32 bits (4 bytes) long.
Dotted-decimal notation: Each byte of the IP address is written in its decimal form, separated by a dot from other bytes. E.g. 11000001 00100000 11011000 00001001 -> 193.32.216.9.
Subnet: Detach each interface from its host or router, creating islands of isolated networks, with interfaces terminating the end points of the isolated networks. Each of these isolated networks is a subnet.
Subnet mask: /x notation that indicates the leftmost x bits of the 32-bit address define the subnet address. E.g. subnet address 223.1.1.0/24.
CIDR (Classless Interdomain Routing): The Internet’s address assignment strategy. As with subnet addressing, an IP address has the form a.b.c.d/x.
Prefix: The x leading bits of address a.b.c.d/x. An organization is typically assigned a range of addresses with a common prefix, only these x leading bits are considered by routers outside, the remaining 32-x bits are considered when forwarding packets at routers inside.
Classful addressing: Addressing scheme before CIDR, constrained IP addresses to be 8, 16 or 24 bits in length. Subnets with 8-, 16- and 24-bit subnet addresses were known as class A, B and C networks.
IP broadcast address: 255/255/255/255. When a host sends a datagram with this destination address, the message is delivered to all hosts on the same subnet. Routers optionally forward the message into neighboring subnets as well (though usually don’t).
1. Obtain a block of IP addresses (for use within an organization’s subnet)
ICANN allocates addresses to ISPs (through its regional Internet registries)-> The network administrator get a smaller block of addresses from its ISP.
2. Assign individual address to the host and router interfaces (in the organization)
Router addresseswill be manually configured by the system administrator typically;
Host addressescan also be configured manually, but more often is done using DHCP.
DHCP (Dynamic Host Configuration Protocol): A client-server protocol used to Allocate IP addresses to hosts automatically. A client is a typically newly arriving host wanting to obtain network configuration information.
A network administrator can configure DHCP so that a host 1) receives the same IP address each time it connects to the network; or 2) is assigned a temporary IP address that will be different each time it connects to the network.
DHCP is often referred to as a plug-and-play protocol.
DHCP also allows a host to learn additional information, e.g. its subnet mask; the address of its first-hop router (i.e. the default gateway); the address of its local DNS server.
DHCP is useful in residential Internet access network, wireless LANs and residential ISP access network.
4-step process of DHCP
1) DHCP server discovery
The client passes an IP datagram that encapsulate DHCP discover message (a UDP packet to port 67), with broadcast destination IP address (255.255.255.255) and 「this host」 source IP address (0.0.0.0) to the link layer.
2) DHCP server offer(s)
A DHCP server responds with a DHCP offer message (contains the transaction ID of the received discover message, the proposed IP address for the client, the network mask and an IP address lease time) that is broadcast to all nodes on the subnet.
3) DHCP request
The client choose from among one or more server offers and respond to one with a DHCP request message, echoing back the configuration parameters.
4) DHCP ACK
The server responds with a DHCP ACK message, confirming the requested parameters.
NAT
Problem: In a realm with private addresses, addresses only have meaning to devices within that network, then how is addressing handled when packets are sent to or received from the global Internet?
Solution: NAT (network address translation). The NAT-enabled router behaves to the outside world as a single device with a single IP address.
How does the NAT-enable router know the internal host to which it should forward a given datagram?
Use a NAT translation table:
1) A host in the home network sends a request datagram to a server outside;
2) The NAT router receives the datagram, replaces the source IP address with its WAN-side IP address, replaces the source port number with a newly generated source port number (that is not currently in its NAT translation table), adds an entry to the NAT translation table, and send out the datagram;
3) The server outside responds with a datagram sent to the NAT router;
4) The NAT router indexes the NAT translation table to obtain the appropriate destination IP address and port number, rewrites the datagram and forwards it to the host.
Major Problem: NAT traversal for P2P application – a Peer B behind a NAT cannot accept TCP connection from a Peer A.
Solution: connection reversal– Peer A (if not behind a NAT) can first contact a intermediate Peer C, to which B has established an ongoing TCP connection, then ask B via C to initiate a TCP connection back.
UPnP
UPnP (Universal Plug and Play): A protocol that allows external host to initiate communication sessions to NATed hosts using either TCP or UDP. (provides effective NAT traversal solution.)
With UPnP, an application running in a host can request a NAT mapping between its (private IP address, private port number) and a (public IP address, public port number), thus the outside nodes can communicate with it.
ICMP
ICMP (Internet Control Message Protocol): used by hosts and routers to communicate network-layer information. Typical use: error reporting.
An ICMP message contains 1) type; 2) code field; 3) the first 8 bytes of the IP datagram that caused the message to be generated.
e.g. some well-known ICMP message types
ICMP architecturally lies just above IP (i.e. ICMP messages are carried as IP payload like TCP or UDP).
Traceroute program is implemented with ICMP messages.
IPv6
IPv6 Datagram Format
Version(4 bits)
Traffic class(8 bits): similar to the TOS field in IPv4, can be used to give priority to certain datagrams (e.g. within a flow, from certain applications).
Flow label(20 bits): identifies a flow of datagrams. (IPv6 allows labeling of packets belonging to particular flows for which the sender requests special handling.)
Payload length(16 bits): the number of bytes in the datagram following the fixed-length 40-byte header.
Next header: identifies the protocol (e.g. TCP, UDP) to which the data field will be delivered, similar to the protocol field in IPv4.
Hop limit: decremented by 1 by each router that forwards the datagram, if reaches 0, the datagram is discarded.
Source and destination addresses
Data: the payload portion.
IPv4 fields no longer present in IPv6: 1) Fragmentation/Reassembly; 2) header checksum; 3) Options.
If a router receives an IPv6 datagram that is too large, it simply drops the datagram and sends a 「Packet Too Big」 ICMPv6 error message back so that the sender can resend the data using a smaller datagram size.
Possible Options of How To Transition from IPv4 to IPv6
1. Dual-stack approach
IPv6/IPv4 node: A Node that have the ability to send and receive both IPv4 and IPv6 datagrams, can determine whether another node is IPv6-capable or IPv4 only and use corresponding datagrams when interoperating.
If either the sender or the receiver is only IPv4-capable, an IPv4 datagram must be used.
Cons: In conversion from IPv6 to IPv4, information in IPv6-specific fields will be lost.
2. Tunneling
Tunnel: The intervening set of IPv4 routers between two IPv6 routers.
The IPv6 node on the sending side of the tunnel (B)puts the entire IPv6 datagram in the data field of an IPv4 datagram ->
The routers in the tunnel route this IPv4 datagram ->
The IPv6 node on the receiving side of the tunnel (E)extracts the IPv6 datagram and go on routing it.
Routing Algorithms
Ways to classify routing algorithms:
1. global or decentralized
Global routing algorithm: has complete information about connectivity and link costs before the calculation. E.g. LS.
Decentralized routing algorithm: Each node begins with only the knowledge of the costs of its own directly attached links, then calculates the path in an iterative distributed manner. E.g. DV.
2. static or dynamic
Static routing algorithm: routes change very slowly, often as a result of human intervention.
Dynamic routing algorithm: routes change as the network traffic loads or topology change.
3. load-sensitive or lead-insensitive
load-sensitive algorithm: link costs vary dynamically to reflect the current level of congestion in the underlying link.
load-insensitive algorithm: A link’s cost does not explicitly reflect its current level of congestion. (Today’s Internet routing algorithms)
LS Algorithm
The link-state (LS) routing algorithm: In which the network topology and all link costs are known (accomplished by a link-state broadcast algorithm), each node can run the algorithm and compute the set of least-cost paths.
Dijkstra’s algorithm:
u – source node;
D(v) – cost of the least-cost path from u to destination v as of this iteration;
p(v) – previous node (v’s neighbour) along the current least-cost path from u to v;
N’ – a subset of nodes, v is in N’ if the least-cost path from u to v is definitively known.
e.g.
DV Algorithm
The Bellman-Ford equation: dx(y) = minv{c(x,v) + dv(y)}
dx(y) – the cost of the least-cost path from x to y.
v – one of x’s neighbors.
The distance-vector (DV) routing algorithm: is 1) distributed– each node receives information from its neighbors, calculates and distributes the result back; 2) iterative– the process continues until no more information is exchanged; 3) asynchronous– does not require all nodes to operate in lockstep.
Each node x maintains the following routing information:
1) The cost c(x,v) for each neighbour v;
2) distance vector Dx= [Dx(y): y in N] i.e. x’s estimate of its costs to all destinations y;
3) distance vector Dv= [Dv(y): y in N] of each neighbour v.
e.g.
routing loop caused by link-cost changes & poisoned reverse see P376-378.
DV algorithm may encounter routing loop, i.e. a packet destined for x arriving at y or z bounce back and forth between the 2 nodes until the forwarding tables changed.
Specific looping scenario can be avoided by using poisoned reverse, i.e. z lie to y that Dz(x)=∞and then y poisons the reverse path by informing Dy(x)=∞ after receiving the update. (but does not solve the general count-to-infinity problem.)
Compare LS & DV
Message complexity
Speed of convergence:
LS is an O(|N|2) algorithm requiring O(|N||E|) messages;
the time needed for DV to converge can depend on many factors, and DV may suffer from routing loops.
Robustness:
Route calculations are somewhat separated under LS, providing a degree of robustness;
An incorrect node calculation can be diffused through the entire network under DV.
LS and DV are essentially the only routing algorithms used in practice today in the Internet.
Other routing algorithms can include:
1) Algorithms based on viewing packet traffic as flows between sources and destinations;
2) Circuit-switched routing algorithms – of interest to packet-switched data networking.
Hierarchical Routing
Problem: In practice, the model of 「all routers executing the same routing algorithm」 is too simplistic.
Solution: to organize routers into ASs.
Autonomous system (AS): A collection of routers under the same administrative and technical control, and all that all run the same routing algorithms and have information about each other.
Intra-AS routing protocol: a.k.a. interior gateway protocol. The routing algorithm running within an AS (e.g. LS, DV), used to determine routing paths that are internal to the AS.
Gateway routers: One or more of the routers in an AS that is responsible for forwarding packets to destinations outside the AS.
How to route a packet to a destination outside the AS?
1) If the source AS has only 1 link that leads outside, just route the packet to the gateway router (through the path determined by the intra-AS routing algorithm), then the gateway forward it to the outside.
2) If the source AS has 2 or more links that leads outside, let the inter-AS routing protocol handle the tasks of 「obtaining reachability information from neighboring ASs」 and 「propagating the reachability information to all routers internal to the AS」. Two communicating ASs must run the same inter-AS routing protocol.
- What if the destination is reachable via multiple gateways?
- A router need to determine to which gateway it should direct. e.g. one approach often employed –hot-potato routing (i.e. choose the gateway router that has smallest least cost.)