Aggressive BGP fall-over behavior

From CT3

Jump to: navigation, search

By Ivan Pepelnjak

BGP support for Fast Peering Session Deactivation described in the IP Corner article »Designing Fast Converging BGP Networks« is a feature that tracks the reachability of the BGP neighbor in the IP routing table and disconnects the BGP session as soon as the neighbor is no longer reachable.

The IP corner article mentions that this feature might be too aggressive in environments with mixed routing protocols due to a small interval between the loss of primary IP route and installation of backup IP route. However, BGP sessions might be lost in pure link-state IGP environments if a directly connected interface of a BGP router fails.

The following topology will be used to illustrate this undesired behavior:

Figure 1: Test network topology

Two BGP routers (A1 and A2) are connected through the core network (C1 and C2) as well as with a backup lower-speed (256 kbps) link. Default OSPF costs are used on all links, thus the primary path from A1 to A2 goes through C1 and C2.

Contents

Core link failure

When a core link fails (for example, the core serial interface on C1), the following sequence of events unfolds on C1 and A1:

  1. C1 detects link loss and adjusts its IP routing tables.
  2. After the OSPF process detects interface loss, C1 generates modified router LSA and floods it.
  3. A1 receives modified router LSA from C1 and schedules SPF (delayed for 5 seconds unless tuned with timers throttle spf command).
  4. SPF computes new network topology and installs the backup route toward A2.

An IP route toward A2 is always present in the IP routing table; the BGP session is therefore not disconnected.

Figure 2: Core link failure

Selected (and heavily edited) debugging printouts taken on A1 illustrate the process:

08:09:08.263: OSPF: received update from 10.0.1.2, Serial1/0
08:09:08.263: OSPF: Rcv Update Type 1, LSID 10.0.1.2, Adv rtr 10.0.1.2, age 1, seq 0x80000008
...
08:09:13.267: OSPF: Begin SPF for topology Base with MTID 0 at 1083.844ms, process time 988ms

Directly connected failure

The behavior of A1 is completely different if the directly connected link (A1 to C1) fails:

  1. A1 detects interface loss and removes all routes using that interface from the IP routing table.
  2. After the OSPF process detects interface loss, A1 generates modified router LSA and floods it.
  3. A1 schedules SPF (delayed for 5 seconds unless tuned with timers throttle spf command).
  4. SPF computes new network topology and installs the backup route toward A2.
Figure 3: Failure of a directly connected link

The route toward A2 is lost between the moment the interface loss is detected and the moment SPF computes new network topology. The BGP session with A2 is thus disconnected even though a backup route exists and is eventually used.

Debugging printouts (heavily edited) generated by A1 illustrate this behavior:

08:07:06.519: RT: interface Serial1/0 removed from routing table
08:07:06.527: RT: delete route to 10.0.1.4 via 10.0.7.10, Serial1/0
08:07:06.531: RT: no routes to 10.0.1.4, flushing
08:07:06.567: %BGP-5-ADJCHANGE: neighbor 10.0.1.4 Down Route to peer lost
08:07:07.583: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial1/0, changed state to down
…
08:07:16.519: %OSPF-5-ADJCHG: Process 1, Nbr 10.0.1.2 on Serial1/0 from FULL to DOWN, Neighbor Down: →
 Interface down or detached
…
08:07:17.027: OSPF: Build router LSA for area 0, router ID 10.0.1.1, seq 0x80000006, process 1
…
08:07:22.007: OSPF: Begin SPF for topology Base with MTID 0 at 972.580ms, process time 780ms
08:07:22.007:       spf_time 00:16:07.580, wait_interval 5000ms
08:07:22.019: RT: updating ospf 10.0.1.4/32 (0x0) via 10.0.7.6 Se1/1
08:07:22.023: RT: add 10.0.1.4/32 via 10.0.7.6, ospf metric [110/391]
The actual sequence of events depends on whether the interface loses physical carrier (in which case the interface loss is immediate) or layer-2 connectivity (detected through layer-2 keepalives).

Workarounds

OSPF convergence optimization might result in fast discovery of backup path (avoiding the loss of BGP session). OSPF has to detect interface loss and install the backup path before the layer-1 or layer-2 detects interface loss. To improve OSPF convergence, use the following techniques:

  • Fast OSPF neighbor loss detection configured with ip ospf dead-interval minimum interface configuration command or BFD-assisted neighbor loss detection.
  • Fast SPF response configured with timers throttle spf router configuration command.
Even a very fast converging OSPF network might not prevent route loss in case of carrier loss on an optical link. The behavior might also be platform-, hardware- and software release specific.

A network design in which every BGP router has two equal-cost paths into the network core is the only reliable means of preventing BGP session loss when using Fast Session Deactivation. When a BGP router has two equal-cost paths, both of them are installed in the IP routing table and thus an alternate path is used immediately following a link failure.

Router configurations

The following configurations were used in the tests. All routers were running IOS release 12.2(33)SRC3.

Configuration of A1

version 12.2
service timestamps debug datetime msec
service timestamps log datetime msec
no service password-encryption
!
hostname A1
!
logging buffered 4096
!
no aaa new-model
ip subnet-zero
!
ip cef
no ip domain lookup
ip host A1 10.0.1.1
ip host C1 10.0.1.2
ip host C2 10.0.1.3
ip host A2 10.0.1.4
!
interface Loopback0
 ip address 10.0.1.1 255.255.255.255
 ip ospf 1 area 0
!
interface Serial1/0
 description Link to C1(ROUTER) s1/0
 ip address 10.0.7.9 255.255.255.252
 encapsulation ppp
 ip ospf dead-interval minimal hello-multiplier 5
 ip ospf 1 area 0
 keepalive 1
 serial restart-delay 0
!
interface Serial1/1
 description Link to A2(ROUTER) s1/1
 bandwidth 256
 ip address 10.0.7.5 255.255.255.252
 encapsulation ppp
 ip ospf 1 area 0
 keepalive 1
 serial restart-delay 0
!
router ospf 1
 log-adjacency-changes
 timers throttle spf 2 1000 2000
!
router bgp 65000
 no synchronization
 bgp log-neighbor-changes
 neighbor 10.0.1.4 remote-as 65000
 neighbor 10.0.1.4 update-source Loopback0
 neighbor 10.0.1.4 fall-over
 no auto-summary
!
ip classless
!
line con 0
 exec-timeout 0 0
 privilege level 15
 logging synchronous
 transport preferred none
 stopbits 1
line aux 0
 stopbits 1
!
ntp logging
end 

Configuration of R2

version 12.2
service timestamps debug datetime msec
service timestamps log datetime msec
no service password-encryption
!
hostname A2
!
logging buffered 4096
!
no aaa new-model
ip subnet-zero
!
ip cef
no ip domain lookup
ip host A1 10.0.1.1
ip host C1 10.0.1.2
ip host C2 10.0.1.3
ip host A2 10.0.1.4
!
interface Loopback0
 ip address 10.0.1.4 255.255.255.255
 ip ospf 1 area 0
!
interface Serial1/0
 description Link to C2(ROUTER) s1/0
 ip address 10.0.7.14 255.255.255.252
 encapsulation ppp
 ip ospf 1 area 0
 keepalive 1
 serial restart-delay 0
!
interface Serial1/1
 description Link to A1(ROUTER) s1/1
 bandwidth 256
 ip address 10.0.7.6 255.255.255.252
 encapsulation ppp
 ip ospf 1 area 0
 keepalive 1
 serial restart-delay 0
!
router ospf 1
 log-adjacency-changes
!
router bgp 65000
 no synchronization
 bgp log-neighbor-changes
 neighbor 10.0.1.1 remote-as 65000
 neighbor 10.0.1.1 update-source Loopback0
 neighbor 10.0.1.1 fall-over
 no auto-summary
!
ip classless
!
line con 0
 exec-timeout 0 0
 privilege level 15
 logging synchronous
 transport preferred none
 stopbits 1
!
ntp logging
end 

Configuration of C1

version 12.2
service timestamps debug datetime msec
service timestamps log datetime msec
no service password-encryption
!
hostname C1
!
logging buffered 4096
!
no aaa new-model
ip subnet-zero
!
ip cef
no ip domain lookup
ip host A1 10.0.1.1
ip host C1 10.0.1.2
ip host C2 10.0.1.3
ip host A2 10.0.1.4
!
interface Loopback0
 ip address 10.0.1.2 255.255.255.255
!
interface Serial1/0
 description Link to A1(ROUTER) s1/0
 ip address 10.0.7.10 255.255.255.252
 encapsulation ppp
 ip ospf dead-interval minimal hello-multiplier 5
 keepalive 1
 serial restart-delay 0
!
interface Serial1/1
 description Link to C2(ROUTER) s1/1
 ip address 10.0.7.17 255.255.255.252
 encapsulation ppp
 ip ospf dead-interval minimal hello-multiplier 5
 keepalive 1
 serial restart-delay 0
!
router ospf 1
 log-adjacency-changes
 network 0.0.0.0 255.255.255.255 area 0
!
ip classless
!
line con 0
 exec-timeout 0 0
 privilege level 15
 logging synchronous
 transport preferred none
 stopbits 1
!
ntp logging
end 

Configuration of C2

version 12.2
service timestamps debug datetime msec
service timestamps log datetime msec
no service password-encryption
!
hostname C2
!
logging buffered 4096
!
no aaa new-model
ip subnet-zero
!
ip cef
no ip domain lookup
ip host A1 10.0.1.1
ip host C1 10.0.1.2
ip host C2 10.0.1.3
ip host A2 10.0.1.4
!
interface Loopback0
 ip address 10.0.1.3 255.255.255.255
!
interface Serial1/0
 description Link to A2(ROUTER) s1/0
 ip address 10.0.7.13 255.255.255.252
 encapsulation ppp
 keepalive 1
 serial restart-delay 0
!
interface Serial1/1
 description Link to C1(ROUTER) s1/1
 ip address 10.0.7.18 255.255.255.252
 encapsulation ppp
 ip ospf dead-interval minimal hello-multiplier 5
 keepalive 1
 serial restart-delay 0
!
router ospf 1
 log-adjacency-changes
 network 0.0.0.0 255.255.255.255 area 0
!
ip classless
!
line con 0
 exec-timeout 0 0
 privilege level 15
 logging synchronous
 transport preferred none
 stopbits 1
!
ntp logging
end 

Additional Resources  

Configuring BGP on Cisco Routers (BGP) course
Other links
Personal tools

CT3

Main menu