What is Spanning-Tree
- a layer 2 protocol used to prevent loops, it prevent only layer 2 loop (switches loops).
- the reason behind loop is the redundant links or the circle topology between switches.
- we use redundant links to avoid link failure.
- spanning tree can take down the whole network, if not implemented correctly.
Operation steps
- Elect Root Bridge ( Root switch)
- lowest Bridge ID
- priority
- lowest mac-address
- lowest Bridge ID
- Elect the Root port (on non-root switches)
- best accumulated Path to the root bridge.
- lowest Bridge ID
- port priority (of the sender)
- lowest Port number (of the sender)
- Block all other segments (links)
- each link is a connection between 2 switches.
- the switch that send the superior BPDU will be forwarding and the other will be blocking
- apply same roles as electing root port.
if you don’t know what is those don’t worry, for now let’s just see them in real situation and I will explain more later
Example 1 : apply operation steps
1: Elect Root Bridge
- all same Priority (Tie)
- sw 1 : aabb.cc00.0100
- Sw 1 is lowest mac-address then he is the root
2: Elect the Root Port
- Best Path found with out tie
- Sw 3 E0/2 (cost 200)
- Sw 3 E0/2 is the Root Port (cost 100)
- Sw 2 E0/0 (cost 200)
- Sw 2 E0/1 is the Root Port (cost 100)
3: Blocking what is left (segment between Sw-2 , Sw-3)
- Both switches have same Cost the root . (Tie)
- switch 2 have lower switch priority (Winner)
- switch 2 E0/1 stay forward.
- switch 3 E0/1 will block (looser)
if you new to spanning-tree and don’t understand the terms, don’t worry. i am just giving you an idea by this.
The previous example is what happened. the rest of the article will explain how this happened.
Spanning-Tree timers
- Hello
- the interval between sending BPDUs and it’s 2 second by default
- Message-Age
- its not a fixed time, but instead its the hop count and each switch add 1 to the age .
- Max-Age
- the time before spanning-tree delete BPDU information
- Forward-Time
- 15 second by default
- Diameter of the STP domain (dia)
- 7 hops by default
- Many more
- http://www.cisco.com/c/en/us/support/docs/lan-switching/spanning-tree-protocol/19120-122.html
Port Stats
- common spanning-tree
- Disabled
- Blocking
- last 20 second and to be accurate it will be (Max-age – Message age)
- if no BPDU received after 20 second the port will move to listening state
- Listening
- last 15 second (default Forward-Time)
- the switch will listen for incoming BPDUs
- the switch will not add any mac-address received to the mac-address table
- Learning
- last 15 second (default Forward-Time)
- the switch start to transmit BDPU
- the switch still listening to BDPU
- the switch will start adding mac-addresses to the mac-address table
- Forwarding
- the port will send and receive frames
point’s to take in mind from the stats of common spanning-tree
- it will take 30 second for a port to be operational (2x Forward-Time)
- it will up to 50 second for the backup port to be operational ( 20 + 15 +15)
- there are ways to avoid such long period, I will mention them in the enhancements section .
- rapid spanning-tree
- discarding (instead of disabled)
- discarding (instead of blocking)
- discarding (instead of listening)
- learning (same)
- forwarding (same)
- we haven’t talked about rapid spanning-tree yet, there is no big difference so don’t worry about it for now
Port Roles
- port state forwarding will have 2 roles :
- Root port
- this is best port to reach the Root Bridge
- the state is forward and the role is Root
- Designated port
- all other ports except the root port and the blocked ports
- all Root Bridge ports at designated
- the state is forward and the role is Designated
- Root port
- Port state Blocked will have 2 roles :
- Alternate
- this is the backup route for root port
- the state is Blocked and the role is Alternate
- Backup
- this the backup for links on the same segment
- you will see this with a hub
- the state is blocked and the role is Backup
- Alternate
Types of Spanning-Tree
1: spanning-tree (802.1D) ( mono Spanning-tree) (MST)
- Normal version of spanning-tree created @1990
- there is another version created @1998
2: Per vlan Spanning-tree (PVST)
- for cisco switchs only, cisco switches don’t run the normal 802.1D which runs 1 instance for all vlans, instead they made the PVST which runs a different instance for every vlan.
- each vlan traffic flow is different from the other vlan, that depened on where is the root bridge for that vlan.
- Cisco used the 802.1D BPDU format
- on trunks those BPDUs are encapsulated in the ISL header, which has the vlan ID + BPDU flag.
- at the time only Cisco switches had trunks , which is ISL Trunk .
3: Per vlan Spanning-tree Plus (PVST+) (Shared Spanning-Tree) (SSTP)
- runs by default on cisco switches
- (+) is an enhancement that makes switches use cisco proprietary multicast address + the ieee multicast address, when you have non-cisco switch between cisco switches,
- this cisco multi-cast address will allow the flooding of PVST+ across non-cisco switchs.
- the original PVST wasn’t supported by 802.1Q Trunk protocol .
- more
4: Rapid Spanning-tree (802.1w) (new name 802.1D-2004)
- enhanced version of spanning tree to speed up convergence by introducing the concept of proposal and agreement.
- at 2004 802.1D-1998 has been removed , and 802.1W + some addtions is now called 802.1D-2004 . !! strange !!
5: Rapid per Vlan Spanning-tree (Rapid PVST+)
- for cisco switches only
- Cisco again made it’s switches run a different instance of 802.1w for each vlan.
6: multiple spanning-tree (MSTP) (802.1s) ( 802.1Q-2005)
well, the take away from this is that : when you want to see the IEEE RFC for rapid-spanning-tree go for 802.1D-2004
what is per Vlan spanning tree (PVST+)
This photo is why we use spanning-tree per vlan
important Notifications:
- switch 1 is the root bridge for all vlans
- all the traffic between accounting and human-resources will go through sw-1, because the link between sw-2 and sw-3 is blocked.
the Fix : give each vlan its own instance of spanning-tree
for such simple topology and since we have 3 root bridges. we are going to make reallocation we want.
important notifications :
- we have 3 root Bridges
- all port stats have been removed because it’s a topology diagram. not flow diagram
to draw the flow diagram we have to make 3 drawing, 1 for each vlan, here the traffic flow.
Click on the image then zoom if you want.
from the drawing you can see that each port will have 3 stats, which depend on traffic, if the traffic is going to vlan 2 so the link between Sw-3 and Sw-1 will be block , if it’s going to Vlan then the link between Sw-2 and Sw-2 will be block.
- when you choose the root bridge placement you consider the traffic flow .
PSVT+ Topology Change Notifications
when a link fails, the switch will take up to 50 second to recover.
but the mac-address table entries will take 5 min.
so even after the reachability to the root is fixed, your switch still send the traffic in the wrong direction
for such purpose there is a special message called “Topology change notification” (TCN)
TCN triggers
- a switch become the root switch
- Port goes down (very bad if the port connected to PC)
- port from learning –> blocking
- port from learning –> forwarding
- with at least one designated port
- Port fast doesn’t trigger topology change
when one of these event occur, the switch with the event will send TCN message to notify to root bridge
- the message doesn’t include the sender bridge ID
- it will be sent to the root and each switch acknowledge the previous one until it reach the root bridge
- the switch will keep sending the TCN to the next switch until it receives an ACK
when the root Bridge receive the TCN message it will do the following :
- the root will flood this message for 35 second (Max-Age + Forward-Time), by Setting a flag called TC-Flag on it’s BPDU’s
- during that time all BPDU’s are called Topology-Change BDPU (TC-BPDU)
- that BPDU is reducing the cam table time from 5 minutes to 15 second (forward-delay time) on all receiving switches
- all switches that receive this TC-BPDU will also reduce cam table to (forward-delay time)
Sw-1#show spanning-tree detail VLAN0001 is executing the ieee compatible Spanning Tree protocol Bridge Identifier has priority 32768, sysid 1, address aabb.cc00.0100 Configured hello time 2, max age 20, forward delay 15 We are the root of the spanning tree Topology change flag set, detected flag set (detected flag is set on the root only) Number of topology changes 2 last change occurred 00:00:34 ago from Ethernet0/0 (you can use this to track the source) Times: hold 1, topology change 35, notification 2 hello 2, max age 20, forward delay 15 Timers: hello 0, topology change 25, notification 0, aging 15 (instead of 300)
you can check this article from cisco for more information Cisco
BPDU Flow
- Only the Root Bridge transmit BPDUs
- each switch relay that BPDU to the other switches after some modifications
- adding Root port cost
- source Bridge ID
- source port ID
- increment message-age
- BPDU are only send & relayed through Designated ports
- switches doesn’t send BPDU out from Root ports & blocked ports ( it’s useless since they are going to be inferior BPDUs)
- Designated port store the BPDU they send
- Root and blocking port store the best received BPDU
- received BPDU will expire in Max-age – Message-age
Failure Handling
- Direct Failure
- In-Direct Failure
Spanning-tree enhancements
- Port fast
- BPDU guard
- BPDU Filter
- Root guard
- Loop guard
- Uplink fast ( Built in rapid spanning tree)
- Backbone fast (Built in rapid spanning tree)
- UDLD
1: Port Fast
- instead of going through listening and learning, the port will go to forwarding immediately.
- doesn’t trigger topology change.
- this point is the most important, because if your access-ports cause TCN when ever you connect a host, your entire network can be congested from the flooding of unicast frame that has no entry in the mac-address table because of the 15 second max-age.
- better to use this feature as you can.
- can cause a loop if you connected a switch to unprotected port.
- activation methods
- Sw-1(config-if)#spanning-tree portfast
- Sw-1(config)#spanning-tree portfast
- activate portfast on all access links
- Sw-1(config-if)#switchport host
- set mode to access
- enable port-fast
- disable channel-group
Sw-1(config)#spanning-tree portfast default(default will enable it on all access-ports) %Warning: this command enables portfast by default on all interfaces. You should now disable portfast explicitly on switched ports leading to hubs, switches and bridges as they may create temporary bridging loops.
2: BPDU guard
- used with portfast to protect the network if someone connected a switch instead of Host
- if a BPDU received on this port the port will go into err-disable mode
- activation methods
- Sw-1(config-if)#spanning-tree bpduguard enable
- Sw-1(config)#spanning-tree portfast bpduguard default (default mean all access links)
3: BPDU filter
- don’t send or receive BPDU
- spanning-tree will ignore received BPDU
- same activation method BPDU guard
4: Root guard
- protect the root bridge from any superior BPDU
- put the port in root-inconsistent, until that BPDU
- activation method
- Sw-1(config-if)#spanning-tree guard root
5: loop guard
- if a switch has port and this port stats is blocked, and if for some reasons the BPDU comming to that blocked port stopped, this port will go to forwarding state, and if the reason stop is bidirectional problem then a loop will occur .
- the loop guard will keep the port in the blocking state.
6: Uplink fast
- Work with direct physical lose only
- Require at least 1 blocked port .
- Activated on all vlans and cannot be enabled for individual vlan
- Change the blocked port immediately to be root port and save 30 second of learning and listening .
- Create a dummy multicast on behave of the hosts with a default rate of 150 pps for learned mac-addresses
- if SW1 is connected to other switches that send frames to SW2 , it will be sent to E0/1 which is down.
- thus SW2 will flood those multicast so SW1 know that those addresses are at E0/2 not E0/1 .
- Should be applied on access-layer switch , because of the dummy mac-address implementation and that’s why it :
- add 3k to the port cost (to ensure that port is not likely to be elected as Designated port)
- makes the priority 49152 ( to ensure that the switch is not likely the root Bridge because the root doesn’t have Root ports )
when the E0/1 comes up, it will take 2x forwarding-delay (30 second) + 5 second( cdp/ ether…etc) for the port to reach the forwarding state.
7: BackBone fast
- work with indirect failure
- instead of waiting MaxAge-MessageAge to move a blocking port from blocking to listening the switch will save that time (~20 sec ) and move the port directly to listening .
- when a switch receive an inferior BPDU, instead of ignoring it . it will check to see if current root bridge is still active or not
- if the Root Bridge is active, then the blocking port will be moved to listening and the current root BPDU will be forwarded to the switch that claiming to be the root
- must be activated on all switches , or RLQ will not be responded .
Manipulating Root Bridge (Per Vlan)
- 1: change priority
- the switch with the lowest priority will win
- priority 0 will become the root if it has lowest mac-address when there is another switch with priority.
- example :
sw-3(config)#spanning-tree vlan 1 priority 0
- 2: Root Primary command
- Usage
- to make non root switch be the root bridge with out adjusting priority.
- behavior
- when the Root Bridge priority is more than 24576
- the switch will set the its priority to 24576. and declare himself the root bridge.
- when the Root Bridge priority is less than or equal to 24576 :
- if the switch mac-address is lower than the root
- the switch will set the priority as the same as the root, since he is going to win because his mac-address is low
- if the switch mac-address is higher than the root
- is such case the switch will change the priority to next lower one to win .
- if the switch mac-address is lower than the root
- when the Root Bridge priority is more than 24576
- limitation
- if the root bridge has a priority of 0 and you tried to apply this command you will get error message.
- this command doesn’t automatically respond to topology change, if you changed the priority in a switch and that switch became the root bridge. you have to manually re-enter this command
- example :
sw-3(config)#spanning-tree vlan 1 root primary sw-3(config)# *Sep 16 05:25:30.171: setting bridge id (which=1) prio 24577 prio cfg 24576 sysid 1 (on) id 6001.aabb.cc00.0300 *Sep 16 05:25:30.171: STP: VLAN0001 we are the spanning tree root *Sep 16 05:25:30.171: STP: VLAN0001 Et0/2 -> listening *Sep 16 05:25:30.531: STP: VLAN0001 Topology Change rcvd on Et0/2
now we go to switch 1 to make it take over again and notice that it chooses the same priority
sw-1(config)#spanning-tree vlan 1 root primary sw-1(config)# *Sep 16 05:29:31.316: setting bridge id (which=1) prio 24577 prio cfg 24576 sysid 1 (on) id 6001.aabb.cc00.0100 *Sep 16 05:29:31.316: STP: VLAN0001 we are the spanning tree root *Sep 16 05:29:31.870: STP: VLAN0001 Topology Change rcvd on Et0/1
if we again to sw-3 and re applied the command, the priority will go to the next lower step.
Example for priority 0 error message
sw-3(config)#spanning-tree vlan 1 root primary % Failed to make the bridge root for vlan 1 % It may be possible to make the bridge root by setting the priority % for some (or all) of these instances to zero.
- 3: Root secondary command ( be careful )
- usage
- this command will adjust the priority of a switch in case the root bridge failed that switch take over.
- behavior
- this command is very tricky, because it will set the priority to 28672 .
- if the root bridge priority is 32768, then the switch with the secondary command is now the root bridge. and if you are running common spanning-tree you might cause 30 second outage.
- the command assume that switches other than the root is 32768 so it choose 28672 to be the backup as if the root is lower than this.
- example :
- usage
sw-3(config)# *Sep 16 05:32:40.720: setting bridge id (which=1) prio 28673 prio cfg 28672 sysid 1 (on) id 7001.aabb.cc00.0300 *Sep 16 05:32:40.720: STP: VLAN0001 we are the spanning tree root *Sep 16 05:32:40.720: STP: VLAN0001 Et0/2 -> listening *Sep 16 05:32:40.726: STP: VLAN0001 Topology Change rcvd on Et0/2 *Sep 16 05:32:42.728: STP: VLAN0001 Topology Change rcvd on Et0/0
802.1D + PVST + PVST+
this part will explain what Spanning-tree do with various types of links + intersection between cisco switches and other switches. and the reason for PVST+ (notice the + sign)
Before we start you have to know the IEEE runs 1 STP instance for all vlans, and cisco runs a different instance for every vlan .
Access Ports :
- cisco switches use the IEEE BPDU version on access links and these BPDU are sent to the multicast-address 0180.C200.0000
- i will talk more about this in the PVID inconsistency section.
ISL Trunks ( PVST)
- again Cisco uses the IEEE BPDU but incapsulated inside the ISL frame
- ISL has a 1 bit flag that indicate that this frame is BPDU
- at that time 802.1Q wasn’t there and ISL was the only trunk that exist .
802.1Q trunk ( PVST+ )
- IEEE made 802.1Q and now it’s problem for cisco when 2 cisco switches are connected via IEEE switch.
- The IEEE switch runs 1 instance while the cisco switch want to send a BPDU for each instance.
- so the point here is how to pass cisco per vlan BPDU across IEEE switches
- cisco has choosen Vlan 1 to inter-operate with IEEE STP accross 802.1Q and we have 2 situations of how this work
- Vlan 1 is native vlan
- Cisco will send IEEE BPDU to 0180.C200.0000 (Untagged) and cisco switches know that this IEEE BPDU belong to vlan 1.
- Cisco will again send VLAN 1 BPDU to 0100.0ccc.cccd (Untagged) to notify other cisco switches that Vlan 1 is the native vlan ( the duplication reason if to notify cisco about native vlan )
- Cisco will send other VLANs BPDU to 0100.0ccc.cccd (tagged) . ( IEEE switches will not read that and it will relay it )
- Vlan 1 is not the native vlan
- Cisco will send IEEE BPDU to 0180.C200.0000 (Untagged) and cisco switches know that this IEEE BPDU belong to vlan 1.
- Cisco will send Native vlan BPDU to 0100.0ccc.cccd (Untagged) to inform cisco switches about the native vlan
- Cisco will send other VlLANS BPDU including VLAN1 to 0100.0ccc.cccd (Tagged)
- Vlan 1 is native vlan
- now PVST+ frames are tunneled accross IEEE region, and from PVST+ point of view : that IEEE region is a like a wire.
Spanning-Tree BPDU flow & convergence .
Rapid Spanning-Tree
- Theory
- TCN trigger and bpdu
- edge port
- BPDU transmission
Basic Terms ( small reference )
- BPDU (Bridge Protocol Data Unit)
- it’s a frame, that contain the root bridge information and the switch information .
- it sent to multicast address
- i should add the BPDU photo, (reminder line)
- Port cost
- determined by the speed of that port.
- Path cost (accumulated path)
- the sum of all links costs to reach to root bridge.
- Bridge ID
- 8 bytes . 2 for priority and 6 for mac-address
- comparing bridge id will first compare priority is they are the same then compare the mac-address
- priority
- Default is 32768 + sys-id-ext
- priority 0 mean that this switch is the Root if he has the lowest mac-address
- it can be set only in the increments of 4096. starting from 0 to 61440
Diagraaam here