MPLS Traffic Engineering DesignAs part of the overall MPC network design, TK conducted a detailed study to determine how it might provision sufficient network capacity to avoid congestion on any core link during steady state and under network element failure. Moreover, as discussed in the "Quality of Service Design" section, to bound the delay, jitter, and loss to the levels required by telephony transit traffic, TK wanted to strictly enforce that the load of telephony traffic always be kept below 40 percent on any link and under any circumstances (including failure). Consequently, TK decided to deploy MPLS TE so that PSTN voice traffic could be constraint-based-routed across the MPC network and be subject to a call admission control limit of 40 percent on any link. MPLS TE is deployed to carry only PSTN traffic. Therefore, all other traffic (such as Internet, Layer 3 MPLS VPN, and so forth) is label-switched across the MPC using the labels allocated by the LDP process and consequently follows the OSPF shortest path. A full mesh of TE LSPs is set up between all the PE-PSTN routers (which connect the VoIP gateways, as illustrated in Figures 4-3 and 4-4). There are two TE LSPs between any two PE-PSTN routers residing in Level 1 POPs. There is a single TE LSP between any two PE-PSTN routers when at least one of them resides in a Level 2 POP (detailed reasoning for this design is provided later). To differentiate between a Level 1 and Level 2 PE-PSTN, a naming convention for the routers was chosen in which the router's name begins with PE-PSTN1 for Level 1 and PE-PSTN2 for Level 2. Setting the Maximum Reservable Bandwidth on Each MPC LinkTo enable the TE design TK chose, each link in the MPC needed to be configured with a maximum reservable bandwidth value. This value indicates how much of the link bandwidth may be reserved for traffic engineering purposes. It can be configured to any value, regardless of the actual link speed. For example, an STM-1 link with 155 Mbps of total bandwidth may be configured with 310 Mbps of maximum reservable bandwidth. Therefore, the router may signal TE LSPs for up to 310 Mbps, which provides a bandwidth overbooking factor of 2. Conversely, the operator may choose to advertise a smaller value than the actual link speed to limit the amount of traffic carried on the link. This is the design elected by TK. Each link is configured with a maximum reservable bandwidth equal to 40 percent of the link speed. This guarantees that the bandwidth of TE LSPs established through a link for PSTN traffic never exceed 40 percent of that link bandwidth. For instance, an OC-192 link between two Level 1 POPs is configured with a reservable bandwidth equal to 0.4 * 10 Gbps = 4 Gbps. This configuration is shown in Example 4-7. It is used as a template for all OC-192 interfaces. (Similar templates exist for all the different link speeds in the MPC.) Example 4-7. OC-192 Configuration Template
interface pos3/0
ip rsvp bandwidth 4000000
!
TE LSPs BandwidthOne of the most challenging aspects of any MPLS TE design is obtaining a traffic matrix to appropriately configure the bandwidth of the TE LSPs. That said, in the case of the PSTN network, TK had very good knowledge of the existing public voice traffic matrix, which it acquired by means of various monitoring tools available on its telephony network during the past two decades. Because of this, several dimensioning rules have been applied to determine the initial size of the TE LSPs. For inter-POP traffic, the traffic peak is multiplied by a factor of 0.9 to take into account the fact that the peaks do not occur simultaneously between each POP. Such dimensioning is considered conservative. TK observed that during the less-active periods the traffic could be as little as one-sixth to one-tenth of the peak and that each peak period rarely exceeded a few hours every day. Hence, the TE LSPs are sized based on 90 percent of the busiest hours. Furthermore, the voice traffic during the weekends is generally significantly less than during weekday hours. Thus, during the weekend the observed PSTN traffic load is significantly less than the reserved bandwidth. Although the PSTN voice traffic is relatively stable, the mobile voice traffic increases at a nonnegligible rate. The required bandwidth for the PSTN traffic can easily be derived from the number of calls that can be accepted by the VoIP gateways and by applying the inter-PSTN-POP traffic dimensioning rule just specified. However, the IP traffic generated by the mobile voice traffic must also be considered. Thus, TK decided to resize each TE LSP bandwidth once every two months. For each TE LSP, an external script collects the related SNMP data (number of bytes transmitted on each TE LSP) every hour. This allows for the collection of a very accurate traffic matrix and tracking of the traffic growth. Once every two months, each TE LSP is resized up if the observed peak value exceeds the configured bandwidth value by 5 percent for more than 5 percent of the samples. Similarly, each TE LSP is also resized down if the observed peak value is 90 percent or less than the configured bandwidth for more than 95 percent of the samples. Path ComputationA dynamic CSPF algorithm is used to compute the shortest path for each TE LSP satisfying its constraints. (This is limited to the bandwidth constraint, except for TE LSPs between PE-PSTN1 routers where both the bandwidth and the affinity constraints must be satisfied, as discussed later.) Note that because the MPC network contains a limited number of TE nodes, the CSPF computation time is negligible (on the order of a few milliseconds). The choice to run CSPF on the TE LSP headends was made (as opposed to an offline path computation approach) for its ability to cope more rapidly with network element failures. TE LSPs Between PE-PSTN1 RoutersThe voice traffic between major cities in Kingland is significantly higher than between smaller cities. Therefore, TK decided to adopt a slightly different design for the TE LSPs between the PE-PSTN1 routers in Level 1 POPs than for the TE LSPs between PE-PSTN1 routers and PE-PSTN2 routers in Level 2 POPs. Because the TE LSPs between PE-PSTN1 routers are larger than the other TE LSPs, the design involves splitting the traffic over two TE LSPs. The rationale behind this is that as the ratio between LSP size and link maximum reservable bandwidth increases, the likelihood of not being able to find a path satisfying the bandwidth constraint also increases, especially in failure scenarios. Hence, to minimize that risk, TK decided to load-balance the traffic between each pair of PE-PSTN1 routers across multiple TE LSPs (two in this case). Moreover, these TE LSPs are configured with a higher preemption (priority) than the TE LSPs between PE-PSTN1 and PE-PSTN2 routers as well as the TE LSPs between PE-PSTN2 routers, because (even after a split) they are still significantly larger. This circumvents the well-known issue of bandwidth fragmentation that can occur when using a distributed CSPF for the TE LSP path computation. Indeed, with distributed CSPF, there is no synchronization between routers. Each router computes the path for the set of TE LSPs it is the headend router for. Consequently, in some cases, bandwidth fragmentation may occur whereby a larger TE LSP cannot be routed because of some other smaller TE LSPs that were previously routed. RSVP-TE defines a multipriority scheme in which a TE LSP of priority X can preempt a TE LSP of priority Y if X < Y (a lower number reflects a higher priority). This preemption scheme can be used to help solve the bandwidth fragmentation problem. For the sake of illustration, consider the example shown in Figure 4-29 (where just a limited number of TE LSPs are represented for simplicity). The following characteristics can be observed: Figure 4-29. Bandwidth Fragmentation Solved by a Multipriority Scheme
Given the situation shown in Figure 4-29, no path could be found for a TE LSP of 3 Gbps between PE-PSTN1-5 and PE-PSTN1-6. In this situation the bandwidth is said to be "fragmented" because although the necessary bandwidth is available collectively across the multiple possible paths, it is not available on any one path. The solution is to displace T1 (the tunnel between PE-PSTN2-1 and PE-PSTN2-4 in Figure 4-29) to free up some bandwidth for T3 (the tunnel between PE-PSTN1-5 and PE-PSTN1-6), which could in turn be routed. Hence, in situations such as the one just described, T3 would preempt T1 and would in this case follow the path PE-PSTN1-5cw2c1c2s2PE-PSTN1-6. After being preempted, the TE LSP T1 would in turn be rerouted onto a different path without any manual intervention. This also illustrates why the PSTN traffic between two PE-PSTN1 routers is split onto two TE LSPs instead of one. Doing so limits their size and consequently increases the probability of finding a path for a TE LSP. (Indeed, smaller TE LSPs are less likely to provoke bandwidth fragmentation.) Because these LSPs are still significantly larger than the TE LSPs between PE-PSTN2 and the TE LSPs between PE-PSTN1 and PE-PSTN2, they are configured with a higher preemption priority to benefit from the preemption mechanism just described. The resulting TE LSP placement is shown in Figure 4-30. Figure 4-30. Situation After Preemption and Rerouting of a Lower-Priority TE LSP
Of course, such a multipriority scheme does not provide an absolute guarantee that bandwidth fragmentation will never occur, but it limits the risk of its occurrence. TK ran several CSPF simulations with a random TE LSP placement. These simulations showed an extremely low risk of bandwidth fragmentation, with such an approach combining the splitting of the large TE LSPs and a multipriority scheme. Establishing two TE LSPs between a pair of PE-PSTN routers has some other interesting properties. Provided that those LSPs are diversely routed, the impact of a single failure can be limited to a smaller proportion of the traffic between two POPs and consequently two cities. The second positive consequence is that establishing two TE LSPs can be used to achieve more even load distribution across links. In the TK design, MPLS TE ensures that no more than 40 percent of the link speed is used by the PSTN traffic on every link. In some circumstances, it is conceivable that some links carry 30 percent of the traffic whereas other links carry only 10 percent. Although such a situation meets the TK objectives, achieving more-optimal load balancing is always desirable. This can be done when traffic is split across multiple TE LSPs. The only downside of such a strategy is the increase in the number of TE LSPs in the network. In the case of TK, such an increase is perfectly acceptable because it concerns only the TE LSPs between PE-PSTN1 routers. Thus, the number of TE LSPs is increased by 12 * 11 = 121 additional LSPs. The solution to achieve such load balancing is to apply the concept of affinities defined by MPLS TE. In a nutshell, the idea is to use a 32-bit mask to indicate up to 32 link properties and use them as input constraints to be satisfied by a TE LSP so as to achieve a particular objective. In the example of the MPC network, the design between the VoIP gateways and the P routers residing in the Level 1 POPs is highly symmetric. Each VoIP gateway is dual-attached to two PE-PSTN1 routers that are themselves dual-attached to two P routers in the Level 1 POP. Hence, the idea is to use a color scheme for the link between PE-PSTN1 and the P routers and for the link between the P routers in the Level 1 POPs. Doing so load-balances the TE LSPs between each pair of PE-PSTN1 routers. This concept is shown in Figure 4-31. Figure 4-31. Three-Color Scheme for Load-Balancing TE LSP Between Level 1 POPs
Figure 4-31 shows that two TE LSPs (T1 and T2) are configured between PE-PSTN1-1 and PE-PSTN1-3. As just mentioned, the objective is to ensure that T1 and T2 are diversely routed when possible. Thus, three shades (light gray, medium gray, and dark gray) are used for the links between PE-PSTN and the P routers and the P routers of the same Level 1 POP. This ensures that T1 and T2 traverse a different P router to exit the source POP and to enter the destination POP. The OSPF metric of the links between the P routers has been computed such that two TE LSPs between a disjoint pair of P routers are always diversely routed end-to-end in steady state. Note that the affinity constraint is relaxed in case a PE-PSTN is incapable of finding a feasible path satisfying those constraints, which could occur in case of failure. TE LSPs Between PE-PSTN1 and PE-PSTN2 Routers or Between PE-PSTN2 RoutersThe design of the TE LSPs between two PE-PSTNs that do not both reside in a Level 1 POP is quite straightforward. There is only one TE LSP between a pair of such PE-PSTNs (no load balancing is required), and the only constraint that must be satisfied is bandwidth (no coloring scheme). Reoptimization of TE LSPsCapacity planning rules for the MPC network are such that there is enough capacity so that all the TE LSPs very easily follow the IGP shortest path (or the shortest path satisfying the color constraints where those are used) in steady state. (In fact, in steady state the voice load is expected to remain below 20 percent on every link.) In other words, only in the case of link/SRLG/node failure might some TE LSPs be rerouted along a non-shortest path to guarantee that the maximum amount of PSTN traffic on any link does not exceed 40 percent of the actual link speed. Furthermore, TK decided to have TE LSPs of a fixed bandwidth size (as opposed to resizing TE LSPs frequently, an example of which appears in Chapter 5). Thus, the only case when TE LSPs should be reoptimized is upon network element restoration, upon TE LSP resizing, or upon the addition of a link or nodenone of which happens very frequently. The MPC network is a national network with relatively short propagation delays (the propagation delay between two POPs never exceeds 15 ms, regardless of the path). Therefore, a TE LSP routed over a non-IGP shortest path does not experience significantly higher propagation delay compared to the OSPF shortest path. Thus, even when a TE LSP should be reoptimized (because a shorter path satisfying the constraints exists), the need for reoptimization should not be very critical. This is because the non-IGP shortest path offers propagation delays close to the IGP shortest path (a critical parameter for the voice traffic). Note that in some networks, the path followed by a TE LSP may experience significantly higher propagation delays than the IGP shortest path. However, this is not the case with the MPC national network. Considering the various aspects mentioned here, TK decided to trigger a reoptimization once every 10 minutes. In this way, every headend router determines whether a more optimal (shorter) path can be found for each of its TE LSPs. If a more optimal path can be found, the TE LSP is reoptimized along the new path in a nondisruptive fashion using a make-before-break approach. Note that the CSPF computation for each TE LSP does not incur any CPU spikes considering the low number of TE LSPs per headend router. This also means that a TE LSP may follow a nonoptimal path for at most 10 minutes if a more optimal path exists because of the restoration or addition of a network element (such as a link or node). MPLS Traffic Engineering SimulationBefore deploying MPLS Traffic Engineering, TK decided to conduct some CSPF simulations. Several objectives were set for these simulations:
The results of the CSPF simulations confirmed TK's expectations. During steady state, 100 percent of the TE LSPs follow the shortest path (or the shortest path satisfying the color constraints), and the maximum voice load on any link is below 20 percent. On the other hand, in the case of some SRLG failures, or node failure in a Level 1 POP, several TE LSPs are routed along a longer path. This meets the objective of not exceeding 40 percent of the PSTN traffic on every link. The propagation delay along those longer paths still meets the voice delay requirements. TE Scaling AspectsWhen analyzing the scaling properties of MPLS TE, several important variables must be considered:
In conclusion, TK felt that the MPC MPLS TE design did not pose any scalability concerns. Use of Refresh ReductionTK chose not to activate refresh reduction in its network, considering that the number of RSVP-TE sessions per midpoint router was not substantial. Provisioning the Mesh of TE LSPsTK developed a set of scripts to automate the provisioning of the TE LSPs between the PE-PSTN routers. MonitoringThe monitoring of the MPLS TE network is, of course, of the utmost importance so as to adjust the TE design if necessary. TK decided to gather the following set of information for each TE LSP in the network:
Last Resort Unconstrained OptionThe MPC network is designed to survive any single failure. In other words, any TE LSP should be able to find an alternate path if a single network element fails. That said, a safe approach is to configure a last-resort option for each TE LSP whereby no constraint is specified to cope with any unexpected eventin particular, multiple-failure cases. On a Cisco router, this can be achieved by means of LSP attributes. For each TE LSP, an ordered list of constraints can be specified. The headend router tries to find a path satisfying the preferred set of constraints; if no path is found, the next preferred set of constraints is tried, and so on. Hence, a safe and recommended approach is to configure a last-resort option whereby the TE LSP is configured without any constraint (no affinity, 0 bandwidth, and so on). This guarantees that in any case the headend router can always find a path to the destination, provided that there is still some connectivity to the destination. In this case the TE LSP path is no different from the OSPF path. On TE LSPs between two PE-PSTN1 routers that use color constraints, the last-resort unconstrained option is used after the backup option, which relaxes the color constraints but not the bandwidth constraints. |