Network Working Group T. Ionta Internet-Draft A. Cappadona Intended status: Standard Telecom Italia Expires: Nov 15, 2016 May 16, 2016 A Performance Layer Inboard Computing method to support active, passive and hybrid measurements draft-ionta-plic-01.txt Abstract This document describes an innovative frame, called PLIC,("Performance Layer Inboard Computing") able to support active, passive or hybrid performance measurements on any kind of unicast and multicast service passing through a network. It is based on an algorithm which, through a distributed computation, floods the performance measurements of each node to the rest of the network, without necessity of any external topology database or operation support system. It has been invented and engineered in the Telecom Italia Core Network. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on May 1, 2016. Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Ionta, et al. Expires May 16, 2016 [Page 1] Internet-Draft Performance Layer Inboard Computing method October 2015 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2. General Overview of the method . . . . . . . . . . . . . . . . . 3 3. Detailed description of the method . . . . . . . . . . . . . . . 3 3.1 Discovery process (Process 1) . . . . . . . . . . . . . . . . . 4 3.2 PLIC-Neighborship building (Process 2). . . . . . . . . . . . . 4 3.3 PLIC Token getting (Process 3). . . . . . . . . . . . . . . . . 6 3.4 Local Performance parameters calculation (Process 4). . . . . . 6 3.5 PLIC Token building (Process 5) . . . . . . . . . . . . . . . . 7 3.5.1 General info put in the PLIC Token. . . . . . . . . . . . . . 7 3.5.2 PLIC Token detailed structure . . . . . . . . . . . . . . . . 8 3.6 Processes sequencing during each single timeslot. . . . . . . . 8 4. Robustness of the method . . . . . . . . . . . . . . . . . . . . 9 5. Implementation and deployment (use case) . . . . . . . . . . . .10 6. Security Considerations. . . . . . . . . . . . . . . . . . . . .10 7. IANA Considerations. . . . . . . . . . . . . . . . . . . . . . .10 8. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . .10 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . .11 9.1. Normative References . . . . . . . . . . . . . . . . . . . . .11 9.2. Informative References . . . . . . . . . . . . . . . . . . . .11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . .11 1. Introduction This document provides a framework for performing measurements on data flows transmitted in a communication network, which may be implemented in a distributed way by the nodes themselves, without requiring the intervention of any external management server collecting measurements from the nodes. It is also capable of automatically adapting, without necessity of getting topology from any external topology database, to any possible changes of the network topology and of the paths followed by the data flow across the communication network, thus easing the operations of performance monitoring and failure detection and management. This framework is applicable to active, passive or hybrid performance measurements, since it can support both active and passive probing, or a mix of them. It is also capable to support performance measurements on both multicast and unicast services. It is applicable everywhere across the network (Core, Edge, Access), giving the possibility to have the result of the measurements either from an "end-to-end" point of view or a "node by node" too. Ionta, et al. Expires May 16, 2016 [Page 2] Internet-Draft Performance Layer Inboard Computing method October 2015 2. General Overview of the method The main idea behind this method is to find a way to pass node-by-node the performance measurements (Packet Loss, Delay, CRC, Bandwith, passive/active measurement parameters or whatever else) taken on each node in a way that let any other node of the network to: - identify, automatically and dynamically, the list of all the upstream nodes along the reverse path up to the source node of the service, thus avoiding the necessity of an external topology database. The topology discovery is performed and updated frequently (with a period equal or less than a single timeslot) using the method described in the following paragraph and based on a discovery method where each node, discovering just its neighbors and passing these discovery info to the following node, allows each node to build, in a distributed way, the topology structure of the reverse path up to the source node of the service. - know what happens on the upstream nodes in terms of performance measurements. The data structure devoted to the information passing is defined as PLIC Token. It is generated by each node during a specific timeslot and passed (in a push or pull way, indifferently) to the following node during the same timeslot, thus allowing all the nodes to update themselves during the same timeslot. PLIC Token flows through the network using a multicast flow (belonging to a probe or to real traffic, indifferently). The method is applicable, at the same time, to one or multiple multicast trees, regardless to the location of the sources and the receivers. Note that the multicast transmission scenario is merely exemplary, because the method is also applicable to unicast services. For instance point-to-point transmission (e.g. MPLSE-TE tunnels, ATM/FR tunnels, etc.)). It is also applicable both to a flow-based traffic (i.e. active probing) and to a link-based traffic (i.e. passive probing) A notable consequence of the method is that no external Operation Support System is needed to manage performance and topology data, because, according to the method described in this document, management is distributed on-board. The communication protocol among nodes can be a new ad-hoc one (as the one proposed in this document) or an extension of an existing one (i.e. PIM). 3. Detailed description of the method To achieve the above goals, and given a generic multicast tree, each node N performs the same set of five processes, detailed below, during each single timeslot (whose duration is freely settable by the operator depending on the architecture of the network under measurement) and then repeated during each following timeslot. Ionta, et al. Expires May 16, 2016 [Page 3] Internet-Draft Performance Layer Inboard Computing method October 2015 3.1 Discovery process (Process 1) This discovery process is a local process (no external off-line topology database is needed) and it is aimed to identify, for a specific multicast flow entering a specific interface, the chain of the upstream nodes along the reverse path from the current node up to the source node. Thus the current node, during the current timeslot "n" and before performing the remaining processes, performs two actions (platform dependent): - Identify the specific interface where the multicast flow is entering the node; - Identify, based on the information inside its IGP database, the set of all the upstream nodes along the reverse path from the current node up to the source node. This set of nodes is the input for process n. 2. Note 1: since the topology refreshing frequency rate must be at least equal to the duration of a single time-slot, this way of performing discovery is much lighter than using an off-line topology database, where the impact of nodes polling would be too heavy. Note 2: with the proposed algorithm, similarly to OSPF approach, the whole network topology is distributed (on any single node) instead of centralized (on an off-line topology database) 3.2 PLIC-Neighborship building (Process 2) Given the set (defined by Process 1) of the upstream nodes along the reverse path from the current node up to the source node, the current process, starting from the closest reachable upstream node, tries to build a "PLIC neighborship" with it, based on the steps defined in the current process. If the PLIC-Neighborship building will result unsuccessful with the closest upstream node, the current process will try to build the PLIC-Neighborship with the following upstream node and so on, up along the path toward the source. As a first step the current node (N(m)) sends the following "PLIC_Token_Status_Request message" to its closest upstream node along the reverse path from the current node up to the source node: "PLIC_Token_Status_Request message" definition: - Node ID (of N(m)) - Request flag Ionta, et al. Expires May 16, 2016 [Page 4] Internet-Draft Performance Layer Inboard Computing method October 2015 The upstream node, regarding its own PLIC Token, can be in one of the following situations: 1) No PLIC Token Tk(m-1): if the upstream node has never generated the PLIC Token since up and running. 2) PLIC Token Tk(m-1) exists but it is "obsolete", that is generated before the previous timeslot (n-2 or older). Thus it is no more useful for node N(m). 3) PLIC Token Tk(m-1) has been generated during the previous timeslot (n-1), thus the desired PLIC Token will be likely generated during the current timeslot (n). 4) PLIC Token Tk(m-1) has been generated during the current timeslot (n), thus it is exactly the PLIC Token node N(m) is looking for. Thus, to let node N(m) distinguish among these situations, it is enough, for node N(m-1) while answering to the "PLIC_Token_Status_Request message" with the "PLIC_Token_Status_Request message", to put the Time-Stamp referring to when the PLIC Token has been successfully generated by node N(m-1). "PLIC_Token_Status message" definition: - Node ID (di N(m-1)) - Ts(m-1) If no PLIC_Token_Status message is received by N(m) after timeout, PLIC-Neighborship between N(m) and N(m-1) will end. Node N(m), depending on the timestamp contained in the "PLIC_Token_Status message", will be able to distinguish among the above possibilities, thus putting the upstream node under examination in one of the following two possible status: - "PLIC-AWARE": if Tk(m-1) is in the state belonging to instance 1 or 2 (as stated above) - "PLIC-UNAWARE": if Tk(m-1) is in the state belonging to instance 3 or 4 (as stated above). In case the upstream node under examination is put in "PLIC-AWARE" status, it is also defined as "PLIC-AWARE PREDECESSOR node" (N(m-1)) of N(m) and a PLIC-Neighborship is built between N(m) and N(m-1). Instead if the upstream node under examination is put in "PLIC-UNAWARE" status, the next upstream node will go under examination trying to build a PLIC-neighborship with it by sending a new "PLIC Token_Status_Request message" and analyzing the corresponding received "PLIC Token_Status message", as stated above. This process will continue iteratively until one of the following conditions occurs: - One PLIC-AWARE node is found along the reverse path from the current node up to the source node. In this case Process 2 has not succeeded. - No one PLIC-AWARE node is found along the reverse path from the current node up to the source node. In this case N(m) is named "Source Node" of the PLIC tree. Note that, following the above algorithm, the "Source Node" of the PLIC tree could differ from the multicast source node. In other words the "Source Node" of the PLIC tree is the PLIC AWARE node more close to the multicast source node. In this case Process 2 succeeded. Ionta, et al. Expires May 16, 2016 [Page 5] Internet-Draft Performance Layer Inboard Computing method October 2015 3.3 PLIC Token getting (Process 3) As soon as process 2 succeeds, thus designating the PLIC-AWARE PREDECESSOR node (called N(m-1)) and building of a PLIC-Neighborship with it, the current node N(m) asks to it, for the first time, the PLIC Token Tk(m-1) by sending the "Get_PLIC_Token message" with the Request Flag set. "Get_PLIC_Token message" Node ID (referred to N(m)) Request Flag Instead, if no one PLIC-AWARE node will be found along the reverse path from the current node up to the source node, the current process is skipped by node N(m). After sending this first request, a "request time counter" is switched on by N(m). Now three possible occurrences occur: 1) Node N(m-1) doesn't answer immediately. 2) Node N(m-1) answers immediately sending its own PLIC Token Tk(m-1) (by using the "Send_PLIC Token message", composed by the fields detailed in Process 5). In this situation two possible occurrences (node N(m) distinguishes between them based on Ts(m-1) contained in the "Send_PLIC Token message" from N(m-1): 2.a) PLIC Token Tk(m-1) has been generated during the previous timeslot (n-1), thus the desired (cfr occurrence 3 of process 2). 2.b) PLIC Token Tk(m-1) has been generated during the current timeslot (n), thus it is exactly the PLIC Token node N(m) is looking for (cfr occurrence 4 of process 2). In occurrence 2.b the current process succeeds and thus process 5 can immediately start. Instead in occurrence 1 or 2.a node N(m), after an predefined incremental quantum time, will reiterate again and again Process 4 until one out the following two occurrences occurs: - occurrence 2.b occurs, thus process 5 can start, using PLIC Token Tk(m-1) to build Tk(m). - the "request time counter" overcomes a predefined timeout, thus process 5 can start, but without using PLIC Token Tk(m-1) to build Tk(m). 3.4 Local Performance parameters calculation (Process 4) This process is a local process and it is devoted to calculate, on node N(m), the data of interest (i.e. PLR, CRS, Delay, etc.), based on measurements performed during the current or previous timeslot (depending on the kind of parameter measured ). These data will be put in PLIC Token Tk(m) during Process 5. Each performance parameter is calculated depending on the chosen method. Ionta, et al. Expires May 16, 2016 [Page 6] Internet-Draft Performance Layer Inboard Computing method October 2015 3.5 PLIC Token building (Process 5) This process is a local process and it is devoted to generate a new PLIC Token Tk(m), available for node N(m+1). This process puts in the PLIC Token Tk(m) some kind of information. General information and the detailed corresponding fields are described in the following paragraphs. 3.5.1 General info put in the PLIC Token The following are the general info to put in the PLIC Token: - Info about local node N(m) (i.e. hostname of N(m)) - Locally performed measurements (i.e. number of packets, CRC, etc.) calculated by process 4 - If, and only if, process 3 succeeded (PLIC Token Tk(m-1) has been got): - Data contained in Tk(m-1) (i.e. list of hostnames of the PLIC-AWARE nodes along the upstream reverse path from the local node N(m) up to the source) - Results of calculation performed merging the data coming from Tk(m-1) and the locally performed measurements calculated by process 4 (i.e. Packet Loss calculated as difference between number of packets measured in N(m-1), and thus contained in Tk(m-1), and number of packets measured by the locally performed measurements during process 4. As detailed in the above PLIC-Neighborship building process, please remember that PLIC Token Tk(m-1) has not been got by N(m) if one of the following cases occurs: - case 1: no PLIC-Aware node has been found during the PLIC-Neighborship building process - case 2: the PLIC-AWARE PREDECESSOR node (called N(m-1)) has not been able to generate PLIC Token Tk(m-1), or N(m) has not been able to get it, within the timeout. In this situation no info about the upstream path can be put in PLIC Token Tk(m). Ionta, et al. Expires May 16, 2016 [Page 7] Internet-Draft Performance Layer Inboard Computing method October 2015 3.5.2 PLIC Token detailed structure The fields of the PLIC Token are: 1.Hostname ID of N(m) 2.Interface ID where the PLIC multicast flow comes in 3.Time-Stamp referred to the successfully creation of the PLIC Token by N(m) during the current timeslot. 4.Flow ID 1: Main PLIC multicast flow identifier. i.e. Multicast Group under monitoring. It can belong to a customer or to a monitoring probe. There is a different PLIC multicast flow for each Multicast group under monitoring by PLIC. 5.(optional) Flow ID 2: Detailed multicast flow identifier. In case the Flow ID 1 is not enough to identify a specific multicast flow (i.e. different PLIC flows using the same Multicast group need different DSCP to be differentiated). 6.Number of Performance parameters, measured by Node N(m), contained in the following part of the current PLIC Token Tk(m) (TLV approach) 7.Performance parameter 1 (i.e. PLR): node-level performance measure (i.e. between the current node and its predecessor) 8.Performance parameter 1 (i.e. PLR): cumulative performance measure (i.e. between the current node and the source) 9.Performance parameter 1 (i.e. PLR): additive info containing the node-level measurements of the basic elements composing the Performance parameter measurements (i.e. number of incoming packets measured on the current Interface). These values will be useful for the successor node to calculate its own node-level performance measure. 10.(optional) analogue of 7 for Performance parameter 2 (i.e. Delay) 11.(optional) analogue of 8 for Performance parameter 2 12.(optional) analogue of 9 for Performance parameter 2 13.(optional) analogue of 7 for Performance parameter "n" (i.e. CRC) 14.(optional) analogue of 8 for Performance parameter "n" 15.(optional) analogue of 9 for Performance parameter "n" 16.(if and only if process 3 succeeded) Numbers of records, each containing the sets of Performance parameters (analogue of the set created by N(m) and put in the above fields 7-15) measured by the upstream PLIC-aware nodes and all contained inside the PLIC Token Tk(m-1) got by N(m) during process 3 17.- n.: analogue of fields 6-16 for each record referred to each of the upstream PLIC-aware node. Ionta, et al. Expires May 16, 2016 [Page 8] Internet-Draft Performance Layer Inboard Computing method October 2015 3.6 Processes sequencing during each single timeslot Following there is a time diagram detailing one single timeslot Process | Process description | Timeslot "N" | ID | | | PLIC Token | |propagation time| 1 | Discovery | XXX | 2 |PLIC-Neighborsh build| | XXX | 3 | Get_PLIC_Token | | XXX | 4 | Local Performance | XXX | XXX | XXX | | parameters calcul. | 5 |PLIC Token building | | XXX| Process 2 can start only after the ending of process 1 because it needs to know, as input, the set of all the upstream nodes along the reverse path from the actual node up to the source node. Process 3 can start only after the ending of process 2 because the PLIC Token must be taken only from the upstream PLIC-AWARE node. Process 4 has not dependencies from process 1,2 and 3 (thus it can start asynchronously respect to them) because the local performance parameters calculation, depending only on performance parameters sampled on the local node, can be performed without necessity of the PLIC Token from the upstream PLIC-AWARE node. Process 5 can start only after the ending of both processes 3 and 4 because depends on the output of both of them. It must finish before the PLIC Token propagation time interval so to give enough time for the PLIC Token propagation downstream, that is during the PLIC Token propagation time interval in Timeslot N its successor node (N(m+1)) must get Tk(m,n) and create its own Tk(m+1,n) (following the same processes sequence as for N(m)) to be taken, once again during the same PLIC Token propagation time interval in Timeslot N, by its successor node (N(m+2)) and creates its own Tk(m+2,n),and so on. Following is shown a zoom on the PLIC Token propagation time interval during generic Timeslot "N": Node m+1 |Get Tk(m,n)| | |Create Tk(m+1,n)| Node m+2 | | |Get Tk(m+1,n)| | | | |Create Tk(m+2,n)| 4. Robustness of the method Based on the above statements, the proposed method is robust regarding: - any loss of measurement samples: even if the PLIC Token is not generated by the upstream node N(m-1), node N(m) generates anyway its own PLIC Token available for downstream nodes. - propagation drift of the PLIC Token during a single timeslot. - network synchronization problems. Ionta, et al. Expires May 16, 2016 [Page 9] Internet-Draft Performance Layer Inboard Computing method October 2015 5. Implementation and deployment (use case) The methodology has been implemented and delivered in Telecom Italia by leveraging functions and tools available on IP routers and it's currently being used to support monitoring of packet loss, CRC and Bandwith in some portions of Telecom Italia's network. The timeslot length is 5 minutes. In particular the packet loss measurement is performed based on the method described in [I-D.draft-tempia-ippm-p3m-01.txt] and the results are put inside the PLIC Token structure and delivered to the following nodes. 6. Security Considerations Implementation of this method must be mindful of security and privacy concerns. There are two types of security concerns: potential harm caused by the measurements and potential harm to the measurements. For what concerns the first point, the measurements described in this document are passive, so there are no packets injected into the network causing potential harm to the network itself and to data traffic. Nevertheless, the method implies modifications on the fly to the IP header of data packets: this must be performed in a way that doesn't alter the quality of service experienced by packets subject to measurements and that preserve stability and performance of routers doing the measurements. The measurements themselves could be harmed by routers altering the coloring of the packets, or by an attacker injecting artificial traffic. Authentication techniques, such as digital signatures, may be used where appropriate to guard against injected traffic attacks. The privacy concerns of network measurement are limited because the method only relies on information contained in the IP header without any release of user data. 7. IANA Considerations There are no IANA actions required. 8. Acknowledgments The authors would like to thank their workmates F. Moretti, A. Soldati and A. Barbetti for their helpful collaboration. Thanks also to F. Benedetti, G. Giammona, G. Mazzola, S. Salamone and L. Tomasino for their support while implementing and deploying this method, according to the rules stated in the job agreement with Telecom Italia. A special thank to Daniela for her help while inspiring and translating this memo. Ionta, et al. Expires May 16, 2016 [Page 10] Internet-Draft Performance Layer Inboard Computing method October 2015 9. References 9.1 Normative References [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", RFC 5357, October 2008. [RFC5938] Morton, A. and M. Chiba, "Individual Session Control Feature for the Two-Way Active Measurement Protocol (TWAMP)", RFC 5938, August 2010. 9.2 Informative References [I-D.tempia-opsawg-p3m] Capello, A., Cociglio, M., Castaldelli, L., and A. Bonda,"A packet based method for passive performance monitoring", draft-tempia-opsawg-p3m-04 (work in progress), February 2014. [draft-tempia-ippm-p3m-01.txt] Capello, A., Cociglio, M., Castaldelli, L., Fioccola G., and A. Bonda, "A packet based method for passive performance monitoring", September 2015 Authors' Address Tiziano Ionta (editor) Telecom Italia Labs Via Valcannuta 250 00167 Rome Italy Phone: +39 06 3688 5600 Email: tiziano.ionta@telecomitalia.it Antonio Cappadona (editor) Telecom Italia Labs Via Valcannuta 250 00167 Rome Italy Phone: +39 06 3688 7181 Email: antonio.cappadona@telecomitalia.it Ionta, et al. Expires May 16, 2016 [Page 11] Internet-Draft Performance Layer Inboard Computing method October 2015