| rfc9768v1.txt | rfc9768.txt | |||
|---|---|---|---|---|
| Internet Engineering Task Force (IETF) B. Briscoe | Internet Engineering Task Force (IETF) B. Briscoe | |||
| Request for Comments: 9768 Independent | Request for Comments: 9768 Independent | |||
| Updates: 3168 M. Kühlewind | Updates: 3168 M. Kühlewind | |||
| Category: Standards Track Ericsson | Category: Standards Track Ericsson | |||
| ISSN: 2070-1721 R. Scheffenegger | ISSN: 2070-1721 R. Scheffenegger | |||
| NetApp | NetApp | |||
| August 2025 | November 2025 | |||
| More Accurate Explicit Congestion Notification (AccECN) Feedback in TCP | More Accurate Explicit Congestion Notification (AccECN) Feedback in TCP | |||
| Abstract | Abstract | |||
| Explicit Congestion Notification (ECN) is a mechanism by which | Explicit Congestion Notification (ECN) is a mechanism by which | |||
| network nodes can mark IP packets instead of dropping them to | network nodes can mark IP packets instead of dropping them to | |||
| indicate incipient congestion to the endpoints. Receivers with an | indicate incipient congestion to the endpoints. Receivers with an | |||
| ECN-capable transport protocol feed back this information to the | ECN-capable transport protocol feed back this information to the | |||
| sender. ECN was originally specified for TCP in such a way that only | sender. ECN was originally specified for TCP in such a way that only | |||
| one feedback signal can be transmitted per Round-Trip Time (RTT). | one feedback signal can be transmitted per Round-Trip Time (RTT). | |||
| Newer TCP mechanisms like Congestion Exposure (ConEx), Data Center | More recently defined mechanisms like Congestion Exposure (ConEx), | |||
| TCP (DCTCP), or Low Latency, Low Loss, and Scalable Throughput (L4S) | Data Center TCP (DCTCP), or Low Latency, Low Loss, and Scalable | |||
| need more Accurate ECN (AccECN) feedback information whenever more | Throughput (L4S) need more precise ECN feedback information whenever | |||
| than one marking is received in one RTT. This document updates the | more than one marking is received in one RTT. This document updates | |||
| original ECN specification defined in RFC 3168 by specifying a scheme | the original ECN specification defined in RFC 3168 by specifying a | |||
| that provides more than one feedback signal per RTT in the TCP | scheme that provides more than one feedback signal per RTT in the TCP | |||
| header. Given TCP header space is scarce, it allocates a reserved | header. Given TCP header space is scarce, it allocates a reserved | |||
| header bit previously assigned to the ECN-nonce. It also overloads | header bit previously assigned to the ECN-nonce. It also overloads | |||
| the two existing ECN flags in the TCP header. The resulting extra | the two existing ECN flags in the TCP header. The resulting extra | |||
| space is additionally exploited to feed back the IP-ECN field | space is additionally exploited to feed back the IP-ECN field | |||
| received during the TCP connection establishment. Supplementary | received during the TCP connection establishment. Supplementary | |||
| feedback information can optionally be provided in two new TCP option | feedback information can optionally be provided in two new TCP Option | |||
| alternatives, which are never used on the TCP SYN. The document also | alternatives, which are never used on the TCP SYN. The document also | |||
| specifies the treatment of this updated TCP wire protocol by | specifies the treatment of this updated TCP wire protocol by | |||
| middleboxes. | middleboxes. | |||
| Status of This Memo | Status of This Memo | |||
| This is an Internet Standards Track document. | This is an Internet Standards Track document. | |||
| This document is a product of the Internet Engineering Task Force | This document is a product of the Internet Engineering Task Force | |||
| (IETF). It represents the consensus of the IETF community. It has | (IETF). It represents the consensus of the IETF community. It has | |||
| skipping to change at line 135 ¶ | skipping to change at line 135 ¶ | |||
| 6. Summary: Protocol Properties | 6. Summary: Protocol Properties | |||
| 7. IANA Considerations | 7. IANA Considerations | |||
| 8. Security and Privacy Considerations | 8. Security and Privacy Considerations | |||
| 9. References | 9. References | |||
| 9.1. Normative References | 9.1. Normative References | |||
| 9.2. Informative References | 9.2. Informative References | |||
| Appendix A. Example Algorithms | Appendix A. Example Algorithms | |||
| A.1. Example Algorithm to Encode/Decode the AccECN Option | A.1. Example Algorithm to Encode/Decode the AccECN Option | |||
| A.2. Example Algorithm for Safety Against Long Sequences of ACK | A.2. Example Algorithm for Safety Against Long Sequences of ACK | |||
| Loss | Loss | |||
| A.2.1. Safety Algorithm Without the AccECN Option | A.2.1. Safety Algorithm without the AccECN Option | |||
| A.2.2. Safety Algorithm with the AccECN Option | A.2.2. Safety Algorithm with the AccECN Option | |||
| A.3. Example Algorithm to Estimate Marked Bytes from Marked | A.3. Example Algorithm to Estimate Marked Bytes from Marked | |||
| Packets | Packets | |||
| A.4. Example Algorithm to Count Not-ECT Bytes | A.4. Example Algorithm to Count Not-ECT Bytes | |||
| Appendix B. Rationale for Usage of TCP Header Flags | Appendix B. Rationale for Usage of TCP Header Flags | |||
| B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | |||
| B.2. Four Codepoints in the SYN/ACK | B.2. Four Codepoints in the SYN/ACK | |||
| B.3. Space for Future Evolution | B.3. Space for Future Evolution | |||
| Acknowledgements | Acknowledgements | |||
| Authors' Addresses | Authors' Addresses | |||
| skipping to change at line 158 ¶ | skipping to change at line 158 ¶ | |||
| Explicit Congestion Notification (ECN) [RFC3168] is a mechanism by | Explicit Congestion Notification (ECN) [RFC3168] is a mechanism by | |||
| which network nodes can mark IP packets instead of dropping them to | which network nodes can mark IP packets instead of dropping them to | |||
| indicate incipient congestion to the endpoints. Receivers with an | indicate incipient congestion to the endpoints. Receivers with an | |||
| ECN-capable transport protocol feed back this information to the | ECN-capable transport protocol feed back this information to the | |||
| sender. In RFC 3168, ECN was specified for TCP in such a way that | sender. In RFC 3168, ECN was specified for TCP in such a way that | |||
| only one feedback signal could be transmitted per Round-Trip Time | only one feedback signal could be transmitted per Round-Trip Time | |||
| (RTT). This is sufficient for congestion control schemes like Reno | (RTT). This is sufficient for congestion control schemes like Reno | |||
| [RFC6582] and CUBIC [RFC9438], as those schemes reduce their | [RFC6582] and CUBIC [RFC9438], as those schemes reduce their | |||
| congestion window by a fixed factor if congestion occurs within an | congestion window by a fixed factor if congestion occurs within an | |||
| RTT independent of the number of received congestion markings. | RTT independent of the number of received congestion markings. More | |||
| Recently, proposed mechanisms like Congestion Exposure (ConEx | recently defined mechanisms like Congestion Exposure (ConEx | |||
| [RFC7713]), DCTCP [RFC8257], and L4S [RFC9330] need to know when more | [RFC7713]), DCTCP [RFC8257], and L4S [RFC9330] need to know when more | |||
| than one marking is received in one RTT, which is information that | than one marking is received in one RTT, which is information that | |||
| cannot be provided by the feedback scheme as specified in [RFC3168]. | cannot be provided by the feedback scheme as specified in [RFC3168]. | |||
| This document specifies an update to the ECN feedback scheme of RFC | This document specifies an update to the ECN feedback scheme of RFC | |||
| 3168 that provides more accurate information and could be used by | 3168 that provides more accurate information and could be used by | |||
| these and potentially other future TCP extensions, while still also | these and potentially other future TCP extensions, while still also | |||
| supporting the pre-existing TCP congestion controllers that use just | supporting the pre-existing TCP congestion controllers that use just | |||
| one feedback signal per round. Congestion control is the term the | one feedback signal per round. Congestion control is the term the | |||
| IETF uses to describe data rate management. It is the algorithm that | IETF uses to describe data rate management. It is the algorithm that | |||
| a sender uses to optimize its sending rate so that it transmits data | a sender uses to optimize its sending rate so that it transmits data | |||
| as fast as the network can carry it, but no faster. A fuller | as fast as the network can carry it, but no faster. A fuller | |||
| description of the motivation for this specification is given in the | description of the motivation for this specification is given in the | |||
| associated requirements document [RFC7560]. | associated requirements document [RFC7560]. | |||
| This document specifies a Standards Track scheme for ECN feedback in | This document specifies a Standards Track scheme for ECN feedback in | |||
| the TCP header to provide more than one feedback signal per RTT. It | the TCP header to provide more than one feedback signal per RTT. It | |||
| is called the more "Accurate ECN" feedback scheme, or AccECN for | is called the "more Accurate ECN feedback" scheme, or AccECN for | |||
| short. This document updates RFC 3168 with respect to negotiation | short. This document updates RFC 3168 with respect to negotiation | |||
| and use of the feedback scheme for TCP. All aspects of RFC 3168 | and use of the feedback scheme for TCP. All aspects of RFC 3168 | |||
| other than the TCP feedback scheme and its negotiation remain | other than the TCP feedback scheme and its negotiation remain | |||
| unchanged by this specification. In particular, the definition of | unchanged by this specification. In particular, the definition of | |||
| ECN at the IP layer is unaffected. Section 4 details the aspects of | ECN at the IP layer is unaffected. Section 4 details the aspects of | |||
| RFC 3168 that are updated by this document. | RFC 3168 that are updated by this document. | |||
| This document uses the term "Classic ECN feedback" when it needs to | This document uses the term "Classic ECN feedback" when it needs to | |||
| distinguish the TCP/ECN feedback scheme defined in [RFC3168] from the | distinguish the TCP/ECN feedback scheme defined in [RFC3168] from the | |||
| AccECN TCP feedback scheme. AccECN is intended to offer a complete | AccECN TCP feedback scheme. AccECN is intended to offer a complete | |||
| skipping to change at line 224 ¶ | skipping to change at line 224 ¶ | |||
| CUBIC, AccECN can be used to respond to the extent of congestion | CUBIC, AccECN can be used to respond to the extent of congestion | |||
| notification over a round trip, as for example DCTCP does in | notification over a round trip, as for example DCTCP does in | |||
| controlled environments [RFC8257]. For congestion response, this | controlled environments [RFC8257]. For congestion response, this | |||
| specification refers to the original ECN specification adopted in | specification refers to the original ECN specification adopted in | |||
| 2001 [RFC3168], as updated by the more relaxed rules introduced in | 2001 [RFC3168], as updated by the more relaxed rules introduced in | |||
| 2018 to allow ECN experiments [RFC8311], namely: a TCP-based Low | 2018 to allow ECN experiments [RFC8311], namely: a TCP-based Low | |||
| Latency Low Loss Scalable (L4S) congestion control [RFC9330]; or | Latency Low Loss Scalable (L4S) congestion control [RFC9330]; or | |||
| Alternative Backoff with ECN (ABE) [RFC8511]. | Alternative Backoff with ECN (ABE) [RFC8511]. | |||
| Section 5.2 explains how AccECN is compatible with current commonly | Section 5.2 explains how AccECN is compatible with current commonly | |||
| used TCP options, and a number of current experimental modifications | used TCP Options, and a number of current experimental modifications | |||
| to TCP, as well as SYN cookies. | to TCP, as well as SYN cookies. | |||
| 1.1. Document Roadmap | 1.1. Document Roadmap | |||
| The following introductory section outlines the goals of AccECN | The following introductory section outlines the goals of AccECN | |||
| (Section 1.2). Then, terminology is defined (Section 1.3) and a | (Section 1.2). Then, terminology is defined (Section 1.3) and a | |||
| recap of existing prerequisite technology is given (Section 1.4). | recap of existing prerequisite technology is given (Section 1.4). | |||
| Section 2 gives an informative overview of the AccECN protocol. Then | Section 2 gives an informative overview of the AccECN protocol. Then | |||
| Section 3 gives the normative protocol specification, and Section 3.3 | Section 3 gives the normative protocol specification, and Section 3.3 | |||
| skipping to change at line 257 ¶ | skipping to change at line 257 ¶ | |||
| main TCP header and quantifies the space left for future use. | main TCP header and quantifies the space left for future use. | |||
| 1.2. Goals | 1.2. Goals | |||
| [RFC7560] enumerates requirements that a candidate feedback scheme | [RFC7560] enumerates requirements that a candidate feedback scheme | |||
| needs to satisfy, under the headings: resilience, timeliness, | needs to satisfy, under the headings: resilience, timeliness, | |||
| integrity, accuracy (including ordering and lack of bias), | integrity, accuracy (including ordering and lack of bias), | |||
| complexity, overhead, and compatibility (both backward and forward). | complexity, overhead, and compatibility (both backward and forward). | |||
| It recognizes that a perfect scheme that fully satisfies all the | It recognizes that a perfect scheme that fully satisfies all the | |||
| requirements is unlikely and trade-offs between requirements are | requirements is unlikely and trade-offs between requirements are | |||
| likely. Section 6 considers the properties of AccECN against these | likely. Section 6 assesses the properties of AccECN against these | |||
| requirements and discusses the trade-offs. | requirements and discusses the trade-offs. | |||
| The requirements document recognizes that a protocol as ubiquitous as | The requirements document recognizes that a protocol as ubiquitous as | |||
| TCP needs to be able to serve as-yet-unspecified requirements. | TCP needs to be able to serve as-yet-unspecified requirements. | |||
| Therefore, an AccECN receiver acts as a generic (mechanistic) | Therefore, an AccECN receiver acts as a generic (mechanistic) | |||
| reflector of congestion information with the aim that new sender | reflector of congestion information with the aim that new sender | |||
| behaviours can be deployed unilaterally (see Section 2.5) in the | behaviours can be deployed unilaterally in the future (see | |||
| future. | Section 2.5). | |||
| 1.3. Terminology | 1.3. Terminology | |||
| AccECN: The more Accurate ECN feedback scheme is called AccECN for | AccECN: The more Accurate ECN feedback scheme. | |||
| short. | ||||
| Classic ECN: The ECN protocol specified in [RFC3168]. | Classic ECN: The ECN protocol specified in [RFC3168]. | |||
| Classic ECN feedback: The feedback aspect of the ECN protocol | Classic ECN feedback: The feedback aspect of the ECN protocol | |||
| specified in [RFC3168], including generation, encoding, | specified in [RFC3168], including generation, encoding, | |||
| transmission and decoding of feedback, but not the Data Sender's | transmission and decoding of feedback, but not the Data Sender's | |||
| subsequent response to that feedback. | subsequent response to that feedback. | |||
| ACK: A TCP acknowledgement, with or without a data payload (ACK=1). | ACK: A TCP acknowledgement, with or without a data payload (ACK=1). | |||
| skipping to change at line 315 ¶ | skipping to change at line 314 ¶ | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in | "OPTIONAL" in this document are to be interpreted as described in | |||
| BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | capitals, as shown here. | |||
| 1.4. Recap of Existing ECN Feedback in IP/TCP | 1.4. Recap of Existing ECN Feedback in IP/TCP | |||
| Explicit Congestion Notification (ECN) [RFC3168] can be split into | Explicit Congestion Notification (ECN) [RFC3168] can be split into | |||
| two parts conceptionally. In the forward direction, alongside the | two parts conceptually. In the forward direction, alongside the data | |||
| data stream, it uses a 2-bit field in the IP header. This is | stream, it uses a 2-bit field in the IP header. This is referred to | |||
| referred to as IP-ECN later on. This signal carried in the IP (Layer | as IP ECN later on. This signal carried in the IP (Layer 3) header | |||
| 3) header is exposed to network devices and may be modified when such | is exposed to network devices, which can modify it when they start to | |||
| a device starts to experience congestion (see Table 1). The second | experience congestion (see Table 1). The second part is the feedback | |||
| part is the feedback mechanism, by which the original data sender is | mechanism, by which the data receiver notifies the current congestion | |||
| notified of the current congestion state of the intermediate path. | state to the original data sender of the intermediate path. That | |||
| That returned signal is carried in a protocol-specific manner, and is | returned signal is carried in a transport-protocol-specific manner, | |||
| not to be modified by intermediate network devices. While ECN is in | and is not to be modified by intermediate network devices. While ECN | |||
| active use for protocols such as QUIC [RFC9000], SCTP [RFC9260], RTP | is in active use for protocols such as QUIC [RFC9000], SCTP | |||
| [RFC6679], and Remote Direct Memory Access over Converged Ethernet | [RFC9260], RTP [RFC6679], and Remote Direct Memory Access over | |||
| [RoCEv2], this document only concerns itself with the specific | Converged Ethernet [RoCEv2], this document only concerns itself with | |||
| implementation for the TCP protocol. | the specific implementation for the TCP protocol. | |||
| Once ECN has been negotiated for a transport layer connection, the | Once ECN has been negotiated for a transport layer connection, the | |||
| Data Sender for either half-connection can set two possible | Data Sender for either half-connection can set two possible | |||
| codepoints (ECT(0) or ECT(1)) in the IP header of a data packet to | codepoints (ECT(0) or ECT(1)) in the IP header of a data packet to | |||
| indicate an ECN-capable transport (ECT). If the ECN codepoint is | indicate an ECN-capable transport (ECT). If the ECN codepoint is | |||
| 0b00, the packet is considered to have been sent by a Not ECN-capable | 0b00, the packet is considered to have been sent by a Not ECN-capable | |||
| Transport (Not-ECT). When a network node experiences congestion, it | Transport (Not-ECT). When a network node experiences congestion, it | |||
| will occasionally either drop or mark a packet, with the choice | will occasionally either drop or mark a packet, with the choice | |||
| depending on the packet's ECN codepoint. If the codepoint is Not- | depending on the packet's ECN codepoint. If the codepoint is Not- | |||
| ECT, only drop is appropriate. If the codepoint is ECT(0) or ECT(1), | ECT, only drop is appropriate. If the codepoint is ECT(0) or ECT(1), | |||
| the node can mark the packet by setting the ECN codepoint to 0b11, | the node can mark the packet by setting the ECN codepoint to 0b11, | |||
| which is termed 'Congestion Experienced' (CE), or loosely a | which is termed 'Congestion Experienced' (CE), or loosely a | |||
| 'congestion mark'. Table 1 summarises these codepoints. | 'congestion mark'. Table 1 summarises these codepoints. | |||
| +==================+================+===========================+ | +==================+================+===========================+ | |||
| | IP-ECN codepoint | Codepoint name | Description | | | IP-ECN Codepoint | Codepoint Name | Description | | |||
| +==================+================+===========================+ | +==================+================+===========================+ | |||
| | 0b00 | Not-ECT | Not ECN-Capable Transport | | | 0b00 | Not-ECT | Not ECN-Capable Transport | | |||
| +------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| | 0b01 | ECT(1) | ECN-Capable Transport (1) | | | 0b01 | ECT(1) | ECN-Capable Transport (1) | | |||
| +------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| | 0b10 | ECT(0) | ECN-Capable Transport (0) | | | 0b10 | ECT(0) | ECN-Capable Transport (0) | | |||
| +------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| | 0b11 | CE | Congestion Experienced | | | 0b11 | CE | Congestion Experienced | | |||
| +------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| skipping to change at line 405 ¶ | skipping to change at line 404 ¶ | |||
| Like the general TCP approach, the Data Receiver of each TCP half- | Like the general TCP approach, the Data Receiver of each TCP half- | |||
| connection sends AccECN feedback to the Data Sender on TCP | connection sends AccECN feedback to the Data Sender on TCP | |||
| acknowledgements, reusing data packets of the other half-connection | acknowledgements, reusing data packets of the other half-connection | |||
| whenever possible. | whenever possible. | |||
| The AccECN protocol has had to be designed in two parts: | The AccECN protocol has had to be designed in two parts: | |||
| * an essential feedback part that reuses the TCP-ECN header bits for | * an essential feedback part that reuses the TCP-ECN header bits for | |||
| the Data Receiver to feed back the number of packets arriving with | the Data Receiver to feed back the number of packets arriving with | |||
| CE in the IP-ECN field. This provides more accuracy than Classic | CE in the IP-ECN field. This provides more accuracy than Classic | |||
| ECN feedback, but limited resilience against ACK loss; | ECN feedback, but limited resilience against ACK loss. | |||
| * a supplementary feedback part using one of two new alternative | * a supplementary feedback part using one of two new alternative | |||
| AccECN TCP options that provide additional feedback on the number | AccECN TCP Options that provide additional feedback on the number | |||
| of payload bytes that arrive marked with each of the three ECN | of payload bytes that arrive marked with each of the three ECN | |||
| codepoints in the IP-ECN field (not just CE marks). See the BCP | codepoints in the IP-ECN field (not just CE marks). See the BCP | |||
| on Byte and Packet Congestion Notification [RFC7141] for the | on Byte and Packet Congestion Notification [RFC7141] for the | |||
| rationale determining that conveying congested payload bytes | rationale determining that conveying congested payload bytes | |||
| should be preferred over just providing feedback about congested | should be preferred over just providing feedback about congested | |||
| packets. This also provides greater resilience against ACK loss | packets. This also provides greater resilience against ACK loss | |||
| than the essential feedback, but it is currently more likely to | than the essential feedback, but it is currently more likely to | |||
| suffer from middlebox interference. | suffer from middlebox interference. | |||
| The two part design was necessary, given limitations on the space | The two part design was necessary, given limitations on the space | |||
| available for TCP options and given the possibility that certain | available for TCP Options and given the possibility that certain | |||
| incorrectly designed middleboxes might prevent TCP from using any new | incorrectly designed middleboxes might prevent TCP from using any new | |||
| options. | options. | |||
| The essential feedback part overloads the previous definition of the | The essential feedback part overloads the previous definition of the | |||
| three flags in the TCP header that had been assigned for use by | three flags in the TCP header that had been assigned for use by | |||
| Classic ECN. This design choice deliberately allows AccECN peers to | Classic ECN. This design choice deliberately allows AccECN peers to | |||
| replace the Classic ECN feedback protocol, rather than leaving | replace the Classic ECN feedback protocol, rather than leaving | |||
| Classic ECN feedback intact and adding more accurate feedback | Classic ECN feedback intact and adding more accurate feedback | |||
| separately because: | separately because: | |||
| * this efficiently reuses scarce TCP header space, given TCP option | * this efficiently reuses scarce TCP header space, given TCP Option | |||
| space is approaching saturation; | space is approaching saturation; | |||
| * a single upgrade path for the TCP protocol is preferable to a fork | * a single upgrade path for the TCP protocol is preferable to a fork | |||
| in the design that modifies the TCP header to convey all ECN | in the design that modifies the TCP header to convey all ECN | |||
| feedback; | feedback; | |||
| * otherwise, Classic and Accurate ECN feedback could give | * otherwise, Classic and Accurate ECN feedback could give | |||
| conflicting feedback about the same segment, which could open up | conflicting feedback about the same segment, which could open up | |||
| new security concerns and make implementations unnecessarily | new security concerns and make implementations unnecessarily | |||
| complex; | complex; | |||
| * middleboxes are more likely to faithfully forward the TCP ECN | * middleboxes are more likely to faithfully forward the TCP-ECN | |||
| flags than newly defined areas of the TCP header. | flags than newly defined areas of the TCP header. | |||
| AccECN is designed to work even if the supplementary feedback part is | AccECN is designed to work even if the supplementary feedback part is | |||
| removed or zeroed out, as long as the essential feedback part gets | removed or zeroed out, as long as the essential feedback part gets | |||
| through. | through. | |||
| 2.1. Capability Negotiation | 2.1. Capability Negotiation | |||
| AccECN changes the wire protocol of the main TCP header; therefore, | AccECN changes the wire protocol of the main TCP header; therefore, | |||
| it can only be used if both endpoints have been upgraded to | it can only be used if both endpoints have been upgraded to | |||
| skipping to change at line 472 ¶ | skipping to change at line 471 ¶ | |||
| option space is limited. The TCP Server sends an AccECN Option on | option space is limited. The TCP Server sends an AccECN Option on | |||
| the SYN/ACK, and the TCP Client sends one on the first ACK to test | the SYN/ACK, and the TCP Client sends one on the first ACK to test | |||
| whether the network path forwards these options correctly. | whether the network path forwards these options correctly. | |||
| 2.2. Feedback Mechanism | 2.2. Feedback Mechanism | |||
| A Data Receiver maintains four counters initialized at the start of | A Data Receiver maintains four counters initialized at the start of | |||
| the half-connection. Three count the number of arriving payload | the half-connection. Three count the number of arriving payload | |||
| bytes marked CE, ECT(1), and ECT(0) in the IP-ECN field. These byte | bytes marked CE, ECT(1), and ECT(0) in the IP-ECN field. These byte | |||
| counters reflect only the TCP payload length, excluding the TCP | counters reflect only the TCP payload length, excluding the TCP | |||
| header and TCP options. The fourth counter counts the number of | header and TCP Options. The fourth counter counts the number of | |||
| packets arriving marked with a CE codepoint (including control | packets arriving marked with a CE codepoint (including control | |||
| packets without payload if they are CE-marked). | packets without payload if they are CE-marked). | |||
| The Data Sender maintains four equivalent counters for the half | The Data Sender maintains four equivalent counters for the half- | |||
| connection, and the AccECN protocol is designed to ensure they will | connection, and the AccECN protocol is designed to ensure they will | |||
| match the values in the Data Receiver's counters, albeit after a | match the values in the Data Receiver's counters, albeit after a | |||
| little delay. | little delay. | |||
| Each ACK carries the three least significant bits (LSBs) of the | Each ACK carries the three least significant bits (LSBs) of the | |||
| packet-based CE counter using the ECN bits in the TCP header, now | packet-based CE counter using the ECN bits in the TCP header, now | |||
| renamed the Accurate ECN (ACE) field (see Figure 3). The 24 LSBs of | renamed the Accurate ECN (ACE) field (see Figure 3). The 24 LSBs of | |||
| some or all of the byte counters can be optionally carried in an | some or all of the byte counters can be optionally carried in an | |||
| AccECN Option. For efficient use of limited option space, two | AccECN Option. For efficient use of limited option space, two | |||
| alternative forms of the AccECN Option are specified with the fields | alternative forms of the AccECN Option are specified with the fields | |||
| in the opposite order to each other. | in the opposite order to each other. | |||
| 2.3. Delayed ACKs and Resilience Against ACK Loss | 2.3. Delayed ACKs and Resilience Against ACK Loss | |||
| With both the ACE and the AccECN Option mechanisms, the Data Receiver | With both the ACE and the AccECN Option mechanisms, the Data Receiver | |||
| continually repeats the current LSBs of each of its respective | continually repeats the current LSBs of each of its respective | |||
| counters. There is no need to acknowledge these continually repeated | counters. There is no need to acknowledge these continually repeated | |||
| counters, so the Congestion Window Reduced (CWR) mechanism of | counters, so the CWR mechanism of [RFC3168] is no longer used. Even | |||
| [RFC3168] is no longer used. Even if some ACKs are lost, the Data | if some ACKs are lost, the Data Sender ought to be able to infer how | |||
| Sender ought to be able to infer how much to increment its own | much to increment its own counters, even if the protocol field has | |||
| counters, even if the protocol field has wrapped. | wrapped. | |||
| The 3-bit ACE field can wrap fairly frequently. Therefore, even if | The 3-bit ACE field can wrap fairly frequently. Therefore, even if | |||
| it appears to have incremented by one (say), the field might have | it appears to have incremented by one (say), the field might have | |||
| actually cycled completely and then incremented by one. The Data | actually cycled completely and then incremented by one. The Data | |||
| Receiver is not allowed to delay sending an ACK to such an extent | Receiver is not allowed to delay sending an ACK to such an extent | |||
| that the ACE field would cycle. However, ACKs received at the Data | that the ACE field would cycle. However, ACKs received at the Data | |||
| Sender could still cycle because a whole sequence of ACKs carrying | Sender could still cycle because a whole sequence of ACKs carrying | |||
| intervening values of the field might all be lost or delayed in | intervening values of the field might all be lost or delayed in | |||
| transit. | transit. | |||
| The fields in an AccECN Option are larger, but they will increment in | The fields in an AccECN Option are larger, but they will increment in | |||
| larger steps because they count bytes not packets. Nonetheless, | larger steps because they count bytes not packets. Nonetheless, | |||
| their size has been chosen such that a whole cycle of the field would | their size has been chosen such that a whole cycle of the field would | |||
| never occur between ACKs unless there has been an infeasibly long | never occur between ACKs unless there had been an infeasibly long | |||
| sequence of ACK losses. Therefore, provided that an AccECN Option is | sequence of ACK losses. Therefore, provided that an AccECN Option is | |||
| available, it can be treated as a dependable feedback channel. | available, it can be treated as a dependable feedback channel. | |||
| If an AccECN Option is not available, e.g., it is being stripped by a | If an AccECN Option is not available, e.g., it is being stripped by a | |||
| middlebox, the AccECN protocol will only feed back information on CE | middlebox, the AccECN protocol will only feed back information on CE | |||
| markings (using the ACE field). Although not ideal, this will be | markings (using the ACE field). Although not ideal, this will be | |||
| sufficient, because it is envisaged that neither ECT(0) nor ECT(1) | sufficient, because it is envisaged that neither ECT(0) nor ECT(1) | |||
| will ever indicate more severe congestion than CE, even though future | will ever indicate more severe congestion than CE, even though future | |||
| uses for ECT(0) or ECT(1) are still unclear [RFC8311]. Because the | uses for ECT(0) or ECT(1) are still unclear [RFC8311]. Because the | |||
| 3-bit ACE field is so small, when it is the only field available, the | 3-bit ACE field is so small, when it is the only field available, the | |||
| skipping to change at line 627 ¶ | skipping to change at line 626 ¶ | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| Figure 2: The New Definition of the TCP Header Flags During the | Figure 2: The New Definition of the TCP Header Flags During the | |||
| TCP Three-Way Handshake | TCP Three-Way Handshake | |||
| During the TCP three-way handshake at the start of a connection, to | During the TCP three-way handshake at the start of a connection, to | |||
| request more Accurate ECN feedback the TCP Client (host A) MUST set | request more Accurate ECN feedback the TCP Client (host A) MUST set | |||
| the TCP flags (AE,CWR,ECE) = (1,1,1) in the initial SYN segment. | the TCP flags (AE,CWR,ECE) = (1,1,1) in the initial SYN segment. | |||
| If a TCP Server (host B) that is AccECN-enabled receives a SYN with | If a TCP Server (host B) that is AccECN-enabled receives a SYN with | |||
| the above three flags set, it MUST set both its half connections into | the above three flags set, it MUST set both its half-connections into | |||
| AccECN mode. Then it MUST set the AE, CWR, and ECE TCP flags on the | AccECN mode. Then it MUST set the AE, CWR, and ECE TCP flags on the | |||
| SYN/ACK to the combination in the top block of Table 2 that feeds | SYN/ACK to the combination in the top block of Table 2 that feeds | |||
| back the IP-ECN field that arrived on the SYN. This applies whether | back the IP-ECN field that arrived on the SYN. This applies whether | |||
| or not the Server itself supports setting the IP-ECN field on a SYN | or not the Server itself supports setting the IP-ECN field on a SYN | |||
| or SYN/ACK (see Section 2.5 for rationale). | or SYN/ACK (see Section 2.5 for rationale). | |||
| When the TCP Server returns any of the four combinations in the top | When the TCP Server returns any of the four combinations in the top | |||
| block of Table 2, it confirms that it supports AccECN. The TCP | block of Table 2, it confirms that it supports AccECN. The TCP | |||
| Server MUST NOT set one of these four combinations of flags on the | Server MUST NOT set one of these four combinations of flags on the | |||
| SYN/ACK unless the preceding SYN requested support for AccECN as | SYN/ACK unless the preceding SYN requested support for AccECN as | |||
| above. | above. | |||
| Once a TCP Client (A) has sent the above SYN to declare that it | Once a TCP Client (A) has sent the above SYN to declare that it | |||
| supports AccECN, and once it has received the above SYN/ACK segment | supports AccECN, and once it has received the above SYN/ACK segment | |||
| that confirms that the TCP Server supports AccECN, the TCP Client | that confirms that the TCP Server supports AccECN, the TCP Client | |||
| MUST set both its half connections into AccECN mode. The TCP Client | MUST set both its half-connections into AccECN mode. The TCP Client | |||
| MUST NOT enter AccECN mode (or any feedback mode) before it has | MUST NOT enter AccECN mode (or any feedback mode) before it has | |||
| received the first SYN/ACK. | received the first SYN/ACK. | |||
| Once in AccECN mode, a TCP Client or Server has the rights and | Once in AccECN mode, a TCP Client or Server has the rights and | |||
| obligations to participate in the ECN protocol defined in | obligations to participate in the ECN protocol defined in | |||
| Section 3.1.5. | Section 3.1.5. | |||
| The procedures for retransmission of SYNs or SYN/ACKs are given in | The procedures for retransmission of SYNs or SYN/ACKs are given in | |||
| Section 3.1.4. | Section 3.1.4. | |||
| It is RECOMMENDED that the AccECN protocol be implemented alongside | It is RECOMMENDED that the AccECN protocol be implemented alongside | |||
| Selective Acknowledgement (SACK) [RFC2018]. If SACK is implemented | Selective Acknowledgement (SACK) [RFC2018]. If SACK is implemented | |||
| with AccECN, Duplicate Selective Acknowledgement (D-SACK) [RFC2883] | with AccECN, Duplicate Selective Acknowledgement (D-SACK) [RFC2883] | |||
| MUST also be implemented. | MUST also be implemented. | |||
| 3.1.2. Backward Compatibility | 3.1.2. Backward Compatibility | |||
| The three flags are set to 1 to indicate AccECN support on the SYN | The setting of all three flags to 1 in order to indicate AccECN | |||
| have been carefully chosen to enable natural fall-back to prior | support on the SYN was carefully chosen to enable natural fall-back | |||
| stages in the evolution of ECN. Table 2 tabulates all the | to prior stages in the evolution of ECN. Table 2 tabulates all the | |||
| negotiation possibilities for ECN-related capabilities that involve | negotiation possibilities for ECN-related capabilities that involve | |||
| at least one AccECN-capable host. The entries in the first two | at least one AccECN-capable host. The entries in the first two | |||
| columns have been abbreviated, as follows: | columns have been abbreviated, as follows: | |||
| AccECN: Supports more Accurate ECN feedback (the present | AccECN: Supports more Accurate ECN feedback (the present | |||
| specification) | specification). | |||
| Nonce: Supports ECN-nonce feedback [RFC3540] | Nonce: Supports ECN-nonce feedback [RFC3540]. | |||
| ECN: Supports 'Classic' ECN feedback [RFC3168] | ECN: Supports 'Classic' ECN feedback [RFC3168]. | |||
| No ECN: Not ECN-capable. Implicit congestion notification using | No ECN: Not ECN-capable. Implicit congestion notification using | |||
| packet drop. | packet drop. | |||
| +========+========+============+============+======================+ | +========+========+============+============+======================+ | |||
| | Host A | Host B | SYN | SYN/ACK | Feedback Mode | | | Host A | Host B | SYN | SYN/ACK | Feedback Mode | | |||
| | | | A->B | B->A | of Host A | | | | | A->B | B->A | of Host A | | |||
| | | | AE CWR ECE | AE CWR ECE | | | | | | AE CWR ECE | AE CWR ECE | | | |||
| +========+========+============+============+======================+ | +========+========+============+============+======================+ | |||
| | AccECN | AccECN | 1 1 1 | 0 1 0 | AccECN (Not-ECT SYN) | | | AccECN | AccECN | 1 1 1 | 0 1 0 | AccECN (Not-ECT SYN) | | |||
| skipping to change at line 716 ¶ | skipping to change at line 715 ¶ | |||
| row. | row. | |||
| 1. The top block shows the case already described in Section 3.1 | 1. The top block shows the case already described in Section 3.1 | |||
| where both endpoints support AccECN and how the TCP Server (B) | where both endpoints support AccECN and how the TCP Server (B) | |||
| indicates congestion feedback. | indicates congestion feedback. | |||
| 2. The second block shows the cases where the TCP Client (A) | 2. The second block shows the cases where the TCP Client (A) | |||
| supports AccECN but the TCP Server (B) supports some earlier | supports AccECN but the TCP Server (B) supports some earlier | |||
| variant of TCP feedback, as indicated in its SYN/ACK. Therefore, | variant of TCP feedback, as indicated in its SYN/ACK. Therefore, | |||
| as soon as an AccECN-capable TCP Client (A) receives the SYN/ACK | as soon as an AccECN-capable TCP Client (A) receives the SYN/ACK | |||
| shown, it MUST set both its half connections into the feedback | shown, it MUST set both its half-connections into the feedback | |||
| mode shown in the rightmost column. If the TCP Client has set | mode shown in the rightmost column. If the TCP Client has set | |||
| itself into Classic ECN feedback mode, it MUST comply with | itself into Classic ECN feedback mode, it MUST comply with | |||
| [RFC3168]. | [RFC3168]. | |||
| An AccECN implementation has no need to recognize or support the | An AccECN implementation has no need to recognize or support the | |||
| Server response labelled 'Nonce' or ECN-nonce feedback more | Server response labelled 'Nonce' or ECN-nonce feedback more | |||
| generally [RFC3540], as RFC 3540 has been reclassified as | generally [RFC3540], as RFC 3540 has been reclassified as | |||
| Historic [RFC8311]. AccECN is compatible with alternative ECN | Historic [RFC8311]. AccECN is compatible with alternative ECN | |||
| feedback integrity approaches to the nonce (see Section 5.3). | feedback integrity approaches to the nonce (see Section 5.3). | |||
| The SYN/ACK labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1) is | The SYN/ACK labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1) is | |||
| skipping to change at line 738 ¶ | skipping to change at line 737 ¶ | |||
| SYN/ACK follows the procedure for forward compatibility given in | SYN/ACK follows the procedure for forward compatibility given in | |||
| Section 3.1.3. | Section 3.1.3. | |||
| 3. The third block shows the cases where the TCP Server (B) supports | 3. The third block shows the cases where the TCP Server (B) supports | |||
| AccECN but the TCP Client (A) supports some earlier variant of | AccECN but the TCP Client (A) supports some earlier variant of | |||
| TCP feedback, as indicated in its SYN. | TCP feedback, as indicated in its SYN. | |||
| When an AccECN-enabled TCP Server (B) receives a SYN with | When an AccECN-enabled TCP Server (B) receives a SYN with | |||
| (AE,CWR,ECE) = (0,1,1), it MUST do one of the following: | (AE,CWR,ECE) = (0,1,1), it MUST do one of the following: | |||
| * set both its half connections into the Classic ECN feedback | * set both its half-connections into the Classic ECN feedback | |||
| mode and return a SYN/ACK with (AE,CWR,ECE) = (0,0,1) as | mode and return a SYN/ACK with (AE,CWR,ECE) = (0,0,1) as | |||
| shown. Then it MUST comply with [RFC3168]. | shown. Then it MUST comply with [RFC3168]. | |||
| * set both its half-connections into Not ECN mode and return a | * set both its half-connections into Not ECN mode and return a | |||
| SYN/ACK with (AE,CWR,ECE) = (0,0,0), then continue with ECN | SYN/ACK with (AE,CWR,ECE) = (0,0,0), then continue with ECN | |||
| disabled. This latter case is unlikely to be desirable, but | disabled. This latter case is unlikely to be desirable, but | |||
| it is allowed as a possibility, e.g., for minimal TCP | it is allowed as a possibility, e.g., for minimal TCP | |||
| implementations. | implementations. | |||
| When an AccECN-enabled TCP Server (B) receives a SYN with | When an AccECN-enabled TCP Server (B) receives a SYN with | |||
| (AE,CWR,ECE) = (0,0,0), it MUST set both its half connections | (AE,CWR,ECE) = (0,0,0), it MUST set both its half-connections | |||
| into the Not ECN feedback mode, return a SYN/ACK with | into the Not ECN feedback mode, return a SYN/ACK with | |||
| (AE,CWR,ECE) = (0,0,0) as shown, and continue with ECN disabled. | (AE,CWR,ECE) = (0,0,0) as shown, and continue with ECN disabled. | |||
| 4. The fourth block displays a combination labelled 'Broken'. Some | 4. The fourth block displays a combination labelled 'Broken'. Some | |||
| older TCP Server implementations incorrectly set the TCP-ECN | older TCP Server implementations incorrectly set the TCP-ECN | |||
| flags in the SYN/ACK by reflecting those in the SYN. Such broken | flags in the SYN/ACK by reflecting those in the SYN. Such broken | |||
| TCP Servers (B) cannot support ECN; so as soon as an AccECN- | TCP Servers (B) cannot support ECN; so as soon as an AccECN- | |||
| capable TCP Client (A) receives such a broken SYN/ACK, it MUST | capable TCP Client (A) receives such a broken SYN/ACK, it MUST | |||
| fall back to Not ECN mode for both its half connections and | fall back to Not ECN mode for both its half-connections and | |||
| continue with ECN disabled. | continue with ECN disabled. | |||
| The following additional rules do not fit the structure of the table, | The following additional rules do not fit the structure of the table, | |||
| but they complement it: | but they complement it: | |||
| Simultaneous Open: An originating AccECN Host (A), having sent a SYN | Simultaneous Open: An originating AccECN Host (A), having sent a SYN | |||
| with (AE,CWR,ECE) = (1,1,1), might receive another SYN from host | with (AE,CWR,ECE) = (1,1,1), might receive another SYN from host | |||
| B. Host A MUST then enter the same feedback mode as it would have | B. Host A MUST then enter the same feedback mode as it would have | |||
| entered had it been a responding host and received the same SYN. | entered had it been a responding host and received the same SYN. | |||
| Then host A MUST send the same SYN/ACK as it would have sent had | Then host A MUST send the same SYN/ACK as it would have sent had | |||
| skipping to change at line 802 ¶ | skipping to change at line 801 ¶ | |||
| mode as if the SYN/ACK confirmed that the Server supported AccECN and | mode as if the SYN/ACK confirmed that the Server supported AccECN and | |||
| as if it fed back that the IP-ECN field on the SYN had arrived | as if it fed back that the IP-ECN field on the SYN had arrived | |||
| unchanged. However, an AccECN Server implementation MUST NOT send a | unchanged. However, an AccECN Server implementation MUST NOT send a | |||
| SYN/ACK with this combination (AE,CWR,ECE) = (1,0,1). | SYN/ACK with this combination (AE,CWR,ECE) = (1,0,1). | |||
| | For the avoidance of doubt, the behaviour described in the | | For the avoidance of doubt, the behaviour described in the | |||
| | present specification applies whether or not the three | | present specification applies whether or not the three | |||
| | remaining reserved TCP header flags are zero. | | remaining reserved TCP header flags are zero. | |||
| All of these requirements ensure that future uses of all the Reserved | All of these requirements ensure that future uses of all the Reserved | |||
| combinations on a SYN or SYN/ACK can rely on consistent behaviour | combinations of all the TCP header bits on a SYN or SYN/ACK (see | |||
| from the installed base of AccECN implementations. See Appendix B.3 | Table 2) can rely on consistent behaviour from the installed base of | |||
| for related discussion. | AccECN implementations. See Appendix B.3 for related discussion. | |||
| 3.1.4. Multiple SYNs or SYN/ACKs | 3.1.4. Multiple SYNs or SYN/ACKs | |||
| 3.1.4.1. Retransmitted SYNs | 3.1.4.1. Retransmitted SYNs | |||
| If the sender of an AccECN SYN (the TCP Client) times out before | If the sender of an AccECN SYN (the TCP Client) times out before | |||
| receiving the SYN/ACK, it SHOULD attempt to negotiate the use of | receiving the SYN/ACK, it SHOULD attempt to negotiate the use of | |||
| AccECN at least one more time by continuing to set all three TCP ECN | AccECN at least one more time by continuing to set all three TCP-ECN | |||
| flags (AE,CWR,ECE) = (1,1,1) on the first retransmitted SYN (using | flags (AE,CWR,ECE) = (1,1,1) on the first retransmitted SYN (using | |||
| the usual retransmission timeouts). If this first retransmission | the usual retransmission timeouts). If this first retransmission | |||
| also fails to be acknowledged, in deployment scenarios where AccECN | also fails to be acknowledged, in deployment scenarios where AccECN | |||
| path traversal might be problematic, the TCP Client SHOULD send | path traversal might be problematic, the TCP Client SHOULD send | |||
| subsequent retransmissions of the SYN with the three TCP-ECN flags | subsequent retransmissions of the SYN with the three TCP-ECN flags | |||
| cleared (AE,CWR,ECE) = (0,0,0). Such a retransmitted SYN MUST use | cleared (AE,CWR,ECE) = (0,0,0). Such a retransmitted SYN MUST use | |||
| the same initial sequence number (ISN) as the original SYN. | the same initial sequence number (ISN) as the original SYN. | |||
| Retrying once before fall-back adds delay in the case where a | Retrying once before fall-back adds delay in the case where a | |||
| middlebox drops an AccECN (or ECN) SYN deliberately. However, recent | middlebox drops an AccECN (or ECN) SYN deliberately. However, recent | |||
| measurements [Mandalari18] imply that a drop is less likely to be due | measurements [Mandalari18] imply that a drop is less likely to be due | |||
| to middlebox interference than other intermittent causes of loss, | to middlebox interference than other intermittent causes of loss, | |||
| e.g., congestion, wireless transmission loss, etc. | e.g., congestion, wireless transmission loss, etc. | |||
| Implementers MAY use other fall-back strategies if they are found to | Implementers MAY use other fall-back strategies if they are found to | |||
| be more effective (e.g., attempting to negotiate AccECN on the SYN | be more effective, e.g., attempting to negotiate AccECN on the SYN | |||
| only once or more than twice (most appropriate during high levels of | only once or more than twice (most appropriate during high levels of | |||
| congestion). | congestion). | |||
| Further it might make sense to also remove any other new or | Further it might make sense to also remove any other new or | |||
| experimental fields or options on the SYN in case a middlebox might | experimental fields or options on the SYN in case a middlebox might | |||
| be blocking them, although the required behaviour will depend on the | be blocking them, although the required behaviour will depend on the | |||
| specification of the other option(s) and any attempt to coordinate | specification of the other option(s) and any attempt to coordinate | |||
| fall-back between different modules of the stack. For instance, even | fall-back between different modules of the stack. For instance, if | |||
| if taking part in an [RFC8311] experiment that allows ECT on a SYN, | taking part in an [RFC8311] experiment that allows ECT on a SYN, it | |||
| it would be advisable to try it without. | would be advisable to have a fall-back strategy that tries use of | |||
| AccECN without setting ECT on the SYN. | ||||
| Whichever fall-back strategy is used, the TCP initiator SHOULD cache | Whichever fall-back strategy is used, the TCP initiator SHOULD cache | |||
| failed connection attempts. If it does, it SHOULD NOT give up | failed connection attempts. If it does, it SHOULD NOT give up | |||
| attempting to negotiate AccECN on the SYN of subsequent connection | attempting to negotiate AccECN on the SYN of subsequent connection | |||
| attempts until it is clear that the blockage is persistently and | attempts until it is clear that the blockage is persistently and | |||
| specifically due to AccECN. The cache needs to be arranged to expire | specifically due to AccECN. The cache needs to be arranged to expire | |||
| so that the initiator will infrequently attempt to check whether the | so that the initiator will infrequently attempt to check whether the | |||
| problem has been resolved. | problem has been resolved. | |||
| All fall-back strategies will need to follow all the normative rules | All fall-back strategies will need to follow all the normative rules | |||
| in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs | in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs | |||
| negotiating different types of feedback have been sent within the | negotiating different types of feedback have been sent within the | |||
| same connection, including the possibility that they arrive out of | same connection, including the possibility that they arrive out of | |||
| order. As examples, the following non-normative bullets call out | order. As examples, the following non-normative bullets call out | |||
| those rules from Section 3.1.5 that apply to the above fall-back | those rules from Section 3.1.5 that apply to the above fall-back | |||
| strategies: | strategies: | |||
| * Once the TCP Client has sent SYNs with (AE,CWR,ECE) = (1,1,1) and | * Once the TCP Client has sent SYNs with (AE,CWR,ECE) = (1,1,1) and | |||
| with (AE,CWR,ECE) = (0,0,0), it might eventually receive a SYN/ACK | with (AE,CWR,ECE) = (0,0,0), it might eventually receive a SYN/ACK | |||
| from the Server in response to one, the other, or both, and | from the Server in response to one, the other, or both, and | |||
| possibly reordered; | possibly reordered. | |||
| * Such a TCP Client enters the feedback mode appropriate to the | * Such a TCP Client enters the feedback mode appropriate to the | |||
| first SYN/ACK it receives according to Table 2, and it does not | first SYN/ACK it receives according to Table 2, and it does not | |||
| switch to a different mode, whatever other SYN/ACKs it might | switch to a different mode, whatever other SYN/ACKs it might | |||
| receive or send; | receive or send. | |||
| * If a TCP Client has entered AccECN mode but then subsequently | * If a TCP Client has entered AccECN mode but then subsequently | |||
| sends a SYN or receives a SYN/ACK with (AE,CWR,ECE) = (0,0,0), it | sends a SYN or receives a SYN/ACK with (AE,CWR,ECE) = (0,0,0), it | |||
| is still allowed to set ECT on packets for the rest of the | is still allowed to set ECT on packets for the rest of the | |||
| connection. Note that this rule is different than that of a | connection. Note that this rule is different from that of a | |||
| Server in an equivalent position (Section 3.1.5 explains). | Server in an equivalent position (Section 3.1.5 explains). | |||
| * Having entered AccECN mode, in general a TCP Client commits to | * Having entered AccECN mode, in general a TCP Client commits to | |||
| respond to any incoming congestion feedback, whether or not it | respond to any incoming congestion feedback, whether or not it | |||
| sets ECT on outgoing packets (for rationale and some exceptions | sets ECT on outgoing packets (for rationale and some exceptions | |||
| see Section 3.2.2.3, Section 3.2.2.4); | see Section 3.2.2.3, Section 3.2.2.4). | |||
| * Having entered AccECN mode, a TCP Client commits to using AccECN | * Having entered AccECN mode, a TCP Client commits to using AccECN | |||
| to feed back the IP-ECN field in incoming packets for the rest of | to feed back the IP-ECN field in incoming packets for the rest of | |||
| the connection, as specified in Section 3.2, even if it is not | the connection, as specified in Section 3.2, even if it is not | |||
| itself setting ECT on outgoing packets. | itself setting ECT on outgoing packets. | |||
| 3.1.4.2. Retransmitted SYN/ACKs | 3.1.4.2. Retransmitted SYN/ACKs | |||
| A TCP Server might send multiple SYN/ACKs indicating different | A TCP Server might send multiple SYN/ACKs indicating different | |||
| feedback modes. For instance, when falling back to sending a SYN/ACK | feedback modes. For instance, when falling back to sending a SYN/ACK | |||
| skipping to change at line 900 ¶ | skipping to change at line 900 ¶ | |||
| All fall-back strategies will need to follow all the normative rules | All fall-back strategies will need to follow all the normative rules | |||
| in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs | in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs | |||
| negotiating different types of feedback are sent within the same | negotiating different types of feedback are sent within the same | |||
| connection, including the possibility that they arrive out of order. | connection, including the possibility that they arrive out of order. | |||
| As examples, the following non-normative bullets call out those rules | As examples, the following non-normative bullets call out those rules | |||
| from Section 3.1.5 that apply to the above fall-back strategies: | from Section 3.1.5 that apply to the above fall-back strategies: | |||
| * An AccECN-capable TCP Server enters the feedback mode appropriate | * An AccECN-capable TCP Server enters the feedback mode appropriate | |||
| to the first SYN it receives using Table 2, and it does not switch | to the first SYN it receives using Table 2, and it does not switch | |||
| to a different mode, whatever other SYNs it might receive and | to a different mode, whatever other SYNs it might receive and | |||
| whatever SYN/ACKs it might send; | whatever SYN/ACKs it might send. | |||
| * If a TCP Server in AccECN mode receives a SYN with (AE,CWR,ECE) = | * If a TCP Server in AccECN mode receives a SYN with (AE,CWR,ECE) = | |||
| (0,0,0), it preferably acknowledges it first using an AccECN SYN/ | (0,0,0), it preferably acknowledges it first using an AccECN SYN/ | |||
| ACK, but it can retry using a SYN/ACK with (AE,CWR,ECE) = (0,0,0); | ACK, but it can retry using a SYN/ACK with (AE,CWR,ECE) = (0,0,0). | |||
| * If a TCP Server in AccECN mode sends multiple AccECN SYN/ACKs, it | * If a TCP Server in AccECN mode sends multiple AccECN SYN/ACKs, it | |||
| uses the TCP-ECN flags in each SYN/ACK to feed back the IP-ECN | uses the TCP-ECN flags in each SYN/ACK to feed back the IP-ECN | |||
| field on the latest SYN to have arrived; | field on the latest SYN to have arrived. | |||
| * If a TCP Server enters AccECN mode and then subsequently sends a | * If a TCP Server enters AccECN mode and then subsequently sends a | |||
| SYN/ACK or receives a SYN with (AE,CWR,ECE) = (0,0,0), it is | SYN/ACK or receives a SYN with (AE,CWR,ECE) = (0,0,0), it is | |||
| prohibited from setting ECT on any packet for the rest of the | prohibited from setting ECT on any packet for the rest of the | |||
| connection; | connection. | |||
| * Having entered AccECN mode, in general a TCP Server commits to | * Having entered AccECN mode, in general a TCP Server commits to | |||
| respond to any incoming congestion feedback, whether or not it | respond to any incoming congestion feedback, whether or not it | |||
| sets ECT on outgoing packets (for rationale and some exceptions | sets ECT on outgoing packets (for rationale and some exceptions | |||
| see Sections 3.2.2.3, 3.2.2.4); | see Sections 3.2.2.3, 3.2.2.4). | |||
| * Having entered AccECN mode, a TCP Server commits to using AccECN | * Having entered AccECN mode, a TCP Server commits to using AccECN | |||
| to feed back the IP-ECN field in incoming packets for the rest of | to feed back the IP-ECN field in incoming packets for the rest of | |||
| the connection, as specified in Section 3.2, even if it is not | the connection, as specified in Section 3.2, even if it is not | |||
| itself setting ECT on outgoing packets. | itself setting ECT on outgoing packets. | |||
| 3.1.5. Implications of AccECN Mode | 3.1.5. Implications of AccECN Mode | |||
| Section 3.1.1 describes the only ways that a host can enter AccECN | Section 3.1.1 describes the only ways that a host can enter AccECN | |||
| mode, whether as a Client or as a Server. | mode, whether as a Client or as a Server. | |||
| An implementation that supports AccECN has the rights and obligations | An implementation that supports AccECN has the rights and obligations | |||
| concerning the use of ECN defined below, which update those in | concerning the use of ECN defined below, which update those in | |||
| Section 6.1.1 of [RFC3168]. This section uses the following | Section 6.1.1 of [RFC3168]. This section uses the following | |||
| definitions: | definitions: | |||
| 'During the handshake': The connection states prior to | 'During the handshake': The connection states prior to | |||
| synchronization; | synchronization. | |||
| 'Valid SYN': A SYN that has the same port numbers and the same ISN | 'Valid SYN': A SYN that has the same port numbers and the same ISN | |||
| as the SYN that first caused the Server to open the connection. | as the SYN that first caused the Server to open the connection. | |||
| An 'Acceptable' packet is defined in Section 1.3. | An 'Acceptable' packet is defined in Section 1.3. | |||
| Handling SYNs or SYN/ACKs of multiple types (e.g., fall-back): | Handling SYNs or SYN/ACKs of multiple types (e.g., fall-back): | |||
| * Any implementation that supports AccECN: | * Any implementation that supports AccECN: | |||
| - MUST NOT switch into a different feedback mode than the one it | - MUST NOT switch into a different feedback mode from the one it | |||
| first entered according to Table 2, no matter whether it | first entered according to Table 2, no matter whether it | |||
| subsequently receives valid SYNs or Acceptable SYN/ACKs of | subsequently receives valid SYNs or Acceptable SYN/ACKs of | |||
| different types. | different types; | |||
| - SHOULD ignore the TCP-ECN flags in SYNs or SYN/ACKs that are | - SHOULD ignore the TCP-ECN flags in SYNs or SYN/ACKs that are | |||
| received after the implementation reaches the Established | received after the implementation reaches the ESTABLISHED | |||
| state, in line with the general TCP approach [RFC9293]; | state, in line with the general TCP approach [RFC9293]; | |||
| Reason: Reaching established state implies that at least one | Reason: Reaching ESTABLISHED state implies that at least one | |||
| SYN and one SYN/ACK have successfully been delivered. And all | SYN and one SYN/ACK have successfully been delivered. And all | |||
| the rules for handshake fall-back are designed to work based on | the rules for handshake fall-back are designed to work based on | |||
| those packets that successfully traverse the path, whatever | those packets that successfully traverse the path, whatever | |||
| other handshake packets are lost or delayed. | other handshake packets are lost or delayed. | |||
| - MUST NOT send a 'Classic' ECN-setup SYN [RFC3168] with | - MUST NOT send a 'Classic' ECN-setup SYN [RFC3168] with | |||
| (AE,CWR,ECE) = (0,1,1) and a SYN with (AE,CWR,ECE) = (1,1,1) | (AE,CWR,ECE) = (0,1,1) and a SYN with (AE,CWR,ECE) = (1,1,1) | |||
| requesting AccECN feedback within the same connection; | requesting AccECN feedback within the same connection; | |||
| - MUST NOT send a 'Classic' ECN-setup SYN/ACK [RFC3168] with | - MUST NOT send a 'Classic' ECN-setup SYN/ACK [RFC3168] with | |||
| (AE,CWR,ECE) = (0,0,1) and a SYN/ACK agreeing to use AccECN | (AE,CWR,ECE) = (0,0,1) and a SYN/ACK agreeing to use AccECN | |||
| feedback within the same connection; | feedback within the same connection; | |||
| - MUST reset the connection with a RST packet, if it receives a | - MUST reset the connection with a RST packet, if it receives a | |||
| 'Classic' ECN-setup SYN with (AE,CWR,ECE) = (0,1,1) and a SYN | 'Classic' ECN-setup SYN with (AE,CWR,ECE) = (0,1,1) and a SYN | |||
| requesting AccECN feedback during the same handshake; | requesting AccECN feedback during the same handshake; | |||
| - MUST reset the connection with a RST packet, if it receives | - MUST reset the connection with a RST packet, if it receives | |||
| 'Classic' ECN-setup SYN/ACK with (AE,CWR,ECE) = (0,0,1) and a | 'Classic' ECN-setup SYN/ACK with (AE,CWR,ECE) = (0,0,1) and a | |||
| SYN/ACK agreeing to use AccECN feedback during the same | SYN/ACK agreeing to use AccECN feedback during the same | |||
| handshake; | handshake. | |||
| The last four rules are necessary because, if one peer were to | The last four rules are necessary because, if one peer were to | |||
| negotiate the feedback mode in two different types of handshake, | negotiate the feedback mode in two different types of handshake, | |||
| it would not be possible for the other peer to know for certain | it would not be possible for the other peer to know for certain | |||
| which handshake packet(s) the other end had eventually received or | which handshake packet(s) the other end had eventually received or | |||
| in which order it received them. So, in the absence of these | in which order it received them. So, in the absence of these | |||
| rules, the two peers could end up using different ECN feedback | rules, the two peers could end up using different ECN feedback | |||
| modes without knowing it. | modes without knowing it. | |||
| * A host in AccECN mode that is feeding back the IP-ECN field on a | * A host in AccECN mode that is feeding back the IP-ECN field on a | |||
| skipping to change at line 1000 ¶ | skipping to change at line 1000 ¶ | |||
| acceptable SYN/ACK to arrive. | acceptable SYN/ACK to arrive. | |||
| * A TCP Server already in AccECN mode: | * A TCP Server already in AccECN mode: | |||
| - SHOULD acknowledge a valid SYN arriving with (AE,CWR,ECE) = | - SHOULD acknowledge a valid SYN arriving with (AE,CWR,ECE) = | |||
| (0,0,0) by emitting an AccECN SYN/ACK (with the appropriate | (0,0,0) by emitting an AccECN SYN/ACK (with the appropriate | |||
| combination of TCP-ECN flags to feed back the IP-ECN field of | combination of TCP-ECN flags to feed back the IP-ECN field of | |||
| this latest SYN); | this latest SYN); | |||
| - MAY acknowledge a valid SYN arriving with (AE,CWR,ECE) = | - MAY acknowledge a valid SYN arriving with (AE,CWR,ECE) = | |||
| (0,0,0) by sending a SYN/ACK with (AE,CWR,ECE) = (0,0,0); | (0,0,0) by sending a SYN/ACK with (AE,CWR,ECE) = (0,0,0). | |||
| Rationale: When a SYN arrives with (AE,CWR,ECE) = (0,0,0) at a TCP | Rationale: When a SYN arrives with (AE,CWR,ECE) = (0,0,0) at a TCP | |||
| Server that is already in AccECN mode, it implies that the TCP | Server that is already in AccECN mode, it implies that the TCP | |||
| Client had probably not received the previous AccECN SYN/ACK | Client had probably not received the previous AccECN SYN/ACK | |||
| emitted by the TCP Server. Therefore, the first bullet recommends | emitted by the TCP Server. Therefore, the first bullet recommends | |||
| attempting at least one more AccECN SYN/ACK. Nonetheless, the | attempting at least one more AccECN SYN/ACK. Nonetheless, the | |||
| second bullet recognizes that the Server might eventually need to | second bullet recognizes that the Server might eventually need to | |||
| fall back to a non-ECN SYN/ACK. In either case, the TCP Server | fall back to a non-ECN SYN/ACK. In either case, the TCP Server | |||
| remains in AccECN feedback mode (according to the earlier | remains in AccECN feedback mode (according to the earlier | |||
| requirement not to switch modes). | requirement not to switch modes). | |||
| * An AccECN-capable TCP Server already in Not ECN mode: | * An AccECN-capable TCP Server already in Not ECN mode: | |||
| - SHOULD respond to any subsequent valid SYN using a SYN/ACK with | - SHOULD respond to any subsequent valid SYN using a SYN/ACK with | |||
| (AE,CWR,ECE) = (0,0,0), even if the SYN is offering to | (AE,CWR,ECE) = (0,0,0), even if the SYN is offering to | |||
| negotiate Classic ECN or AccECN feedback mode; | negotiate Classic ECN or AccECN feedback mode. | |||
| Rationale: There would be no point in the Server offering any | Rationale: There would be no point in the Server offering any | |||
| type of ECN feedback, because the Client will not be using ECN. | type of ECN feedback, because the Client will not be using ECN. | |||
| However, there is no interoperability reason to make this rule | However, there is no interoperability reason to make this rule | |||
| mandatory. | mandatory. | |||
| If for any reason a host is not willing to provide ECN feedback on a | If for any reason a host is not willing to provide ECN feedback on a | |||
| particular TCP connection, it SHOULD clear the AE, CWR, and ECE flags | particular TCP connection, it SHOULD clear the AE, CWR, and ECE flags | |||
| in all SYN and/or SYN/ACK packets that it sends. | in all SYN and/or SYN/ACK packets that it sends. | |||
| skipping to change at line 1040 ¶ | skipping to change at line 1040 ¶ | |||
| - MUST NOT set ECT if it is in Not ECN feedback mode. | - MUST NOT set ECT if it is in Not ECN feedback mode. | |||
| A Data Sender in AccECN mode: | A Data Sender in AccECN mode: | |||
| - SHOULD set an ECT codepoint in the IP header of packets to | - SHOULD set an ECT codepoint in the IP header of packets to | |||
| indicate to the network that the transport is capable and | indicate to the network that the transport is capable and | |||
| willing to participate in ECN for this packet; | willing to participate in ECN for this packet; | |||
| - MAY not set ECT on any packet (for instance if it has reason to | - MAY not set ECT on any packet (for instance if it has reason to | |||
| believe such a packet would be blocked); | believe such a packet would be blocked). | |||
| A TCP Server in AccECN mode: | A TCP Server in AccECN mode: | |||
| - MUST NOT set ECT on any packet for the rest of the connection, | - MUST NOT set ECT on any packet for the rest of the connection, | |||
| if it has received or sent at least one valid SYN or Acceptable | if it has received or sent at least one valid SYN or Acceptable | |||
| SYN/ACK with (AE,CWR,ECE) = (0,0,0) during the handshake. | SYN/ACK with (AE,CWR,ECE) = (0,0,0) during the handshake. | |||
| This rule solely applies to a Server because, when a Server | This rule solely applies to a Server because, when a Server | |||
| enters AccECN mode, it doesn't know for sure whether the Client | enters AccECN mode, it doesn't know for sure whether the Client | |||
| will end up in AccECN mode. But when a Client enters AccECN | will end up in AccECN mode. But when a Client enters AccECN | |||
| skipping to change at line 1066 ¶ | skipping to change at line 1066 ¶ | |||
| * A host in AccECN mode: | * A host in AccECN mode: | |||
| - is obliged to respond appropriately to AccECN feedback that | - is obliged to respond appropriately to AccECN feedback that | |||
| indicates there were ECN marks on packets it had previously | indicates there were ECN marks on packets it had previously | |||
| sent, where 'appropriately' is defined in Section 6.1 of | sent, where 'appropriately' is defined in Section 6.1 of | |||
| [RFC3168] and updated by Sections 2.1 and 4.1 of [RFC8311]; | [RFC3168] and updated by Sections 2.1 and 4.1 of [RFC8311]; | |||
| - is still obliged to respond appropriately to congestion | - is still obliged to respond appropriately to congestion | |||
| feedback, even when it is solely sending non-ECN-capable | feedback, even when it is solely sending non-ECN-capable | |||
| packets (for rationale, some examples and some exceptions see | packets (for rationale, some examples and some exceptions see | |||
| Sections 3.2.2.3 and 3.2.2.4). | Sections 3.2.2.3 and 3.2.2.4); | |||
| - is still obliged to respond appropriately to congestion | - is still obliged to respond appropriately to congestion | |||
| feedback, even if it has sent or received a SYN or SYN/ACK | feedback, even if it has sent or received a SYN or SYN/ACK | |||
| packet with (AE,CWR,ECE) = (0,0,0) during the handshake; | packet with (AE,CWR,ECE) = (0,0,0) during the handshake; | |||
| - MUST NOT set CWR to indicate that it has received and responded | - MUST NOT set CWR to indicate that it has received and responded | |||
| to indications of congestion. | to indications of congestion. | |||
| For the avoidance of doubt, this is unlike an RFC 3168 data | For the avoidance of doubt, this is unlike an RFC 3168 data | |||
| sender and this does not preclude the Data Sender from setting | sender and this does not preclude the Data Sender from setting | |||
| skipping to change at line 1111 ¶ | skipping to change at line 1111 ¶ | |||
| - MUST NOT use reception of packets with ECT set in the IP-ECN | - MUST NOT use reception of packets with ECT set in the IP-ECN | |||
| field as an implicit signal that the peer is ECN-capable. | field as an implicit signal that the peer is ECN-capable. | |||
| Reason: ECT at the IP layer does not explicitly confirm the | Reason: ECT at the IP layer does not explicitly confirm the | |||
| peer has the correct ECN feedback logic, because the packets | peer has the correct ECN feedback logic, because the packets | |||
| could have been mangled at the IP layer. | could have been mangled at the IP layer. | |||
| 3.2. AccECN Feedback | 3.2. AccECN Feedback | |||
| Each Data Receiver of each half connection maintains four counters, | Each Data Receiver of each half-connection maintains four counters, | |||
| r.cep, r.ceb, r.e0b, and r.e1b: | r.cep, r.ceb, r.e0b, and r.e1b: | |||
| * The Data Receiver MUST increment the CE packet counter (r.cep), | * The Data Receiver MUST increment the CE packet counter (r.cep), | |||
| for every Acceptable packet that it receives with the CE code | for every Acceptable packet that it receives with the CE code | |||
| point in the IP-ECN field, including CE-marked control packets and | point in the IP-ECN field, including CE-marked control packets and | |||
| retransmissions but excluding CE on SYN packets (SYN=1; ACK=0). | retransmissions but excluding CE on SYN packets (SYN=1; ACK=0). | |||
| * A Data Receiver that supports sending of AccECN TCP Options MUST | * A Data Receiver that supports sending of AccECN TCP Options MUST | |||
| increment the r.ceb, r.e0b, or r.e1b byte counters by the number | increment the r.ceb, r.e0b, or r.e1b byte counters by the number | |||
| of TCP payload octets in Acceptable packets marked with the CE, | of TCP payload octets in Acceptable packets marked with the CE, | |||
| ECT(0), and ECT(1) codepoint in their IP-ECN field, including any | ECT(0), and ECT(1) codepoint in their IP-ECN field, including any | |||
| payload octets on control packets and retransmissions, but not | payload octets on control packets and retransmissions, but not | |||
| including any payload octets on SYN packets (SYN=1; ACK=0). | including any payload octets on SYN packets (SYN=1; ACK=0). | |||
| Each Data Sender of each half connection maintains four counters, | Each Data Sender of each half-connection maintains four counters, | |||
| s.cep, s.ceb, s.e0b, and s.e1b, intended to track the equivalent | s.cep, s.ceb, s.e0b, and s.e1b, intended to track the equivalent | |||
| counters at the Data Receiver. | counters at the Data Receiver. | |||
| A Data Receiver feeds back the CE packet counter using the Accurate | A Data Receiver feeds back the CE packet counter using the Accurate | |||
| ECN (ACE) field, as explained in Section 3.2.2. And it optionally | ECN (ACE) field, as explained in Section 3.2.2. And it optionally | |||
| feeds back all the byte counters using the AccECN TCP Option, as | feeds back all the byte counters using the AccECN TCP Option, as | |||
| specified in Section 3.2.3. | specified in Section 3.2.3. | |||
| Whenever a Data Receiver feeds back the value of any counter, it MUST | Whenever a Data Receiver feeds back the value of any counter, it MUST | |||
| report the most recent value, no matter whether it is in a pure ACK, | report the most recent value, no matter whether it is in a pure ACK, | |||
| or an ACK piggybacked on a packet used by the other half-connection, | or an ACK piggybacked on a packet used by the other half-connection, | |||
| whether a new payload data or a retransmission. Therefore, the | whether a new payload data or a retransmission. Therefore, the | |||
| feedback piggybacked on a retransmitted packet is unlikely to be the | feedback piggybacked on a retransmitted packet is unlikely to be the | |||
| same as the feedback on the original packet. | same as the feedback on the original packet. | |||
| 3.2.1. Initialization of Feedback Counters | 3.2.1. Initialization of Feedback Counters | |||
| When a host first enters AccECN mode, in its role as a Data Receiver, | When a host first enters AccECN mode, in its role as a Data Receiver, | |||
| it initializes its counters to r.cep = 5, r.e0b = r.e1b = 1, and | it initializes its counters to r.cep = 5, r.e0b = r.e1b = 1, and | |||
| r.ceb = 0, | r.ceb = 0. | |||
| Non-zero initial values are used to support a stateless handshake | Non-zero initial values are used to support a stateless handshake | |||
| (see Section 5.1) and to be distinct from cases where the fields are | (see Section 5.1) and to be distinct from cases where the fields are | |||
| incorrectly zeroed (e.g., by middleboxes -- see Section 3.2.3.2.4). | incorrectly zeroed (e.g., by middleboxes -- see Section 3.2.3.2.4). | |||
| When a host enters AccECN mode, in its role as a Data Sender, it | When a host enters AccECN mode, in its role as a Data Sender, it | |||
| initializes its counters to s.cep = 5, s.e0b = s.e1b = 1, and s.ceb = | initializes its counters to s.cep = 5, s.e0b = s.e1b = 1, and s.ceb = | |||
| 0. | 0. | |||
| 3.2.2. The ACE Field | 3.2.2. The ACE Field | |||
| skipping to change at line 1203 ¶ | skipping to change at line 1203 ¶ | |||
| retransmission of an unacknowledged SYN/ACK, or when both ends send | retransmission of an unacknowledged SYN/ACK, or when both ends send | |||
| SYN/ACKs after AccECN support has been successfully negotiated during | SYN/ACKs after AccECN support has been successfully negotiated during | |||
| a simultaneous open). | a simultaneous open). | |||
| 3.2.2.1. ACE Field on the ACK of the SYN/ACK | 3.2.2.1. ACE Field on the ACK of the SYN/ACK | |||
| A TCP Client (A) in AccECN mode MUST feed back which of the 4 | A TCP Client (A) in AccECN mode MUST feed back which of the 4 | |||
| possible values of the IP-ECN field was on the SYN/ACK by writing it | possible values of the IP-ECN field was on the SYN/ACK by writing it | |||
| into the ACE field of a pure ACK with no SACK blocks using the binary | into the ACE field of a pure ACK with no SACK blocks using the binary | |||
| encoding in Table 3 (which is the same as that used on the SYN/ACK in | encoding in Table 3 (which is the same as that used on the SYN/ACK in | |||
| Table 2). This shall be called the handshake encoding of the ACE | Table 2). This shall be called the "handshake encoding" of the ACE | |||
| field, and it is the only exception to the rule that the ACE field | field, and it is the only exception to the rule that the ACE field | |||
| carries the 3 least significant bits of the r.cep counter on packets | carries the 3 least significant bits of the r.cep counter on packets | |||
| with SYN=0. | with SYN=0. | |||
| Normally, a TCP Client acknowledges a SYN/ACK with an ACK that | Normally, a TCP Client acknowledges a SYN/ACK with an ACK that | |||
| satisfies the above conditions anyway (SYN=0, no data, no SACK | satisfies the above conditions anyway (SYN=0, no data, no SACK | |||
| blocks). If an AccECN TCP Client intends to acknowledge the SYN/ACK | blocks). If an AccECN TCP Client intends to acknowledge the SYN/ACK | |||
| with a packet that does not satisfy these conditions (e.g., it has | with a packet that does not satisfy these conditions (e.g., it has | |||
| data to include on the ACK), it SHOULD first send a pure ACK that | data to include on the ACK), it SHOULD first send a pure ACK that | |||
| does satisfy these conditions (see Section 5.2), so that it can feed | does satisfy these conditions (see Section 5.2), so that it can feed | |||
| back which of the four values of the IP-ECN field arrived on the SYN/ | back which of the four values of the IP-ECN field arrived on the SYN/ | |||
| ACK. A valid exception to this "SHOULD" would be where the | ACK. A valid exception to this "SHOULD" would be where the | |||
| implementation will only be used in an environment where mangling of | implementation will only be used in an environment where mangling of | |||
| the ECN field is unlikely. | the ECN field is unlikely. | |||
| The TCP Client MUST also use the handshake encoding for the pure ACK | The TCP Client MUST also use the handshake encoding for the pure ACK | |||
| of any retransmitted SYN/ACK that confirms that the TCP Server | of any retransmitted SYN/ACK that confirms that the TCP Server | |||
| supports AccECN. If the final ACK of the handshake does not arrive | supports AccECN. If the TCP Server does not receive the final ACK of | |||
| before its retransmission timer expires, the TCP Server is follow the | the handshake before its retransmission timer expires, the procedure | |||
| procedure given in Section 3.1.4.2. | for it to follow is given in Section 3.1.4.2. | |||
| +==================+================+=====================+ | +==================+================+=====================+ | |||
| | IP-ECN codepoint | ACE on pure | r.cep of TCP Client | | | IP-ECN Codepoint | ACE on Pure | r.cep of TCP Client | | |||
| | on SYN/ACK | ACK of SYN/ACK | in AccECN mode | | | on SYN/ACK | ACK of SYN/ACK | in AccECN Mode | | |||
| +==================+================+=====================+ | +==================+================+=====================+ | |||
| | Not-ECT | 0b010 | 5 | | | Not-ECT | 0b010 | 5 | | |||
| +------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| | ECT(1) | 0b011 | 5 | | | ECT(1) | 0b011 | 5 | | |||
| +------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| | ECT(0) | 0b100 | 5 | | | ECT(0) | 0b100 | 5 | | |||
| +------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| | CE | 0b110 | 6 | | | CE | 0b110 | 6 | | |||
| +------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| Table 3: The Encoding of the ACE Field in the ACK of | Table 3: The Encoding of the ACE Field in the ACK of | |||
| the SYN-ACK to Reflect the SYN-ACK's IP-ECN Field | the SYN-ACK to Reflect the SYN-ACK's IP-ECN Field | |||
| When an AccECN Server in SYN-RCVD state receives a pure ACK with | When an AccECN Server in SYN-RCVD state receives a pure ACK with | |||
| SYN=0 and no SACK blocks, instead of treating the ACE field as a | SYN=0 and no SACK blocks, it MUST infer the meaning of each possible | |||
| counter, it MUST infer the meaning of each possible value of the ACE | value of the ACE field from Table 4 instead of treating the ACE field | |||
| field from Table 4, which also shows the value that an AccECN Server | as a counter. As a result, an AccECN Server MUST set s.cep to the | |||
| MUST set s.cep to as a result. | respective value, also shown in Table 4. | |||
| Given this encoding of the ACE field on the ACK of a SYN/ACK is | Given this encoding of the ACE field on the ACK of a SYN/ACK is | |||
| exceptional, an AccECN Server using large receive offload (LRO) might | exceptional, an AccECN Server using large receive offload (LRO) might | |||
| prefer to disable LRO until such an ACK has transitioned it out of | prefer to disable LRO until it transitions out of SYN-RCVD state | |||
| SYN-RCVD state. | (when it first receives an ACK that covers the SYN/ACK). | |||
| +============+==========================+=====================+ | +============+==========================+=====================+ | |||
| | ACE on ACK | IP-ECN codepoint on SYN/ | s.cep of TCP Server | | | ACE on ACK | IP-ECN Codepoint on SYN/ | s.cep of TCP Server | | |||
| | of SYN/ACK | ACK inferred by Server | in AccECN mode | | | of SYN/ACK | ACK Inferred by Server | in AccECN Mode | | |||
| +============+==========================+=====================+ | +============+==========================+=====================+ | |||
| | 0b000 | {Notes 1, 3} | Disable s.cep | | | 0b000 | {Notes 1, 3} | Disable s.cep | | |||
| +------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| | 0b001 | {Notes 2, 3} | 5 | | | 0b001 | {Notes 2, 3} | 5 | | |||
| +------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| | 0b010 | Not-ECT | 5 | | | 0b010 | Not-ECT | 5 | | |||
| +------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| | 0b011 | ECT(1) | 5 | | | 0b011 | ECT(1) | 5 | | |||
| +------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| | 0b100 | ECT(0) | 5 | | | 0b100 | ECT(0) | 5 | | |||
| skipping to change at line 1291 ¶ | skipping to change at line 1291 ¶ | |||
| AccECN feedback. Nonetheless, as a Data Receiver, it MUST | AccECN feedback. Nonetheless, as a Data Receiver, it MUST | |||
| NOT disable AccECN feedback. | NOT disable AccECN feedback. | |||
| Any of the circumstances below could cause a value of zero | Any of the circumstances below could cause a value of zero | |||
| but, whatever the cause, the actions above would be the | but, whatever the cause, the actions above would be the | |||
| appropriate response: | appropriate response: | |||
| * The TCP Client has somehow entered No ECN feedback mode | * The TCP Client has somehow entered No ECN feedback mode | |||
| (most likely if the Server received a SYN or sent a SYN/ | (most likely if the Server received a SYN or sent a SYN/ | |||
| ACK with (AE,CWR,ECE) = (0,0,0) after entering AccECN | ACK with (AE,CWR,ECE) = (0,0,0) after entering AccECN | |||
| mode, but possible even if it didn't); | mode, but possible even if it didn't). | |||
| * The TCP Client genuinely might be in AccECN mode, but its | * The TCP Client genuinely might be in AccECN mode, but its | |||
| count of received CE marks might have caused the ACE | count of received CE marks might have caused the ACE | |||
| field to wrap to zero. This is highly unlikely, but not | field to wrap to zero. This is highly unlikely, but not | |||
| impossible because the Server might have already sent | impossible because the Server might have already sent | |||
| multiple packets while still in SYN-RCVD state, e.g., | multiple packets while still in SYN-RCVD state, e.g., | |||
| using TFO (see Section 5.2), and some might have been CE- | using TFO (see Section 5.2), and some might have been CE- | |||
| marked. Then ACE on the first ACK seen by the Server | marked. Then ACE on the first ACK seen by the Server | |||
| might be zero, due to previous ACKs experiencing an | might be zero, due to previous ACKs experiencing an | |||
| unfortunate pattern of loss or delay. | unfortunate pattern of loss or delay. | |||
| skipping to change at line 1322 ¶ | skipping to change at line 1322 ¶ | |||
| Note 3: In the case where a Server that implements AccECN is also | Note 3: In the case where a Server that implements AccECN is also | |||
| using a stateless handshake (termed a SYN cookie), it will | using a stateless handshake (termed a SYN cookie), it will | |||
| not remember whether it entered AccECN mode. The values | not remember whether it entered AccECN mode. The values | |||
| 0b000 or 0b001 will remind it that it did not enter AccECN | 0b000 or 0b001 will remind it that it did not enter AccECN | |||
| mode, because AccECN does not use them (see Section 5.1 for | mode, because AccECN does not use them (see Section 5.1 for | |||
| details). If a Server that uses a stateless handshake and | details). If a Server that uses a stateless handshake and | |||
| implements AccECN receives either of these two values in the | implements AccECN receives either of these two values in the | |||
| ACK, its action is implementation-dependent and outside the | ACK, its action is implementation-dependent and outside the | |||
| scope of this document. It will certainly not take the | scope of this document. It will certainly not take the | |||
| action in the third column because, after it receives either | action in the third column because, after it receives either | |||
| of these values, it is not in AccECN mode. For example, it | of these values, it is not in AccECN mode. That is, it will | |||
| will not disable ECN (at least not just because ACE is | not disable ECN (at least not just because ACE is 0b000) and | |||
| 0b000) and it will not set s.cep. | it will not set s.cep. | |||
| 3.2.2.2. Encoding and Decoding Feedback in the ACE Field | 3.2.2.2. Encoding and Decoding Feedback in the ACE Field | |||
| Whenever the Data Receiver sends an ACK with SYN=0 (with or without | Whenever the Data Receiver sends an ACK with SYN=0 (with or without | |||
| data), unless the handshake encoding in Section 3.2.2.1 applies, the | data), unless the handshake encoding in Section 3.2.2.1 applies, the | |||
| Data Receiver MUST encode the least significant 3 bits of its r.cep | Data Receiver MUST encode the least significant 3 bits of its r.cep | |||
| counter into the ACE field (see Appendix A.2). | counter into the ACE field (see Appendix A.2). | |||
| Whenever the Data Sender receives an ACK with SYN=0 (with or without | Whenever the Data Sender receives an ACK with SYN=0 (with or without | |||
| data), it first checks whether it has already been superseded | data), it first checks whether it has already been superseded | |||
| skipping to change at line 1469 ¶ | skipping to change at line 1469 ¶ | |||
| marking. If continuous CE marking is detected, for the remainder of | marking. If continuous CE marking is detected, for the remainder of | |||
| the half-connection, the Data Sender ought to send non-ECN-capable | the half-connection, the Data Sender ought to send non-ECN-capable | |||
| packets, and it is advised not to respond to any feedback of CE | packets, and it is advised not to respond to any feedback of CE | |||
| markings. The Data Sender might occasionally test whether it can | markings. The Data Sender might occasionally test whether it can | |||
| resume sending ECN-capable packets. | resume sending ECN-capable packets. | |||
| The above advice on switching to sending non-ECN-capable packets but | The above advice on switching to sending non-ECN-capable packets but | |||
| still responding to CE markings unless they become continuous is not | still responding to CE markings unless they become continuous is not | |||
| stated normatively (in capitals), because the best strategy might | stated normatively (in capitals), because the best strategy might | |||
| depend on experience of the most likely types of mangling, which can | depend on experience of the most likely types of mangling, which can | |||
| only be known at the time of deployment. The same is true for other | only be known at the time of deployment. For instance, later in a | |||
| forms of mangling (or resumption of expected marking) during later | connection, sender implementations might need to detect the onset (or | |||
| stages of a connection. | the end) of mangling and stop (or start) sending ECN-capable packets | |||
| accordingly. | ||||
| As always, once a host has entered AccECN mode, it follows the | As always, once a host has entered AccECN mode, it follows the | |||
| general mandatory requirements (Section 3.1.5) to remain in the same | general mandatory requirements (Section 3.1.5) to remain in the same | |||
| feedback mode and to continue feeding back any ECN markings on | feedback mode and to continue feeding back any ECN markings on | |||
| arriving packets using AccECN feedback. This follows the general | arriving packets using AccECN feedback. This follows the general | |||
| approach where an AccECN Data Receiver mechanistically reflects | approach where an AccECN Data Receiver mechanistically reflects | |||
| whatever it receives (Section 2.5). | whatever it receives (Section 2.5). | |||
| The ACK of the SYN/ACK is not reliably delivered (nonetheless, the | The ACK of the SYN/ACK is not reliably delivered (nonetheless, the | |||
| count of CE marks is still eventually delivered reliably). If this | count of CE marks is still eventually delivered reliably). If this | |||
| skipping to change at line 1539 ¶ | skipping to change at line 1540 ¶ | |||
| Reason: the symptoms imply any or all of the following: | Reason: the symptoms imply any or all of the following: | |||
| * the remote peer has somehow entered Not ECN feedback mode; | * the remote peer has somehow entered Not ECN feedback mode; | |||
| * a broken remote TCP implementation; | * a broken remote TCP implementation; | |||
| * potential mangling of the ECN fields in the TCP headers (although | * potential mangling of the ECN fields in the TCP headers (although | |||
| unlikely given they clearly survived during the handshake). | unlikely given they clearly survived during the handshake). | |||
| This advice is not stated normatively (in capitals), because the best | This advice is not stated normatively (in capitals), because the best | |||
| strategy might depend on experience of the most likely scenarios, | strategy might depend on the likelihood to experience these | |||
| which can only be known at the time of deployment. | scenarios, which can only be known at the time of deployment. | |||
| Note that a host in AccECN mode MUST continue to provide Accurate ECN | | Note that a host in AccECN mode MUST continue to provide | |||
| feedback to its peer, even if it is no longer sending ECT itself over | | Accurate ECN feedback to its peer, even if it is no longer | |||
| the other half connection. | | sending ECT itself over the other half-connection. | |||
| If reordering occurs, the first feedback packet that arrives will not | If reordering occurs, the first feedback packet that arrives will not | |||
| necessarily be the same as the first packet in sequence order. The | necessarily be the same as the first packet in sequence order. The | |||
| test has been specified loosely like this to simplify implementation, | test has been specified loosely like this to simplify implementation, | |||
| and because it would not have been any more precise to have specified | and because it would not have been any more precise to have specified | |||
| the first packet in sequence order, which would not necessarily be | the first packet in sequence order, which would not necessarily be | |||
| the first ACE counter that the Data Receiver fed back anyway, given | the first ACE counter that the Data Receiver fed back anyway, given | |||
| it might have been a retransmission. | it might have been a retransmission. | |||
| The possibility of reordering means that there is a small chance that | The possibility of reordering means that there is a small chance that | |||
| the ACE field on the first packet to arrive is genuinely zero | the ACE field on the first packet to arrive is genuinely zero | |||
| (without middlebox interference). This would cause a host to | (without middlebox interference). This would cause a host to | |||
| unnecessarily disable ECN for a half connection. Therefore, in | unnecessarily disable ECN for a half-connection. Therefore, in | |||
| environments where there is no evidence of the ACE field being | environments where there is no evidence of the ACE field being | |||
| zeroed, implementations MAY skip this test. | zeroed, implementations MAY skip this test. | |||
| Note that the Data Sender MUST NOT test whether the arriving counter | | Note that the Data Sender MUST NOT test whether the arriving | |||
| in the initial ACE field has been initialized to a specific valid | | counter in the initial ACE field has been initialized to a | |||
| value -- the above check solely tests whether the ACE fields have | | specific valid value -- the above check solely tests whether | |||
| been incorrectly zeroed. This allows hosts to use different initial | | the ACE fields have been incorrectly zeroed. This allows hosts | |||
| values as an additional signalling channel in the future. | | to use different initial values as an additional signalling | |||
| | channel in the future. | ||||
| 3.2.2.5. Safety Against Ambiguity of the ACE Field | 3.2.2.5. Safety Against Ambiguity of the ACE Field | |||
| If too many CE-marked segments are acknowledged at once, or if a long | If too many CE-marked segments are acknowledged at once, or if a long | |||
| run of ACKs is lost or thinned out, the 3-bit counter in the ACE | run of ACKs is lost or thinned out, the 3-bit counter in the ACE | |||
| field might have cycled between two ACKs arriving at the Data Sender. | field might have cycled between two ACKs arriving at the Data Sender. | |||
| The following safety procedures minimize this ambiguity. | The following safety procedures minimize this ambiguity. | |||
| 3.2.2.5.1. Packet Receiver Safety Procedures | 3.2.2.5.1. Packet Receiver Safety Procedures | |||
| The following rules define when the receiver of a packet in AccECN | The following rules define when the receiver of a packet in AccECN | |||
| mode emits an ACK: | mode emits an ACK: | |||
| Change-Triggered ACKs: An AccECN Data Receiver SHOULD emit an ACK | Change-Triggered ACKs: An AccECN Data Receiver SHOULD emit an ACK | |||
| whenever a data packet marked CE arrives after the previous packet | whenever a data packet marked CE arrives after the previous packet | |||
| was not CE. | was not CE. | |||
| Even though this rule is stated as a "SHOULD", it is important for | Even though this rule is stated as a "SHOULD", it is important for | |||
| a transition to trigger an ACK if at all possible. The only valid | a transition to trigger an ACK if at all possible. The only valid | |||
| exception to this rule is given below these bullets. | exception to this rule is due to Large Receive Offload (LRO) or | |||
| Generic Receive Offload (GRO) as further described below. | ||||
| For the avoidance of doubt, this rule is deliberately worded to | For the avoidance of doubt, this rule is deliberately worded to | |||
| apply solely when _data_ packets arrive, but the comparison with | apply solely when _data_ packets arrive, but the comparison with | |||
| the previous packet includes any packet, not just data packets. | the previous packet includes any packet, not just data packets. | |||
| Increment-Triggered ACKs: An AccECN receiver of a packet MUST emit | Increment-Triggered ACKs: An AccECN receiver of a packet MUST emit | |||
| an ACK if 'n' CE marks have arrived since the previous ACK. If | an ACK if 'n' CE marks have arrived since the previous ACK. If | |||
| there is unacknowledged data at the receiver, 'n' SHOULD be 2. If | there is unacknowledged data at the receiver, 'n' SHOULD be 2. If | |||
| there is no unacknowledged data at the receiver, 'n' SHOULD be 3 | there is no unacknowledged data at the receiver, 'n' SHOULD be 3 | |||
| and MUST be no less than 3. In either case, 'n' MUST be no | and MUST be no less than 3. In either case, 'n' MUST be no | |||
| skipping to change at line 1620 ¶ | skipping to change at line 1623 ¶ | |||
| Even if a number of data packets do not arrive as one event, the | Even if a number of data packets do not arrive as one event, the | |||
| 'Change-Triggered ACKs' rule could sometimes cause the ACK rate to be | 'Change-Triggered ACKs' rule could sometimes cause the ACK rate to be | |||
| problematic for high performance (although high performance protocols | problematic for high performance (although high performance protocols | |||
| such as DCTCP already successfully use change-triggered ACKs). The | such as DCTCP already successfully use change-triggered ACKs). The | |||
| rationale for change-triggered ACKs is so that the Data Sender can | rationale for change-triggered ACKs is so that the Data Sender can | |||
| rely on them to detect queue growth as soon as possible, particularly | rely on them to detect queue growth as soon as possible, particularly | |||
| at the start of a flow. The approach can lead to some additional | at the start of a flow. The approach can lead to some additional | |||
| ACKs but it feeds back the timing and the order in which ECN marks | ACKs but it feeds back the timing and the order in which ECN marks | |||
| are received with minimal additional complexity. If CE marks are | are received with minimal additional complexity. If CE marks are | |||
| infrequent, as is the case for most Active Queue Management (AQM) | infrequent, as is the case for most Active Queue Management (AQM) | |||
| packet schedulers at the time of writing, or there are multiple marks | algorithms at the time of writing, or there are multiple marks in a | |||
| in a row, the additional load will be low. However, marking patterns | row, the additional load will be low. However, marking patterns with | |||
| with numerous non-contiguous CE marks could increase the load | numerous non-contiguous CE marks could increase the load | |||
| significantly. One possible compromise would be for the receiver to | significantly. One possible compromise would be for the receiver to | |||
| heuristically detect whether the sender is in slow-start, then to | heuristically detect whether the sender is in slow-start, then to | |||
| implement change-triggered ACKs while the sender is in slow-start, | implement change-triggered ACKs while the sender is in slow-start, | |||
| and offload otherwise. | and offload otherwise. | |||
| In a scenario where both endpoints support AccECN, if host B has | In a scenario where both endpoints support AccECN, if host B has | |||
| chosen to use ECN-capable pure ACKs (as allowed in [RFC8311] | chosen to use ECN-capable pure ACKs (as allowed in [RFC8311] | |||
| experiments) and enough of these ACKs become CE marked, then the | experiments) and enough of these ACKs become CE marked, then the | |||
| 'Increment-Triggered ACKs' rule ensures that its peer (host A) gives | 'Increment-Triggered ACKs' rule ensures that its peer (host A) gives | |||
| B sufficient feedback about this congestion on the ACKs from B to A. | B sufficient feedback about this congestion on the ACKs from B to A. | |||
| skipping to change at line 1723 ¶ | skipping to change at line 1726 ¶ | |||
| Figure 4 shows two option field orders; order 0 and order 1. They | Figure 4 shows two option field orders; order 0 and order 1. They | |||
| both consist of three 24-bit fields. Order 0 provides the 24 least | both consist of three 24-bit fields. Order 0 provides the 24 least | |||
| significant bits of the r.e0b, r.ceb, and r.e1b counters, | significant bits of the r.e0b, r.ceb, and r.e1b counters, | |||
| respectively. Order 1 provides the same fields, but in the opposite | respectively. Order 1 provides the same fields, but in the opposite | |||
| order. On each packet, the Data Receiver can use whichever order is | order. On each packet, the Data Receiver can use whichever order is | |||
| more efficient. In either case, the bytes within the fields are in | more efficient. In either case, the bytes within the fields are in | |||
| network byte order (big-endian). | network byte order (big-endian). | |||
| The choice to use three bytes (24 bits) fields in the options was | The choice to use three bytes (24 bits) fields in the options was | |||
| made to strike a balance between TCP option space usage, and the | made to strike a balance between TCP Option space usage, and the | |||
| required fidelity of the counters to accommodate typical scenarios | required fidelity of the counters to accommodate typical scenarios | |||
| such as hardware TCP Segmentation Offloading (TSO), and periods | such as hardware TCP Segmentation Offloading (TSO), and periods | |||
| during which no option may be transmitted (e.g., SACK loss recovery). | during which no option may be transmitted (e.g., SACK loss recovery). | |||
| Providing only 2 bytes (16 bits) for these counters could easily roll | Providing only 2 bytes (16 bits) for these counters could easily roll | |||
| over within a single TSO transmission or large/generic receive | over within a single TSO transmission or large/generic receive | |||
| offload (LRO/GRO) event. Having two distinct orderings further | offload (LRO/GRO) event. Having two distinct orderings further | |||
| allows the transmission of the most pertinent changes in an | allows the transmission of the most pertinent changes in an | |||
| abbreviated option (see below). | abbreviated option (see below). | |||
| When a Data Receiver sends an AccECN Option, it MUST set the Kind | When a Data Receiver sends an AccECN Option, it MUST set the Kind | |||
| skipping to change at line 1922 ¶ | skipping to change at line 1925 ¶ | |||
| packets carried an AccECN Option and disable the sending of AccECN | packets carried an AccECN Option and disable the sending of AccECN | |||
| Options if the loss probability of those packets is significantly | Options if the loss probability of those packets is significantly | |||
| higher than that of all other data packets in the same connection. | higher than that of all other data packets in the same connection. | |||
| 3.2.3.2.3. Testing for Absence of the AccECN Option | 3.2.3.2.3. Testing for Absence of the AccECN Option | |||
| If the TCP Client has successfully negotiated AccECN but does not | If the TCP Client has successfully negotiated AccECN but does not | |||
| receive an AccECN Option on the SYN/ACK (e.g., because is has been | receive an AccECN Option on the SYN/ACK (e.g., because is has been | |||
| stripped by a middlebox or not sent by the Server), the Client | stripped by a middlebox or not sent by the Server), the Client | |||
| switches into a mode that assumes that the AccECN Option is not | switches into a mode that assumes that the AccECN Option is not | |||
| available for this half connection. | available for this half-connection. | |||
| Similarly, if the TCP Server has successfully negotiated AccECN but | Similarly, if the TCP Server has successfully negotiated AccECN but | |||
| does not receive an AccECN Option on the first segment that | does not receive an AccECN Option on the first segment that | |||
| acknowledges sequence space at least covering the ISN, it switches | acknowledges sequence space at least covering the ISN, it switches | |||
| into a mode that assumes that the AccECN Option is not available for | into a mode that assumes that the AccECN Option is not available for | |||
| this half connection. | this half-connection. | |||
| While a host is in this mode that assumes incoming AccECN Options are | While a host is in this mode that assumes incoming AccECN Options are | |||
| not available, it MUST adopt the conservative interpretation of the | not available, it MUST adopt the conservative interpretation of the | |||
| ACE field discussed in Section 3.2.2.5. However, it cannot make any | ACE field discussed in Section 3.2.2.5. However, it cannot make any | |||
| assumption about support of outgoing AccECN Options on the other half | assumption about support of outgoing AccECN Options on the other | |||
| connection, so it SHOULD continue to send AccECN Options itself | half-connection, so it SHOULD continue to send AccECN Options itself | |||
| (unless it has established that sending AccECN Options is causing | (unless it has established that sending AccECN Options is causing | |||
| packets to be blocked as in Section 3.2.3.2.2). | packets to be blocked as in Section 3.2.3.2.2). | |||
| If a host is in the mode that assumes incoming AccECN Options are not | If a host is in the mode that assumes incoming AccECN Options are not | |||
| available, but it receives an AccECN Option at any later point during | available, but it receives an AccECN Option at any later point during | |||
| the connection, this clearly indicates that AccECN Options are no | the connection, this clearly indicates that AccECN Options are no | |||
| longer blocked on the respective path, and the AccECN endpoint MAY | longer blocked on the respective path, and the AccECN endpoint MAY | |||
| switch out of the mode that assumes AccECN Options are not available | switch out of the mode that assumes AccECN Options are not available | |||
| for this half connection. | for this half-connection. | |||
| 3.2.3.2.4. Test for Zeroing of the AccECN Option | 3.2.3.2.4. Test for Zeroing of the AccECN Option | |||
| For a related test for invalid initialization of the ACE field, see | For a related test for invalid initialization of the ACE field, see | |||
| Section 3.2.2.4 | Section 3.2.2.4. | |||
| Section 3.2.1 required the Data Receiver to initialize the r.e0b and | Section 3.2.1 required the Data Receiver to initialize the r.e0b and | |||
| r.e1b counters to a non-zero value. Therefore, in either direction | r.e1b counters to a non-zero value. Therefore, in either direction | |||
| the initial value of the EE0B field or EE1B field in an AccECN Option | the initial value of the EE0B field or EE1B field in an AccECN Option | |||
| (if one exists) ought to be non-zero. If AccECN has been negotiated: | (if one exists) ought to be non-zero. If AccECN has been negotiated: | |||
| * the TCP Server MAY check that the initial value of the EE0B field | * the TCP Server MAY check that the initial value of the EE0B field | |||
| or the EE1B field is non-zero in the first segment that | or the EE1B field is non-zero in the first segment that | |||
| acknowledges sequence space that at least covers the ISN plus 1. | acknowledges sequence space that at least covers the ISN plus 1. | |||
| If it runs a test and either initial value is zero, the Server | If it runs a test and either initial value is zero, the Server | |||
| will switch into a mode that ignores AccECN Options for this half | will switch into a mode that ignores AccECN Options for this half- | |||
| connection. | connection. | |||
| * the TCP Client MAY check that the initial value of the EE0B field | * the TCP Client MAY check that the initial value of the EE0B field | |||
| or the EE1B field is non-zero on the SYN/ACK. If it runs a test | or the EE1B field is non-zero on the SYN/ACK. If it runs a test | |||
| and either initial value is zero, the Client will switch into a | and either initial value is zero, the Client will switch into a | |||
| mode that ignores AccECN Options for this half connection. | mode that ignores AccECN Options for this half-connection. | |||
| While a host is in the mode that ignores AccECN Options, it MUST | While a host is in the mode that ignores AccECN Options, it MUST | |||
| adopt the conservative interpretation of the ACE field discussed in | adopt the conservative interpretation of the ACE field discussed in | |||
| Section 3.2.2.5. | Section 3.2.2.5. | |||
| Note that the Data Sender MUST NOT test whether the arriving byte | | Note that the Data Sender MUST NOT test whether the arriving | |||
| counters in an initial AccECN Option have been initialized to | | byte counters in an initial AccECN Option have been initialized | |||
| specific valid values -- the above checks solely test whether these | | to specific valid values -- the above checks solely test | |||
| fields have been incorrectly zeroed. This allows hosts to use | | whether these fields have been incorrectly zeroed. This allows | |||
| different initial values as an additional signalling channel in the | | hosts to use different initial values as an additional | |||
| future. Also note that the initial value of either field might be | | signalling channel in the future. Also note that the initial | |||
| greater than its expected initial value, because the counters might | | value of either field might be greater than its expected | |||
| already have been incremented. Nonetheless, the initial values of | | initial value, because the counters might already have been | |||
| the counters have been chosen so that they cannot wrap to zero on | | incremented. Nonetheless, the initial values of the counters | |||
| these initial segments. | | have been chosen so that they cannot wrap to zero on these | |||
| | initial segments. | ||||
| 3.2.3.2.5. Consistency Between AccECN Feedback Fields | 3.2.3.2.5. Consistency Between AccECN Feedback Fields | |||
| When AccECN Options are available, they ought to provide more | When AccECN Options are available, they ought to provide more | |||
| unambiguous feedback. However, they supplement but do not replace | unambiguous feedback. However, they supplement but do not replace | |||
| the ACE field. An endpoint using AccECN feedback MUST always | the ACE field. An endpoint using AccECN feedback MUST always | |||
| reconcile the information provided in the ACE field with that in any | reconcile the information provided in the ACE field with that in any | |||
| AccECN Option, so that the state of the ACE-related packet counter | AccECN Option, so that the state of the ACE-related packet counter | |||
| can be relied on if future feedback does not carry an AccECN Option. | can be relied on if future feedback does not carry an AccECN Option. | |||
| skipping to change at line 2018 ¶ | skipping to change at line 2022 ¶ | |||
| 3.2.3.3. Usage of the AccECN TCP Option | 3.2.3.3. Usage of the AccECN TCP Option | |||
| If a Data Receiver in AccECN mode intends to use AccECN TCP Options | If a Data Receiver in AccECN mode intends to use AccECN TCP Options | |||
| to provide feedback, the rules below determine when to include an | to provide feedback, the rules below determine when to include an | |||
| AccECN TCP Option, and which fields to include, given other options | AccECN TCP Option, and which fields to include, given other options | |||
| might be competing for limited option space: | might be competing for limited option space: | |||
| Importance of Congestion Control: AccECN is for congestion control, | Importance of Congestion Control: AccECN is for congestion control, | |||
| which implementations SHOULD generally prioritize over other TCP | which implementations SHOULD generally prioritize over other TCP | |||
| options when there is insufficient space for all the options in | Options when there is insufficient space for all the options in | |||
| use. | use. | |||
| If SACK has been negotiated [RFC2018], and the smallest | If SACK has been negotiated [RFC2018], and the smallest | |||
| recommended AccECN Option would leave insufficient space for two | recommended AccECN Option would leave insufficient space for two | |||
| SACK blocks on a particular ACK, the Data Receiver MUST give | SACK blocks on a particular ACK, the Data Receiver MUST give | |||
| precedence to the SACK option (total 18 octets), because loss | precedence to the SACK option (total 18 octets), because loss | |||
| feedback is more critical. | feedback is more critical. | |||
| Recommended Simple Scheme: The Data Receiver SHOULD include an | Recommended Simple Scheme: The Data Receiver SHOULD include an | |||
| AccECN TCP Option on every scheduled ACK if any byte counter has | AccECN TCP Option on every scheduled ACK if any byte counter has | |||
| skipping to change at line 2040 ¶ | skipping to change at line 2044 ¶ | |||
| include a field for every byte counter that has changed at some | include a field for every byte counter that has changed at some | |||
| time during the connection (see examples later). | time during the connection (see examples later). | |||
| A scheduled ACK means an ACK that the Data Receiver would send by | A scheduled ACK means an ACK that the Data Receiver would send by | |||
| its regular delayed ACK rules. Recall that Section 1.3 defines an | its regular delayed ACK rules. Recall that Section 1.3 defines an | |||
| 'ACK' as either with data payload or without. But the above rule | 'ACK' as either with data payload or without. But the above rule | |||
| is worded so that, in the common case when most of the data is | is worded so that, in the common case when most of the data is | |||
| from a Server to a Client, the Server only includes an AccECN TCP | from a Server to a Client, the Server only includes an AccECN TCP | |||
| Option while it is acknowledging data from the Client. | Option while it is acknowledging data from the Client. | |||
| When available TCP option space is limited on particular packets, the | When available TCP Option space is limited on particular packets, the | |||
| recommended scheme will need to include compromises. To guide the | recommended scheme will need to include compromises. To guide the | |||
| implementer, the rules below are ranked in order of importance, but | implementer, the rules below are ranked in order of importance, but | |||
| the final decision has to be implementation-dependent, because | the final decision has to be implementation-dependent, because | |||
| tradeoffs will alter as new TCP options are defined and new use-cases | tradeoffs will alter as new TCP Options are defined and new use-cases | |||
| arise. | arise. | |||
| Necessary Option Length: When TCP option space is limited, an AccECN | Necessary Option Length: When TCP Option space is limited, an AccECN | |||
| TCP option MAY be truncated to omit one or two fields from the end | TCP Option MAY be truncated to omit one or two fields from the end | |||
| of the option, as indicated by the permitted variants listed in | of the option, as indicated by the permitted variants listed in | |||
| Table 5, provided that the counter(s) that have changed since the | Table 5, provided that the counter(s) that have changed since the | |||
| previous AccECN TCP option are not omitted. | previous AccECN TCP Option are not omitted. | |||
| If there is insufficient space to include an AccECN TCP option | If there is insufficient space to include an AccECN TCP Option | |||
| containing the counter(s) that have changed since the previous | containing the counter(s) that have changed since the previous | |||
| AccECN TCP option, then the entire AccECN TCP option MUST be | AccECN TCP Option, then the entire AccECN TCP Option MUST be | |||
| omitted. (see Section 3.2.3); | omitted. (see Section 3.2.3); | |||
| Change-Triggered AccECN TCP Options: If an arriving packet | Change-Triggered AccECN TCP Options: If an arriving packet | |||
| increments a different byte counter to that incremented by the | increments a different byte counter to that incremented by the | |||
| previous packet, the Data Receiver SHOULD feed it back in an | previous packet, the Data Receiver SHOULD feed it back in an | |||
| AccECN Option on the next scheduled ACK. | AccECN Option on the next scheduled ACK. | |||
| For the avoidance of doubt, this rule does not concern the arrival | For the avoidance of doubt, this rule does not concern the arrival | |||
| of control packets with no payload, because they cannot alter any | of control packets with no payload, because they cannot alter any | |||
| byte counters. | byte counters. | |||
| skipping to change at line 2078 ¶ | skipping to change at line 2082 ¶ | |||
| increment the same byte counter: | increment the same byte counter: | |||
| * the Data Receiver SHOULD include a counter that has continued | * the Data Receiver SHOULD include a counter that has continued | |||
| to increment on the next scheduled ACK following a change- | to increment on the next scheduled ACK following a change- | |||
| triggered AccECN TCP Option; | triggered AccECN TCP Option; | |||
| * while the same counter continues to increment, it SHOULD | * while the same counter continues to increment, it SHOULD | |||
| include the counter every n ACKs as consistently as possible, | include the counter every n ACKs as consistently as possible, | |||
| where n can be chosen by the implementer; | where n can be chosen by the implementer; | |||
| * It SHOULD always include an AccECN Option if the r.ceb counter | * it SHOULD always include an AccECN Option if the r.ceb counter | |||
| is incrementing and it MAY include an AccECN Option if r.ec0b | is incrementing and it MAY include an AccECN Option if r.ec0b | |||
| or r.ec1b is incrementing | or r.ec1b is incrementing; | |||
| * It SHOULD include each counter at least once for every 2^22 | * it SHOULD include each counter at least once for every 2^22 | |||
| bytes incremented to prevent overflow during continual | bytes incremented to prevent overflow during continual | |||
| repetition. | repetition. | |||
| The above rules complement those in Section 3.2.2.5, which determine | The above rules complement those in Section 3.2.2.5, which determine | |||
| when to generate an ACK irrespective of whether an AccECN TCP Option | when to generate an ACK irrespective of whether an AccECN TCP Option | |||
| is to be included. | is to be included. | |||
| The recommended scheme is intended as a simple way to ensure that all | The recommended scheme is intended as a simple way to ensure that all | |||
| the relevant byte counters will be carried on any ACK that reaches | the relevant byte counters will be carried on any ACK that reaches | |||
| the Data Sender, no matter how many pure ACKs are filtered or | the Data Sender, no matter how many pure ACKs are filtered or | |||
| skipping to change at line 2147 ¶ | skipping to change at line 2151 ¶ | |||
| on each side complied with the present AccECN specification and each | on each side complied with the present AccECN specification and each | |||
| side negotiated AccECN independently of the other side. | side negotiated AccECN independently of the other side. | |||
| 3.3.2. Requirements for Transparent Middleboxes and TCP Normalizers | 3.3.2. Requirements for Transparent Middleboxes and TCP Normalizers | |||
| Another large class of middleboxes intervenes to some degree at the | Another large class of middleboxes intervenes to some degree at the | |||
| transport layer, but attempts to be transparent (invisible) to the | transport layer, but attempts to be transparent (invisible) to the | |||
| end-to-end connection. A subset of this class of middleboxes | end-to-end connection. A subset of this class of middleboxes | |||
| attempts to 'normalize' the TCP wire protocol by checking that all | attempts to 'normalize' the TCP wire protocol by checking that all | |||
| values in header fields comply with a rather narrow interpretation of | values in header fields comply with a rather narrow interpretation of | |||
| the TCP specifications that is not always up to date. | the TCP specifications that is also not always kept up to date. | |||
| A middlebox that is not normalizing the TCP protocol and does not | A middlebox that is not normalizing the TCP protocol and does not | |||
| itself act as a back-to-back pair of TCP endpoints (i.e., a middlebox | itself act as a back-to-back pair of TCP endpoints (i.e., a middlebox | |||
| that intends to be transparent or invisible at the transport layer) | that intends to be transparent or invisible at the transport layer) | |||
| ought to forward AccECN TCP Options unaltered, whether or not the | ought to forward AccECN TCP Options unaltered, whether or not the | |||
| length value matches one of those specified in Section 3.2.3, and | length value matches one of those specified in Section 3.2.3, and | |||
| whether or not the initial values of the byte-counter fields match | whether or not the initial values of the byte-counter fields match | |||
| those in Section 3.2.1. This is because blocking apparently invalid | those in Section 3.2.1. This is because blocking apparently invalid | |||
| values prevents the standardized set of values from being extended in | values prevents the standardized set of values from being extended in | |||
| the future (such outdated normalizers would block updated hosts from | the future (such outdated normalizers would block updated hosts from | |||
| skipping to change at line 2170 ¶ | skipping to change at line 2174 ¶ | |||
| A TCP normalizer is likely to block or alter an AccECN TCP Option if | A TCP normalizer is likely to block or alter an AccECN TCP Option if | |||
| the length value or the initial values of its byte-counter fields do | the length value or the initial values of its byte-counter fields do | |||
| not match one of those specified in Sections 3.2.3 or 3.2.1. | not match one of those specified in Sections 3.2.3 or 3.2.1. | |||
| However, to comply with the present AccECN specification, a middlebox | However, to comply with the present AccECN specification, a middlebox | |||
| MUST NOT change the ACE field; or those fields of an AccECN Option | MUST NOT change the ACE field; or those fields of an AccECN Option | |||
| that are currently specified in Section 3.2.3; or any AccECN field | that are currently specified in Section 3.2.3; or any AccECN field | |||
| covered by integrity protection (e.g., [RFC5925]). | covered by integrity protection (e.g., [RFC5925]). | |||
| 3.3.3. Requirements for TCP ACK Filtering | 3.3.3. Requirements for TCP ACK Filtering | |||
| Section 5.2.1 of [RFC3449] gives best current practice on filtering | Section Section 5.2.1 of [RFC3449] gives best current practice on | |||
| (aka thinning or coalescing) of pure TCP ACKs. It advises that | filtering (aka thinning or coalescing) of pure TCP ACKs. It advises | |||
| filtering ACKs carrying ECN feedback ought to preserve the correct | that filtering ACKs carrying ECN feedback ought to preserve the | |||
| operation of ECN feedback. As the present specification updates the | correct operation of ECN feedback. As the present specification | |||
| operation of ECN feedback, this section discusses how an ACK filter | updates the operation of ECN feedback, this section discusses how an | |||
| might preserve correct operation of AccECN feedback as well. | ACK filter might preserve correct operation of AccECN feedback as | |||
| well. | ||||
| The problem divides into two parts: determining if an ACK is part of | The problem divides into two parts: determining if an ACK is part of | |||
| a connection that is using AccECN and then preserving the correct | a connection that is using AccECN and then preserving the correct | |||
| operation of AccECN feedback: | operation of AccECN feedback: | |||
| * To determine whether a pure TCP ACK is part of an AccECN | * To determine whether a pure TCP ACK is part of an AccECN | |||
| connection without resorting to connection tracking and per-flow | connection without resorting to connection tracking and per-flow | |||
| state, a useful heuristic would be to check for a non-zero ECN | state, a useful heuristic would be to check for a non-zero ECN | |||
| field at the IP layer (because the ECN++ experiment only allows | field at the IP layer (because the ECN++ experiment only allows | |||
| TCP pure ACKs to be ECN-capable if AccECN has been negotiated | TCP pure ACKs to be ECN-capable if AccECN has been negotiated | |||
| [ECN++]). This heuristic is simple and stateless. However, it | [ECN++]). This heuristic is simple and stateless. However, it | |||
| might omit some AccECN ACKs, because AccECN can be used without | might omit some AccECN ACKs because AccECN can be used without | |||
| ECN++ and even if it is, ECN++ does not have to make pure ACKs | ECN++. Even if a sender uses ECN++, it does not necessarily have | |||
| ECN-capable -- only deployment experience will tell. Also, TCP | to mark pure ACKs as ECN-capable -- only deployment experience | |||
| ACKs might be ECN-capable owing to some scheme other than AccECN, | will tell. Also, TCP ACKs might be ECN-capable owing to some | |||
| e.g., [RFC5690] or some future standards action. Again, only | scheme other than AccECN, e.g., [RFC5690] or some future standards | |||
| deployment experience will tell. | action. Again, only deployment experience will tell. | |||
| * The main concern with preserving correct AccECN operation involves | * The main concern with preserving correct AccECN operation involves | |||
| leaving enough ACKs for the Data Sender to work out whether the | leaving enough ACKs for the Data Sender to work out whether the | |||
| 3-bit ACE field has wrapped. In the worst case, in feedback about | 3-bit ACE field has wrapped. In the worst case, in feedback about | |||
| a run of received packets that were all ECN-marked, the ACE field | a run of received packets that were all ECN-marked, the ACE field | |||
| will wrap every 8 acknowledged packets. ACE field wrap might be | will wrap every 8 acknowledged packets. ACE field wrap might be | |||
| of less concern if packets also carry AccECN TCP Options. | of less concern if packets also carry AccECN TCP Options. | |||
| However, note that logic to read an AccECN TCP Option is optional | However, note that logic to read an AccECN TCP Option is optional | |||
| to implement (albeit recommended -- see Section 3.2.3). So one | to implement (albeit recommended -- see Section 3.2.3). So one | |||
| end writing an AccECN TCP Option into a packet does not | end writing an AccECN TCP Option into a packet does not | |||
| skipping to change at line 2240 ¶ | skipping to change at line 2245 ¶ | |||
| direction. Therefore, currently available TSO hardware with | direction. Therefore, currently available TSO hardware with | |||
| [RFC3168] support may need some minor driver changes, to adjust the | [RFC3168] support may need some minor driver changes, to adjust the | |||
| bitmask for the first, middle, and last segments processed with TSO. | bitmask for the first, middle, and last segments processed with TSO. | |||
| Initially, when Classic ECN [RFC3168] and Accurate ECN flows coexist | Initially, when Classic ECN [RFC3168] and Accurate ECN flows coexist | |||
| on the same offloading engine, the host software may need to work | on the same offloading engine, the host software may need to work | |||
| around incompatibilities (e.g., when only global configurable TSO TCP | around incompatibilities (e.g., when only global configurable TSO TCP | |||
| Flag bitmasks are available), otherwise this would cause some issues. | Flag bitmasks are available), otherwise this would cause some issues. | |||
| One way around this could be to only negotiate for Accurate ECN, but | One way around this could be to only negotiate for Accurate ECN, but | |||
| not offer a fall back to [RFC3168] ECN. Another way could be to | not offer a fall back to Classic ECN [RFC3168]. Another way could be | |||
| allow TSO only as long as the CWR flag in the TCP header is not set | to allow TSO only as long as the CWR flag in the TCP header is not | |||
| -- at the cost of more processing overhead while the ACE field has | set -- at the cost of more processing overhead while the ACE field | |||
| this bit set. | has this bit set. | |||
| For LRO in the receive direction, a different issue may get exposed | For LRO in the receive direction, a different issue may get exposed | |||
| with [RFC3168] ECN supporting hardware. | with hardware that supports Classic ECN [RFC3168]. | |||
| The ACE field changes with every received CE marking, so today's | The ACE field changes with every received CE marking, so today's | |||
| receive offloading could lead to many interrupts in high congestion | receive offloading could lead to many interrupts in high congestion | |||
| situations. Although that would be useful (because congestion | situations. Although that would be useful (because congestion | |||
| information is received sooner), it could also significantly increase | information is received sooner), it could also significantly increase | |||
| processor load, particularly in scenarios such as DCTCP or L4S where | processor load, particularly in scenarios such as DCTCP or L4S where | |||
| the marking rate is generally higher. | the marking rate is generally higher. | |||
| Current offload hardware ejects a segment from the coalescing process | Current offload hardware ejects a segment from the coalescing process | |||
| whenever the TCP ECN flags change. In data centres, it has been | whenever the TCP-ECN flags change. In data centres, it has been | |||
| fortunate for this offload hardware that DCTCP-style feedback changes | fortunate for this offload hardware that DCTCP-style feedback changes | |||
| less often when there are long sequences of CE marks, which is more | less often when there are long sequences of CE marks, which is more | |||
| common with a step marking threshold (but less likely the more short | common with a step marking threshold (but less likely the more short | |||
| flows are in the mix). The ACE counter approach has been designed so | flows are in the mix). The ACE counter approach has been designed so | |||
| that coalescing can continue over arbitrary patterns of marking and | that coalescing can continue over arbitrary patterns of marking and | |||
| only needs to stop when the counter wraps. Nonetheless, until the | only needs to stop when the counter wraps. Nonetheless, until the | |||
| particular offload hardware in use implements this more efficient | particular offload hardware in use implements this more efficient | |||
| approach, it is likely to be more efficient for AccECN connections to | approach, it is likely to be more efficient for AccECN connections to | |||
| implement this counter-style logic using software segmentation | implement this counter-style logic using software segmentation | |||
| offload. | offload. | |||
| ECN encodes a varying signal in the ACK stream, so it is inevitable | ECN encodes a varying signal in the ACK stream, so it is inevitable | |||
| that offload hardware will ultimately need to handle any form of ECN | that offload hardware will ultimately need to handle any form of ECN | |||
| feedback exceptionally. The ACE field has been designed as a counter | feedback exceptionally. The ACE field has been designed as a counter | |||
| so that it is straightforward for offload hardware to pass on the | so that it is straightforward for offload hardware to pass on the | |||
| highest counter, and to push a segment from its cache before the | highest counter, and to push a segment from its cache before the | |||
| counter wraps. The purpose of working towards standardized TCP ECN | counter wraps. The purpose of working towards standardized TCP-ECN | |||
| feedback is to reduce the risk for hardware developers, who would | feedback is to reduce the risk for hardware developers, who would | |||
| otherwise have to guess which scheme is likely to become dominant. | otherwise have to guess which scheme is likely to become dominant. | |||
| The above process has been designed to enable a continuing | The above process has been designed to enable a continuing | |||
| incremental deployment path -- to more highly dynamic congestion | incremental deployment path -- to more highly dynamic congestion | |||
| control. Once offload hardware supports AccECN, it will be able to | control. Once offload hardware supports AccECN, it will be able to | |||
| coalesce efficiently for any sequence of marks, instead of relying on | coalesce efficiently for any sequence of marks, instead of relying on | |||
| the long marking sequences from step marking for efficiency. In the | the long marking sequences from step marking for efficiency. In the | |||
| next stage, marking can evolve from a step to a ramp function. That | next stage, marking can evolve from a step to a ramp function. That | |||
| in turn will allow host congestion control algorithms to respond | in turn will allow host congestion control algorithms to respond | |||
| faster to dynamics, while being backwards compatible with existing | faster to dynamics, while being backwards compatible with existing | |||
| host algorithms. | host algorithms. | |||
| 4. Updates to RFC 3168 | 4. Updates to RFC 3168 | |||
| This section clarifies which parts of RFC 3168 are updated and maps | This section clarifies which parts of RFC 3168 are updated and maps | |||
| them to the relevant updated sections of the present AccECN | them to the relevant updated sections of the present AccECN | |||
| specification. | specification. | |||
| * The whole of Section 6.1.1 of [RFC3168] is updated by Section 3.1 | * The whole of Section 6.1.1 (TCP Initialization) of [RFC3168] is | |||
| of the present specification. | updated by Section 3.1 of the present specification. | |||
| * In Section 6.1.2 of [RFC3168], all mentions of a congestion | * In Section 6.1.2 (The TCP Sender) of [RFC3168], all mentions of a | |||
| response to an ECN-Echo (ECE) ACK packet are updated by | congestion response to an ECN-Echo (ECE) ACK packet are updated by | |||
| Section 3.2 of the present specification to mean an increment to | Section 3.2 of the present specification to mean an increment to | |||
| the sender's count of CE-marked packets, s.cep. And the | the sender's count of CE-marked packets, s.cep. And the | |||
| requirements to set the CWR flag no longer apply, as specified in | requirements to set the CWR flag no longer apply, as specified in | |||
| Section 3.1.5 of the present specification. Otherwise, the | Section 3.1.5 of the present specification. Otherwise, the | |||
| remaining requirements in Section 6.1.2 of [RFC3168] still stand. | remaining requirements in Section 6.1.2 (The TCP Sender) of | |||
| [RFC3168] still stand. | ||||
| It will be noted that [RFC8311] already updates, or potentially | It will be noted that [RFC8311] already updates a number of the | |||
| updates, a number of the requirements in Section 6.1.2 of | requirements in Section 6.1.2 (The TCP Sender) of [RFC3168]. | |||
| [RFC3168]. Section 6.1.2 of RFC 3168 extended standard TCP | Section 6.1.2 of [RFC3168] extended standard TCP congestion | |||
| congestion control [RFC5681] to cover ECN marking as well as | control [RFC5681] to cover ECN marking as well as packet drop. | |||
| packet drop. Whereas, [RFC8311] enables experimentation with | Whereas, [RFC8311] enables experimentation with alternative | |||
| alternative responses to ECN marking, if specified for instance by | responses to ECN marking, if specified for instance by an | |||
| an Experimental RFC produced by the IETF Stream. [RFC8311] also | Experimental RFC produced by the IETF Stream. [RFC8311] also | |||
| strengthened the statement that "ECT(0) SHOULD be used" to a | strengthened the statement that "ECT(0) SHOULD be used" to a | |||
| "MUST" (see [RFC8311] for the details). | "MUST" (see [RFC8311] for the details). | |||
| * The whole of Section 6.1.3 of [RFC3168] is updated by Section 3.2 | * The whole of Section 6.1.3 (The TCP Receiver) of [RFC3168] is | |||
| of the present specification, with the exception of the last | updated by Section 3.2 of the present specification, with the | |||
| paragraph (about congestion response to drop and ECN in the same | exception of the last paragraph (about congestion response to drop | |||
| round trip), which still stands. Incidentally, this last | and ECN in the same round trip), which still stands. | |||
| paragraph is in the wrong section, because it relates to "TCP | Incidentally, this last paragraph is in the wrong section, because | |||
| Sender" behaviour. | it relates to "TCP Sender" behaviour. | |||
| * The following text within Section 6.1.5 of [RFC3168]: | * The following text within Section 6.1.5 (Retransmitted TCP | |||
| packets) of [RFC3168]: | ||||
| | the TCP data receiver SHOULD ignore the ECN field on arriving | | the TCP data receiver SHOULD ignore the ECN field on arriving | |||
| | data packets that are outside of the receiver's current window. | | data packets that are outside of the receiver's current window. | |||
| is updated by more stringent acceptability tests for any packet | is updated by more stringent acceptability tests for any packet | |||
| (not just data packets) in the present specification. | (not just data packets) in the present specification. | |||
| Specifically, in the normative specification of AccECN | Specifically, in the normative specification of AccECN | |||
| (Section 3), only 'Acceptable' packets contribute to the ECN | (Section 3), only 'Acceptable' packets contribute to the ECN | |||
| counters at the AccECN receiver and Section 1.3 defines an | counters at the AccECN receiver and Section 1.3 defines an | |||
| Acceptable packet as one that passes acceptability tests | Acceptable packet as one that passes acceptability tests | |||
| equivalent in strength to those in both [RFC9293] and [RFC5961]. | equivalent in strength to those in both [RFC9293] and [RFC5961]. | |||
| * Sections 5.2, 6.1.1, 6.1.4, 6.1.5, and 6.1.6 of [RFC3168] prohibit | * Sections 5.2 (Dropped or Corrupted Packets), 6.1.1 (TCP | |||
| use of ECN on TCP control packets and retransmissions. The | Initialization), 6.1.4 (Congestion on the ACK-path), 6.1.5 | |||
| present specification does not update that aspect of [RFC3168], | (Retransmitted TCP packets), and 6.1.6 (TCP Window Probes) of | |||
| but it does say what feedback an AccECN Data Receiver ought to | [RFC3168] prohibit use of ECN on TCP control packets and | |||
| provide if it receives an ECN-capable control packet or | retransmissions. The present specification does not update that | |||
| retransmission. This ensures AccECN is forward compatible with | aspect of [RFC3168], but it does say what feedback an AccECN Data | |||
| any future scheme that allows ECN on these packets, as provided | Receiver ought to provide if it receives an ECN-capable control | |||
| for in Section 4.3 of [RFC8311] and as proposed in [ECN++]. | packet or retransmission. This ensures AccECN is forward | |||
| compatible with any future scheme that allows ECN on these | ||||
| packets, as provided for in Section 4.3 of [RFC8311] and as | ||||
| proposed in [ECN++]. | ||||
| 5. Interaction with TCP Variants | 5. Interaction with TCP Variants | |||
| This section is informative, not normative. | This section is informative, not normative. | |||
| 5.1. Compatibility with SYN Cookies | 5.1. Compatibility with SYN Cookies | |||
| A TCP Server can use SYN Cookies (see Appendix A of [RFC4987]) to | A TCP Server can use SYN Cookies (see Appendix A of [RFC4987]) to | |||
| protect itself from SYN flooding attacks. It places minimal commonly | protect itself from SYN flooding attacks. It places minimal commonly | |||
| used connection state in the SYN/ACK, and deliberately does not hold | used connection state in the SYN/ACK, and deliberately does not hold | |||
| skipping to change at line 2384 ¶ | skipping to change at line 2394 ¶ | |||
| with the value 0b000 or 0b001, these values indicate that the TCP | with the value 0b000 or 0b001, these values indicate that the TCP | |||
| Client did not request support for AccECN; therefore, the Server does | Client did not request support for AccECN; therefore, the Server does | |||
| not enter AccECN mode for this connection. Further, 0b001 on the ACK | not enter AccECN mode for this connection. Further, 0b001 on the ACK | |||
| implies that the Server sent an ECN-capable SYN/ACK, which was marked | implies that the Server sent an ECN-capable SYN/ACK, which was marked | |||
| CE in the network, and the non-AccECN TCP Client fed this back by | CE in the network, and the non-AccECN TCP Client fed this back by | |||
| setting ECE on the ACK of the SYN/ACK. | setting ECE on the ACK of the SYN/ACK. | |||
| 5.2. Compatibility with TCP Experiments and Common TCP Options | 5.2. Compatibility with TCP Experiments and Common TCP Options | |||
| AccECN is compatible (at least on paper) with the most commonly used | AccECN is compatible (at least on paper) with the most commonly used | |||
| TCP options: MSS, time-stamp, window scaling, SACK, and TCP-AO. It | TCP Options: MSS, timestamp, window scaling, SACK, and TCP-AO. It is | |||
| is also compatible with Multipath TCP (MPTCP [RFC8684]) and the | also compatible with Multipath TCP (MPTCP [RFC8684]) and the | |||
| experimental TCP option TCP Fast Open (TFO [RFC7413]). AccECN is | experimental TCP Option TCP Fast Open (TFO [RFC7413]). AccECN is | |||
| friendly to all these protocols, because space for TCP options is | friendly to all these protocols, because space for TCP Options is | |||
| particularly scarce on the SYN, where AccECN consumes zero additional | particularly scarce on the SYN, where AccECN consumes zero additional | |||
| header space. | header space. | |||
| When option space is under pressure from other options, | Because option space is limited, Section 3.2.3.3 specifies which | |||
| Section 3.2.3.3 provides guidance on how important it is to send an | AccECN Option fields are more important to include and provides | |||
| AccECN Option relative to other options, and which fields are more | guidance on the relative importance of AccECN Options against other | |||
| important to include. | TCP Options. | |||
| Implementers of TFO need to take careful note of the recommendation | Implementers of TFO need to take careful note of the recommendation | |||
| in Section 3.2.2.1. That section recommends that, if the TCP Client | in Section 3.2.2.1. That section recommends that, if the TCP Client | |||
| has successfully negotiated AccECN, when acknowledging the SYN/ACK, | has successfully negotiated AccECN, when acknowledging the SYN/ACK, | |||
| even if it has data to send, it sends a pure ACK immediately before | even if it has data to send, it sends a pure ACK immediately before | |||
| the data. Then it can reflect the IP-ECN field of the SYN/ACK on | the data. Then it can reflect the IP-ECN field of the SYN/ACK on | |||
| this pure ACK, which allows the Server to detect ECN mangling. Note | this pure ACK, which allows the Server to detect ECN mangling. Note | |||
| that, as specified in Section 3.2, any data on the SYN (SYN=1, ACK=0) | that, as specified in Section 3.2, any data on the SYN (SYN=1, ACK=0) | |||
| is not included in any of the byte counters held locally for each ECN | is not included in any of the byte counters held locally for each ECN | |||
| marking, nor in the AccECN Option on the wire. | marking, nor in the AccECN Option on the wire. | |||
| skipping to change at line 2455 ¶ | skipping to change at line 2465 ¶ | |||
| ConEx is an experimental change to the Data Sender that would be | ConEx is an experimental change to the Data Sender that would be | |||
| most useful when combined with AccECN. Without AccECN, the ConEx | most useful when combined with AccECN. Without AccECN, the ConEx | |||
| behaviour of a Data Sender would have to be more conservative than | behaviour of a Data Sender would have to be more conservative than | |||
| would be necessary if it had the accurate feedback of AccECN. | would be necessary if it had the accurate feedback of AccECN. | |||
| * The Standards Track TCP authentication option (TCP-AO [RFC5925]) | * The Standards Track TCP authentication option (TCP-AO [RFC5925]) | |||
| can be used to detect any tampering with AccECN feedback between | can be used to detect any tampering with AccECN feedback between | |||
| the Data Receiver and the Data Sender (whether malicious or | the Data Receiver and the Data Sender (whether malicious or | |||
| accidental). The AccECN fields are immutable end to end, so they | accidental). The AccECN fields are immutable end to end, so they | |||
| are amenable to TCP-AO protection, which covers TCP options by | are amenable to TCP-AO protection, which covers TCP Options by | |||
| default. However, TCP-AO is often too brittle to use on many end- | default. However, TCP-AO is often too brittle to use on many end- | |||
| to-end paths, where middleboxes can make verification fail in | to-end paths, where middleboxes can make verification fail in | |||
| their attempts to improve performance or security, e.g., Network | their attempts to improve performance or security, e.g., Network | |||
| Address Translation (NAT) and Network Address Port Translation | Address Translation (NAT) and Network Address Port Translation | |||
| (NAPT), resegmentation, or shifting the sequence space. | (NAPT), resegmentation, or shifting the sequence space. | |||
| 6. Summary: Protocol Properties | 6. Summary: Protocol Properties | |||
| This section is informative, not normative. It describes how well | This section is informative, not normative. It describes how well | |||
| the protocol satisfies the agreed requirements for a more Accurate | the protocol satisfies the agreed requirements for a more Accurate | |||
| skipping to change at line 2477 ¶ | skipping to change at line 2487 ¶ | |||
| Accuracy: From each ACK, the Data Sender can infer the number of new | Accuracy: From each ACK, the Data Sender can infer the number of new | |||
| CE-marked segments since the previous ACK. This provides better | CE-marked segments since the previous ACK. This provides better | |||
| accuracy on CE feedback than Classic ECN. In addition, if an | accuracy on CE feedback than Classic ECN. In addition, if an | |||
| AccECN Option is present (not blocked by the network path), the | AccECN Option is present (not blocked by the network path), the | |||
| number of bytes marked with CE, ECT(1), and ECT(0) are provided. | number of bytes marked with CE, ECT(1), and ECT(0) are provided. | |||
| Overhead: The AccECN scheme is divided into two parts. The | Overhead: The AccECN scheme is divided into two parts. The | |||
| essential feedback part reuses the three flags already assigned to | essential feedback part reuses the three flags already assigned to | |||
| ECN in the TCP header. The supplementary feedback part adds an | ECN in the TCP header. The supplementary feedback part adds an | |||
| additional TCP option consuming up to 11 bytes. However, no TCP | additional TCP Option consuming up to 11 bytes. However, no TCP | |||
| option space is consumed in the SYN. | Option space is consumed in the SYN. | |||
| Ordering: The order in which marks arrive at the Data Receiver is | Ordering: The order in which marks arrive at the Data Receiver is | |||
| preserved in AccECN feedback, because the Data Receiver is | preserved in AccECN feedback, because the Data Receiver is | |||
| expected to send an ACK immediately whenever a different mark | expected to send an ACK immediately whenever a different mark | |||
| arrives. | arrives. | |||
| Timeliness: While the same ECN markings are arriving continually at | Timeliness: While the same ECN markings are arriving continually at | |||
| the Data Receiver, it can defer ACKs as TCP does normally, but it | the Data Receiver, it can defer ACKs as TCP does normally, but it | |||
| will immediately send an ACK as soon as a different ECN marking | will immediately send an ACK as soon as a different ECN marking | |||
| arrives. | arrives. | |||
| skipping to change at line 2546 ¶ | skipping to change at line 2556 ¶ | |||
| stripped, the resolution of the feedback is degraded, but the | stripped, the resolution of the feedback is degraded, but the | |||
| integrity of this degraded feedback can still be assured. | integrity of this degraded feedback can still be assured. | |||
| Backward Compatibility: If only one endpoint supports the AccECN | Backward Compatibility: If only one endpoint supports the AccECN | |||
| scheme, it will fall back to the most advanced ECN feedback scheme | scheme, it will fall back to the most advanced ECN feedback scheme | |||
| supported by the other end. | supported by the other end. | |||
| If AccECN Options are stripped by a middlebox, AccECN still | If AccECN Options are stripped by a middlebox, AccECN still | |||
| provides basic congestion feedback in the ACE field. Further, | provides basic congestion feedback in the ACE field. Further, | |||
| AccECN can be used to detect mangling of the IP-ECN field; | AccECN can be used to detect mangling of the IP-ECN field; | |||
| mangling of the TCP ECN flags; blocking of ECT-marked segments; | mangling of the TCP-ECN flags; blocking of ECT-marked segments; | |||
| and blocking of segments carrying an AccECN Option. It can detect | and blocking of segments carrying an AccECN Option. It can detect | |||
| these conditions during TCP's three-way handshake so that it can | these conditions during TCP's three-way handshake so that it can | |||
| fall back to operation without ECN and/or operation without AccECN | fall back to operation without ECN and/or operation without AccECN | |||
| Options. | Options. | |||
| Forward Compatibility: The behaviour of endpoints and middleboxes is | Forward Compatibility: The behaviour of endpoints and middleboxes is | |||
| carefully defined for all reserved or currently unused codepoints | carefully defined for all reserved or currently unused codepoints | |||
| in the scheme. Then, the designers of security devices can | in the scheme. Then, the designers of security devices can | |||
| understand which currently unused values might appear in the | understand which currently unused values might appear in the | |||
| future. So, even if they choose to treat such values as anomalous | future. So, even if they choose to treat such values as anomalous | |||
| while they are not widely used, any blocking will at least be | while they are not widely used, any blocking will at least be | |||
| under policy control and not hard-coded. Then, if previously | under policy control, not hard-coded. Then, if previously unused | |||
| unused values start to appear on the Internet (or in standards), | values start to appear on the Internet (or in standards), such | |||
| such policies could be quickly reversed. | policies could be quickly reversed. | |||
| 7. IANA Considerations | 7. IANA Considerations | |||
| This document reassigns the TCP header flag at bit offset 7 to the | This document reassigns the TCP header flag at bit offset 7 to the | |||
| AccECN protocol. This bit was previously called the Nonce Sum (NS) | AccECN protocol. This bit was previously called the Nonce Sum (NS) | |||
| flag [RFC3540], but RFC 3540 has been reclassified as Historic | flag [RFC3540], but RFC 3540 has been reclassified as Historic | |||
| [RFC8311]. The flag is now defined as the following in the "TCP | [RFC8311]. The flag is now defined as the following in the "TCP | |||
| Header Flags" registry in the "Transmission Control Protocol (TCP) | Header Flags" registry in the "Transmission Control Protocol (TCP) | |||
| Parameters" registry group: | Parameters" registry group: | |||
| +=====+==============+===========+==============================+ | +=====+==============+===========+==============================+ | |||
| | Bit | Name | Reference | Assignment Notes | | | Bit | Name | Reference | Assignment Notes | | |||
| +=====+==============+===========+==============================+ | +=====+==============+===========+==============================+ | |||
| | 7 | AE (Accurate | RFC 9768 | Previously used as NS (Nonce | | | 7 | AE (Accurate | RFC 9768 | Previously used as NS (Nonce | | |||
| | | ECN) | | Sum) by [RFC3540], which is | | | | ECN) | | Sum) by [RFC3540], which is | | |||
| | | | | now Historic [RFC8311] | | | | | | now Historic [RFC8311] | | |||
| +-----+--------------+-----------+------------------------------+ | +-----+--------------+-----------+------------------------------+ | |||
| Table 6: TCP Header Flag Reassignment | Table 6: TCP Header Flag Reassignment | |||
| This document also defines two new TCP options for AccECN from the | This document also defines two new TCP Options for AccECN from the | |||
| TCP option space. These values are defined as the following in the | TCP Option space. These values are defined as the following in the | |||
| "TCP Option Kind Numbers" registry in the "Transmission Control | "TCP Option Kind Numbers" registry in the "Transmission Control | |||
| Protocol (TCP) Parameters" registry group: | Protocol (TCP) Parameters" registry group: | |||
| +======+========+================================+===========+ | +======+========+================================+===========+ | |||
| | Kind | Length | Meaning | Reference | | | Kind | Length | Meaning | Reference | | |||
| +======+========+================================+===========+ | +======+========+================================+===========+ | |||
| | 172 | N | Accurate ECN Order 0 (AccECN0) | RFC 9768 | | | 172 | N | Accurate ECN Order 0 (AccECN0) | RFC 9768 | | |||
| +------+--------+--------------------------------+-----------+ | +------+--------+--------------------------------+-----------+ | |||
| | 174 | N | Accurate ECN Order 1 (AccECN1) | RFC 9768 | | | 174 | N | Accurate ECN Order 1 (AccECN1) | RFC 9768 | | |||
| +------+--------+--------------------------------+-----------+ | +------+--------+--------------------------------+-----------+ | |||
| Table 7: New TCP Option assignments | Table 7: New TCP Option Assignments | |||
| Early experimental implementations of the two AccECN Options used | Early experimental implementations of the two AccECN Options used | |||
| experimental option 254 per [RFC6994] with the 16-bit magic numbers | experimental option 254 per [RFC6994] with the 16-bit magic numbers | |||
| 0xACC0 and 0xACC1, respectively, for Order 0 and 1, as allocated in | 0xACC0 and 0xACC1, respectively, for Order 0 and 1, as allocated in | |||
| the IANA "TCP/UDP Experimental Option Experiment Identifiers (TCP/UDP | the IANA "TCP/UDP Experimental Option Experiment Identifiers (TCP/UDP | |||
| ExIDs)" registry. Even earlier experimental implementations used the | ExIDs)" registry. Even earlier experimental implementations used the | |||
| single magic number 0xACCE (16 bits). Uses of these experimental | single magic number 0xACCE (16 bits). Uses of these experimental | |||
| options SHOULD migrate to use the new option kinds (172 and 174). | options SHOULD migrate to use the new option kinds (172 and 174). | |||
| 8. Security and Privacy Considerations | 8. Security and Privacy Considerations | |||
| If ever the supplementary feedback part of AccECN that is based on | If ever the supplementary feedback part of AccECN that is based on | |||
| one of the new AccECN TCP Options is unusable (due for example to | one of the new AccECN TCP Options is unusable (due for example to | |||
| middlebox interference), the essential feedback part of AccECN's | middlebox interference), the essential feedback part of AccECN's | |||
| congestion feedback offers only limited resilience to long runs of | congestion feedback offers only limited resilience to long runs of | |||
| ACK loss (see Section 3.2.2.5). These problems are unlikely to be | ACK loss (see Section 3.2.2.5). These problems are unlikely to be | |||
| due to malicious intervention (because if an attacker could strip a | due to malicious intervention (because if an attacker could strip a | |||
| TCP option or discard a long run of ACKs, it could wreak other | TCP Option or discard a long run of ACKs, it could wreak other | |||
| arbitrary havoc). However, it would be of concern if AccECN's | arbitrary havoc). However, it would be of concern if AccECN's | |||
| resilience could be indirectly compromised during a flooding attack. | resilience could be indirectly compromised during a flooding attack. | |||
| AccECN is still considered safe though, because if AccECN Options are | AccECN is still considered safe though, because if AccECN Options are | |||
| not present, the AccECN Data Sender is then required to switch to | not present, the AccECN Data Sender is then required to switch to | |||
| more conservative assumptions about wrap of congestion indication | more conservative assumptions about wrap of congestion indication | |||
| counters (see Section 3.2.2.5 and Appendix A.2). | counters (see Section 3.2.2.5 and Appendix A.2). | |||
| Section 5.1 describes how a TCP Server can negotiate AccECN and use | Section 5.1 describes how a TCP Server can negotiate AccECN and use | |||
| the SYN cookie method for mitigating SYN flooding attacks. | the SYN cookie method for mitigating SYN flooding attacks. | |||
| skipping to change at line 2639 ¶ | skipping to change at line 2649 ¶ | |||
| will be degraded, but the integrity of this degraded information can | will be degraded, but the integrity of this degraded information can | |||
| still be assured. Assuring that Data Senders respond appropriately | still be assured. Assuring that Data Senders respond appropriately | |||
| to ECN feedback is possible, but the scope of the present document is | to ECN feedback is possible, but the scope of the present document is | |||
| confined to the feedback protocol and excludes the response to this | confined to the feedback protocol and excludes the response to this | |||
| feedback. | feedback. | |||
| In Section 3.2.3, a Data Sender is allowed to ignore an unrecognized | In Section 3.2.3, a Data Sender is allowed to ignore an unrecognized | |||
| TCP AccECN Option length and read as many whole 3-octet fields from | TCP AccECN Option length and read as many whole 3-octet fields from | |||
| it as possible up to a maximum of 3, treating the remainder as | it as possible up to a maximum of 3, treating the remainder as | |||
| padding. This opens up a potential covert channel of up to 29B (40 - | padding. This opens up a potential covert channel of up to 29B (40 - | |||
| (2+3*3)) B. However, it is really an overt channel (not hidden) and | (2+3*3)). However, it is really an overt channel (not hidden) and it | |||
| it is no different than the use of unknown TCP options with unknown | is no different from the use of unknown TCP Options with unknown | |||
| option lengths in general. Therefore, where this is of concern, it | option lengths in general. Therefore, where this is of concern, it | |||
| can already be adequately mitigated by regular TCP normalizer | can already be adequately mitigated by regular TCP normalizer | |||
| technology (see Section 3.3.2). | technology (see Section 3.3.2). | |||
| The AccECN protocol is not believed to introduce any new privacy | ||||
| concerns, because it merely counts and feeds back signals at the | ||||
| transport layer that had already been visible at the IP layer. A | ||||
| covert channel can be used to compromise privacy. However, as | ||||
| explained above, undefined TCP options in general open up such | ||||
| channels, and common techniques are available to close them off. | ||||
| There is a potential concern that a Data Receiver could deliberately | There is a potential concern that a Data Receiver could deliberately | |||
| omit AccECN Options pretending that they had been stripped by a | omit AccECN Options pretending that they had been stripped by a | |||
| middlebox. No known way can yet be contrived for a receiver to take | middlebox. Currently, there is no known way for a receiver to take | |||
| advantage of this behaviour, which seems to always degrade its own | advantage of this behaviour, which seems to always degrade its own | |||
| performance. However, the concern is mentioned here for | performance. However, the concern is mentioned here for | |||
| completeness. | completeness. | |||
| The AccECN protocol is not believed to introduce any new privacy | ||||
| concerns, because it merely counts and feeds back signals at the | ||||
| transport layer that had already been visible at the IP layer. A | ||||
| covert channel can be used to compromise privacy. However, as | ||||
| explained above, undefined TCP Options in general open up such | ||||
| channels, and common techniques are available to close them off. | ||||
| A generic privacy concern of any new protocol is that for a while it | A generic privacy concern of any new protocol is that for a while it | |||
| will be used by a small population of hosts, and thus show up more | will be used by a small population of hosts, and thus those hosts | |||
| easily. However, it is expected that AccECN will become available in | could be more easily identified. However, it is expected that AccECN | |||
| operating systems over time and that it will eventually be turned on | will become available in more operating systems over time and that it | |||
| by default. Thus, an individual identification of a particular user | will eventually be turned on by default. Thus, an individual | |||
| is less of a concern than the fingerprinting of specific versions of | identification of a particular user is less of a concern than the | |||
| operation systems. However, the latter can be done using different | fingerprinting of specific versions of operation systems. However, | |||
| means independent of Accurate ECN. | the latter can be done using different means independent of Accurate | |||
| ECN. | ||||
| As Accurate ECN exposes more bits in the TCP header that could be | As Accurate ECN exposes more bits in the TCP header that could be | |||
| tampered with without interfering with the transport excessively, it | tampered with without interfering with the transport excessively, it | |||
| may allow an additional way to identify specific data streams across | may allow an additional way to identify specific data streams across | |||
| a virtual private network (VPN) to an attacker that has access to the | a virtual private network (VPN) to an attacker that has access to the | |||
| datastream before and after the VPN tunnel endpoints. This may be | datastream before and after the VPN tunnel endpoints. This may be | |||
| achieved by injecting or modifying the ACE field in specific patterns | achieved by injecting or modifying the ACE field in specific patterns | |||
| that can be recognized. | that can be recognized. | |||
| Overall, Accurate ECN does not change the risk profile on privacy to | Overall, Accurate ECN does not change the risk profile on privacy to | |||
| a user dramatically beyond what is already possible using classic | a user dramatically beyond what is already possible using classic | |||
| ECN. However, in order to prevent such attacks and means of easier | ECN. However, in order to prevent such attacks and means of easier | |||
| identification of flows, it is advisable for privacy-conscious users | identification of flows, it is advisable for privacy-conscious users | |||
| behind VPNs to not enable the Accurate ECN, or Classic ECN for that | behind VPNs to not enable Accurate ECN, or Classic ECN for that | |||
| matter. | matter. | |||
| 9. References | 9. References | |||
| 9.1. Normative References | 9.1. Normative References | |||
| [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | |||
| Selective Acknowledgment Options", RFC 2018, | Selective Acknowledgment Options", RFC 2018, | |||
| DOI 10.17487/RFC2018, October 1996, | DOI 10.17487/RFC2018, October 1996, | |||
| <https://www.rfc-editor.org/info/rfc2018>. | <https://www.rfc-editor.org/info/rfc2018>. | |||
| skipping to change at line 2860 ¶ | skipping to change at line 2871 ¶ | |||
| [RoCEv2] InfiniBand Trade Association, "InfiniBand Architecture | [RoCEv2] InfiniBand Trade Association, "InfiniBand Architecture | |||
| Specification", Volume 1, Release 1.4, 2020, | Specification", Volume 1, Release 1.4, 2020, | |||
| <https://www.infinibandta.org/ibta-specification/>. | <https://www.infinibandta.org/ibta-specification/>. | |||
| Appendix A. Example Algorithms | Appendix A. Example Algorithms | |||
| This appendix is informative, not normative. It gives example | This appendix is informative, not normative. It gives example | |||
| algorithms that would satisfy the normative requirements of the | algorithms that would satisfy the normative requirements of the | |||
| AccECN protocol. However, implementers are free to choose other ways | AccECN protocol. However, implementers are free to choose other ways | |||
| to implement the requirements. | to satisfy the requirements. | |||
| A.1. Example Algorithm to Encode/Decode the AccECN Option | A.1. Example Algorithm to Encode/Decode the AccECN Option | |||
| The example algorithms below show how a Data Receiver in AccECN mode | The example algorithms below show how a Data Receiver in AccECN mode | |||
| could encode its CE byte counter r.ceb into the ECEB field within an | could encode its CE byte counter r.ceb into the ECEB field within an | |||
| AccECN TCP Option, and how a Data Sender in AccECN mode could decode | AccECN TCP Option, and how a Data Sender in AccECN mode could decode | |||
| the ECEB field into its byte counter s.ceb. The other counters for | the ECEB field into its byte counter s.ceb. The other counters for | |||
| bytes marked ECT(0) and ECT(1) in an AccECN Option would be similarly | bytes marked ECT(0) and ECT(1) in an AccECN Option would be similarly | |||
| encoded and decoded. | encoded and decoded. | |||
| skipping to change at line 2893 ¶ | skipping to change at line 2904 ¶ | |||
| where '%' is the remainder operator. | where '%' is the remainder operator. | |||
| On the arrival of an AccECN Option, the Data Sender first makes sure | On the arrival of an AccECN Option, the Data Sender first makes sure | |||
| the ACK has not been superseded in order to avoid winding the s.ceb | the ACK has not been superseded in order to avoid winding the s.ceb | |||
| counter backwards. It uses the TCP acknowledgement number and any | counter backwards. It uses the TCP acknowledgement number and any | |||
| SACK options [RFC2018] to calculate newlyAckedB, the amount of new | SACK options [RFC2018] to calculate newlyAckedB, the amount of new | |||
| data that the ACK acknowledges in bytes (newlyAckedB can be zero but | data that the ACK acknowledges in bytes (newlyAckedB can be zero but | |||
| not negative). If newlyAckedB is zero, either the ACK has been | not negative). If newlyAckedB is zero, either the ACK has been | |||
| superseded or CE-marked packet(s) without data could have arrived. | superseded or CE-marked packet(s) without data could have arrived. | |||
| To break the tie for the latter case, the Data Sender could use time- | To break the tie for the latter case, the Data Sender could use | |||
| stamps [RFC7323] (if present) to work out newlyAckedT, the amount of | timestamps [RFC7323] (if present) to work out newlyAckedT, the amount | |||
| new time that the ACK acknowledges. If the Data Sender determines | of new time that the ACK acknowledges. If the Data Sender determines | |||
| that the ACK has been superseded, it ignores the AccECN Option. | that the ACK has been superseded, it ignores the AccECN Option. | |||
| Otherwise, the Data Sender calculates the minimum non-negative | Otherwise, the Data Sender calculates the minimum non-negative | |||
| difference d.ceb between the ECEB field and its local s.ceb counter, | difference d.ceb between the ECEB field and its local s.ceb counter, | |||
| using modulo arithmetic as follows: | using modulo arithmetic as follows: | |||
| if ((newlyAckedB > 0) || (newlyAckedT > 0)) { | if ((newlyAckedB > 0) || (newlyAckedT > 0)) { | |||
| d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT | d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT | |||
| s.ceb += d.ceb | s.ceb += d.ceb | |||
| } | } | |||
| skipping to change at line 2937 ¶ | skipping to change at line 2948 ¶ | |||
| heuristically detect a long enough unbroken string of ACK losses that | heuristically detect a long enough unbroken string of ACK losses that | |||
| could have concealed a cycle of the congestion counter in the ACE | could have concealed a cycle of the congestion counter in the ACE | |||
| field of the next ACK to arrive. | field of the next ACK to arrive. | |||
| Two variants of the algorithm are given: i) a more conservative | Two variants of the algorithm are given: i) a more conservative | |||
| variant for a Data Sender to use if it detects that AccECN Options | variant for a Data Sender to use if it detects that AccECN Options | |||
| are not available (see Section 3.2.2.5 and Section 3.2.3.2); and ii) | are not available (see Section 3.2.2.5 and Section 3.2.3.2); and ii) | |||
| a less conservative variant that is feasible when complementary | a less conservative variant that is feasible when complementary | |||
| information is available from AccECN Options. | information is available from AccECN Options. | |||
| A.2.1. Safety Algorithm Without the AccECN Option | A.2.1. Safety Algorithm without the AccECN Option | |||
| It is assumed that each local packet counter is a sufficiently sized | It is assumed that each local packet counter is a sufficiently sized | |||
| unsigned integer (probably 32b) and that the following constant has | unsigned integer (probably 32b) and that the following constant has | |||
| been assigned: | been assigned: | |||
| DIVACE = 2^3 | DIVACE = 2^3 | |||
| Every time an Acceptable CE marked packet arrives (Section 3.2.2.2), | Every time an Acceptable CE marked packet arrives (Section 3.2.2.2), | |||
| the Data Receiver increments its local value of r.cep by 1. It | the Data Receiver increments its local value of r.cep by 1. It | |||
| repeats the same value of ACE in every subsequent ACK until the next | repeats the same value of ACE in every subsequent ACK until the next | |||
| skipping to change at line 2982 ¶ | skipping to change at line 2993 ¶ | |||
| of the missing ACKs were piggy-backed on data (i.e., not pure ACKs) | of the missing ACKs were piggy-backed on data (i.e., not pure ACKs) | |||
| retransmissions will not repair the lost AccECN information, because | retransmissions will not repair the lost AccECN information, because | |||
| AccECN requires retransmissions to carry the latest AccECN counters, | AccECN requires retransmissions to carry the latest AccECN counters, | |||
| not the original ones. | not the original ones. | |||
| The phrase 'under prevailing conditions' allows for implementation- | The phrase 'under prevailing conditions' allows for implementation- | |||
| dependent interpretation. A Data Sender might take account of the | dependent interpretation. A Data Sender might take account of the | |||
| prevailing size of data segments and the prevailing CE marking rate | prevailing size of data segments and the prevailing CE marking rate | |||
| just before the sequence of missing ACKs. However, we shall start | just before the sequence of missing ACKs. However, we shall start | |||
| with the simplest algorithm, which assumes segments are all full- | with the simplest algorithm, which assumes segments are all full- | |||
| sized and ultra-conservatively it assumes that ECN marking was 100% | sized, and ultra-conservatively it assumes that ECN marking was 100% | |||
| on the forward path when ACKs on the reverse path started to all be | on the forward path when ACKs on the reverse path started to all be | |||
| dropped. Specifically, if newlyAckedB is the amount of data that an | dropped. Specifically, if newlyAckedB is the amount of data that an | |||
| ACK acknowledges since the previous ACK, then the Data Sender could | ACK acknowledges since the previous ACK, then the Data Sender could | |||
| assume that this acknowledges newlyAckedPkt full-sized segments, | assume that this acknowledges newlyAckedPkt full-sized segments, | |||
| where newlyAckedPkt = newlyAckedB/MSS. Then it could assume that the | where newlyAckedPkt = newlyAckedB/MSS. Then it could assume that the | |||
| ACE field incremented by | ACE field incremented by | |||
| dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE) | dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE) | |||
| For example, imagine an ACK acknowledges newlyAckedPkt=9 more full- | For example, imagine an ACK acknowledges newlyAckedPkt=9 more full- | |||
| size segments than any previous ACK, and that ACE increments by a | size segments than any previous ACK, and that ACE increments by a | |||
| minimum of 2 CE marks (d.cep=2). The above formula works out that it | minimum of 2 CE marks (d.cep=2). The above formula indicates that it | |||
| would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) = | would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) = | |||
| 2). However, if ACE increases by a minimum of 2 but acknowledges 10 | 2). However, if ACE increases by a minimum of 2 but acknowledges 10 | |||
| full-sized segments, then it would be necessary to assume that there | full-sized segments, then it would be necessary to assume that there | |||
| could have been 10 CE marks (because 10 - ((10-2) % 8) = 10). | could have been 10 CE marks (because 10 - ((10-2) % 8) = 10). | |||
| Note that checks would need to be added to the above pseudocode for | Note that checks would need to be added to the above pseudocode for | |||
| (d.cep > newlyAckedPkt), which could occur if newlyAckedPkt had been | (d.cep > newlyAckedPkt), which could occur if newlyAckedPkt had been | |||
| wrongly estimated using an inappropriate packet size. | wrongly estimated using an inappropriate packet size. | |||
| ACKs that acknowledge a large stretch of packets might be common in | ACKs that acknowledge a large stretch of packets might be common in | |||
| skipping to change at line 3024 ¶ | skipping to change at line 3035 ¶ | |||
| average segment size and prevailing ECN marking. For instance, | average segment size and prevailing ECN marking. For instance, | |||
| newlyAckedPkt in the above formula could be replaced with | newlyAckedPkt in the above formula could be replaced with | |||
| newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing | newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing | |||
| segment size and p is the prevailing ECN marking probability. | segment size and p is the prevailing ECN marking probability. | |||
| However, ultimately, if TCP's ECN feedback becomes inaccurate, it | However, ultimately, if TCP's ECN feedback becomes inaccurate, it | |||
| still has loss detection to fall back on. Therefore, it would seem | still has loss detection to fall back on. Therefore, it would seem | |||
| safe to implement a simple algorithm, rather than a perfect one. | safe to implement a simple algorithm, rather than a perfect one. | |||
| The simple algorithm for dSafer.cep above requires no monitoring of | The simple algorithm for dSafer.cep above requires no monitoring of | |||
| prevailing conditions and it would still be safe if, for example, | prevailing conditions and it would still be safe if, for example, | |||
| segments were on average at least 5% of full-sized as long as ECN | segments were on average at least 5% of a full-sized segment as long | |||
| marking was 5% or less. Assuming it was used, the Data Sender would | as ECN marking was 5% or less. Assuming it was used, the Data Sender | |||
| increment its packet counter as follows: | would increment its packet counter as follows: | |||
| s.cep += dSafer.cep | s.cep += dSafer.cep | |||
| If missing acknowledgement numbers arrive later (due to reordering), | If missing acknowledgement numbers arrive later (due to reordering), | |||
| Section 3.2.2.5.2 says "the Data Sender MAY attempt to neutralize the | Section 3.2.2.5.2 says "the Data Sender MAY attempt to neutralize the | |||
| effect of any action it took based on a conservative assumption that | effect of any action it took based on a conservative assumption that | |||
| it later found to be incorrect". To do this, the Data Sender would | it later found to be incorrect". To do this, the Data Sender would | |||
| have to store the values of all the relevant variables whenever it | have to store the values of all the relevant variables whenever it | |||
| made assumptions, so that it could re-evaluate them later. Given | made assumptions, so that it could re-evaluate them later. Given | |||
| this could become complex and it is not required, we do not attempt | this could become complex and it is not required, we do not attempt | |||
| skipping to change at line 3063 ¶ | skipping to change at line 3074 ¶ | |||
| if (dSafer.cep > d.cep) { | if (dSafer.cep > d.cep) { | |||
| if (d.ceb <= MSS * d.cep) { % Same as (s <= MSS), but no DBZ | if (d.ceb <= MSS * d.cep) { % Same as (s <= MSS), but no DBZ | |||
| sSafer = d.ceb/dSafer.cep | sSafer = d.ceb/dSafer.cep | |||
| if (sSafer < MSS/SAFETY_FACTOR) | if (sSafer < MSS/SAFETY_FACTOR) | |||
| dSafer.cep = d.cep % d.cep is a safe enough estimate | dSafer.cep = d.cep % d.cep is a safe enough estimate | |||
| } % else | } % else | |||
| % No need for else; dSafer.cep is already correct, | % No need for else; dSafer.cep is already correct, | |||
| % because d.cep must have been too small | % because d.cep must have been too small | |||
| } | } | |||
| The chart below shows when the above algorithm will consider d.cep | The chart below shows when the above algorithm will replace | |||
| can replace dSafer.cep as a safe enough estimate of the number of CE- | dSafer.cep with d.cep as a safe enough estimate of the number of CE | |||
| marked packets: | marked packets: | |||
| ^ | ^ | |||
| sSafer| | sSafer| | |||
| | | | | |||
| MSS+ | MSS+ | |||
| | | | | |||
| | dSafer.cep | | dSafer.cep | |||
| | is | | is | |||
| MSS/SAFETY_FACTOR+--------------+ safest | MSS/SAFETY_FACTOR+--------------+ safest | |||
| skipping to change at line 3112 ¶ | skipping to change at line 3123 ¶ | |||
| size is more likely to have been just less than one MSS, rather | size is more likely to have been just less than one MSS, rather | |||
| than below MSS/2. | than below MSS/2. | |||
| If pure ACKs were allowed to be ECN-capable, missing ACKs would be | If pure ACKs were allowed to be ECN-capable, missing ACKs would be | |||
| far less likely. However, because [RFC3168] currently precludes | far less likely. However, because [RFC3168] currently precludes | |||
| this, the above algorithm assumes that pure ACKs are not ECN-capable. | this, the above algorithm assumes that pure ACKs are not ECN-capable. | |||
| A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets | A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets | |||
| If AccECN Options are not available, the Data Sender can only decode | If AccECN Options are not available, the Data Sender can only decode | |||
| a CE marking from the ACE field in packets. Every time an ACK | the ACE field as a number of marked packets. Every time an ACK | |||
| arrives, to convert this into an estimate of CE-marked bytes, it | arrives, to convert the number of CE markings into an estimate of CE- | |||
| needs an average of the segment size, s_ave. Then it can add or | marked bytes, it needs an average of the segment size, s_ave. Then | |||
| subtract s_ave from the value of d.ceb as the value of d.cep | it can add or subtract s_ave from the value of d.ceb as the value of | |||
| increments or decrements. Some possible ways to calculate s_ave are | d.cep increments or decrements. Some possible ways to calculate | |||
| outlined below. The precise details will depend on why an estimate | s_ave are outlined below. The precise details will depend on why an | |||
| of marked bytes is needed. | estimate of marked bytes is needed. | |||
| The implementation could keep a record of the byte numbers of all the | The implementation could keep a record of the byte numbers of all the | |||
| boundaries between packets in flight (including control packets), and | boundaries between packets in flight (including control packets), and | |||
| recalculate s_ave on every ACK. However, it would be simpler to | recalculate s_ave on every ACK. However, it would be simpler to | |||
| merely maintain a counter packets_in_flight for the number of packets | merely maintain a counter packets_in_flight for the number of packets | |||
| in flight (including control packets), which is reset once per RTT. | in flight (including control packets), which is reset once per RTT. | |||
| Either way, it would estimate s_ave as: | Either way, it would estimate s_ave as: | |||
| s_ave ~= flightsize / packets_in_flight, | s_ave ~= flightsize / packets_in_flight, | |||
| skipping to change at line 3172 ¶ | skipping to change at line 3183 ¶ | |||
| IPv6 Traffic Class field). To detect bleaching, it will be | IPv6 Traffic Class field). To detect bleaching, it will be | |||
| sufficient to detect whether nearly all bytes arrive marked as Not- | sufficient to detect whether nearly all bytes arrive marked as Not- | |||
| ECT. Therefore, there ought to be no need to keep track of the | ECT. Therefore, there ought to be no need to keep track of the | |||
| details of retransmissions. | details of retransmissions. | |||
| Appendix B. Rationale for Usage of TCP Header Flags | Appendix B. Rationale for Usage of TCP Header Flags | |||
| B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | |||
| AccECN uses a rather unorthodox approach to negotiate the highest | AccECN uses a rather unorthodox approach to negotiate the highest | |||
| version TCP ECN feedback scheme that both ends support, as justified | version TCP-ECN feedback scheme that both ends support, as justified | |||
| below. It follows from the original TCP ECN capability negotiation | below. It follows from the original TCP-ECN capability negotiation | |||
| [RFC3168], in which the Client set the 2 least significant of the | [RFC3168], in which the Client set the 2 least significant of the | |||
| original reserved flags in the TCP header, and fell back to No ECN | original reserved flags in the TCP header, and fell back to no | |||
| support if the Server responded with the 2 flags cleared, which had | support for ECN if the Server responded with the 2 flags cleared, | |||
| previously been the default. | which had previously been the default. | |||
| Classic ECN used header flags rather than a TCP option because it was | Classic ECN used header flags rather than a TCP Option because it was | |||
| considered more efficient to use a header flag for 1 bit of feedback | considered more efficient to use a header flag for 1 bit of feedback | |||
| per ACK, and this bit could be overloaded to indicate support for | per ACK, and this bit could be overloaded to indicate support for | |||
| Classic ECN during the handshake. During the development of ECN, 1 | Classic ECN during the handshake. During the development of ECN, 1 | |||
| bit crept up to 2, in order to deliver the feedback reliably and to | bit crept up to 2, in order to deliver the feedback reliably and to | |||
| work round some broken hosts that reflected the reserved flags during | work round some broken hosts that reflected the reserved flags during | |||
| the handshake. | the handshake. | |||
| In order to be backward compatible with RFC 3168, AccECN continues | In order to be backward compatible with RFC 3168, AccECN continues | |||
| this approach, using the 3rd least significant TCP header flag that | this approach, using the 3rd least significant TCP header flag that | |||
| had previously been allocated for the ECN-nonce (now historic). | had previously been allocated for the ECN-nonce (now historic). | |||
| Then, whatever form of Server an AccECN Client encounters, the | Then, whatever form of Server an AccECN Client encounters, the | |||
| connection can fall back to the highest version of feedback protocol | connection can fall back to the highest version of feedback protocol | |||
| that both ends support, as explained in Section 3.1. | that both ends support, as explained in Section 3.1. | |||
| If AccECN capability negotiation had used the more orthodox approach | If AccECN capability negotiation had used the more orthodox approach | |||
| of a TCP option, it would still have had to set the two ECN flags in | of a TCP Option, it would still have had to set the two ECN flags in | |||
| the main TCP header, in order to be able to fall back to Classic ECN | the main TCP header, in order to be able to fall back to Classic ECN | |||
| [RFC3168], or to disable ECN support, without another round of | [RFC3168], or to disable ECN support, without another round of | |||
| negotiation. Then AccECN would also have had to handle all the | negotiation. Then AccECN would also have had to handle all the | |||
| different ways that Servers currently respond to settings of the ECN | different ways that Servers currently respond to settings of the ECN | |||
| flags in the main TCP header, including all of the conflicting cases | flags in the main TCP header, including all of the conflicting cases | |||
| where a Server might have said it supported one approach in the flags | where a Server might have said it supported one approach in the flags | |||
| and another approach in a new TCP option. And AccECN would have had | and another approach in a new TCP Option. And AccECN would have had | |||
| to deal with all of the additional possibilities where a middlebox | to deal with all of the additional possibilities where a middlebox | |||
| might have mangled the ECN flags, or removed TCP options. Thus, | might have mangled the ECN flags, or removed TCP Options. Thus, | |||
| usage of the 3rd reserved TCP header flag simplified the protocol. | usage of the 3rd reserved TCP header flag simplified the protocol. | |||
| The third flag was used in a way that could be distinguished from the | The third flag was used in a way that could be distinguished from the | |||
| ECN-nonce, in case any nonce deployment was encountered. Previous | ECN-nonce, in case any nonce deployment was encountered. Previous | |||
| usage of this flag for the ECN-nonce was integrated into the original | usage of this flag for the ECN-nonce was integrated into the original | |||
| ECN negotiation. This further justified the third flag's use for | ECN negotiation. This further justified the third flag's use for | |||
| AccECN, because a non-ECN usage of this flag would have had to use it | AccECN, because a non-ECN usage of this flag would have had to use it | |||
| as a separate single bit, rather than in combination with the other 2 | as a separate single bit, rather than in combination with the other 2 | |||
| ECN flags. | ECN flags. | |||
| skipping to change at line 3240 ¶ | skipping to change at line 3251 ¶ | |||
| in Section 2.5). | in Section 2.5). | |||
| During traversal testing, it was discovered that the IP-ECN field in | During traversal testing, it was discovered that the IP-ECN field in | |||
| the SYN was mangled on a non-negligible proportion of paths. | the SYN was mangled on a non-negligible proportion of paths. | |||
| Therefore, it was necessary to allow the SYN/ACK to feed all four IP- | Therefore, it was necessary to allow the SYN/ACK to feed all four IP- | |||
| ECN codepoints that the SYN could arrive with back to the Client. | ECN codepoints that the SYN could arrive with back to the Client. | |||
| Without this, the Client could not know whether to disable ECN for | Without this, the Client could not know whether to disable ECN for | |||
| the connection due to mangling of the IP-ECN field (also explained in | the connection due to mangling of the IP-ECN field (also explained in | |||
| Section 2.5). This development consumed the remaining two codepoints | Section 2.5). This development consumed the remaining two codepoints | |||
| on the SYN/ACK that had been reserved for future use by AccECN in | on the SYN/ACK that had been reserved for future use by AccECN in | |||
| earlier versions. | earlier draft versions of this document. | |||
| B.3. Space for Future Evolution | B.3. Space for Future Evolution | |||
| Despite availability of usable TCP header space being extremely | Despite availability of usable TCP header space being extremely | |||
| scarce, the AccECN protocol has taken all possible steps to ensure | scarce, the AccECN protocol has taken all possible steps to ensure | |||
| that there is space to negotiate possible future variants of the | that there is space to negotiate possible future variants of the | |||
| protocol, either if a variant of AccECN is required, or if a | protocol, either if a variant of AccECN is required, or if a | |||
| completely different ECN feedback approach is needed. | completely different ECN feedback approach is needed. | |||
| Future AccECN variants: When the AccECN capability is negotiated | Future AccECN variants: When the AccECN capability is negotiated | |||
| skipping to change at line 3300 ¶ | skipping to change at line 3311 ¶ | |||
| equivalent to AccECN negotiation with (1,1,1) on the SYN. These | equivalent to AccECN negotiation with (1,1,1) on the SYN. These | |||
| codepoints would not allow fall-back to Classic ECN support for a | codepoints would not allow fall-back to Classic ECN support for a | |||
| Server that did not understand them, but this approach ensures | Server that did not understand them, but this approach ensures | |||
| they are available in the future, perhaps for uses other than ECN | they are available in the future, perhaps for uses other than ECN | |||
| alongside the AccECN scheme. All possible combinations of SYN/ACK | alongside the AccECN scheme. All possible combinations of SYN/ACK | |||
| could be used in response except either (0,0,0) or reflection of | could be used in response except either (0,0,0) or reflection of | |||
| the same values sent on the SYN. | the same values sent on the SYN. | |||
| In order to extend AccECN or ECN in the future, other ways could | In order to extend AccECN or ECN in the future, other ways could | |||
| be resorted to, although their traversal properties are likely to | be resorted to, although their traversal properties are likely to | |||
| be inferior. They include a new TCP option; using the remaining | be inferior. They include a new TCP Option; using the remaining | |||
| reserved flags in the main TCP header (preferably extending the | reserved flags in the main TCP header (preferably extending the | |||
| 3-bit combinations used by AccECN to 4-bit combinations, rather | 3-bit combinations used by AccECN to 4-bit combinations, rather | |||
| than burning one bit for just one state); a non-zero urgent | than burning one bit for just one state); a non-zero urgent | |||
| pointer in combination with the URG flag cleared; or some other | pointer in combination with the URG flag cleared; or some other | |||
| unexpected combination of fields yet to be invented. | unexpected combination of fields yet to be invented. | |||
| Acknowledgements | Acknowledgements | |||
| We want to thank Koen De Schepper, Praveen Balasubramanian, Michael | We want to thank Koen De Schepper, Praveen Balasubramanian, Michael | |||
| Welzl, Gorry Fairhurst, David Black, Spencer Dawkins, Michael Scharf, | Welzl, Gorry Fairhurst, David Black, Spencer Dawkins, Michael Scharf, | |||
| End of changes. 136 change blocks. | ||||
| 266 lines changed or deleted | 277 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||