(D)CUBIC, NewReno and Prague refactoring #1818

hfstco · 2025-01-14T17:01:02Z

WORK IN PROGRESS

Refactor of CC code. I moved many common parts to cc_common, but I like to keep the code readable and understandable, as well as what the CC algorithm is doing, by just looking at its code. So, some parts stay in the CC code file even if some code duplicates another CC algorithm. (avoiding general picoquic_cc_do_congestion_avoidance(cc_algo, cc_param) functions)

Additionally, I try to change the CWIN (and other path variables) inside the CC code file. Most of the common CC functions return new values or increase them by values.

While doing some coverage runs, I recognized that CUBIC will not enter the recovery phase anyway. (_ alg_state _ isn't set to _ picoquic_cubic_alg_recovery _ at any time.) Even if we call enter_recovery(), we immediately transition to SS or CA. Of course, recovery_sequence is set. Is this intended, or can we remove the recovery state from the enum? Or set it to recovery while the recovery_sequence is set.

Changes:

cubic/dcubic notification callback refactor
dcubic calls cubic notification callback except for delay variations
CWND increase functions (build on each other) for all CC algorithms ((D)CUBIC, NewReno and Prague) in cc_common
app limited for all CC algorithms (as part of CWND increase functions, except NewReno currently)
moved some CC common functions (e.g. update_bandwidth()) to cc_common
moved common parts of congestion avoidance to cc_common
loss filter for Prague
introduced HyStart++ constants
added some test cases to improve code coverage

TODO:

changes which require lots of changes in the test suite (memlog keylog_test packet_trace ready_to_send ready_to_skip ready_to_zfin ready_to_zero pacing_update quality_update multipath_callback multipath_quality multipath_stream_af)
- loss filter in NewReno
- app limited for NewReno
different betas for (D)CUBIC/calc of beta like in Prague
(more test cases to improve coverage)

Any feedback, changes, and help would be appreciated.

…atellite dcubic test, unused variables fix,

# Conflicts: # picoquictest/edge_cases.c

…to cc-cleanup

…imited, cc_slow_start_increase pass through

huitema · 2025-01-14T17:23:51Z

Wow! That a lot of work. I will review later, but this is impressive.

Regarding recovery: yes, that's a bug. We should make sure that recovery is properly entered for cubic, etc. But I share your concern about trying to change too much in a single PR. It is probably better to first complete the code cleanup that you are doing while keeping test results unchanged, then do a series of smaller PRs each fixing an individual issue. That way, we can separate refactor, which should not change test results, and individual improvements, for which we can verify that the test results change in the expected ways.

huitema

I left a series of comments, mostly asking for a few comments in the code in places when we will need further actions, plus a suggested name change for one of the cc functions.

I see that all tests are passing, with just a few requiring minor adjustment of the completion time target. For me, that's the key. If you address the comments below, we can probably merge that now. I would not try to solve all problems in this one large PR, because I am concerned about potential merge conflicts if we wait much longer.

We should then open specialized PR. First, one PR to add support for recovery. Probably another to verify the "app limited" detection and reaction. And another for DCubic, with additional tests to simulate competition on the same link between Cubic (plain) and Dcubic, and investigate potential issues.

huitema · 2025-01-14T20:56:53Z

picoquic/cc_common.c

+    /* Otherwise, return current cwin. */
+    return path_x->cwin;
+}
+


This function is not estimating bandwidth. It is estimating a value of cwin, similar to the "target cwin" used in BBR. Maybe rename to picoquic_cc_target_cwin_estimation or some such?

That's true.
I have used the word "increase" for the functions which only return the portion of increased bytes and "update" for functions which return the CWIN.
Is picoquic_cc_update_target_cwin_estimation an option?

huitema · 2025-01-14T20:58:05Z

picoquic/cc_common.c

+        /* In consecutive Slow Start. */
+        return picoquic_cc_slow_start_increase(path_x, nb_delivered / PICOQUIC_HYSTART_PP_CSS_GROWTH_DIVISOR);
+    }
+


Is this meant for hystart++ support?

Yes, it is.
I copied a part of my HyStart++ macros/constants to the code for CSS version of the increase function.
So the current code defines the recommended growth divisor (=4, as specified by the HyStart++ RFC) as a macro.
Another option is to change the in_css parameter to a double to specify the divisor from the outside of the function. (more flexible)

My next paper will be about HyStart++. I will share my results here. But I can't tell any date, because I'm very busy with careful resume currently.

huitema · 2025-01-14T21:04:43Z

picoquic/cc_common.c

@@ -203,7 +281,7 @@ uint64_t picoquic_cc_increased_window(picoquic_cnx_t* cnx, uint64_t previous_win
    else {
        double w = (double)previous_window;
        w /= (double)PICOQUIC_TARGET_RENO_RTT;
-        w *= (cnx->path[0]->rtt_min > PICOQUIC_TARGET_SATELLITE_RTT)? PICOQUIC_TARGET_SATELLITE_RTT: cnx->path[0]->rtt_min;
+        w *= (cnx->path[0]->rtt_min > PICOQUIC_TARGET_SATELLITE_RTT) ? PICOQUIC_TARGET_SATELLITE_RTT : (double)cnx->path[0]->rtt_min;
        new_window = (uint64_t)w;


You copied the existing code, and that's fine. But it is a bit funky. It is means to open the window faster when the RTT is long, but since that we came out with "careful resume" that solves a large part of the problem when a previous bandwidth estimate is available, and then with increases based on peak bandwidth estimate, which also provides for rapid growth of cwin in these high delay links. So maybe we should add a note that this code should be revisited and perhaps removed. If we keep it, it would be nice to use integer arithmetic instead of (double).

But let's keep the code as is for now. Maybe add a comment.

In our careful resume (CR) implementation this part of the code is already removed. :)
https://github.com/hfstco/picoquic/tree/cr

We submitted a paper about CR and I think it will be published in March. It focuses on CR and first measurements on satellite connections. (GEO, LEO, ...) I want it to share it, but I think the IEEE copyright forbids me to. Maybe I can share a pre-submitted version of it. I have to take a look.

But yes, these part of code can be removed, but currently don't have to.

huitema · 2025-01-14T21:18:12Z

picoquic/cubic.c

+             *
+             * Isn't win_cubic always larger than cubic_state->W_reno?
+             * If clause is unnecessary here.
+             */
            /* Pick the largest */


Let's review that closely. The Cubic specification assumes a set of "dimensions", such as times in seconds and windows expressed in packets. The picoquic code expects CWIN to be expressed in bytes, with a typical target being "Cwin = bandwith * RTT", with bandwidth in bytes per second and RTT in seconds. Actually using ""Cwin = bandwidth * RTTus / 1000000" since times are measured in microseconds. I think that w_reno and w_cubic are both computed in bytes, and that the test here just follows the spec. I would expect w_reno to be larger than w_cubic when the RTT is small.

Ok, I wrote this comment early in December. I don't can clearly remember and have to check that again.

huitema · 2025-01-14T21:21:39Z

picoquic/cubic.c

        cubic_state->W_reno = W_cubic * (double)path_x->send_mtu;
-        cubic_state->ssthresh = (uint64_t)(cubic_state->W_max * cubic_state->beta * (double)path_x->send_mtu);
+        cubic_state->ssthresh = (uint64_t)(cubic_state->W_max * PICOQUIC_CUBIC_BETA * (double)path_x->send_mtu);
        path_x->cwin = (uint64_t)cubic_state->W_reno;
    }
 }



Not a request to change the code, but a comment. If we implement "recovery", the spurious detection could be tied to exiting recovery: either go back to avoidance through "correct spurious", or switch to avoidance with the current parameters if a packet sent after entering recovery is acked.

Ok, in this case it would be good to save the used beta of the CWIN reduction into the CC state to recover the correct values.

huitema · 2025-01-14T21:43:52Z

picoquictest/edge_cases.c

@@ -1502,6 +1502,8 @@ int initial_pto_srv_test()
    int has_initial;
    int has_handshake;
    uint64_t simulated_time = 0;
+    //uint64_t simulated_rtt = 20000;
+    //uint64_t simulated_pto = 4 * simulated_rtt;


I am curious. These two lines are not present in my copy of the code. Is this a code synchronization issue? We could have a merge conflict here, but the fix of the conflict is obvious.

Maybe. My IDE told me that these values are not used. That's why I have commented them out. Then I will remove them completely.

I suggest you just remove them. Otherwise, this will be flagged as a merged conflict.

huitema · 2025-01-14T21:45:23Z

picoquictest/l4s_test.c

@@ -130,7 +130,8 @@ int l4s_prague_test()
 {
    picoquic_congestion_algorithm_t* ccalgo = picoquic_prague_algorithm;

-    int ret = l4s_congestion_test(ccalgo, 1, 3500000, 7, 1500, 0, NULL);
+    /* TODO increased max_completion_time for 100ms, because of "app limited" changes */
+    int ret = l4s_congestion_test(ccalgo, 1, 3600000, 7, 1500, 0, NULL);



That increase is probably OK.

Yes, app limited slightly increase the max_completion_time of some test cases. To be honest, I haven't checked in which CC era app limited hits.

Let's open an issue about testing of app limited. We should do a set of connection trials, some app limited and some not. Then monitor the "app limited" signals, to verify that it matches the state of the application. The issue will capture that, and then we can do a separate PR to develop the tests and correct whatever bugs surface.

huitema · 2025-01-14T21:46:23Z

picoquictest/h3zerotest.c

    return demo_server_test(PICOHTTP_ALPN_HQ_LATEST, picoquic_h09_server_callback, NULL, satellite_test_scenario, nb_satellite_test_scenario, 
-        demo_test_stream_length, 1, 0, 10750000, 0, NULL, NULL, NULL, 0);
+        demo_test_stream_length, 1, 0, 11000000, 0, NULL, NULL, NULL, 0);
 }


That's not a dramatic increase, but it would be nice to understand what triggered it. Just like for the h3zero test.

Yes, as mentioned above, I will add a simple printf which tells me the timestamp app limited hits. So we can find out if app limited hits at the beginning to the connection/SS and prevents CWIN to grow up more quickly or is spread over the whole connection. I will investigate.

That's why I'm careful with the changes of the test suite and didn't include app limited for NewReno. I think if we understand what app limited changes in these test cases we can fix the other failing test cases too.

If you can do it simply, yes do that. But see previous comment about opening an issue for improved testing of app limited, which we will need to do anyhow.

huitema · 2025-01-14T21:50:24Z

picoquictest/satellite_test.c

-    return satellite_test_one(picoquic_bbr_algorithm, 100000000, 5300000, 250, 3, 0, 0, 0, 0, 0, 0);
+    /* TODO test changed, app limited, verify. */
+    /* return satellite_test_one(picoquic_bbr_algorithm, 100000000, 5300000, 250, 3, 0, 0, 0, 0, 0, 0); */
+    return satellite_test_one(picoquic_bbr_algorithm, 100000000, 5500000, 250, 3, 0, 0, 0, 0, 0, 0);
 }


That increase is probably OK. The 5.5 second time largely passes the "less than 7 seconds" bar. See previous comment about funky code for increasing CWIN faster with long delays. But the "app limited" comment is surprising. The satellite tests are not supposed to be app limited, ever. We probably need to open an issue to revisit the whole "app limited" detection.

huitema · 2025-01-14T21:51:47Z

picoquictest/satellite_test.c

+    /* TODO check max_completion_time */
+    return satellite_test_one(picoquic_prague_algorithm, 100000000, 5300000, 250, 3, 0, 0, 0, 1, 0, 0);
+}
+


Wow. Both the dcubic and prague values appear better than BBR! And yes, adding these tests is a good idea.

I have added them to cover the seed_cwin parts of prague and dcubic. Maybe there are other test cases which are better suited to cover these parts of code.
Generally, my goal is to reach more than 90% code coverage.

Most of the uncovered lines in (D)CUBIC are part of the "recovery alg_state not really set/entered"-problem.

First, we do not need to worry too much about test coverage of the tests themselves. By definition, the test code will include branches for potential errors in the main code; if the main code works well, these branches will not be used.

Second, my goal in the previous coverage PR was to get to at least 90% lines and 75% branches. I am not sure that the version on which your PR is based includes the results of that coverage effort. That's one of the reasons why I would like you to encourage to merge that PR sooner rather than later, and open separate issues for the areas that need further work.

hfstco added 13 commits December 3, 2024 22:43

several cc improvements, improved coverage, cubic.c refactor, added s…

ac104fe

…atellite dcubic test, unused variables fix,

Merge remote-tracking branch 'private_octopus/master' into cc-cleanup

7158cf7

# Conflicts: # picoquictest/edge_cases.c

added comments to discuss

681d58c

added some tests, new hystart increase functions

248ead7

Merge remote-tracking branch 'refs/remotes/private_octopus/master' in…

ca63932

…to cc-cleanup

added some tests, new hystart increase functions

d0654c8

cubic constants, merged cubic and dcubic, prague app limited

b113a71

rename cc functions

5777c8e

Merge remote-tracking branch 'private_octopus/master' into cc-cleanup

306f6b1

cc functions return by value, cubic switch cleanup, implemented app l…

393808f

…imited, cc_slow_start_increase pass through

just some cleanup

f5598ef

Merge branch 'private-octopus:master' into cc-cleanup

31a4db6

prague loss filter, just some cleanup

0d8a423

hfstco linked an issue Jan 14, 2025 that may be closed by this pull request

Refactor Cubic, D-Cubic and Prague code #1794

Open

picohttp_ct fixes

727ce7d

hfstco changed the title ~~(D)CUBIC, NewReno and Prague refactor~~ (D)CUBIC, NewReno and Prague refactoring Jan 14, 2025

hfstco requested a review from huitema January 14, 2025 17:44

huitema reviewed Jan 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(D)CUBIC, NewReno and Prague refactoring #1818

(D)CUBIC, NewReno and Prague refactoring #1818

hfstco commented Jan 14, 2025 •

edited

Loading

huitema commented Jan 14, 2025

huitema left a comment

huitema Jan 14, 2025

hfstco Jan 15, 2025

huitema Jan 14, 2025

hfstco Jan 15, 2025

hfstco Jan 15, 2025

huitema Jan 14, 2025

hfstco Jan 15, 2025

huitema Jan 14, 2025

hfstco Jan 15, 2025

huitema Jan 14, 2025

hfstco Jan 15, 2025

huitema Jan 14, 2025

hfstco Jan 15, 2025

huitema Jan 16, 2025

huitema Jan 14, 2025

hfstco Jan 15, 2025

huitema Jan 16, 2025

huitema Jan 14, 2025

hfstco Jan 15, 2025

hfstco Jan 15, 2025 •

edited

Loading

huitema Jan 16, 2025

huitema Jan 14, 2025

hfstco Jan 15, 2025

huitema Jan 14, 2025

hfstco Jan 15, 2025

huitema Jan 16, 2025

(D)CUBIC, NewReno and Prague refactoring #1818

Are you sure you want to change the base?

(D)CUBIC, NewReno and Prague refactoring #1818

Conversation

hfstco commented Jan 14, 2025 • edited Loading

huitema commented Jan 14, 2025

huitema left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hfstco Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hfstco commented Jan 14, 2025 •

edited

Loading

hfstco Jan 15, 2025 •

edited

Loading