Skip to content

Latest commit

 

History

History
280 lines (238 loc) · 11.4 KB

quicperf.md

File metadata and controls

280 lines (238 loc) · 11.4 KB

Performance testing with picoquicdemo

The picoquicdemo program supports multiple applications, one of which is "quic perf" defined by Nick Banks in this draft. To quote from the draft, The QUIC performance protocol provides a simple, general-purpose protocol for testing the performance characteristics of a QUIC implementation.

The original Quic Perf protocol was very simple. The client opens QUIC connection with the LPN set to "perf", and then it opens bidirectional streams. The first 8 bytes sent by the client on each stream encode the size of the data that the server will send on the return stream. This can be used to measure batch performance, simply requesting a large amount of data and measuring how long it takes to get the result. It can also be used to measure transactional applications: open a large number of streams, require a small amount of data on each, and measure how long it takes to process that many query-reponse exchanges.

Recently, we extended this simple protocol to also test "real time" workloads, such as would be generated by "media over QUIC" (MoQ). In the extension, we reserve two specific 4 bytes value:

  • if the first 4 bytes are set to 0xFFFFFFFE, the next 12 bytes instruct the server to respond with by sending on the retrun stream a series of "media frames" of specified length and frequency.
  • if the first 4 bytes are set to 0xFFFFFFFDE, the next 12 bytes instruct the server to respond with by sending a series of datagrams with a specified length and frequency.

By opening multiple bidirectional streams, the client can create load patterns that emulate the traffic of an audio/video server.

Support in picoquicdemo server

By default, the picoquic demo server supports the QUIC "perf" protocol. Simply set a connection with ALPN "perf", and have the client open bidirectional streams for classic batch requests, or for media streams, or for datagrams.

Support in picoquicdemo client

The picoquic demo client can be directed to use the QUIC perf protocol and request batch streams, media streams or datagram streams. The "performance" scenario needs to be specified on the command line, as in:

.\picoquicdemo -a perf test.privateoctopus.com 4433 <scenario description>

Where -a perf means set ALPN to "perf", and use the quicperf protocol.

The scenario description is composed of a series of stream descriptions, separated by semicolons:

scenario = stream_description |  stream_description ';' *scenario

Each stream description contains an order set of parameters, specifying the details of what is expected:

  • Alphanumerical identifier of the stream,
  • identifier of the "previous stream",
  • repeat count,
  • type of stream and frequency,
  • for batch streams:
    • post size (bytes sent by the client) and
    • response size (bytes sent by the server)
  • For media or datagram streams:
    • frequency, i.e., number of frames per second
    • priority,
    • number of frames,
    • frame size
    • number of frames per group (not used in datagram streams)
    • size of first frame in the group (not used for datagrams)
    • reset delay in milliseconds

Many of these fields are optional:

  • if there is no stream identifier, the reports use one constructed from the rank of the stream description in the scenario, e.g., "#3"
  • if a previous stream is specified, the stream will start after the completion of that previous stream. If none is specified, the stream will start immediately.
  • if a repeat count is specified, the client will try to initiate as many copies of the stream in parallel. If not, just one stream.
  • if the priority is not specified, the default value for picoquic will be used.
  • if the number of frames is not specified, there will be just one frame.
  • if the number of frames per group is not specified, there will be just one group.
  • if the size of the first frame is not specified, all frames will have the same size.
  • if the reset delay is specified, a group will be abandonned if it falls behind by that delay.

The formal syntax is:

scenario = stream_description |  stream_description ';' *stream_description

stream_description =
     [ '=' id [':' previous-stream-id ':' ]]['*' repeat_count ':']
     { batch_stream_ | media_stream | datagram_stream }

id = alphanumeric-string | '-'

previous-stream-id = alphanumeric-string

batch_stream = post_size ':' response_size

media_stream = 'm' media_description

datagram_stream = 'd' media_description

media_description = frequency  ':' [ 'n' nb_frames ':' ] frame_size ':' 
              [ group_description ':' ] [ first_frame ':'] [ reset_delay ':' ]

group_description = 'G' frames_per_group

first_frame = ['I' first_frame_size ]

reset_delay = ['D' reset_delay_in_ms ]

Examples of scenarios could be:

batch_scenario = "=b1:*1:397:1000000;"
datagram_scenario = "=a1:d50:n250:100;"
media_scenario = "=v1:s30:n150:2000:G30:I20000;"
multimedia_scenario = "=a1:d50:p2:S:n250:80; \
     = vlow: s30 :p4:S:n150 : 3750 : G30 : I37500; \
     = vmid: s30 :p6:S:n150 : 6250 : G30 : I62500 : D250000; \
     = vhi: s30 :p8:S: n150 : 12500 : G150 : I125000 : D250000;"
parallel_multimedia_scenario= "=a1:d50:p2:S:n250:80; \
     = vlow:*3:s30 :p4:S:n150 : 3750 : G30 : I37500; \
     = vmid:*3:s30 :p6:S:n150 : 6250 : G30 : I62500 : D300000; \
     = vhi:*3 : s30 :p8:S: n150 : 12500 : G150 : I125000 : D250000;"

To run the "perf" protocol and run a basic test scenario, do:

.\picoquicdemo -a perf test.privateoctopus.com 4433 "*1:397:5000000;"

When used as a client, the program will display statistics, e.g.:

Connection_duration_sec: 4.425348
Nb_transactions: 10000
Upload_bytes: 1000000
Download_bytes: 1000000
TPS: 2259.709293
Upload_Mbps: 1.807767
Download_Mbps: 1.807767

For more detailed statistics, or for gathering statistics on servers, picoquicdemo can provide performance logs, see {{performance logs}}.

There are lots of other arguments in picoquicdemo, but you probably don't need them for running quicperf, although you may consider collecting quic logs using the -q option when debugging. Also, the "-h" option will produce a list of command line arguments.

    .\picoquicdemo -h

Performance logs

When doing performance measurements, the natural instinct is to turn off all logging, because writing logs slows down the program execution. On the other hand, it is very useful to have at least some logging, in order to understand what changes from run to run, and what might affected performance. The performance logs are designed to minimize the interference. The data is written to disk at the end of the connection. If the performance test involves multiple simultaneous connections, the server will keep the data in memory and write it to disk when all connections are complete.

To produce the performance logs with picoquicdemo, use the argument -F as in:

.\picoquicdemo -k key.pem -c cert.pem -p 4433 -F server_log.csv
.\picoquicdemo -q client_log.csv -a perf test.privateoctopus.com 4433 "*1:397:5000000;"

The performance logs are formatted as CSV file, with the following columns:

  • Log_v: Performance log version
  • PQ_v: Picoquic version
  • Duration: Time from start to finish, seconds
  • Sent: Number of bytes sent
  • Received: Number of bytes received
  • Mpbs_S: Sending rate, Mbps
  • Mbps_R: Receive rate, Mbps
  • QUIC_v: QUIC version
  • ALPN: ALPN
  • CNX_ID: Initial connection ID (same for client and server)
  • T64: Start time, in 64 bit format, in microseconds
  • is_client: 1 if client, 0 if server
  • pkt_recv: Number of packets received
  • trains_s: Number of packet trains sent
  • t_short: Number of packet trains shorter than target
  • tb_cwin: Number of packet trains shorter because of CWIN
  • tb_pacing: Number of packet trains shorter because of pacing
  • tb_others: Number of packet trains shorter for other reasons
  • pkt_sent: Number of packets sent
  • retrans.: Number of packets retransmitted
  • spurious: Number of spurious retransmissions
  • delayed_ack_option: 1 if delayed ack negotiated, 0 otherwise
  • min_ack_delay_remote: Minimum ack delay set by peer (microsecond)
  • max_ack_delay_remote: Maximum ack delay set by peer (microsecond)
  • max_ack_gap_remote: Maximum ack gap set by peer
  • min_ack_delay_local: Minimum ack delay required from peer (microsecond)
  • max_ack_delay_local: Maximum ack delay required from peer (microsecond)
  • max_ack_gap_local: Maximum ack gap required from peer
  • max_mtu_sent: Maximum sender MTU
  • max_mtu_received: Largest packet received
  • zero_rtt: 1 if zero rtt was negotiated
  • srtt: Smoothed RTT at end of connection
  • minrtt: Min RTT at end of connection
  • cwin: Largest CWIN during connection
  • ccalgo: Congestion control algorithm
  • bwe_max: Largest bandwidth estimate (bytes per second)
  • p_quantum: Largest pacing quantum
  • p_rate: Largest pacing rate (bytes per second)

Extended PERF protocol

The standard Perf protocol uses bidirectional streams in a very simple way: the client opens a stream and starts sending data; the server reads the number of required bytes in the first 8 bytes of the client stream, and sends that many bytes to the client. We extend this protocol by using unidirectional streams and datagrams.

The extended Perf protocol also uses bidirectional streams. The first 16 bytes sent by the client encode the type of response expected by the sender. The first 8 bytes use reserved values to differentiate these streams from the standard "batch" stream:

  • The most significant 32 bits contain the value 0xFFFFFFFD to indicate a "media" request, or 0xFFFFFFFE to indicate a datagram request.
  • The lower 32 bits contain the size of the frames.

The complete set of 16 bytes is defined as:

media request header {
     media or datagram mark (32),
     frame size (32),
     priority (8),
     frequency (8),
     number of frames (24),
     first frame size (24)
}

Upon receiving a request header, the server will start sending frames as specified by the frequency. If the client requested datagrams, the server will send datagrams as specified by the frequency. The first datagram (frame number 0) will be sent immediately. The other datagrams will be sent at:

datagram_send_time = first_datagram_send_time + frame_number*1_second/frequency

Each datagram will carry a header and a payload, with a combined size set to the requested frame size. (The first frame size parameter is ignored for datagrams.) The first bytes of the datagram contain a header encoded as:

datagram header {
    request stream ID (i),
    frame number (i),
    datagram send time (64)
}

The datagram send time is the local time at the server, encoded in microseconds. When all datagrams have been sent, the server closes the media request stream.

If the client requested a "media" stream, the server will send the requested number of frames on the return side of the bilateral stream that carried the client request. The first frame contains "first frame size" bytes, while the other frames contain "frame size" bytes. The first frame is queued on the stream immediately. The next frames will be queued at:

frame_send_time = first_frame_send_time + frame_number*1_second/frequency

The first 8 bytes of each frame carry the frame_send_time, set at the local time at which the server queued the frame, expressed in microseconds and encoded on 64 bits.

The client may issue a stop sending request for a specific media request stream. Upon receiving the request, the server will reset the stream, without sending any additional frame.