What Is RTCP? Functions, Packets, and Audio Sync

RTCP (Real-time Transport Control Protocol) is a network protocol that monitors the quality of audio and video streams during real-time communication. It works alongside its companion protocol, RTP (Real-time Transport Protocol), which carries the actual media data. While RTP handles delivering your voice or video, RTCP acts as the feedback channel, reporting on how well that delivery is going. You encounter RTCP every time you join a video call, stream a live broadcast, or use VoIP.

How RTCP Relates to RTP

RTP and RTCP are two halves of the same system, defined together in the same networking standard (RFC 3550). RTP carries data with real-time properties, like the audio from your microphone or the video from your camera. RTCP sits beside it, providing quality monitoring and minimal control functions. Neither protocol works well without the other: RTP delivers the media, and RTCP tells everyone involved how that media is arriving.

Both protocols run over UDP, the lightweight transport layer that prioritizes speed over guaranteed delivery. In a typical session, RTP and RTCP each use their own UDP port. For bidirectional streams like video calls, the recommended setup is “symmetric” mode, where a device sends and receives on the same port and IP address. This simplifies communication, especially when firewalls or network address translation are involved.

The Four Functions of RTCP

RTCP serves four distinct purposes in a media session:

  • Quality feedback. This is the primary function. RTCP packets report statistics like packet loss, delay, and jitter back to the sender. This lets the sender adapt in real time, for example by lowering video resolution when the network is congested.
  • Source identification. RTCP carries a persistent identifier called a CNAME (canonical name) for each media source. This allows participants and applications to track who is sending what, even if the underlying technical identifiers change mid-session.
  • Scaling control. Every participant in a session sends RTCP packets, which means the volume of control traffic could explode in large groups. To prevent this, each participant counts the total number of participants and automatically reduces how often it sends reports. The more people in the session, the less frequently each one reports.
  • Session control (optional). RTCP can carry minimal session information, like a participant’s display name for the user interface. This function is optional and kept lightweight by design.

RTCP Packet Types

RTCP communicates through several packet types, each carrying different information. These are typically bundled together into a “compound packet” before being sent.

Sender Reports and Receiver Reports

Sender Reports (SR) come from participants who are actively transmitting media. They include sending statistics like how many packets and bytes have been sent, along with timestamps that help synchronize streams. Receiver Reports (RR) come from participants who are only receiving. Both report types include reception quality data: how many packets were lost, how much jitter was observed, and the round-trip delay. Every compound RTCP packet must start with either an SR or RR, even if no data has been sent or received yet (in which case an empty RR is used). If a participant is receiving from more than 31 sources, additional RR packets are appended to cover the overflow.

Source Description Packets

Source Description (SDES) packets carry identifying information about each media source. Every compound RTCP packet must include an SDES packet containing at least the CNAME. Other details, like a human-readable name or email address, can be included optionally depending on what the application needs and how much bandwidth is available.

BYE Packets

A BYE packet signals that a participant is leaving the session. This allows other participants to update their internal tracking and adjust their RTCP reporting intervals accordingly.

How RTCP Enables Audio-Video Sync

One of RTCP’s most practical roles is keeping audio and video in sync during a video call or stream. Audio and video travel as separate RTP streams, each with its own set of timestamps. On their own, there is no way to align these two independent clocks. RTCP Sender Reports solve this by including two timestamps for each stream: the stream’s own RTP timestamp and a reference to a common wall clock (based on NTP, the standard internet time protocol). By comparing these paired timestamps from the audio and video streams, the receiver can calculate exactly how the two streams relate in time and play them back in sync. Without this mechanism, you would see lips moving out of time with the words you hear.

How RTCP Scales in Large Sessions

A video call between two people generates very little RTCP traffic. A live broadcast with thousands of viewers is a different story. If every participant sent reports at the same fixed rate, RTCP traffic would consume an ever-growing share of bandwidth as the group expanded.

RTCP handles this with a self-regulating algorithm. Each participant monitors incoming RTCP packets from others and uses that information to estimate the total number of participants. As that number grows, each participant spaces out its own reports further apart. The result is that total RTCP bandwidth stays roughly constant regardless of group size, typically capped at about 5% of the session’s total bandwidth. This is why RTCP works just as well in a two-person call as it does in a webinar with thousands of attendees.

Secure RTCP (SRTCP)

Standard RTCP packets travel unencrypted, which means anyone intercepting them could read the session metadata or tamper with the reports. Secure RTCP, or SRTCP, is the encrypted version defined alongside SRTP (Secure Real-time Transport Protocol) in RFC 3711. It adds three layers of protection: confidentiality (encrypting the packet contents), message authentication (verifying packets haven’t been altered), and replay protection (preventing old packets from being reinjected into a session).

SRTCP appends several fields to each packet, including an authentication tag and an index number used for replay detection. A single-bit flag (the E-flag) indicates whether a given packet’s payload is encrypted or sent in the clear, since encryption is optional on a per-packet basis. Message authentication, however, is mandatory. This ensures that even if you choose not to encrypt your RTCP data, no one can forge or modify the control reports that keep the session running correctly.

Where RTCP Is Used

RTCP operates behind the scenes in most real-time communication technologies. Video conferencing platforms like Zoom, Microsoft Teams, and Google Meet all rely on RTP/RTCP. VoIP phone systems use it to monitor call quality and detect degradation before it becomes noticeable. Live streaming setups, IP security cameras, and internet radio broadcasts all use RTCP to maintain quality and synchronization. WebRTC, the technology that powers browser-based video and voice communication, includes RTCP as a core component of its media stack. Whenever you experience a video call that automatically adjusts its quality based on your network conditions, RTCP feedback is driving that adaptation.