VoIP Fundamentals • Updated May 17, 2026

RTP explained: Real-time Transport Protocol for VoIP calls

RTP (Real-time Transport Protocol) carries the actual voice audio between VoIP endpoints after SIP establishes the session. RTP packets contain digitised voice samples encoded with a codec and include sequence numbers and timestamps that allow the receiver to reconstruct audio in the correct order.

Audience: Network engineers and VoIP administrators troubleshooting call quality. This guide focuses on operational setup inside the CallOrbit platform.

Understand how VoIP calling works — SIP, PBX, codecs, trunking, DID numbers, STIR/SHAKEN, and the protocols behind business phone systems.

  • Separate signalling from media: SIP handles call setup and teardown. Once the session is negotiated, RTP takes over to transport the audio stream. This means SIP and RTP often traverse different network paths and ports, which affects firewall configuration.
  • Understand RTP packet structure: each packet contains a header with payload type (codec ID), sequence number (packet ordering), timestamp (playback timing), SSRC identifier (synchronisation source), and the encoded audio payload. Jitter buffers on the receiving end reorder packets for smooth playback.
  • Troubleshoot RTP issues with common symptoms: one-way audio (RTP not reaching both endpoints), robotic or distorted voices (packet loss exceeding 3-5 percent), echo (improper echo cancellation or acoustic coupling), and cut-in-and-out audio (jitter buffer underrun or overrun).
  • Secure RTP with SRTP (Secure RTP): standard RTP sends audio unencrypted, which means anyone on the network path can capture and replay call audio. SRTP adds AES encryption, message authentication, and replay protection without changing the underlying transport behaviour.
  • Configure firewall rules for RTP: SIP signalling typically uses port 5060 (UDP/TCP) while RTP media uses a dynamic range of UDP ports — commonly 10 000-20 000 or a narrower range specified by your provider. Firewalls must allow both SIP and RTP traffic for calls to complete.

Who this guide is for

Audience: Network engineers and VoIP administrators troubleshooting call quality.

Understand how VoIP calling works — SIP, PBX, codecs, trunking, DID numbers, STIR/SHAKEN, and the protocols behind business phone systems.

Use this guide when you want the setup to be correct the first time and easy for another admin, manager, or supervisor to verify later.

What this workflow helps you accomplish

RTP (Real-time Transport Protocol) carries the actual voice audio between VoIP endpoints after SIP establishes the session. RTP packets contain digitised voice samples encoded with a codec and include sequence numbers and timestamps that allow the receiver to reconstruct audio in the correct order.

This workflow matters because numbers, routing, access, and reporting in CallOrbit are connected. Skipping one setup detail usually creates avoidable support work later.

  • Step 1: Separate signalling from media: SIP handles call setup and teardown. Once the session is negotiated, RTP takes over to transport the audio stream. This means SIP and RTP often traverse different network paths and ports, which affects firewall configuration.
  • Step 2: Understand RTP packet structure: each packet contains a header with payload type (codec ID), sequence number (packet ordering), timestamp (playback timing), SSRC identifier (synchronisation source), and the encoded audio payload. Jitter buffers on the receiving end reorder packets for smooth playback.
  • Step 3: Troubleshoot RTP issues with common symptoms: one-way audio (RTP not reaching both endpoints), robotic or distorted voices (packet loss exceeding 3-5 percent), echo (improper echo cancellation or acoustic coupling), and cut-in-and-out audio (jitter buffer underrun or overrun).
  • Step 4: Secure RTP with SRTP (Secure RTP): standard RTP sends audio unencrypted, which means anyone on the network path can capture and replay call audio. SRTP adds AES encryption, message authentication, and replay protection without changing the underlying transport behaviour.
  • Step 5: Configure firewall rules for RTP: SIP signalling typically uses port 5060 (UDP/TCP) while RTP media uses a dynamic range of UDP ports — commonly 10 000-20 000 or a narrower range specified by your provider. Firewalls must allow both SIP and RTP traffic for calls to complete.

Setup checklist

  • Separate signalling from media: SIP handles call setup and teardown. Once the session is negotiated, RTP takes over to transport the audio stream. This means SIP and RTP often traverse different network paths and ports, which affects firewall configuration.
  • Understand RTP packet structure: each packet contains a header with payload type (codec ID), sequence number (packet ordering), timestamp (playback timing), SSRC identifier (synchronisation source), and the encoded audio payload. Jitter buffers on the receiving end reorder packets for smooth playback.
  • Troubleshoot RTP issues with common symptoms: one-way audio (RTP not reaching both endpoints), robotic or distorted voices (packet loss exceeding 3-5 percent), echo (improper echo cancellation or acoustic coupling), and cut-in-and-out audio (jitter buffer underrun or overrun).
  • Secure RTP with SRTP (Secure RTP): standard RTP sends audio unencrypted, which means anyone on the network path can capture and replay call audio. SRTP adds AES encryption, message authentication, and replay protection without changing the underlying transport behaviour.
  • Configure firewall rules for RTP: SIP signalling typically uses port 5060 (UDP/TCP) while RTP media uses a dynamic range of UDP ports — commonly 10 000-20 000 or a narrower range specified by your provider. Firewalls must allow both SIP and RTP traffic for calls to complete.

Operational follow-up

After you complete this flow, confirm the live experience from both the agent and customer side so ownership, routing, permissions, and reporting all match what the workspace expects.

If your team is rolling this out across multiple users, queues, or phone numbers, pair this article with the broader knowledge base and the relevant routing or numbers guides to keep deployment consistent.

  • What is the CallOrbit Knowledge Base for? — It is the public help hub for how CallOrbit works, covering numbers, webphone setup, SIP, extensions, routing, users, roles, and billing basics.
  • Can customers read this without signing in? — Yes. The Knowledge Base now lives on a public route so customers can read setup guidance before or after they enter the portal.
  • Does the portal still have its own Knowledge Base page? — No. The signed-in portal navigation no longer carries a separate Knowledge Base page, and the old portal path now redirects to this public version.
  • What is VoIP and how does it work? — VoIP (Voice over Internet Protocol) converts analogue voice signals into digital packets and transmits them over IP networks. Unlike traditional PSTN phone lines that require dedicated copper wiring per line, VoIP calls use your existing internet connection, which makes them cheaper, more flexible, and easier to scale.
  • What is SIP trunking? — SIP trunking is a virtual connection that replaces traditional analogue phone lines or PRI circuits. A SIP trunk carries multiple concurrent voice channels over a single IP connection to your PBX or phone system, eliminating per-line hardware costs and monthly line rental fees.
  • What is the difference between hosted PBX and cloud PBX? — Hosted PBX runs on dedicated virtual infrastructure managed by a provider, while cloud PBX uses shared multi-tenant cloud infrastructure. Hosted PBX suits organisations needing custom configuration and predictable pricing. Cloud PBX is better for instant scalability and per-user monthly billing.
  • What is a DID number? — A DID (Direct Inward Dialling) number is a virtual phone number that routes directly to a specific extension, IVR menu, queue, or user within a phone system without an operator. DIDs decouple the phone number from the physical phone line, so you can have hundreds of numbers routed through a single SIP trunk.
  • What are G.711, Opus, and G.729 codecs used for? — These are VoIP codecs that convert voice into digital data. G.711 uses 64 Kbps for toll-grade quality and is the PSTN standard. Opus uses 6-510 Kbps and adjusts to network conditions. G.729 uses 8 Kbps for bandwidth-constrained links. The right codec depends on your available bandwidth and call quality requirements.

Related CallOrbit guides