Basic VoIP Protocols

Learn AWS hacking from zero to hero with htARTE (HackTricks AWS Red Team Expert)!

Other ways to support HackTricks:

WhiteIntel is a dark-web fueled search engine that offers free functionalities to check if a company or its customers have been compromised by stealer malwares.

Their primary goal of WhiteIntel is to combat account takeovers and ransomware attacks resulting from information-stealing malware.

You can check their website and try their engine for free at:


Signaling Protocols

SIP (Session Initiation Protocol)

This is the industry standard, for more information check:

pageSIP (Session Initiation Protocol)

MGCP (Media Gateway Control Protocol)

MGCP (Media Gateway Control Protocol) is a signaling and call control protocol outlined in RFC 3435. It operates in a centralized architecture, which consists of three main components:

  1. Call Agent or Media Gateway Controller (MGC): The master gateway in the MGCP architecture is responsible for managing and controlling the media gateways. It handles call setup, modification, and termination processes. The MGC communicates with the media gateways using the MGCP protocol.

  2. Media Gateways (MGs) or Slave Gateways: These devices convert digital media streams between different networks, such as traditional circuit-switched telephony and packet-switched IP networks. They are managed by the MGC and execute commands received from it. Media gateways may include functions like transcoding, packetization, and echo cancellation.

  3. Signaling Gateways (SGs): These gateways are responsible for converting signaling messages between different networks, enabling seamless communication between traditional telephony systems (e.g., SS7) and IP-based networks (e.g., SIP or H.323). Signaling gateways are crucial for interoperability and ensuring that call control information is properly communicated between the different networks.

In summary, MGCP centralizes the call control logic in the call agent, which simplifies the management of media and signaling gateways, providing better scalability, reliability, and efficiency in telecommunication networks.

SCCP (Skinny Client Control Protocol)

Skinny Client Control Protocol (SCCP) is a proprietary signaling and call control protocol owned by Cisco Systems. It is primarily used for communication between Cisco Unified Communications Manager (formerly known as CallManager) and Cisco IP phones or other Cisco voice and video endpoints.

SCCP is a lightweight protocol that simplifies the communication between the call control server and the endpoint devices. It is referred to as "Skinny" because of its minimalistic design and reduced bandwidth requirements compared to other VoIP protocols like H.323 or SIP.

The main components of an SCCP-based system are:

  1. Call Control Server: This server, typically a Cisco Unified Communications Manager, manages the call setup, modification, and termination processes, as well as other telephony features such as call forwarding, call transfer, and call hold.

  2. SCCP Endpoints: These are devices such as IP phones, video conferencing units, or other Cisco voice and video endpoints that use SCCP to communicate with the call control server. They register with the server, send and receive signaling messages, and follow the instructions provided by the call control server for call handling.

  3. Gateways: These devices, such as voice gateways or media gateways, are responsible for converting media streams between different networks, like traditional circuit-switched telephony and packet-switched IP networks. They may also include additional functionality, such as transcoding or echo cancellation.

SCCP offers a simple and efficient communication method between Cisco call control servers and endpoint devices. However, it is worth noting that SCCP is a proprietary protocol, which can limit interoperability with non-Cisco systems. In such cases, other standard VoIP protocols like SIP may be more suitable.

H.323

H.323 is a suite of protocols for multimedia communication, including voice, video, and data conferencing over packet-switched networks, such as IP-based networks. It was developed by the International Telecommunication Union (ITU-T) and provides a comprehensive framework for managing multimedia communication sessions.

Some key components of the H.323 suite include:

  1. Terminals: These are endpoint devices, such as IP phones, video conferencing systems, or software applications, that support H.323 and can participate in multimedia communication sessions.

  2. Gateways: These devices convert media streams between different networks, like traditional circuit-switched telephony and packet-switched IP networks, enabling interoperability between H.323 and other communication systems. They may also include additional functionality, such as transcoding or echo cancellation.

  3. Gatekeepers: These are optional components that provide call control and management services in an H.323 network. They perform functions such as address translation, bandwidth management, and admission control, helping to manage and optimize network resources.

  4. Multipoint Control Units (MCUs): These devices facilitate multipoint conferences by managing and mixing media streams from multiple endpoints. MCUs enable features such as video layout control, voice-activated switching, and continuous presence, making it possible to host large-scale conferences with multiple participants.

H.323 supports a range of audio and video codecs, as well as other supplementary services like call forwarding, call transfer, call hold, and call waiting. Despite its widespread adoption in the early days of VoIP, H.323 has been gradually replaced by more modern and flexible protocols like the Session Initiation Protocol (SIP), which offers better interoperability and easier implementation. However, H.323 remains in use in many legacy systems and continues to be supported by various equipment vendors.

IAX (Inter Asterisk eXchange)

IAX (Inter-Asterisk eXchange) is a signaling and call control protocol primarily used for communication between Asterisk PBX (Private Branch Exchange) servers and other VoIP devices. It was developed by Mark Spencer, the creator of the Asterisk open-source PBX software, as an alternative to other VoIP protocols like SIP and H.323.

IAX is known for its simplicity, efficiency, and ease of implementation. Some key features of IAX include:

  1. Single UDP Port: IAX uses a single UDP port (4569) for both signaling and media traffic, which simplifies firewall and NAT traversal, making it easier to deploy in various network environments.

  2. Binary Protocol: Unlike text-based protocols like SIP, IAX is a binary protocol, which reduces its bandwidth consumption and makes it more efficient for transmitting signaling and media data.

  3. Trunking: IAX supports trunking, which allows multiple calls to be combined into a single network connection, reducing overhead and improving bandwidth utilization.

  4. Native Encryption: IAX has built-in support for encryption, using methods like RSA for key exchange and AES for media encryption, providing secure communication between endpoints.

  5. Peer-to-Peer Communication: IAX can be used for direct communication between endpoints without the need for a central server, enabling simpler and more efficient call routing.

Despite its benefits, IAX has some limitations, such as its primary focus on the Asterisk ecosystem and less widespread adoption compared to more established protocols like SIP. As a result, IAX might not be the best choice for interoperability with non-Asterisk systems or devices. However, for those working within the Asterisk environment, IAX offers a robust and efficient solution for VoIP communication.

Transmission & Transport Protocols

SDP (Session Description Protocol)

SDP (Session Description Protocol) is a text-based format used to describe the characteristics of multimedia sessions, such as voice, video, or data conferencing, over IP networks. It was developed by the Internet Engineering Task Force (IETF) and is defined in RFC 4566. SDP does not handle the actual media transmission or session establishment but is used in conjunction with other signaling protocols, like SIP (Session Initiation Protocol), to negotiate and exchange information about the media streams and their attributes.

Some key elements of SDP include:

  1. Session Information: SDP describes the details of a multimedia session, including session name, session description, start time, and end time.

  2. Media Streams: SDP defines the characteristics of media streams, such as the media type (audio, video, or text), transport protocol (e.g., RTP or SRTP), and the media format (e.g., codec information).

  3. Connection Information: SDP provides information about the network address (IP address) and port number where the media should be sent or received.

  4. Attributes: SDP supports the use of attributes to provide additional, optional information about a session or media stream. Attributes can be used for specifying various features like encryption keys, bandwidth requirements, or media control mechanisms.

SDP is typically used in the following process:

  1. An initiating party creates an SDP description of the proposed multimedia session, including the details of the media streams and their attributes.

  2. The SDP description is sent to the receiving party, usually embedded within a signaling protocol message like SIP or RTSP.

  3. The receiving party processes the SDP description, and based on its capabilities, it may accept, reject, or modify the proposed session.

  4. The final SDP description is sent back to the initiating party as part of the signaling protocol message, completing the negotiation process.

SDP's simplicity and flexibility make it a widely adopted standard for describing multimedia sessions in various communication systems, playing a crucial role in establishing and managing real-time multimedia sessions over IP networks.

RTP / RTCP / SRTP / ZRTP

  1. RTP (Real-time Transport Protocol): RTP is a network protocol designed for the delivery of audio and video data, or other real-time media, over IP networks. Developed by the IETF and defined in RFC 3550, RTP is commonly used with signaling protocols like SIP and H.323 to enable multimedia communication. RTP provides mechanisms for synchronization, sequencing, and timestamping of media streams, helping to ensure smooth and timely media playback.

  2. RTCP (Real-time Transport Control Protocol): RTCP is a companion protocol to RTP, used for monitoring the quality of service (QoS) and providing feedback on the transmission of media streams. Defined in the same RFC 3550 as RTP, RTCP periodically exchanges control packets between participants in an RTP session. It shares information such as packet loss, jitter, and round-trip time, which helps in diagnosing and adapting to network conditions, improving overall media quality.

  3. SRTP (Secure Real-time Transport Protocol): SRTP is an extension of RTP that provides encryption, message authentication, and replay protection for media streams, ensuring secure transmission of sensitive audio and video data. Defined in RFC 3711, SRTP uses cryptographic algorithms like AES for encryption and HMAC-SHA1 for message authentication. SRTP is often used in combination with secure signaling protocols like SIP over TLS to provide end-to-end security in multimedia communication.

  4. ZRTP (Zimmermann Real-time Transport Protocol): ZRTP is a cryptographic key-agreement protocol that provides end-to-end encryption for RTP media streams. Developed by Phil Zimmermann, the creator of PGP, ZRTP is described in RFC 6189. Unlike SRTP, which relies on signaling protocols for key exchange, ZRTP is designed to work independently of the signaling protocol. It uses Diffie-Hellman key exchange to establish a shared secret between the communicating parties, without requiring prior trust or a public key infrastructure (PKI). ZRTP also includes features like Short Authentication Strings (SAS) to protect against man-in-the-middle attacks.

These protocols play essential roles in delivering and securing real-time multimedia communication over IP networks. While RTP and RTCP handle the actual media transmission and quality monitoring, SRTP and ZRTP ensure that the transmitted media is protected against eavesdropping, tampering, and replay attacks.

WhiteIntel is a dark-web fueled search engine that offers free functionalities to check if a company or its customers have been compromised by stealer malwares.

Their primary goal of WhiteIntel is to combat account takeovers and ransomware attacks resulting from information-stealing malware.

You can check their website and try their engine for free at:

Learn AWS hacking from zero to hero with htARTE (HackTricks AWS Red Team Expert)!

Other ways to support HackTricks:

Last updated