Every telemedicine video platform eventually confronts the same architectural fork: should media flow directly between participants, or should it route through a server?
It sounds like a purely technical question. It isn’t. The answer affects your compliance posture, your infrastructure costs, your call quality under degraded network conditions, and your ability to add features like recording, AI transcription, and multi-party consultations. Get it wrong, and you’ll be re-architecting under pressure, probably when you’re scaling fast and can least afford the distraction.
This article cuts through the noise. We’ll explain how P2P and SFU actually work, where each genuinely excels in telemedicine contexts, and how to make the call for your specific situation.
The Basics: What P2P and SFU Actually Mean
Peer-to-Peer (P2P)
In a P2P connection, media flows directly between the patient’s device and the practitioner’s device. WebRTC establishes this using ICE — the protocol that negotiates a network path, tries a direct connection first, and falls back to a TURN relay server if NAT traversal fails.
The server’s role in P2P is limited to signaling: exchanging the session description and ICE candidates that bootstrap the connection. Once the call is established, the server is out of the media path entirely.
This is how WhatsApp calls, FaceTime, and early WebRTC demos work. It’s the simplest possible architecture.
Selective Forwarding Unit (SFU)
An SFU is a media server that sits in the middle of every call. Each participant sends their media stream to the SFU, and the SFU forwards the relevant streams to each participant. The keyword is forwarding — a well-implemented SFU passes encrypted packets without decoding them, which preserves end-to-end encryption.
The SFU can also selectively forward: if a participant’s connection is slow, the SFU can forward a lower-resolution stream. If someone isn’t actively speaking, the SFU can deprioritize their stream. This intelligence is what makes SFUs worth their complexity for multi-party calls.
Popular open-source SFUs include mediasoup, Janus, Pion, and LiveKit. Managed SFU infrastructure is offered daily.co, Vonage, Agora, and others.
MCU: The Third Option (Usually Wrong for Telemedicine)
A Multipoint Control Unit goes further than an SFU — it decodes all incoming streams and composes them into a single outgoing stream per participant. This breaks end-to-end encryption by design, makes server costs scale with call complexity rather than participant count, and adds encoding latency. For telemedicine, MCU architecture is almost always the wrong choice, so we won’t dwell on it.
Where P2P Genuinely Wins
P2P isn’t an inferior fallback. For a well-defined subset of telemedicine use cases, it’s the better architecture — simpler, cheaper, and more private.
One-to-One Consultations With Standard Network Conditions
The core telemedicine model — one practitioner, one patient — is P2P’s natural habitat. Two participants mean no media routing complexity. The call is direct, latency is minimized, and there’s no media server infrastructure to provision, monitor, or pay for.
If your platform’s primary use case is one-to-one consultations and your patient population is predominantly on reliable broadband or strong mobile connections, P2P handles the load cleanly.
Maximum Privacy With Minimal Infrastructure Exposure
In P2P, the media never touches your servers. This is a meaningful security property. There’s no media server that could be misconfigured, breached, or subpoenaed. The attack surface is smaller by architecture, not by policy.
For practitioners dealing with particularly sensitive cases — psychiatric consultations, addiction treatment, sexual health — the argument that the media is physically absent from any server has genuine weight. Some practitioners and patients care about this distinction even when they can’t articulate the technical reason.
Compliance Simplicity in Regulated Markets
In markets where E2E encryption requirements are strict (KBV in Germany, certain HIPAA interpretations in the US), P2P sidesteps the need to document and certify that your media server doesn’t access decrypted content — because there is no media server. The compliance evidence is simpler to produce.
This advantage is conditional: you still need TURN server infrastructure for NAT traversal, and that TURN infrastructure needs to be in compliant regions. But the overall compliance surface is meaningfully smaller.
Cost Profile at Modest Scale
P2P infrastructure costs are dominated by TURN server usage — the relay fallback for connections that can’t go directly. In practice, roughly 15–20% of WebRTC connections require a TURN relay. TURN bandwidth is cheap.
At modest consultation volumes (under ~50,000 minutes per month), P2P infrastructure costs are negligible compared to managed SFU pricing. There’s no per-minute charge, no media processing cost, just signaling and TURN relay.

Where SFU Wins
Multi-Party Consultations
This is the clearest SFU advantage, and it’s decisive. In P2P, every participant sends their media stream to every other participant. Three participants means three upload streams per person. Four participants means everyone uploads four streams. The math breaks quickly.
With five or more participants, P2P is functionally unusable for most end-user devices and consumer network connections. An SFU solves this: every participant uploads one stream (to the SFU), and receives one or more streams back. Upload bandwidth requirements stay flat regardless of group size.
If your platform supports group consultations, multidisciplinary team reviews, family therapy sessions, or any format beyond two participants, you need SFU architecture for those sessions.
Degraded or Asymmetric Network Conditions
SFUs enable simulcast: a sender transmits multiple quality layers of the same stream (e.g., 720p, 360p, and 180p simultaneously), and the SFU delivers the appropriate layer to each receiver based on their available bandwidth.
This makes a material difference in telemedicine. Your patient in a rural area on a congested mobile connection gets a 180p stream that stays connected, while your practitioner on a stable broadband connection gets 720p. In P2P, the quality negotiation is bilateral and less granular — you’re adapting to the worst connection in the pair, not optimizing for each side independently.
For platforms serving patients in areas with variable connectivity — rural healthcare, developing markets, home-based care — SFU’s adaptive bitrate capabilities translate directly into consultation completion rates.
Recording, Transcription, and AI Features
This is where many telemedicine platforms hit an architectural ceiling they didn’t anticipate during initial build.
Recording in P2P requires client-side recording — one participant’s device captures the call. This is fragile (the recording stops if that participant’s connection drops), quality-limited (you’re recording a rendered stream, not the raw media), and legally complex (who controls the recording file?).
An SFU can integrate a recording bot that joins as a participant, or — in more sophisticated implementations — tap the media pipeline server-side without decoding. This gives you reliable, centrally managed recordings with clear audit trails.
The same logic applies to AI features: real-time transcription for clinical notes, automated appointment summaries, live interpretation, and speech analytics for mental health platforms. All of these require server-side access to media streams. P2P architecture makes them impossible or impractical.
If your roadmap includes any AI-powered clinical features in the next 18 months, SFU is the right foundation even if you don’t need it today.
Reliable Connection at Scale
In P2P, connection quality depends entirely on the network path between two specific endpoints. If a patient is behind a particularly restrictive corporate firewall or a symmetric NAT that even TURN struggles with, the call fails.
SFU connections are client-to-server rather than client-to-client. Your server infrastructure can be placed strategically — in data centers with high-quality network connectivity — and the connection quality from each participant’s side is independent. One participant’s bad network degrades their own stream; it doesn’t degrade the connection for everyone else.
The Hybrid Architecture: What Production Platforms Actually Use
The false choice is P2P or SFU for your entire platform. Production telemedicine systems typically use both, routing each session to the appropriate architecture based on session type and runtime conditions.
A practical hybrid approach:
- Default one-to-one consultations to P2P, fall back to SFU relay if connection quality drops below the threshold
- Route all multi-party sessions directly to SFU.
- Use SFU for any session where recording or AI features are active.
- Monitor RTCPeerConnection.getStats() in real time and trigger architecture switching when packet loss or jitter exceeds clinical quality thresholds.
This requires more sophisticated session management logic but gives you the cost and privacy advantages of P2P where appropriate, and the reliability and feature advantages of SFU where necessary.

Decision Framework
| Factor | Choose P2P | Choose SFU |
| Participant count | 1-to-1 only | 3+ participants in any session |
| Network reliability | Mostly broadband/strong mobile | Mixed or variable connectivity |
| Recording needed | No / client-side acceptable | Yes, centrally managed |
| AI/transcription roadmap | No plans | Yes, within 18 months |
| Compliance focus | Minimal server footprint valued | Adaptive quality more critical |
| Infrastructure cost priority | Minimize at modest scale | Predictable at high scale |
| Privacy sensitivity | Maximum (media never on servers) | Standard E2E encryption sufficient |
The Engineering Reality
Choosing P2P doesn’t mean the implementation is simple. Robust TURN fallback, ICE state handling, connection quality monitoring, and graceful degradation all require real engineering work. The architecture is simpler; the implementation still has edge cases.
Choosing SFU doesn’t mean outsourcing the complexity. SFU selection, configuration, scaling, and compliance documentation are non-trivial. Managed SFU providers reduce operational burden but introduce vendor dependency and per-minute costs that compound at scale.
The platforms we’ve seen make this decision well share one habit: they map their actual use cases and their 18-month product roadmap before choosing an architecture, rather than choosing based on what the tutorial used or what the previous engineer was familiar with.
How Trembit Approaches This Decision
When Trembit designs telemedicine video infrastructure, P2P vs. SFU is one of the first architectural conversations we have — and we rarely recommend one to the exclusion of the other. The right answer is almost always context-specific and often involves a hybrid approach that evolves with the platform.
We’ve built and rescued both architectures across telemedicine platforms operating under KBV, HIPAA, and GDPR requirements. If you’re making this decision now — for a new build, a migration, or a system that’s struggling at scale — we’re happy to think through it with you.
The conversation starts with your use cases, not our preferences.