Fexyn
Fexyn
All posts

What is deep packet inspection and how VPNs defeat it

Fexyn Team··13 min read

Your ISP sees every packet you send. Most of the time, it only reads the header: source IP, destination IP, port number. That's enough for routing. Deep packet inspection goes further. It opens the packet and examines the payload itself.

The analogy people reach for is mail. A postal worker reads the address on an envelope to sort it. DPI is the postal worker opening the envelope, reading the letter, and deciding whether to deliver it based on what's inside. Except DPI operates at line speed on millions of packets per second, and the decision happens before the packet reaches its destination.

This is how governments block VPNs. Not by IP address (too easy to change), not by port number (any protocol can run on any port), but by looking at the traffic itself and identifying what protocol generated it.

What DPI systems actually do

A DPI appliance sits inline on the network path between you and the internet. Every packet passes through it. The appliance maintains flow state, reassembles TCP streams, and runs classification engines against the reassembled data.

Classification happens in layers. First, the system checks port numbers and basic protocol headers. Then it runs signature matching against known protocol fingerprints. Then, if the first two layers didn't produce a confident match, it feeds traffic features into statistical classifiers. Some systems also perform active probing: when they see suspicious traffic, they connect to the destination server themselves and try to provoke it into revealing what it's running.

This isn't theoretical. Cisco, Huawei, Sandvine, and Allot sell DPI boxes to ISPs and governments worldwide. Sandvine's PacketLogic was found in Egypt, Turkey, Libya, and Syria. Huawei's technology runs inside the Great Firewall of China. These are real products with real sales teams and real deployments.

Where DPI is deployed for censorship

The countries with the most aggressive DPI programs aren't hiding it. Some of them publish procurement documents.

China's Great Firewall is the oldest and most sophisticated system. It combines passive DPI (signature matching, statistical classification, machine learning) with active probing. When the GFW sees a connection it suspects might be a proxy, it launches its own connection to the same server, sending crafted payloads designed to elicit protocol-specific responses. Researchers have documented probes arriving from over 12,000 distinct IP addresses. In a four-month study, more than 50,000 active probes were recorded against a single server, with 25% of replay-based probes arriving within one second of the original connection.

Russia's TSPU (Technical Systems for Countering Threats) is newer but well-funded. Roskomnadzor, the agency responsible for internet regulation, has a budget of 2.27 billion rubles allocated through 2027 for AI-powered traffic analysis. TSPU boxes sit inside every licensed ISP in Russia. They're mandatory. The system can throttle or block specific protocols at the national level, and it does. Russia throttled Twitter in 2021 using TSPU. By mid-2024, TSPU was detecting and blocking WireGuard connections with near-100% accuracy.

Iran imports Chinese DPI technology and operates its own filtering infrastructure. During the June 2025 "stealth blackout," authorities didn't shut down the internet entirely. Instead, they used DPI to selectively block VPN protocols while leaving regular HTTPS browsing functional. This is the new playbook: targeted protocol blocking is cheaper and less politically costly than full shutdowns.

Indonesia takes a different approach. The government maps internet traffic to citizen identity numbers, linking DPI-classified flows to specific people. This isn't just censorship. It's surveillance with attribution.

Freedom House's 2025 report documented the 15th consecutive year of declining internet freedom globally. 57 of 72 countries studied had arrested people for online expression. DPI is the enabling technology behind most of these enforcement actions.

How DPI identifies VPN traffic

DPI uses three distinct methods to catch VPN connections. Each one targets a different weakness.

Method 1: Protocol fingerprinting

Every protocol has structural patterns in its packets. DPI systems maintain signature databases that match these patterns.

WireGuard is the most obvious example. Its handshake initiation message is always exactly 148 bytes. The first byte is 0x01 (message type), followed by three zero bytes, a 4-byte sender index, and a 32-byte unencrypted ephemeral public key. This structure never varies. A DPI rule that matches on packet size plus the first four bytes catches every WireGuard connection on the first packet.

OpenVPN is only slightly harder. A 2022 paper published at USENIX Security ("OpenVPN is Open to VPN Fingerprinting") demonstrated that researchers could identify 85% of OpenVPN flows with zero false positives. The method worked against a real ISP's traffic, not lab data. They tested 41 different obfuscation configurations and still identified 34 of them. OpenVPN's control channel framing, its opcode byte structure, and the timing of its TLS handshake all produce recognizable signatures.

TLS fingerprinting adds another layer. JA3 and JA3S hashing take the parameters from a TLS ClientHello (cipher suites, extensions, supported groups, elliptic curve point formats — and the SNI hostname is visible in plaintext alongside them) and hash them into a fingerprint. If your VPN client generates a ClientHello that doesn't match any known browser, the connection is flagged. The JA3 hash alone can distinguish between Chrome, Firefox, curl, and most proxy clients. China's GFW has been using TLS fingerprinting since at least 2019.

Method 2: Statistical analysis

When fingerprinting fails, statistics pick up the slack. Machine learning classifiers trained on labeled traffic datasets can detect VPN connections based on packet size distributions, inter-arrival timing, flow duration, and byte entropy.

Published research shows these classifiers achieve 80-95% accuracy on encrypted VPN traffic. Random Forest models hit 85%+ accuracy on Shadowsocks specifically. The classifier doesn't need to decrypt anything. It looks at the shape of the traffic: how big the packets are, how frequently they arrive, how much randomness the byte stream contains.

High entropy is actually a red flag. Normal HTTPS traffic has structured elements: a TLS handshake with certificate chains, HTTP/2 frame headers, content-length fields. Shadowsocks and other SOCKS-based proxies produce streams of nearly uniform randomness from the first byte. That level of entropy is unusual in normal web browsing, and classifiers pick up on it quickly.

Packet size distribution matters too. Web browsing produces a characteristic pattern of small request packets and larger response packets with specific clustering around common content sizes. A VPN tunnel carrying the same web traffic adds encapsulation overhead, shifts the size distribution, and creates patterns that differ from direct HTTPS connections.

Method 3: Active probing

This is China's specialty, though Russia and Iran are adopting it.

Active probing works like this: the DPI system sees a connection it can't classify with confidence. Instead of blocking it immediately, it records the destination IP and port. Then, from a separate machine, it opens its own connection to that server and sends carefully constructed payloads.

If the server is running Shadowsocks, the probe sends a random payload. A Shadowsocks server will try to decrypt it, fail, and either close the connection or produce a distinctive error response. A legitimate HTTPS server would return a proper TLS alert.

If the server is running Trojan, the probe sends an HTTP request. A properly configured Trojan server will serve a real website. But the probe then checks whether the TLS certificate matches what's in certificate transparency logs, whether the server's timing matches a real web server, whether the HTTP response headers are consistent with the claimed server software.

The scale of active probing is enormous. Researchers running honeypots have seen probes arrive from thousands of distinct Chinese IP addresses. The probes are distributed across multiple ASNs to avoid blacklisting. They test HTTP, TLS, and custom protocol-specific payloads. They replay captured traffic to see if the server responds differently to replayed versus fresh connections.

Protocol-by-protocol detection rates

Data aggregated from Russian TSPU testing, Chinese GFW observations, and researcher reports (Habr, February 2026):

Protocol Detection rate Time to detect Method
OpenVPN ~100% Under 30 seconds Signature matching on opcode structure
WireGuard ~100% First packet 148-byte handshake fingerprint
Shadowsocks (AEAD) ~95% Minutes Entropy analysis + replay detection
Trojan ~90% Hours Active probing + cert verification
VMess ~80% Minutes Padding/timing statistical analysis
VLESS Reality <5% N/A No known reliable detection method

These numbers shift. A protocol at 80% today might be at 99% in six months. But the hierarchy has been stable since mid-2024: traditional VPN protocols are trivially detectable, pre-Reality circumvention tools are detectable with effort, and Reality-based connections remain hard to distinguish from legitimate HTTPS.

How protocols try to evade DPI

The evasion strategies fall into three generations.

Generation 1: Obfuscation wrappers

The earliest approach was wrapping VPN traffic in another layer. OpenVPN users ran their connections through obfs4, Stunnel, or similar tools that either randomized the traffic or wrapped it in TLS. obfs4 aimed to produce traffic with no identifiable structure at all.

This worked against simple signature matching. It failed against statistical analysis. Random noise stands out when everything else on the network is structured HTTP/S, DNS, or QUIC. Once China deployed entropy classifiers, obfs4-wrapped connections became easy to flag.

Shadowsocks represented a refinement. AEAD ciphers added authentication, and later versions tried to make replay attacks harder. But the fundamental problem remained: Shadowsocks traffic doesn't look like any legitimate protocol. It's too random, too uniform, too clean. By September 2024, Russian TSPU was catching 95% of Shadowsocks flows.

Generation 2: Protocol mimicry

Trojan was the first widely deployed tool that tried to look like a specific legitimate protocol (HTTPS) rather than like nothing. A Trojan server presents a valid TLS certificate and serves a real website to unauthenticated connections. Only connections with the correct password get proxied.

This was a significant improvement. Passive DPI couldn't distinguish Trojan from a real HTTPS server based on traffic alone. The problem was active probing. China's GFW began connecting to suspected Trojan servers, checking whether the served website was actually hosted there, comparing certificate details, and measuring response timing. By August 2025, Trojan's detection rate had climbed to roughly 90%.

Generation 3: Indistinguishable traffic

VLESS Reality solved the certificate problem. Instead of generating its own TLS certificate, a Reality server contacts the real camouflage target (like microsoft.com) and forwards that server's genuine certificate. The certificate chain is real. The OCSP stapling is real. The TLS fingerprint matches a real browser because the client uses uTLS to reproduce browser-identical ClientHello messages.

Active probing hits a wall. When a probe connects to a Reality server without valid authentication, the server proxies the connection straight through to the real camouflage target. The probe talks to actual microsoft.com. There's nothing fake to detect.

The < 5% detection rate for VLESS Reality in the Habr data isn't from protocol analysis. It's from operational mistakes: people using unusual ports, misconfiguring the camouflage target, or running the server on IP ranges that don't host legitimate websites.

Proprietary approaches

NordVPN ships NordWhisper, described as a protocol that "disguises VPN traffic as regular web browsing." Technical details are sparse. ProtonVPN offers Stealth mode, which appears to use TLS-based obfuscation. Neither company publishes the source code or detailed protocol specifications, which makes independent verification of their censorship resistance claims impossible.

The open-source censorship circumvention community has generally been skeptical of proprietary approaches. Closed protocols can't be audited by researchers, can't be tested against known DPI systems in controlled environments, and rely entirely on the vendor's claims about their effectiveness.

The arms race doesn't stop

Every detection technique described here was once considered impossible. Ten years ago, most internet censorship relied on DNS poisoning and IP blacklisting. DPI was expensive and slow. That changed. Hardware got faster. ML models got better. Governments allocated real budgets.

Russia's 2.27 billion ruble investment in AI-powered traffic analysis isn't a one-time expense. It's ongoing development. China's GFW team publishes academic papers on traffic classification at top security conferences. These aren't amateurs.

The response from the circumvention side has been to raise the bar for detection. Each generation of tools made DPI harder, not impossible. Obfuscation bought a few years. Protocol mimicry bought a few more. Reality made detection require distinguishing a VPN connection from an identical-looking legitimate HTTPS connection, which is a fundamentally harder problem than anything before it.

But "fundamentally harder" doesn't mean "permanently impossible." Potential future attacks against Reality include:

  • Timing correlation: matching the timing of your local traffic against traffic observed at the camouflage target's end
  • Server reputation: building databases of known Reality server IPs through network scanning
  • Behavioral analysis: flagging IP addresses that maintain unusually long HTTPS connections to microsoft.com with high bandwidth
  • ECH adoption: if Encrypted Client Hello becomes universal, servers that don't support it while claiming to be microsoft.com would stand out

RPRX and the XRay development team are aware of these vectors. XRay core updates have addressed several of them already. XHTTP transport, added in v24.12.18, splits traffic across multiple short-lived HTTP sessions that look like normal web API calls rather than long-lived tunnels.

What this means for choosing a VPN

If you're in a country that doesn't run DPI for censorship (most of Europe, North America, parts of South America and Africa), protocol choice doesn't matter much for evading detection. WireGuard's speed and simplicity make it the best option. Use it.

If you're in China, Russia, Iran, or any country that actively blocks VPN protocols, protocol choice is the single most important factor in whether your VPN works at all. WireGuard will get blocked. OpenVPN will get blocked faster. You need a protocol that produces traffic indistinguishable from regular HTTPS browsing.

That's why Fexyn runs VLESS Reality alongside WireGuard and OpenVPN. When your connection to a WireGuard server gets blocked, Fexyn's protocol rotation engine automatically switches to VLESS Reality over TCP or XHTTP. Your traffic starts looking like a normal HTTPS connection to microsoft.com. The DPI system that just blocked WireGuard has no reliable way to distinguish your VPN traffic from the millions of legitimate HTTPS connections flowing through the same network.

DPI is real, it's deployed at scale, and it's getting better. The protocols that survive are the ones that don't try to hide. They just look exactly like everything else. For users whose threat model centres on hostile networks — journalists especially — protocol choice is the single most consequential setting in any VPN.

The most concrete daily example of DPI at work is VoIP blocking in the Gulf and Egypt: Etisalat, du, STC, Mobily, Zain and the major Egyptian carriers use DPI to drop WhatsApp, FaceTime and Skype call signalling at the packet level while letting text messaging through. The protection of carrier call revenue is the motive; the technical fingerprint of those VoIP signals is the mechanism. The country-and-app guides — VPN for WhatsApp, VPN for FaceTime, VPN for Skype — walk through which Fexyn protocol gets each of those working again.

For the cross-country picture, the global censorship map has a "DPI deployed" filter that surfaces all 18+ countries with documented DPI infrastructure on a single screen.

What is deep packet inspection and how VPNs defeat it | Fexyn VPN