Date: 2026-05-07 SAST Author: Luci (auto-diagnosed during session 2b2b56cb) Status: Degraded but functional — DATA PLANE WORKS via DERP relay; UDP NAT traversal broken Affected: Phone PWA (MC, audio streaming, anything chatty over Tailscale Funnel) Not affected: SSH from laptop ↔ Luci ↔ Larry (still works, lower-bandwidth use case tolerates DERP)
Server-side tailscaled cannot establish direct WireGuard UDP tunnels because outbound STUN/IPv4 binding requests get no response. All phone traffic relays through Tailscale's DERP-jnb proxy. RTT phone↔server is 241ms to 2.1s with massive variance, which makes Mission Control on mobile feel broken (slow, stuttering audio, page loads >30s).
Workaround in place: hit http://204.168.188.33:3001 (public IP, raw HTTP) bypasses Tailscale entirely. Confirmed fast on Elmar's phone 2026-05-07.
Permanent fix is a TLS-terminated direct path that doesn't depend on Tailscale (Caddy/Let's Encrypt or Cloudflare Tunnel). Filed pending the monitoring task below — if degradation persists for 7+ days we ship one of the solutions.
tailscale netcheck reports UDP: false, IPv4: (no addr found), Nearest DERP: unknown (no response to latency probes).tailscale ping elmars-s26-ultra: every reply via DERP(jnb), no direct connection. RTT samples: 241ms / 418ms / 2128ms.openclaw.tailb2ba18.ts.net): same issue because Funnel routes traffic through Tailscale's edge using the same broken UDP discovery.nc -u -z derp1.tailscale.com 3478 succeeds at SYN level, but real STUN binding requests get no reply (kernel sends, nothing comes back).| Hypothesis | Result |
|---|---|
| Server load / RAM / swap pressure | RAM 854Mi free, load 0.94, swap 7.1Gi free. Fine. |
| MC backend slow | localhost timings: /board 9ms, /api/board 1ms, / 50ms. Fine. |
| Stale service worker / PWA cache | Bumped CACHE_NAME → v33 earlier session. Hard reload did not fix. |
| ext4 / disk error | No kernel ext4 messages, write probes succeed. |
| ufw / iptables blocking | UFW inactive; outbound UDP socket-open succeeds. |
| Tailscale software bug | tailscaled restarted 2026-05-05 + 2026-05-07 — no change. |
| Phone client misconfig | Phone IS on tailnet (elmars-s26-ultra active). Issue is at server's NAT-traversal layer. |
Hetzner cloud network silently filters or rate-limits outbound UDP responses to high-frequency STUN traffic from the gateway IP 172.31.1.1, OR a recent Hetzner network change broke return-path symmetry for Tailscale's discovery. Outbound packets leave the server; replies don't return. WireGuard data plane survives because it can fall back to DERP (TCP relay), but performance suffers.
We have no kernel-level evidence for what's blocking — no dmesg/iptables/route artefacts. Best guess is upstream Hetzner. A support ticket to Hetzner asking "is outbound UDP egress healthy from cloud server 204.168.188.33?" would help, but is a separate workstream.
Phone on cellular is behind CGNAT. Tailscale needs UDP hole-punching from BOTH ends to establish direct path. With server's UDP discovery broken, both ends fall back to DERP. DERP is TCP-based and routed through Tailscale's relay infrastructure (closest is Johannesburg). Each round trip adds ~30-200ms baseline plus jitter. PWAs make many small HTTP requests per page → latency stacks → unusable.
Laptop (when on home Wi-Fi, fixed IP, friendly NAT) tolerates the same DERP relay because SSH and similar low-frequency traffic doesn't notice the latency. Phone PWA does.
Use http://204.168.188.33:3001 directly from phone. MC binds 0.0.0.0:3001, public IP is reachable, no Tailscale dependency. Confirmed fast.
Caveats: - No TLS — credentials/tokens travel in clear (token-auth via cookie still works once set). - No PWA install (PWA/Service Worker requires HTTPS). Add to home screen still works in degraded form. - Exposes MC publicly — anyone with the IP can probe. Currently MC has token-auth on all mutating endpoints; read endpoints are open.
| Option | Effort | TLS | DNS | Cost | Notes |
|---|---|---|---|---|---|
| A) Caddy + Let's Encrypt + own domain | 15 min | ✅ | needs A-record → 204.168.188.33 | free | Clean, standard. Picks up auto-renewing cert. Need a domain. |
| B) Caddy + duckdns/no-ip free subdomain | 10 min | ✅ | none (free dynamic DNS) | free | No domain needed. Subdomain looks ugly. |
| C) Cloudflare Tunnel | 10 min | ✅ at edge | none (cf hostname) | free | No inbound port opened. CF anycast = fast worldwide. Hides server IP. Recommended. |
All three coexist with Tailscale (additive — Tailscale stays for SSH/admin). Recommend C for permanent fix because it requires no inbound port + survives if Hetzner's IP routing flakes.
A scheduled task tailscale-watch will run every 30 min and:
tailscale netcheck → parse UDP, IPv4, DERP fieldstailscale ping -c 1 elmars-s26-ultra → check direct vs relayed~/workspace/state/tailscale_health.json with rolling historyFixes itself naturally if Hetzner network heals or tailscaled rediscovers a working path. Otherwise we have evidence-based escalation to ship the permanent fix.