Networking¶
Networking failures are among the most common issues in cloud-native systems.
Before debugging Kubernetes Services, Ingress, or service mesh traffic, you need strong Linux networking fundamentals.
This page gives you a practical command set for diagnosing reachability, DNS, ports, routing, and packet flow.
What It Is¶
Linux networking diagnostics are the tools and checks you use to answer:
- Does this host or container have an IP?
- Is the route to the destination valid?
- Is DNS returning the expected result?
- Is the target port listening?
- Are packets reaching the destination and returning?
In practice, networking debugging is layer-by-layer elimination:
- Link/interface layer (is the interface up?)
- IP/routing layer (can packets be routed?)
- Name resolution layer (does DNS resolve correctly?)
- Transport layer (is TCP/UDP port open?)
- Application layer (does HTTP/gRPC/etc. return expected responses?)
When to Use It¶
Use these commands when:
- A service is unreachable
- A pod/container cannot connect to a dependency
- DNS resolution fails or returns unexpected addresses
- Traffic times out intermittently
- Latency spikes or packet loss appear
- A port is reported as "closed" or "connection refused"
Core Commands¶
ip (interfaces, addresses, routes)¶
Inspect interface state and IP assignment:
ip addr
ip -br addr
ip link
Inspect route selection and default gateway:
ip route
ip route get 10.96.0.10
Why it matters:
state DOWNmeans interface-level issue- Missing default route breaks external access
- Wrong route metric can send traffic through the wrong path
ss (socket and listening port inspection)¶
Check listening sockets:
ss -lntp
ss -lunp
Check active outbound/inbound connections:
ss -tnp
ss -s
Why it matters:
- Confirms whether a process is actually listening on expected port
- Helps separate "app not listening" from "network blocked"
ping (basic reachability and packet loss)¶
ping -c 4 8.8.8.8
ping -c 4 example.com
Why it matters:
- IP ping success + DNS ping failure suggests name resolution issue
- Loss/latency variance can indicate congestion or unstable links
Note:
- Some environments block ICMP; failed ping does not always mean service is down
traceroute or tracepath (path visibility)¶
traceroute 1.1.1.1
tracepath 1.1.1.1
Why it matters:
- Shows where packets stop across hops
- Useful for identifying routing boundaries or upstream blocks
dig / nslookup / getent (DNS checks)¶
Query DNS directly:
dig example.com
dig +short api.internal.example.com
dig @8.8.8.8 example.com
Resolve using system resolver path:
getent hosts example.com
cat /etc/resolv.conf
Why it matters:
- Distinguishes DNS server issues from local resolver configuration issues
- Makes split-horizon DNS mistakes obvious
curl (application-layer validation)¶
Validate HTTP reachability with timing breakdown:
curl -v http://service:8080/health
curl -sS -o /dev/null -w "dns=%{time_namelookup} connect=%{time_connect} ttfb=%{time_starttransfer} total=%{time_total}\n" http://service:8080/health
Why it matters:
- Confirms whether the app responds correctly, not just whether port opens
- Timing fields quickly identify whether delay is DNS, TCP connect, or server response
nc (raw TCP/UDP connectivity tests)¶
nc -vz db.internal 5432
nc -vz redis.internal 6379
Why it matters:
- Fast way to validate transport connectivity without full client tooling
- Useful for smoke-checking security group/firewall behavior
tcpdump (packet-level inspection)¶
Capture traffic on interface:
sudo tcpdump -i eth0 host 10.10.2.15
sudo tcpdump -i any port 53
sudo tcpdump -i any tcp port 443
Why it matters:
- Definitive evidence of whether packets arrive/leave
- Essential for diagnosing NAT, DNS, and handshake failures
Tip:
- Save capture for deeper analysis:
sudo tcpdump -i any -w /tmp/capture.pcap host 10.10.2.15
iptables / nft (firewall policy visibility)¶
On systems using iptables:
sudo iptables -L -n -v
sudo iptables -t nat -L -n -v
On systems using nftables:
sudo nft list ruleset
Why it matters:
- Identifies dropped traffic and NAT rules influencing packet flow
- Prevents misattributing policy blocks to application bugs
Kubernetes-adjacent checks from foundations context¶
When debugging from a node or jump host, these commands are still useful:
kubectl get svc -A
kubectl get endpoints -A
kubectl exec -it <pod> -- sh
kubectl exec -it <pod> -- nslookup kubernetes.default.svc
kubectl exec -it <pod> -- nc -vz my-service 8080
Why it matters:
- Confirms whether issue is cluster DNS/service plumbing or app-level behavior
Real-World Example¶
Scenario: payments-api cannot connect to postgres in a cluster-backed environment.
Step-by-step workflow:
- Verify local DNS resolution from workload context:
kubectl exec -it payments-api-abc -- nslookup postgres.default.svc.cluster.local
- Validate transport connectivity from same pod:
kubectl exec -it payments-api-abc -- nc -vz postgres.default.svc.cluster.local 5432
- Check whether target process is listening on expected address/port:
ss -lntp | grep 5432
- Confirm route to destination subnet:
ip route
ip route get <postgres-pod-ip>
- If still failing, capture packets at destination node:
sudo tcpdump -i any host <postgres-pod-ip> and port 5432
Likely outcomes:
- DNS fails: CoreDNS or resolver config issue
- DNS works,
ncfails: policy/firewall/routing issue - TCP connects, app still fails: authentication/TLS/app configuration issue
Debugging Pattern¶
Use this repeatable sequence:
- Identify source and destination precisely (IP, hostname, port, protocol)
- Test DNS resolution (
dig,getent,nslookup) - Test transport reachability (
nc,ss,curl) - Validate route and path (
ip route,traceroute) - Inspect packet flow (
tcpdump) - Inspect policy controls (
iptables/nft, network policies, security groups) - Confirm application-level behavior (HTTP status, TLS handshake, auth)
Decision shortcuts:
- "Connection refused" usually means host reachable but nothing listening on port
- "No route to host" usually means routing or network segmentation issue
- "Name or service not known" usually means resolver or DNS record problem
- "Timed out" often points to drops by firewall/policy or asymmetric routing
Common Pitfalls¶
- Debugging only from your laptop instead of from the failing workload context
- Assuming DNS success means service health; DNS only returns an address
- Ignoring IPv6 vs IPv4 mismatch in dual-stack environments
- Forgetting that ICMP may be blocked while TCP is allowed
- Confusing closed port with filtered port (
refusedvstimed out) - Checking service object but not endpoints/backing pods
- Relying on one tool; always validate across layers