Networking¶

Networking failures are among the most common issues in cloud-native systems.

Before debugging Kubernetes Services, Ingress, or service mesh traffic, you need strong Linux networking fundamentals.

This page gives you a practical command set for diagnosing reachability, DNS, ports, routing, and packet flow.

What It Is¶

Linux networking diagnostics are the tools and checks you use to answer:

Does this host or container have an IP?
Is the route to the destination valid?
Is DNS returning the expected result?
Is the target port listening?
Are packets reaching the destination and returning?

In practice, networking debugging is layer-by-layer elimination:

Link/interface layer (is the interface up?)
IP/routing layer (can packets be routed?)
Name resolution layer (does DNS resolve correctly?)
Transport layer (is TCP/UDP port open?)
Application layer (does HTTP/gRPC/etc. return expected responses?)

When to Use It¶

Use these commands when:

A service is unreachable
A pod/container cannot connect to a dependency
DNS resolution fails or returns unexpected addresses
Traffic times out intermittently
Latency spikes or packet loss appear
A port is reported as "closed" or "connection refused"

Core Commands¶

`ip` (interfaces, addresses, routes)¶

Inspect interface state and IP assignment:

ip addr
ip -br addr
ip link

Inspect route selection and default gateway:

ip route
ip route get 10.96.0.10

Why it matters:

state DOWN means interface-level issue
Missing default route breaks external access
Wrong route metric can send traffic through the wrong path

`ss` (socket and listening port inspection)¶

Check listening sockets:

ss -lntp
ss -lunp

Check active outbound/inbound connections:

ss -tnp
ss -s

Why it matters:

Confirms whether a process is actually listening on expected port
Helps separate "app not listening" from "network blocked"

`ping` (basic reachability and packet loss)¶

ping -c 4 8.8.8.8
ping -c 4 example.com

Why it matters:

IP ping success + DNS ping failure suggests name resolution issue
Loss/latency variance can indicate congestion or unstable links

Note:

Some environments block ICMP; failed ping does not always mean service is down

`traceroute` or `tracepath` (path visibility)¶

traceroute 1.1.1.1
tracepath 1.1.1.1

Why it matters:

Shows where packets stop across hops
Useful for identifying routing boundaries or upstream blocks

`dig` / `nslookup` / `getent` (DNS checks)¶

Query DNS directly:

dig example.com
dig +short api.internal.example.com
dig @8.8.8.8 example.com

Resolve using system resolver path:

getent hosts example.com
cat /etc/resolv.conf

Why it matters:

Distinguishes DNS server issues from local resolver configuration issues
Makes split-horizon DNS mistakes obvious

`curl` (application-layer validation)¶

Validate HTTP reachability with timing breakdown:

curl -v http://service:8080/health
curl -sS -o /dev/null -w "dns=%{time_namelookup} connect=%{time_connect} ttfb=%{time_starttransfer} total=%{time_total}\n" http://service:8080/health

Why it matters:

Confirms whether the app responds correctly, not just whether port opens
Timing fields quickly identify whether delay is DNS, TCP connect, or server response

`nc` (raw TCP/UDP connectivity tests)¶

nc -vz db.internal 5432
nc -vz redis.internal 6379

Why it matters:

Fast way to validate transport connectivity without full client tooling
Useful for smoke-checking security group/firewall behavior

`tcpdump` (packet-level inspection)¶

Capture traffic on interface:

sudo tcpdump -i eth0 host 10.10.2.15
sudo tcpdump -i any port 53
sudo tcpdump -i any tcp port 443

Why it matters:

Definitive evidence of whether packets arrive/leave
Essential for diagnosing NAT, DNS, and handshake failures

Tip:

Save capture for deeper analysis:

sudo tcpdump -i any -w /tmp/capture.pcap host 10.10.2.15

`iptables` / `nft` (firewall policy visibility)¶

On systems using iptables:

sudo iptables -L -n -v
sudo iptables -t nat -L -n -v

On systems using nftables:

sudo nft list ruleset

Why it matters:

Identifies dropped traffic and NAT rules influencing packet flow
Prevents misattributing policy blocks to application bugs

Kubernetes-adjacent checks from foundations context¶

When debugging from a node or jump host, these commands are still useful:

kubectl get svc -A
kubectl get endpoints -A
kubectl exec -it <pod> -- sh
kubectl exec -it <pod> -- nslookup kubernetes.default.svc
kubectl exec -it <pod> -- nc -vz my-service 8080

Why it matters:

Confirms whether issue is cluster DNS/service plumbing or app-level behavior

Real-World Example¶

Scenario: payments-api cannot connect to postgres in a cluster-backed environment.

Step-by-step workflow:

Verify local DNS resolution from workload context:

kubectl exec -it payments-api-abc -- nslookup postgres.default.svc.cluster.local

Validate transport connectivity from same pod:

kubectl exec -it payments-api-abc -- nc -vz postgres.default.svc.cluster.local 5432

Check whether target process is listening on expected address/port:

ss -lntp | grep 5432

Confirm route to destination subnet:

ip route
ip route get <postgres-pod-ip>

If still failing, capture packets at destination node:

sudo tcpdump -i any host <postgres-pod-ip> and port 5432

Likely outcomes:

DNS fails: CoreDNS or resolver config issue
DNS works, nc fails: policy/firewall/routing issue
TCP connects, app still fails: authentication/TLS/app configuration issue

Debugging Pattern¶

Use this repeatable sequence:

Identify source and destination precisely (IP, hostname, port, protocol)
Test DNS resolution (dig, getent, nslookup)
Test transport reachability (nc, ss, curl)
Validate route and path (ip route, traceroute)
Inspect packet flow (tcpdump)
Inspect policy controls (iptables/nft, network policies, security groups)
Confirm application-level behavior (HTTP status, TLS handshake, auth)

Decision shortcuts:

"Connection refused" usually means host reachable but nothing listening on port
"No route to host" usually means routing or network segmentation issue
"Name or service not known" usually means resolver or DNS record problem
"Timed out" often points to drops by firewall/policy or asymmetric routing

Common Pitfalls¶

Debugging only from your laptop instead of from the failing workload context
Assuming DNS success means service health; DNS only returns an address
Ignoring IPv6 vs IPv4 mismatch in dual-stack environments
Forgetting that ICMP may be blocked while TCP is allowed
Confusing closed port with filtered port (refused vs timed out)
Checking service object but not endpoints/backing pods
Relying on one tool; always validate across layers

Networking¶

What It Is¶

When to Use It¶

Core Commands¶

ip (interfaces, addresses, routes)¶

ss (socket and listening port inspection)¶

ping (basic reachability and packet loss)¶

traceroute or tracepath (path visibility)¶

dig / nslookup / getent (DNS checks)¶

curl (application-layer validation)¶

nc (raw TCP/UDP connectivity tests)¶

tcpdump (packet-level inspection)¶

iptables / nft (firewall policy visibility)¶

Kubernetes-adjacent checks from foundations context¶

Real-World Example¶

Debugging Pattern¶

Common Pitfalls¶

`ip` (interfaces, addresses, routes)¶

`ss` (socket and listening port inspection)¶

`ping` (basic reachability and packet loss)¶

`traceroute` or `tracepath` (path visibility)¶

`dig` / `nslookup` / `getent` (DNS checks)¶

`curl` (application-layer validation)¶

`nc` (raw TCP/UDP connectivity tests)¶

`tcpdump` (packet-level inspection)¶

`iptables` / `nft` (firewall policy visibility)¶