DNS resolution is the step where a client (or recursive resolver) converts a domain name into an IP address by querying DNS servers. DNS troubleshooting is verifying each step of that lookup chain (client → resolver → authoritative DNS) to isolate failures, delays, or wrong answers.
When to use DNS troubleshooting
- Users see “DNS_PROBE_FINISHED_NXDOMAIN,” “Server IP address could not be found,” or “DNS server not responding.”
- A domain resolves to the wrong IP, wrong region, or wrong provider.
- DNS changes “don’t take effect” after a deployment or migration.
- You suspect cache/TTL issues, propagation delays, or stale records.
- You need to validate DNSSEC or delegation (NS records).
When not to use DNS troubleshooting
- The hostname resolves correctly, but the site still fails due to HTTP/TLS/app errors (check web server, TLS, WAF, origin health).
- Packet loss or routing issues are evident (start with network diagnostics like ping/traceroute).
- The issue is limited to a single browser tab and clears with a refresh (likely client/browser state).
- You already confirmed authoritative DNS answers are correct and fast; focus on CDN/origin latency instead.
Signals you need this (common symptoms)
- Intermittent “works on mobile, fails on Wi‑Fi” behavior.
- Different users resolve different IPs unexpectedly.
- Very slow first-byte only on the first request (DNS delay before TCP/TLS).
- Email delivery failures related to MX records.
- DNS changes appear inconsistent across regions.
Common DNS Problems
Common mistakes (and fixes)
- Mistake: Testing only with one resolver (e.g., your ISP).
Fix: Query multiple resolvers and authoritative servers directly. - Mistake: Ignoring the difference between authoritative vs recursive answers.
Fix: Usedig +traceor query authoritative nameservers with@ns.example. - Mistake: Changing records repeatedly during an incident.
Fix: Stabilize, lower TTL before planned migrations, then change once. - Mistake: Forgetting AAAA records (IPv6) or broken dual-stack behavior.
Fix: Check bothAandAAAA, and validate IPv6 reachability. - Mistake: Misconfigured CNAME at the zone apex.
Fix: Use ALIAS/ANAME (provider feature) or restructure records. - Mistake: Missing/incorrect NS delegation at the registrar.
Fix: Verify parent zone NS and glue records when required.
DNS server failures can render websites inaccessible, often due to network connectivity issues or server unavailability. Common symptoms include error messages like “DNS server not responding” or “Server IP address could not be found.”
Common DNS problems (what they look like + typical causes)
1) DNS server not responding / timeouts
What you see: query hangs, then times out; browsers show DNS server errors.
Typical causes:
- Network disruption between client and resolver, or resolver and authoritative
- Authoritative server overload/outage (including DDoS attacks)
- Firewall blocks UDP/TCP 53 or EDNS behavior breaks on the path
2) NXDOMAIN (domain/record “does not exist”)
What you see: status: NXDOMAIN in dig output.
Typical causes:
- Typo in hostname or wrong zone
- Record not created in authoritative DNS
- Querying a resolver with stale negative caching
3) SERVFAIL
What you see: status: SERVFAIL.
Typical causes:
- DNSSEC validation failure
- Broken delegation chain (bad NS)
- Authoritative server returns malformed responses
4) Wrong IP / wrong destination
What you see: resolves, but to an unexpected IP/CNAME.
Typical causes:
- Split-horizon DNS confusion (internal vs external views)
- Cached old record (TTL not expired)
- Misconfigured CNAME chain or multiple conflicting records
Resolution Failures
Failures in DNS resolution disrupt the conversion of domain names to IP addresses, leading to broken communication between users and websites.
Types of Resolution Failures:
- Query Failures: timeouts, recursive resolver issues, or root server communication errors.
- Record-Related Issues: missing or corrupted A, AAAA, or CNAME records, expired domains, or improper zone file propagation.
Technical Impacts
- Connection problems: browsers fail to establish server connections.
- Performance degradation: increased latency as systems retry queries.
- Resource waste: bandwidth is consumed by repeated failed resolution attempts.
- Communication disruptions: failed email deliveries due to unresolved hostnames.
Diagnostic Approaches
Client-Side Analysis
- Use DNS lookup tools to verify record availability.
- Run traceroute to identify breaks in the resolution chain.
- Inspect the local DNS cache for outdated or corrupted entries.
Server-Side Verification:
- Check the integrity of zone files on authoritative name servers.
- Validate DNSSEC settings.
- Ensure proper delegation and configuration of NS records.
Mitigation Strategies
Mitigating DNS resolution failures requires a combination of robust infrastructure and adherence to best practices.
Infrastructure Improvements
- Redundant DNS Servers: deploy servers across multiple locations for failover support.
- Anycast DNS Services: enhance reliability by routing queries to the nearest server.
- Health Checks: regularly monitor server health and automate failover responses.
Best Practices
- Conduct regular DNS record audits to remove outdated or conflicting entries.
- Optimize TTL settings for balanced performance and data freshness.
- Implement monitoring and alerting systems to detect issues early.
- Keep DNS server software updated to address vulnerabilities and maintain compliance.
Using the Dig Command for DNS Analysis
The dig command is a powerful tool for DNS troubleshooting. Learn how to install it here.
Common Commands
dig example.comPerforms a standard DNS lookup, displaying answer, authority, and additional sections.
dig @8.8.8.8 example.comQueries Google’s DNS server (8.8.8.8) for the domain.
dig example.com AFetches the A record (IPv4 address) for the domain.
Advanced Commands
dig +trace example.comTraces the entire DNS resolution process.
dig +short example.comProvides a concise output of the resolution result.
dig +stats example.comDisplays detailed timing and server statistics.
DNS Cache Management
DNS caching stores recently accessed domain resolutions locally to reduce lookup times and network traffic. However, outdated or corrupted cache entries can lead to connectivity problems.
Flushing DNS Cache
Flushing the DNS cache is a simple yet effective way to resolve connectivity issues and ensure your device uses the most up-to-date DNS information. When you access websites, DNS cache stores IP addresses for faster browsing. However, outdated or corrupted entries can lead to errors like site inaccessibility. Clearing the cache removes these issues, prompting your device to fetch fresh DNS data. This process is quick and varies by operating system, commonly involving a command-line instruction. Regularly flushing your DNS cache can improve performance, enhance security, and help troubleshoot network issues seamlessly.
By Operating System
Windows:
ipconfig /flushdnsmacOS:
sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponderLinux:
sudo systemd-resolve --flush-cachesAfter flushing, DNS queries will fetch fresh records, potentially resolving connectivity issues.
Understanding DNS Failures
DNS failures can occur at multiple stages, such as query formation, server contact, or record retrieval. Identifying the root cause is essential for effective troubleshooting.
Common Stages and Issues:
| Stage | Issues | Impact |
|---|---|---|
| Query Formation | Syntax errors | Failed initiation |
| Server Contact | Network issues, timeouts | No response |
| Record Retrieval | Missing/incorrect records | Incomplete resolution |
Cache-Related Problems:
| Cache Type | Purpose | Issues |
|---|---|---|
| Browser | Quick access | Outdated records |
| OS | System-wide caching | Stale data, corruption |
| Resolver | ISP-level caching | Propagation delays |
Understanding and effectively troubleshooting DNS issues are critical skills for maintaining network reliability. By leveraging tools like dig, adhering to best practices, and proactively monitoring DNS configurations, you can minimize disruptions and ensure seamless internet connectivity.
DNS diagnostic workflow (fast, repeatable)
Step 1: Confirm what the client is seeing
Use a basic lookup:
dig example.comCheck:
status(NOERROR/NXDOMAIN/SERVFAIL)ANSWER SECTION(records and TTL)Query timeand whichSERVERresponded
Step 2: Query a known public resolver (compare answers)
dig @8.8.8.8 example.comdig @1.1.1.1 example.comIf results differ, you may have caching, propagation, or split-horizon issues.
Step 3: Ask for specific record types (A/AAAA/CNAME/NS/MX)
dig example.com Adig example.com AAAAdig example.com CNAMEdig example.com NSdig example.com MXStep 4: Trace the delegation chain (find where it breaks)
dig +trace example.comThis helps you identify whether the failure is at:
- parent delegation (TLD/registrar)
- authoritative nameserver reachability
- zone content (missing records)
Step 5: Reduce noise when you just want the answer
dig +short example.comStep 6: Capture timing and stats for evidence
dig +stats example.comTip: If you need installation guidance, see: How to install dig.
DNS cache management (when flushing helps vs when it doesn’t)
DNS caches exist at multiple layers: browser, OS, recursive resolver, and sometimes enterprise gateways. Flushing helps when the local machine is holding stale data; it won’t fix wrong authoritative records.
Flush local DNS cache (by OS)
Windows
ipconfig /flushdnsmacOS
sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponderLinux (systemd-resolved)
sudo systemd-resolve --flush-cachesCache types and typical issues
| Cache layer | Purpose | Common issue | What you can do |
|---|---|---|---|
| Browser | Speed up repeat visits | Stale host mapping | Clear browser DNS cache / restart browser |
| OS | System-wide caching | Corruption/stale entries | Flush OS DNS cache |
| Recursive resolver | Shared caching for many users | Propagation delays, negative caching | Test with another resolver; wait TTL; fix authoritative records |
Learn the mechanics here: How DNS cache works.
Understanding failures by stage (quick mapping table)
| Stage | What breaks | Typical sign | Likely root cause |
|---|---|---|---|
| Query formation | Invalid/incorrect name | NXDOMAIN | Typos, wrong zone, wrong search suffix |
| Resolver contact | Resolver unreachable | timeouts | Local network/ISP issues |
| Resolver → authoritative | Authoritative unreachable | timeouts/SERVFAIL | DDoS, firewall, bad NS |
| Record retrieval | Missing/wrong record | NXDOMAIN/wrong IP | Misconfigured A/AAAA/CNAME/MX |
| Validation | DNSSEC problems | SERVFAIL | DNSSEC misconfig, broken chain |
Mini FAQ “Why does DNS work for me but not for others?”
Common causes are resolver caching differences, regional resolvers, split-horizon DNS, or propagation/TTL behavior. Compare answers from multiple resolvers and check authoritative responses.
“How do I know if the problem is my resolver or authoritative DNS?”
Query the authoritative nameserver directly (or use dig +trace). If authoritative answers are correct but a resolver returns wrong/old data, it’s a caching or resolver issue.
“Why do my DNS changes not apply immediately?”
Resolvers cache responses for the TTL. Some caches also keep negative answers (NXDOMAIN) for a period. Plan migrations by lowering TTL ahead of time.
“What’s the fastest way to see the full resolution path?”
Use:
dig +trace example.com“What should I check first during a DNS outage?”
digstatus and timing- Compare multiple resolvers
dig +traceto find the failing hop- Verify NS delegation and authoritative health
How this applies in practice
- Before a migration: reduce TTL 24–48 hours in advance; validate A/AAAA and CNAME chains; confirm NS delegation at registrar.
- During an incident: avoid repeated record edits; measure query timeouts and SERVFAIL; test authoritative servers directly.
- After changes: verify from multiple regions/resolvers; track DNS lookup latency and error rates; restore TTL to normal values.
How to implement on Azion
- If you need an operational workflow for running and interpreting
dig, use:
Run the dig command - If DNS issues are part of a broader availability or security investigation, correlate them with latency and protection signals in your edge stack (e.g., WAF/WAAP events and performance telemetry). See:
Web application and API protection (WAAP)