One thing I would add to this is when defining where query logs are written to even if temporarily for debugging, ensure it is a tmpfs mount. Bind and a handful of other daemons will block when log buffers are backed up and this can slow down the DNS server. Tmpfs is not a perfect fix but an improvement. In this example /var/log/named/ should be tmpfs. In /etc/fstab it would look something like:
though I would pick a unique place for tmpfs logs that need not be preserved on reboot such as /var/log/named/tmpfs. It can be useful to also specify a uid and gid that the tmpfs mount belongs to so that statistical tools can read them without world permissions. DNS load testing tools can show the difference in using tmpfs vs. disk even if that disk is NVME. We need not wear down our storage.
Include this debug location in log rotation and adjust the tmpfs size and log rotate frequency according to the DNS server usage under its highest theoretical load.
Another thing I do is have scripts that can quickly display counts of NOERROR and NXDOMAIN by domain to quickly see where I am having a problem though I am sure HN can come up with better ideas.
One thing I would add to this is when defining where query logs are written to even if temporarily for debugging, ensure it is a tmpfs mount. Bind and a handful of other daemons will block when log buffers are backed up and this can slow down the DNS server. Tmpfs is not a perfect fix but an improvement. In this example /var/log/named/ should be tmpfs. In /etc/fstab it would look something like:
though I would pick a unique place for tmpfs logs that need not be preserved on reboot such as /var/log/named/tmpfs. It can be useful to also specify a uid and gid that the tmpfs mount belongs to so that statistical tools can read them without world permissions. DNS load testing tools can show the difference in using tmpfs vs. disk even if that disk is NVME. We need not wear down our storage.Include this debug location in log rotation and adjust the tmpfs size and log rotate frequency according to the DNS server usage under its highest theoretical load.
Another thing I do is have scripts that can quickly display counts of NOERROR and NXDOMAIN by domain to quickly see where I am having a problem though I am sure HN can come up with better ideas.
How complexity of the DNS, DNSSEC and IPv6 + recently enabled DDoS protection made recursive DNS broken in the default setup