30 Essential Commands in Linux: Quick Guide for SREs

1. System Observability & Performance
| Command | Purpose | Real-World SRE Example |
top / htop | Process monitoring. | htop (to visually identify which CPU core is pinned at 100%). |
uptime | System load average. | uptime (to check if the load average exceeds the number of CPU cores). |
vmstat | Memory & CPU stats. | vmstat 1 5 (report every second to detect context switching or swapping). |
iostat | Disk I/O utilization. | iostat -xz 1 (monitor extended disk latency in real-time). |
free -h | Available RAM & Swap. | free -h (quick check before deploying a high-memory container). |
sar | Historical performance. | sar -u -f /var/log/sa/sa10 (analyze CPU usage from the 10th of the month). |
2. Networking & Connectivity
| Command | Purpose | Real-World SRE Example |
ip addr | IP configuration. | ip addr show eth0 (validate the IP assigned to a specific interface). |
ss -tulpn | Open ports & sockets. | ss -tulpn |
dig | DNS lookups. | dig +short blog.uptodeploy.com (get only the A record IP for a domain). |
curl -Iv | HTTP(S) debugging. | curl -Iv https://google.com (inspect TLS handshake and headers). |
traceroute | Network path routing. | traceroute -I 8.8.8.8 (use ICMP to find where the packet drop occurs). |
tcpdump | Packet capturing. | tcpdump -i eth0 port 443 (sniffing HTTPS traffic for deep debugging). |
3. File System & Logs
| Command | Purpose | Real-World SRE Example |
df -h | Disk space usage. | df -h / (check if the root partition is at 100% capacity). |
du -sh * | Directory size. | du -sh /var/log/* (find which log file is consuming the most space). |
lsof | List open files. | lsof -i :22 (see active users connected via SSH). |
tail -f | Real-time log follow. | tail -f /var/log/syslog (monitor system events as they happen). |
grep -r | Pattern searching. | grep -r "error" /var/log/nginx/ (search for errors across all Nginx logs). |
find | Locate files. | find /etc -name "*.conf" (locate all configuration files in /etc). |
4. Process Management & Security
| Command | Purpose | Real-World SRE Example |
ps aux | List active processes. | ps aux --sort=-%mem (list processes by highest RAM consumption). |
kill -9 | Force termination. | kill -9 1234 (kill a zombie or hung process with PID 1234). |
systemctl | Manage services. | systemctl restart docker (restart the Docker daemon). |
journalctl -xe | Systemd logs. | journalctl -u nginx.service -f (follow logs for a specific service). |
sudo | Superuser privileges. | sudo visudo (safely edit the sudoers file to manage permissions). |
chmod / chown | Permissions & Ownership. | chown -R www-data:www-data /var/www/html (fix web server permissions). |
5. SRE Power Tools
| Command | Purpose | Real-World SRE Example |
strace | Trace system calls. | strace -p 1234 (debug why a process is stuck or failing). |
dmesg -T | Kernel ring buffer. | dmesg -T |
awk | Text processing. | awk '{print $1}' access.log (extract only the IPs from an access log). |
rsync | Efficient file sync. | rsync -avz ./data/ remote:/backup/ (sync data with compression and delta). |
openssl | SSL/TLS management. | openssl x509 -in cert.crt -text -noout (check certificate expiration/details). |





