<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[UpToDeploy | SRE, Cloud Architecture & Security]]></title><description><![CDATA[UpToDeploy | Jose Alvarez: Cloud Specialist & SRE Blog. Technical articles on Cloud Architecture, Security, Containers, and AI. Documenting real-world projects and infrastructure labs.]]></description><link>https://blog.uptodeploy.com</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1769747265148/d7e42b95-cb0d-4550-95f6-08dea3bcaa28.png</url><title>UpToDeploy | SRE, Cloud Architecture &amp; Security</title><link>https://blog.uptodeploy.com</link></image><generator>RSS for Node</generator><lastBuildDate>Sat, 11 Apr 2026 13:28:59 GMT</lastBuildDate><atom:link href="https://blog.uptodeploy.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[30 Essential Docker Commands: A Practical Guide for SREs]]></title><description><![CDATA[1. Container Management (Lifecycle)




CommandPurposeExample



docker psList running containers.docker ps -a (to see all containers, including those that exited with errors).

docker runCreate and start a container.docker run -d --name web nginx (r...]]></description><link>https://blog.uptodeploy.com/docker-commands-practical</link><guid isPermaLink="true">https://blog.uptodeploy.com/docker-commands-practical</guid><category><![CDATA[Docker]]></category><category><![CDATA[Linux]]></category><category><![CDATA[SRE]]></category><category><![CDATA[Devops]]></category><dc:creator><![CDATA[Jose Alvarez R.]]></dc:creator><pubDate>Wed, 11 Feb 2026 06:00:54 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/HSACbYjZsqQ/upload/3f15b18a86a7608e2acf68931828490b.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3 id="heading-1-container-management-lifecycle">1. Container Management (Lifecycle)</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Command</strong></td><td><strong>Purpose</strong></td><td><strong>Example</strong></td></tr>
</thead>
<tbody>
<tr>
<td><code>docker ps</code></td><td>List running containers.</td><td><code>docker ps -a</code> (to see all containers, including those that exited with errors).</td></tr>
<tr>
<td><code>docker run</code></td><td>Create and start a container.</td><td><code>docker run -d --name web nginx</code> (run a web server in the background/detached).</td></tr>
<tr>
<td><code>docker stop</code></td><td>Stop a running container.</td><td><code>docker stop $(docker ps -q)</code> (stop all running containers at once).</td></tr>
<tr>
<td><code>docker rm</code></td><td>Remove a container.</td><td><code>docker rm -f web</code> (force removal of a running container).</td></tr>
<tr>
<td><code>docker exec</code></td><td>Run a command in a container.</td><td><code>docker exec -it web bash</code> (get an interactive shell inside a container).</td></tr>
<tr>
<td><code>docker logs</code></td><td>Fetch container logs.</td><td><code>docker logs -f --tail 100 web</code> (follow the last 100 log lines in real-time).</td></tr>
</tbody>
</table>
</div><h3 id="heading-2-image-management">2. Image Management</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Command</strong></td><td><strong>Purpose</strong></td><td><strong>Example</strong></td></tr>
</thead>
<tbody>
<tr>
<td><code>docker images</code></td><td>List available images.</td><td><code>docker images -q</code> (get only the IDs of all local images).</td></tr>
<tr>
<td><code>docker pull</code></td><td>Download an image.</td><td><code>docker pull redis:latest</code> (ensure you have the latest version before deployment).</td></tr>
<tr>
<td><code>docker build</code></td><td>Build an image from a Dockerfile.</td><td><code>docker build -t my-app:v1 .</code> (tag a new version of your application).</td></tr>
<tr>
<td><code>docker rmi</code></td><td>Remove an image.</td><td><code>docker rmi $(docker images -f "dangling=true" -q)</code> (clean up unused/dangling images).</td></tr>
<tr>
<td><code>docker tag</code></td><td>Create a tag for an image.</td><td><code>docker tag my-app:v1 myrepo/my-app:v1</code> (prepare an image for a registry).</td></tr>
<tr>
<td><code>docker push</code></td><td>Upload an image to a registry.</td><td><code>docker push myrepo/my-app:v1</code> (deploy the image to Docker Hub or ACR/ECR).</td></tr>
</tbody>
</table>
</div><h3 id="heading-3-observability-amp-troubleshooting">3. Observability &amp; Troubleshooting</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Command</strong></td><td><strong>Purpose</strong></td><td><strong>Example</strong></td></tr>
</thead>
<tbody>
<tr>
<td><code>docker stats</code></td><td>Live resource usage.</td><td><code>docker stats --no-stream</code> (get a snapshot of CPU/RAM usage per container).</td></tr>
<tr>
<td><code>docker inspect</code></td><td>Low-level info on objects.</td><td><code>docker inspect web</code> (Information about the container)</td></tr>
<tr>
<td><code>docker top</code></td><td>Display running processes.</td><td><code>docker top web</code> (see what processes are running inside the container host-side).</td></tr>
<tr>
<td><code>docker port</code></td><td>List port mappings.</td><td><code>docker port web</code> (verify which host port is mapped to the container's port).</td></tr>
<tr>
<td><code>docker events</code></td><td>Real-time events from server.</td><td><code>docker events --since 1h</code> (audit what happened in the Docker engine recently).</td></tr>
<tr>
<td><code>docker diff</code></td><td>Changes in the FS.</td><td><code>docker diff web</code> (see what files were modified/added in the container layer).</td></tr>
</tbody>
</table>
</div><h3 id="heading-4-network-amp-volumes">4. Network &amp; Volumes</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Command</strong></td><td><strong>Purpose</strong></td><td><strong>Example</strong></td></tr>
</thead>
<tbody>
<tr>
<td><code>docker network ls</code></td><td>List networks.</td><td><code>docker network ls</code> (check available drivers like bridge, host, or overlay).</td></tr>
<tr>
<td><code>docker network inspect</code></td><td>Detailed network info.</td><td><code>docker network inspect bridge</code> (see which containers are attached to a network).</td></tr>
<tr>
<td><code>docker volume ls</code></td><td>List volumes.</td><td><code>docker volume ls</code> (identify persistent data volumes).</td></tr>
<tr>
<td><code>docker volume prune</code></td><td>Remove unused volumes.</td><td><code>docker volume prune</code> (recover disk space from orphaned volumes).</td></tr>
<tr>
<td><code>docker cp</code></td><td>Copy files to/from container.</td><td><code>docker cp web:/etc/nginx/nginx.conf .</code> (extract a config file for local review).</td></tr>
<tr>
<td><code>docker network connect</code></td><td>Connect container to network.</td><td><code>docker network connect my-net web</code> (dynamically attach a container to a new network).</td></tr>
</tbody>
</table>
</div><h3 id="heading-5-advanced-amp-cleanup">5. Advanced &amp; Cleanup</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Command</strong></td><td><strong>Purpose</strong></td><td><strong>Example</strong></td></tr>
</thead>
<tbody>
<tr>
<td><code>docker system prune</code></td><td>Total cleanup.</td><td><code>docker system prune -a --volumes</code> (wipe everything unused to free up disk).</td></tr>
<tr>
<td><code>docker system df</code></td><td>Show docker disk usage.</td><td><code>docker system df</code> (diagnose why the Docker partition is full).</td></tr>
<tr>
<td><code>docker-compose up</code></td><td>Orchestrate multiple containers.</td><td><code>docker-compose up -d</code> (deploy an entire stack defined in a YAML).</td></tr>
<tr>
<td><code>docker-compose logs</code></td><td>View logs from a stack.</td><td><code>docker-compose logs -f app</code> (follow logs for a specific service in the stack).</td></tr>
<tr>
<td><code>docker commit</code></td><td>Create image from container.</td><td><code>docker commit web my-emergency-image</code> (save a container's state for forensics).</td></tr>
<tr>
<td><code>docker wait</code></td><td>Wait for a container to stop.</td><td><code>docker wait web</code> (useful in scripts to take action after a container finishes).</td></tr>
</tbody>
</table>
</div>]]></content:encoded></item><item><title><![CDATA[30 Essential Commands in Linux: Quick Guide for SREs]]></title><description><![CDATA[1. System Observability & Performance




CommandPurposeReal-World SRE Example



top / htopProcess monitoring.htop (to visually identify which CPU core is pinned at 100%).

uptimeSystem load average.uptime (to check if the load average exceeds the n...]]></description><link>https://blog.uptodeploy.com/quick-linux-commands</link><guid isPermaLink="true">https://blog.uptodeploy.com/quick-linux-commands</guid><category><![CDATA[Linux]]></category><category><![CDATA[linux for beginners]]></category><category><![CDATA[linux-basics]]></category><category><![CDATA[SRE]]></category><dc:creator><![CDATA[Jose Alvarez R.]]></dc:creator><pubDate>Sun, 08 Feb 2026 02:34:28 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/FXFz-sW0uwo/upload/8e77cbc16954e3644058eb4666b83d96.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3 id="heading-1-system-observability-amp-performance">1. System Observability &amp; Performance</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Command</strong></td><td><strong>Purpose</strong></td><td><strong>Real-World SRE Example</strong></td></tr>
</thead>
<tbody>
<tr>
<td><code>top</code> / <code>htop</code></td><td>Process monitoring.</td><td><code>htop</code> (to visually identify which CPU core is pinned at 100%).</td></tr>
<tr>
<td><code>uptime</code></td><td>System load average.</td><td><code>uptime</code> (to check if the load average exceeds the number of CPU cores).</td></tr>
<tr>
<td><code>vmstat</code></td><td>Memory &amp; CPU stats.</td><td><code>vmstat 1 5</code> (report every second to detect context switching or swapping).</td></tr>
<tr>
<td><code>iostat</code></td><td>Disk I/O utilization.</td><td><code>iostat -xz 1</code> (monitor extended disk latency in real-time).</td></tr>
<tr>
<td><code>free -h</code></td><td>Available RAM &amp; Swap.</td><td><code>free -h</code> (quick check before deploying a high-memory container).</td></tr>
<tr>
<td><code>sar</code></td><td>Historical performance.</td><td><code>sar -u -f /var/log/sa/sa10</code> (analyze CPU usage from the 10th of the month).</td></tr>
</tbody>
</table>
</div><h3 id="heading-2-networking-amp-connectivity">2. Networking &amp; Connectivity</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Command</strong></td><td><strong>Purpose</strong></td><td><strong>Real-World SRE Example</strong></td></tr>
</thead>
<tbody>
<tr>
<td><code>ip addr</code></td><td>IP configuration.</td><td><code>ip addr show eth0</code> (validate the IP assigned to a specific interface).</td></tr>
<tr>
<td><code>ss -tulpn</code></td><td>Open ports &amp; sockets.</td><td><code>ss -tulpn</code></td></tr>
<tr>
<td><code>dig</code></td><td>DNS lookups.</td><td><code>dig +short</code> <a target="_blank" href="http://blog.uptodeploy.com"><code>blog.uptodeploy.com</code></a> (get only the A record IP for a domain).</td></tr>
<tr>
<td><code>curl -Iv</code></td><td>HTTP(S) debugging.</td><td><code>curl -Iv</code> <a target="_blank" href="https://google.com"><code>https://google.com</code></a> (inspect TLS handshake and headers).</td></tr>
<tr>
<td><code>traceroute</code></td><td>Network path routing.</td><td><code>traceroute -I 8.8.8.8</code> (use ICMP to find where the packet drop occurs).</td></tr>
<tr>
<td><code>tcpdump</code></td><td>Packet capturing.</td><td><code>tcpdump -i eth0 port 443</code> (sniffing HTTPS traffic for deep debugging).</td></tr>
</tbody>
</table>
</div><h3 id="heading-3-file-system-amp-logs">3. File System &amp; Logs</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Command</strong></td><td><strong>Purpose</strong></td><td><strong>Real-World SRE Example</strong></td></tr>
</thead>
<tbody>
<tr>
<td><code>df -h</code></td><td>Disk space usage.</td><td><code>df -h /</code> (check if the root partition is at 100% capacity).</td></tr>
<tr>
<td><code>du -sh *</code></td><td>Directory size.</td><td><code>du -sh /var/log/*</code> (find which log file is consuming the most space).</td></tr>
<tr>
<td><code>lsof</code></td><td>List open files.</td><td><code>lsof -i :22</code> (see active users connected via SSH).</td></tr>
<tr>
<td><code>tail -f</code></td><td>Real-time log follow.</td><td><code>tail -f /var/log/syslog</code> (monitor system events as they happen).</td></tr>
<tr>
<td><code>grep -r</code></td><td>Pattern searching.</td><td><code>grep -r "error" /var/log/nginx/</code> (search for errors across all Nginx logs).</td></tr>
<tr>
<td><code>find</code></td><td>Locate files.</td><td><code>find /etc -name "*.conf"</code> (locate all configuration files in /etc).</td></tr>
</tbody>
</table>
</div><h3 id="heading-4-process-management-amp-security">4. Process Management &amp; Security</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Command</strong></td><td><strong>Purpose</strong></td><td><strong>Real-World SRE Example</strong></td></tr>
</thead>
<tbody>
<tr>
<td><code>ps aux</code></td><td>List active processes.</td><td><code>ps aux --sort=-%mem</code> (list processes by highest RAM consumption).</td></tr>
<tr>
<td><code>kill -9</code></td><td>Force termination.</td><td><code>kill -9 1234</code> (kill a zombie or hung process with PID 1234).</td></tr>
<tr>
<td><code>systemctl</code></td><td>Manage services.</td><td><code>systemctl restart docker</code> (restart the Docker daemon).</td></tr>
<tr>
<td><code>journalctl -xe</code></td><td>Systemd logs.</td><td><code>journalctl -u nginx.service -f</code> (follow logs for a specific service).</td></tr>
<tr>
<td><code>sudo</code></td><td>Superuser privileges.</td><td><code>sudo visudo</code> (safely edit the sudoers file to manage permissions).</td></tr>
<tr>
<td><code>chmod</code> / <code>chown</code></td><td>Permissions &amp; Ownership.</td><td><code>chown -R www-data:www-data /var/www/html</code> (fix web server permissions).</td></tr>
</tbody>
</table>
</div><h3 id="heading-5-sre-power-tools">5. SRE Power Tools</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Command</strong></td><td><strong>Purpose</strong></td><td><strong>Real-World SRE Example</strong></td></tr>
</thead>
<tbody>
<tr>
<td><code>strace</code></td><td>Trace system calls.</td><td><code>strace -p 1234</code> (debug why a process is stuck or failing).</td></tr>
<tr>
<td><code>dmesg -T</code></td><td>Kernel ring buffer.</td><td><code>dmesg -T</code></td></tr>
<tr>
<td><code>awk</code></td><td>Text processing.</td><td><code>awk '{print $1}' access.log</code> (extract only the IPs from an access log).</td></tr>
<tr>
<td><code>rsync</code></td><td>Efficient file sync.</td><td><code>rsync -avz ./data/ remote:/backup/</code> (sync data with compression and delta).</td></tr>
<tr>
<td><code>openssl</code></td><td>SSL/TLS management.</td><td><code>openssl x509 -in cert.crt -text -noout</code> (check certificate expiration/details).</td></tr>
</tbody>
</table>
</div>]]></content:encoded></item><item><title><![CDATA[SRE Guide: Eliminating "Toil" – The Art of Scaling Without Burning Out]]></title><description><![CDATA[Automate to scale, not just to survive
In the world of Site Reliability Engineering (SRE), not all automation is created equal. There is a silent enemy that consumes engineer’s time, stalls innovation, and drains operational budgets: Toil.
If your da...]]></description><link>https://blog.uptodeploy.com/sre-guide-eliminating-toil</link><guid isPermaLink="true">https://blog.uptodeploy.com/sre-guide-eliminating-toil</guid><category><![CDATA[SRE]]></category><category><![CDATA[automation]]></category><category><![CDATA[Toil Reduction]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Cloud Computing]]></category><dc:creator><![CDATA[Jose Alvarez R.]]></dc:creator><pubDate>Tue, 03 Feb 2026 06:00:10 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/FHnnjk1Yj7Y/upload/ce2357be328266337a8cc55d343d971c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3 id="heading-automate-to-scale-not-just-to-survive">Automate to scale, not just to survive</h3>
<p>In the world of <strong>Site Reliability Engineering (SRE)</strong>, not all automation is created equal. There is a silent enemy that consumes engineer’s time, stalls innovation, and drains operational budgets: <strong>Toil</strong>.</p>
<p>If your daily routine revolves around manually putting out fires, you aren't doing SRE; you are doing traditional operations with a modern title.</p>
<hr />
<h2 id="heading-1-what-is-toil-really-and-what-it-isnt">1. What is Toil, really? (And what it isn't)</h2>
<p>We often mistake "boring work" for Toil. However, for a task to be technically classified as Toil, it must meet the four following points:</p>
<ul>
<li><p><strong>Manual:</strong> It is performed by a human (e.g., Connecting through SSH to a server to restart a pod or clear logs).</p>
</li>
<li><p><strong>Repetitive:</strong> You do it over and over again, week after week.</p>
</li>
<li><p><strong>Automatable:</strong> If a Bash script or a Python workflow could handle it, it’s Toil.</p>
</li>
<li><p><strong>No Enduring Value:</strong> Once you are done, the system hasn't structurally improved. The state simply returned to "point zero."</p>
</li>
</ul>
<blockquote>
<p><strong>Note:</strong> If you are designing a new architecture in Azure or hardening a Linux, that is <strong>not</strong> Toil; that is engineering. You are leaving the system better than you found it.</p>
</blockquote>
<hr />
<h2 id="heading-2-the-50-rule">2. The 50% Rule</h2>
<p>In big organizations, we follow a strict mandate: <strong>An SRE should spend no more than 50% of their time on Toil.</strong></p>
<ul>
<li><p><strong>The other 50%:</strong> Must be dedicated exclusively to engineering projects. This includes developing new tools, optimizing Infrastructure as Code (IaC), or implementing advanced security policies.</p>
</li>
<li><p><strong>Why it's vital:</strong> If Toil grows at the same rate as your systems, you will eventually need an army of operators just to "keep the lights on." Toil doesn't scale; engineering does.</p>
</li>
</ul>
<hr />
<h2 id="heading-3-strategies-to-eliminate-toil">3. Strategies to Eliminate Toil</h2>
<p>To tackle this situation, we need a clear solutions. Here is how we implement it in real-world scenearios:</p>
<ul>
<li><p><strong>Self-Healing Systems:</strong> Instead of manual intervention during a failure, we use or design the system to perform its own Health Checks and repair itself (e.g., restarting services or replacing unhealthy instances) without human input.</p>
</li>
<li><p><strong>Declarative Infrastructure:</strong> We eliminate human error by manual configurations. Every change to networks, firewalls, or servers must be defined in code, ensuring auditability, repeatability, and consistent deployments.</p>
</li>
<li><p><strong>Event-Driven Operations:</strong> We set up automated triggers. If the system detects a resource will hit its limit within a specific timeframe, a logical routine should handle the expansion before it turns into an incident.</p>
</li>
<li><p><strong>GitOps &amp; Continuous Delivery:</strong> The "truth" for the infrastructure resides in a controlled repository. Any drift between the live environment and the code is automatically reconciled, eliminating manual configuration drift.</p>
</li>
</ul>
<hr />
<h2 id="heading-4-the-role-of-ai-against-toil">4. The Role of AI Against Toil</h2>
<p>AI is the ultimate weapon because, unlike a static Bash script, AI is <strong>adaptive</strong>. As SRE, we must leverage this:</p>
<ol>
<li><p><strong>Smart Alert Classification:</strong> AI can filter out monitoring "noise," discarding false positives and automatically handling low-impact alerts that previously woke someone up at 3 AM.</p>
</li>
<li><p><strong>Code &amp; Config Generation:</strong> Using AI to generate Ansible playbooks or Azure Policies drastically reduces manual writing time, letting you focus on the logic and security of the design.</p>
</li>
<li><p><strong>Anomaly Analysis:</strong> Moving from threshold-based alerts (e.g., CPU &gt; 80%) to behavior-based alerts (e.g., "this traffic pattern is unusual for a Monday morning").</p>
</li>
</ol>
<hr />
<h3 id="heading-conclusion-automate-yourself-out-of-your-current-job">Conclusion: "Automate yourself out of your current job"</h3>
<p>The goal of an SRE is to automate themselves out of their daily operational tasks. This doesn't mean losing your job—it means freeing up your time to focus on other needs.</p>
]]></content:encoded></item><item><title><![CDATA[Essential Protocols: Quick Guide for SREs & SysAdmins]]></title><description><![CDATA[In infrastructure, you don't guess—you verify. Whether you're debugging a Kubernetes CNI or a cloud-native database, knowing the underlying protocol is the key to faster troubleshooting.
Here is a cheat sheet of the 25 most important protocols and th...]]></description><link>https://blog.uptodeploy.com/protocols-for-sres-sysadmins</link><guid isPermaLink="true">https://blog.uptodeploy.com/protocols-for-sres-sysadmins</guid><category><![CDATA[SRE]]></category><category><![CDATA[Devops]]></category><category><![CDATA[internet]]></category><category><![CDATA[protocols]]></category><category><![CDATA[troubleshooting]]></category><dc:creator><![CDATA[Jose Alvarez R.]]></dc:creator><pubDate>Mon, 02 Feb 2026 05:53:23 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/tiSE_paTt0A/upload/1e0730ace06058b5610c6b0781f08130.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In infrastructure, you don't guess—you verify. Whether you're debugging a Kubernetes CNI or a cloud-native database, knowing the underlying protocol is the key to faster troubleshooting.</p>
<p>Here is a cheat sheet of the 25 most important protocols and the commands to test them directly from your terminal.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Protocol</strong></td><td><strong>Role in Infrastructure</strong></td><td><strong>Command to Test</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>HTTPS</strong></td><td>Secure web traffic &amp; API entry points.</td><td><code>curl -I</code> <a target="_blank" href="https://google.com"><code>https://google.com</code></a></td></tr>
<tr>
<td><strong>SSH</strong></td><td>Secure remote server management.</td><td><code>ssh -v user@host</code></td></tr>
<tr>
<td><strong>DNS</strong></td><td>The backbone of service discovery.</td><td><code>dig +short</code> <a target="_blank" href="http://google.com"><code>google.com</code></a></td></tr>
<tr>
<td><strong>TCP</strong></td><td>Connection-oriented, reliable delivery.</td><td><code>nc -zv host 443</code></td></tr>
<tr>
<td><strong>UDP</strong></td><td>Fast delivery for VoIP, VPN, &amp; DNS.</td><td><code>nc -zuv host 1194</code></td></tr>
<tr>
<td><strong>ICMP</strong></td><td>Network reachability &amp; latency testing.</td><td><code>ping -c 4 8.8.8.8</code></td></tr>
<tr>
<td><strong>BGP</strong></td><td>Internet-scale routing (AS path).</td><td><code>mtr --aslookup 8.8.8.8</code></td></tr>
<tr>
<td><strong>NTP</strong></td><td>Log &amp; certificate synchronization.</td><td><code>chronyc sources</code></td></tr>
<tr>
<td><strong>TLS/SSL</strong></td><td>Encryption layer for all secure traffic.</td><td><code>openssl s_client -connect host:443</code></td></tr>
<tr>
<td><strong>DHCP</strong></td><td>Dynamic IP allocation &amp; management.</td><td><code>ip addr show</code></td></tr>
<tr>
<td><strong>ARP</strong></td><td>IP to MAC resolution (Layer 2).</td><td><code>ip neighbor</code></td></tr>
<tr>
<td><strong>NFS</strong></td><td>Network file sharing (Linux/Unix).</td><td><code>showmount -e host</code></td></tr>
<tr>
<td><strong>SMB</strong></td><td>Network file sharing (Windows/Azure Files).</td><td><code>smbclient -L //host</code></td></tr>
<tr>
<td><strong>SFTP</strong></td><td>Secure file transfer over SSH.</td><td><code>sftp user@host</code></td></tr>
<tr>
<td><strong>SMTP</strong></td><td>Outbound mail &amp; alert notifications.</td><td><code>swaks --to</code> <a target="_blank" href="mailto:user@domain.com"><code>user@domain.com</code></a></td></tr>
<tr>
<td><strong>SNMP</strong></td><td>Monitoring hardware &amp; network devices.</td><td><code>snmpwalk -v2c -c public host</code></td></tr>
<tr>
<td><strong>LDAP</strong></td><td>Centralized user authentication.</td><td><code>ldapsearch -x -h host</code></td></tr>
<tr>
<td><strong>RDP</strong></td><td>Remote desktop access (Windows/VDI).</td><td><code>telnet host 3389</code></td></tr>
<tr>
<td><strong>iSCSI</strong></td><td>Storage over IP (SAN connectivity).</td><td><code>iscsiadm -m discovery</code></td></tr>
<tr>
<td><strong>gRPC</strong></td><td>High-performance microservices.</td><td><code>grpcurl host:port list</code></td></tr>
<tr>
<td><strong>Redis</strong></td><td>Fast caching &amp; session management.</td><td><code>redis-cli -h host ping</code></td></tr>
<tr>
<td><strong>SQL</strong></td><td>Database connectivity (MySQL/PostgreSQL).</td><td><code>mysql -h host -P 3306</code></td></tr>
<tr>
<td><strong>MQTT</strong></td><td>IoT &amp; event-driven messaging.</td><td><code>mosquitto_pub -h host -t test</code></td></tr>
<tr>
<td><strong>VXLAN</strong></td><td>Overlay networks (CNIs like Flannel/Calico).</td><td><code>ip -d link show type vxlan</code></td></tr>
<tr>
<td><strong>HTTP/2</strong></td><td>Modern web performance (Multiplexing).</td><td><code>curl -I --http2</code> <a target="_blank" href="https://google.com"><code>https://google.com</code></a></td></tr>
</tbody>
</table>
</div>]]></content:encoded></item><item><title><![CDATA[SRE Guide: Blame-Free Post-mortems – From Chaos to Systemic Resilience]]></title><description><![CDATA[The Incident Doesn't End at the "Fix"
In the daily life of an SRE, the first reaction to a downtime is the "Quick Fix": restarting a pod, scaling a node, or triggering a rollback. However, an incident isn’t truly closed when the service returns to no...]]></description><link>https://blog.uptodeploy.com/sre-guide-blame-free-post-mortems</link><guid isPermaLink="true">https://blog.uptodeploy.com/sre-guide-blame-free-post-mortems</guid><category><![CDATA[SRE devops]]></category><category><![CDATA[SRE]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Cloud Computing]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[computing]]></category><dc:creator><![CDATA[Jose Alvarez R.]]></dc:creator><pubDate>Wed, 28 Jan 2026 14:00:52 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/UgA3Xvi3SkA/upload/4e1a22c81468304d7fa09128b2b7ee86.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3 id="heading-the-incident-doesnt-end-at-the-fix">The Incident Doesn't End at the "Fix"</h3>
<p>In the daily life of an SRE, the first reaction to a downtime is the "Quick Fix": restarting a pod, scaling a node, or triggering a rollback. However, an incident isn’t truly closed when the service returns to normal (<a target="_blank" href="https://blog.uptodeploy.com/sre-metrics-guide-measuring">T4</a>). In my experience, it only ends when the team fully understands the root cause and takes concrete steps to ensure it never happens again.</p>
<p>This is where the <strong>Post-mortem</strong> becomes our most powerful tool for building resilient infrastructures.</p>
<hr />
<h2 id="heading-1-the-blame-free-philosophy-why-its-non-negotiable">1. The "Blame-Free" Philosophy: Why It’s Non-Negotiable</h2>
<p>Human error is a symptom, not the cause. If an engineer accidentally executes a destructive command in production, the question shouldn't be "Who did it?" but rather "Why did the system allow a single command to compromise our availability?"</p>
<ul>
<li><p><strong>The Psychology of Reliability:</strong> If the team fears retaliation, they will hide mistakes. In SRE, a hidden error is a ticking time bomb.</p>
</li>
<li><p><strong>Systemic Focus:</strong> We look for flaws in design, architecture, or CI/CD processes, not individuals.</p>
</li>
<li><p><strong>Learning Culture:</strong> A Blame-Free Post-mortem encourages everyone to share their findings, preventing the rest of the team from making the same mistake.</p>
</li>
</ul>
<h2 id="heading-2-anatomy-of-a-high-level-post-mortem">2. Anatomy of a High-Level Post-mortem</h2>
<p>A technical document should be a clear roadmap. To make it effective for your workflow, ensure it includes:</p>
<h3 id="heading-a-executive-summary-amp-impact">A. Executive Summary &amp; Impact</h3>
<p>State what happened directly: "The payment API was down for 45 minutes, affecting 30% of transactions." It is vital to include which <a target="_blank" href="https://blog.uptodeploy.com/sre-its-not-just-automation"><strong>SLO/SLA</strong></a> metrics were compromised.</p>
<h3 id="heading-b-detailed-timeline-do-you-remember-ithttpsbloguptodeploycomsre-metrics-guide-measuring">B. Detailed Timeline (<a target="_blank" href="https://blog.uptodeploy.com/sre-metrics-guide-measuring">Do you remember it?</a>)</h3>
<p>This is the "log" of the crisis. It’s fundamental for understanding our <strong>MTTD</strong> (Detection) and <strong>MTTR</strong> (Recovery).</p>
<ul>
<li><p><strong>T0:</strong> Incident start (via metrics or logs).</p>
</li>
<li><p><strong>T1:</strong> Alert triggered.</p>
</li>
<li><p><strong>T2:</strong> Investigation begins.</p>
</li>
<li><p><strong>T3:</strong> Mitigation applied.</p>
</li>
<li><p><strong>T4:</strong> Service restored and stable.</p>
</li>
</ul>
<h3 id="heading-c-root-cause-analysis-rca">C. Root Cause Analysis (RCA)</h3>
<p>This is where we dive into the "nuts and bolts": Was it a memory leak in a microservice? A database deadlock? A misconfigured Firewall rule in the Cloud?</p>
<hr />
<h2 id="heading-3-workshop-applying-the-5-whys">3. Workshop: Applying the "5 Whys"</h2>
<p>To get to the bottom of the issue, don't stop at the first logical answer. Look at this real-world example:</p>
<p><strong>Scenario:</strong> The authentication service failed.</p>
<ol>
<li><p><strong>Why did the service fail?</strong> Because the container entered a crash loop (<em>CrashLoopBackOff</em>).</p>
</li>
<li><p><strong>Why was it crashing?</strong> Because it couldn't connect to the Redis cluster.</p>
</li>
<li><p><strong>Why couldn't it connect to Redis?</strong> Because the credentials in the Kubernetes <em>Secret</em> were incorrect.</p>
</li>
<li><p><strong>Why were the credentials incorrect?</strong> Because they were rotated manually and not updated in the deployment.</p>
</li>
<li><p><strong>Why were they rotated manually?</strong> (<strong>Root Cause</strong>): We lack a secrets management system (like HashiCorp Vault or Azure Key Vault) to automate rotation and syncing.</p>
</li>
</ol>
<hr />
<h2 id="heading-4-powering-the-process-with-aiops">4. Powering the Process with AIOps</h2>
<p>AI doesn't replace our technical judgment, but it accelerates administrative tasks so we can focus on strategy:</p>
<ul>
<li><p><strong>Timeline Reconstruction:</strong> AI can analyze thousands of logs and messages across communication channels to build a timeline in seconds.</p>
</li>
<li><p><strong>Anomaly Detection:</strong> It identifies unusual traffic patterns that occurred before the incident which might have gone unnoticed.</p>
</li>
<li><p><strong>Intelligent Drafting:</strong> Generating a first draft based on raw data allows engineers to focus on adding high-value context and definitive fixes.</p>
</li>
</ul>
<h2 id="heading-5-the-action-plan-no-tasks-no-improvement">5. The "Action Plan": No Tasks, No Improvement</h2>
<p>The final output must be a list of tasks in your backlog. Each task must be:</p>
<ol>
<li><p><strong>Specific:</strong> Instead of "Improve monitoring," use "Configure latency alert at the 99% for the <code>/auth</code> endpoint."</p>
</li>
<li><p><strong>Prioritized:</strong> Distinguish between immediate actions (preventing a recurrence tomorrow) and structural improvements.</p>
</li>
</ol>
<hr />
<h3 id="heading-conclusion">Conclusion</h3>
<p>Failure is an investment you’ve already paid for. You’ve already spent time, money, and "points" from your Error Budget. Don't waste that investment: document it, learn from it, and above all, automate the solution.</p>
]]></content:encoded></item><item><title><![CDATA[SRE Metrics Guide: Measuring the Incident Lifecycle]]></title><description><![CDATA[In SRE (Site Reliability Engineering), time is not just a number; it is the core resource that determines whether we meet or breach our SLO (Service Level Objective). To manage incidents professionally, we must deconstruct the timeline into specific ...]]></description><link>https://blog.uptodeploy.com/sre-metrics-guide-measuring</link><guid isPermaLink="true">https://blog.uptodeploy.com/sre-metrics-guide-measuring</guid><category><![CDATA[Incident Lifecycle]]></category><category><![CDATA[SRE]]></category><category><![CDATA[technology]]></category><category><![CDATA[Cloud Computing]]></category><category><![CDATA[IA]]></category><dc:creator><![CDATA[Jose Alvarez R.]]></dc:creator><pubDate>Mon, 26 Jan 2026 06:00:40 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/hpjSkU2UYSU/upload/1b57b36a97ff4373af763b9c6c8eae55.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In <strong>SRE (Site Reliability Engineering)</strong>, time is not just a number; it is the core resource that determines whether we meet or breach our <a target="_blank" href="https://blog.uptodeploy.com/sre-its-not-just-automation"><strong>SLO</strong></a> <strong>(Service Level Objective)</strong>. To manage incidents professionally, we must deconstruct the timeline into specific metrics that reveal exactly where we can optimize our systems and processes.</p>
<h2 id="heading-1-the-incident-lifecycle-from-t0-to-t4">1. The Incident Lifecycle: From T0 to T4</h2>
<p>An incident is not an isolated event but a sequence of stages. Whether it is a failing Kubernetes pod or a misconfigured security rule, every event follows this chronology:</p>
<ul>
<li><p><strong>T0:</strong> Incident Start (The actual moment the failure occurs).</p>
</li>
<li><p><strong>T1:</strong> Detection. The monitoring system identifies the failure and triggers an alert.</p>
</li>
<li><p><strong>T2:</strong> Acknowledgment. An engineer acknowledges the alert and begins the investigation.</p>
</li>
<li><p><strong>T3:</strong> Mitigation. A fix is applied (Hotfix, Rollback, Restart).</p>
</li>
<li><p><strong>T4:</strong> Full Recovery. The service is 100% operational for the user again.</p>
</li>
</ul>
<hr />
<h2 id="heading-2-key-metrics-mttx">2. Key Metrics (MTTx)</h2>
<p>Understanding these intervals allows us to move from "guessing" to "managing with data."</p>
<h3 id="heading-mttd-mean-time-to-detect-t0-t1">MTTD: Mean Time to Detect (T0 - T1)</h3>
<ul>
<li><p><strong>What it measures:</strong> The effectiveness of our observability stack.</p>
</li>
<li><p><strong>The Goal:</strong> We aim for seconds. If a user notifies you before your tools do, your monitoring needs adjustment.</p>
</li>
</ul>
<h3 id="heading-mtta-mean-time-to-acknowledge-t1-t2">MTTA: Mean Time to Acknowledge (T1 - T2)</h3>
<ul>
<li><p><strong>What it measures:</strong> The responsiveness of the On-call team.</p>
</li>
<li><p><strong>The Goal:</strong> To reduce the time an alert remains unaddressed, which helps mitigate "alert fatigue."</p>
</li>
</ul>
<h3 id="heading-mttr-the-recovery-standard">MTTR: The Recovery Standard</h3>
<p>In the industry, we differentiate between two approaches for MTTR:</p>
<ul>
<li><p><strong>MTTR (Recovery):</strong> From T0 to T4. This is the total downtime experienced by the customer.</p>
</li>
<li><p><strong>MTTR (Repair):</strong> From T2 to T3. It measures technical agility in applying a solution once the problem is identified.</p>
</li>
</ul>
<h3 id="heading-mtbf-mean-time-between-failures-t0-t4">MTBF: Mean Time Between Failures (T0 - T4)</h3>
<ul>
<li><p><strong>What it measures:</strong> The structural stability of the architecture.</p>
</li>
<li><p><strong>The Insight:</strong> If you repair quickly (low MTTR) but the system fails constantly (low MTBF), you have a root-cause technical debt issue that must be addressed.</p>
</li>
</ul>
<hr />
<h2 id="heading-3-the-impact-on-the-error-budget">3. The Impact on the Error Budget</h2>
<p>Every minute of downtime is a direct withdrawal from your <strong>Error Budget</strong>.</p>
<blockquote>
<p><strong>Quick Calculation:</strong> If your SLO is 99.9% (approx. 43 minutes of allowed downtime per month) and a single incident has an MTTR of 30 minutes, you have consumed <strong>70% of your monthly budget</strong> in a single event. Precision in these metrics is fundamental for decision-making.</p>
</blockquote>
<hr />
<h2 id="heading-4-optimization-with-automation-and-ai">4. Optimization with Automation and AI</h2>
<p>To drive these numbers down using a <strong>Cloud-Native</strong> approach, we apply technology at every stage:</p>
<ul>
<li><p><strong>Optimizing MTTD:</strong> We implement anomaly detection. AI can identify traffic variations that static rules might ignore, triggering T1 almost instantly.</p>
</li>
<li><p><strong>Optimizing MTTR:</strong> We prioritize <strong>Self-Healing</strong>. Through Kubernetes Operators or automation scripts, the system can execute a T3 (like an automatic restart) before a human even intervenes.</p>
</li>
<li><p><strong>Accelerating RCA:</strong> AI tools correlate events and logs to provide the "why" quickly, allowing engineers to move from T2 to T3 much faster.</p>
</li>
</ul>
<hr />
<h2 id="heading-conclusion-from-support-to-architecture">Conclusion: From Support to Architecture</h2>
<p>Mastering these metrics allows you to manage infrastructure with technical precision.</p>
<ul>
<li><p>Reducing <strong>MTTD</strong> provides clear visibility.</p>
</li>
<li><p>Reducing <strong>MTTR</strong> protects your <strong>Error Budget</strong>.</p>
</li>
<li><p>Increasing <strong>MTBF</strong> builds confidence in the platform.</p>
</li>
</ul>
<p>By integrating automation and AI into this flow, you shift from executing manual tasks to becoming the architect who designs resilient systems. Remember: in SRE, what cannot be measured, cannot be improved.</p>
]]></content:encoded></item><item><title><![CDATA[SRE Guide: The Art of Measuring Trust (SLI, SLO, SLA)]]></title><description><![CDATA[In the world of infrastructure, we often obsess over whether a server is "alive" (ping). But a business doesn't care about a ping; it cares about the user experience.
To understand this, let's step away from the data center for a moment and imagine w...]]></description><link>https://blog.uptodeploy.com/sre-its-not-just-automation</link><guid isPermaLink="true">https://blog.uptodeploy.com/sre-its-not-just-automation</guid><category><![CDATA[SRE]]></category><category><![CDATA[Error budget]]></category><category><![CDATA[computing]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[Cloud Computing]]></category><category><![CDATA[IA]]></category><dc:creator><![CDATA[Jose Alvarez R.]]></dc:creator><pubDate>Thu, 22 Jan 2026 14:00:29 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/poI7DelFiVA/upload/15879e7587902c1776cbf7fd60314484.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the world of infrastructure, we often obsess over whether a server is "alive" (ping). But a business doesn't care about a ping; it cares about the <strong>user experience</strong>.</p>
<p>To understand this, let's step away from the data center for a moment and imagine we are the <strong>owners of a busy Burger Restaurant</strong>.</p>
<hr />
<h3 id="heading-the-restaurant-metaphor">The Restaurant Metaphor</h3>
<h4 id="heading-1-sli-service-level-indicator-the-thermometer">1. SLI (Service Level Indicator) - "The Thermometer"</h4>
<p>The SLI is the raw, real-time measurement of what is happening right now. It is a snapshot of reality.</p>
<ul>
<li><p><strong>In the Restaurant:</strong> It’s the exact time it takes for a waiter to bring a burger to the table after the customer orders.</p>
</li>
<li><p><strong>In SRE:</strong> It’s the latency (e.g., 300ms) or the success rate of requests (e.g., 99.9% of responses are 200 OK).</p>
</li>
</ul>
<blockquote>
<p><strong>Rule of thumb:</strong> The SLI answers the question: <strong>"How is the service performing at this very second?"</strong></p>
</blockquote>
<h4 id="heading-2-slo-service-level-objective-the-internal-promise">2. SLO (Service Level Objective) - "The Internal Promise"</h4>
<p>The SLO is the target you set for your team to keep the customers happy. It’s your "Line in the Sand."</p>
<ul>
<li><p><strong>In the Restaurant:</strong> You decide that <strong>95% of burgers must be served in under 15 minutes</strong>.</p>
<ul>
<li>Why not 100%? Because you know that sometimes the kitchen gets slammed or a waiter trips. Aiming for 100% would require hiring 50 waiters for one table, and you would go bankrupt.</li>
</ul>
</li>
<li><p><strong>In SRE:</strong> 99.9% of requests to the <strong>Azure</strong> API must respond in less than 200ms over a rolling 30-day window.</p>
</li>
</ul>
<blockquote>
<p><strong>Key Concept:</strong> The SLO is the balance between <strong>user happiness</strong> and <strong>operational cost</strong>.</p>
</blockquote>
<h4 id="heading-3-sla-service-level-agreement-the-legal-contract">3. SLA (Service Level Agreement) - "The Legal Contract"</h4>
<p>The SLA is what you promise the customer in writing, including the consequences if you fail.</p>
<ul>
<li><p><strong>In the Restaurant:</strong> You hang a sign on the door: <em>"If your food takes longer than 30 minutes, it’s free!"</em> * Note that the SLA (30 min) is much more relaxed than your internal goal/SLO (15 min). This gives you a "safety buffer."</p>
</li>
<li><p><strong>In SRE:</strong> This is the legal contract. If the platform falls below 99% uptime, the provider must pay back credits or refunds.</p>
</li>
</ul>
<hr />
<h3 id="heading-quick-comparison">Quick Comparison</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Acronym</strong></td><td><strong>Name</strong></td><td><strong>Who watches it?</strong></td><td><strong>What happens if it fails?</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>SLI</strong></td><td><strong>Indicator</strong></td><td>The Engineer</td><td>We tune the code or the resources.</td></tr>
<tr>
<td><strong>SLO</strong></td><td><strong>Objective</strong></td><td>The SRE Team</td><td>We stop new changes (<strong>Error Budget</strong>).</td></tr>
<tr>
<td><strong>SLA</strong></td><td><strong>Agreement</strong></td><td>The Lawyer / Client</td><td>There are financial consequences.</td></tr>
</tbody>
</table>
</div><hr />
<h3 id="heading-the-error-budget-your-room-for-innovation">The Error Budget: Your "Room for Innovation"</h3>
<p>If your SLO is to deliver 95% of burgers on time, you have a 5% <strong>margin of error</strong>. That is your <strong>Error Budget</strong>.</p>
<ul>
<li><p><strong>Budget is full?</strong> You can spend that 5% experimenting with a risky new recipe (<strong>Innovation</strong>).</p>
</li>
<li><p><strong>Budget is empty?</strong> You made too many mistakes this month. Stop experimenting and focus 100% on making the kitchen stable (<strong>Reliability</strong>).</p>
</li>
</ul>
<hr />
<h3 id="heading-how-ai-helps-us">How AI helps us?</h3>
<p>The AI acts like a <strong>Highly Intelligent Kitchen Supervisor</strong>:</p>
<ol>
<li><p><strong>Detection:</strong> The AI notices the oil is taking 2 degrees longer to heat up before the meat comes out undercooked (<strong>AIOps</strong>).</p>
</li>
<li><p><strong>Prediction:</strong> It warns you: <em>"At the current rate you're burning burgers, you'll have to start giving them away for free in 3 days (SLA breach prediction)."</em></p>
</li>
<li><p><strong>Action:</strong> If it sees a crowd coming, it automatically fires up a second grill (e.g. Auto-scaling in Azure).</p>
</li>
</ol>
<hr />
<h3 id="heading-final-conclusion">Final Conclusion</h3>
<p>At the end of the day, whether you are managing a Raspberry Pi at home or a multi-region infrastructure with Azure or AWS, the lesson is the same: You cannot manage what you do not measure.</p>
<p>The <strong>SLI, SLO,</strong> and <strong>SLA</strong> framework isn't just a set of acronyms; it is a shared language between technology and business.</p>
<ul>
<li><p><strong>SLIs</strong> give us the truth.</p>
</li>
<li><p><strong>SLOs</strong> give us a goal.</p>
</li>
<li><p><strong>SLAs</strong> define our commitment.</p>
</li>
</ul>
<p>By mastering this framework—and accelerating it with <strong>AI</strong>—you stop being the person who "fixes servers" and become the architect who ensures the business can keep its promises to its users. Reliability is not a lucky accident; it is a calculated decision.</p>
]]></content:encoded></item><item><title><![CDATA[How to Securely Expose Your Local Lab Using Cloudflare Tunnel and Docker]]></title><description><![CDATA[In the world of Cloud Engineering, security is not an afterthought; it’s a foundation. When hosting a portfolio or a home lab, the old-school method of Port Forwarding is a major security risk. It exposes your home IP and leaves your network vulnerab...]]></description><link>https://blog.uptodeploy.com/cloudflare-tunnel-docker-zero-trust</link><guid isPermaLink="true">https://blog.uptodeploy.com/cloudflare-tunnel-docker-zero-trust</guid><category><![CDATA[SRE]]></category><category><![CDATA[cloudflare]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Docker]]></category><category><![CDATA[Security]]></category><category><![CDATA[zerotrust]]></category><dc:creator><![CDATA[Jose Alvarez R.]]></dc:creator><pubDate>Tue, 06 Jan 2026 03:49:49 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767669477076/3452df0b-ef60-487e-9bd2-659126d65a71.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the world of Cloud Engineering, security is not an afterthought; it’s a foundation. When hosting a portfolio or a home lab, the old-school method of <strong>Port Forwarding</strong> is a major security risk. It exposes your home IP and leaves your network vulnerable.</p>
<p>Today, we are taking a <strong>Zero Trust</strong> approach. We will use <strong>Cloudflare Tunnel (cloudflared)</strong> and <strong>Docker Compose</strong> to create a secure bridge that allows the world to see your work without ever opening a port on your router.</p>
<hr />
<h3 id="heading-the-architecture">The Architecture</h3>
<p>Before we dive into the terminal, let's look at the flow of traffic:</p>
<ol>
<li><p><strong>User</strong> requests domain <a target="_blank" href="http://uptodeploy.com"><code>example.com</code></a>.</p>
</li>
<li><p><strong>Cloudflare Edge</strong> receives the request.</p>
</li>
<li><p><strong>Cloudflared Connector</strong> (running in your Docker or server) pulls the request through an outbound-only encrypted tunnel.</p>
</li>
<li><p><strong>Webserver</strong> serves the static files locally.</p>
</li>
</ol>
<p><img src="https://developers.cloudflare.com/_astro/handshake.eh3a-Ml1_1IcAgC.webp" alt="Cloudflare Tunnel · Cloudflare One docs" /></p>
<blockquote>
<p>URL: <a target="_blank" href="https://developers.cloudflare.com/cloudflare-one/networks/connectors/cloudflare-tunnel/">https://developers.cloudflare.com/cloudflare-one/networks/connectors/cloudflare-tunnel/</a></p>
</blockquote>
<hr />
<h3 id="heading-prerequisites-for-our-project">Prerequisites for our project:</h3>
<ul>
<li><p><strong>Domain:</strong> Managed by Cloudflare (e.g., <a target="_blank" href="http://up2runc.com"><code>up2runc.com</code></a>).</p>
</li>
<li><p><strong>Environment:</strong> Docker &amp; Docker Compose installed.</p>
</li>
<li><p><strong>Access:</strong> Cloudflare Zero Trust dashboard (Free Tier).</p>
</li>
</ul>
<hr />
<h3 id="heading-step-1-initialize-the-cloudflare-tunnel">Step 1: Initialize the Cloudflare Tunnel</h3>
<p>We need to create the tunnel identity in the Cloudflare Cloud.</p>
<ol>
<li><p>Navigate to the <strong>Zero Trust Dashboard</strong> &gt; <strong>Networks</strong> &gt; <strong>Connectors</strong></p>
</li>
<li><p>Click <strong>Create a Tunnel</strong> &gt; <strong>Cloudflared &gt; Select Cloudflared</strong></p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767668222759/73418942-000c-4e08-b449-aa7b31c55bea.png" alt class="image--center mx-auto" /></p>
</li>
<li><p>Name it (e.g <code>website-prod</code>)</p>
</li>
<li><p><strong>Save Tunnel</strong></p>
</li>
<li><p>Choose <strong>Docker provider</strong>:</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767668334963/e12c78eb-fa7a-46a4-8ce9-b0c8c8d7d4e0.png" alt class="image--center mx-auto" /></p>
<p> <strong>Important:</strong> Never share your token!!</p>
</li>
</ol>
<hr />
<h3 id="heading-step-2-provisioning-with-docker-compose">Step 2: Provisioning with Docker Compose</h3>
<p>We will define our infrastructure as code. This ensures our setup is reproducible.</p>
<p><strong>File:</strong> <code>docker-compose.yml</code></p>
<p>YAML FILE</p>
<pre><code class="lang-yaml"><span class="hljs-attr">version:</span> <span class="hljs-string">'3.8'</span>
<span class="hljs-attr">services:</span>
  <span class="hljs-comment"># Service 1: The Alpine Linux</span>
  <span class="hljs-attr">web:</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">nginx:alpine</span>
    <span class="hljs-attr">container_name:</span> <span class="hljs-string">web-server</span>
    <span class="hljs-attr">restart:</span> <span class="hljs-string">unless-stopped</span>
    <span class="hljs-attr">volumes:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">.:/usr/share/nginx/html:ro</span>

  <span class="hljs-comment"># Service 2: The Tunnel Connector</span>
  <span class="hljs-attr">tunnel:</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">cloudflare/cloudflared:latest</span>
    <span class="hljs-attr">container_name:</span> <span class="hljs-string">cloudflare-connector</span>
    <span class="hljs-attr">restart:</span> <span class="hljs-string">unless-stopped</span>
    <span class="hljs-attr">command:</span> <span class="hljs-string">tunnel</span> <span class="hljs-string">--no-autoupdate</span> <span class="hljs-string">run</span> <span class="hljs-string">--token</span> <span class="hljs-string">${TUNNEL_TOKEN}</span>
    <span class="hljs-attr">depends_on:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">web</span>
</code></pre>
<hr />
<h3 id="heading-step-3-configure-public-hostnames">Step 3: Configure Public Hostnames</h3>
<p>Go back to your Cloudflare Tunnel settings and click the <strong>Public Hostname</strong> tab.</p>
<ul>
<li><p><strong>Public Hostname:</strong> <a target="_blank" href="http://uptodeploy.com"><code>up2runc.com</code></a></p>
</li>
<li><p><strong>Service:</strong> <a target="_blank" href="http://web:80"><code>http://web:80</code></a></p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767668742647/c950bd59-858e-495e-9878-1503ef66ed5a.png" alt class="image--center mx-auto" /></p>
<blockquote>
<p><strong>Note:</strong> We use <code>web:80</code> because Docker Compose creates an internal network where the services can talk to each other by name.</p>
</blockquote>
<hr />
<h3 id="heading-step-4-deployment-amp-verification">Step 4: Deployment &amp; Verification</h3>
<p>Execute the deployment:</p>
<p>Bash</p>
<pre><code class="lang-bash"><span class="hljs-built_in">export</span> TUNNEL_TOKEN=your_token_here
docker-compose up -d
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767670925971/53bd9349-3795-4db9-bcbe-c2be049e02db.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767671022636/484b0278-2a76-4d8a-b714-5bb2c7bc1c53.png" alt class="image--center mx-auto" /></p>
<h4 id="heading-verification-checklist">Verification Checklist:</h4>
<ul>
<li><p><strong>Status:</strong> Check the tunnel status in the dashboard; it should show <strong>HEALTHY</strong>.</p>
</li>
<li><p><strong>Connectivity:</strong> Run <code>curl -I</code> <a target="_blank" href="https://uptodeploy.com"><code>https://up2runc.com</code></a> and look for the <code>Server: cloudflare</code> header.</p>
</li>
<li><p><strong>Security:</strong> Confirm your router has <strong>NO</strong> ports forwarded to your machine.</p>
</li>
</ul>
<hr />
<h3 id="heading-key-takeaways">Key Takeaways</h3>
<ol>
<li><p><strong>Outbound Only:</strong> The tunnel only makes outbound connections. This means your firewall stays closed.</p>
</li>
<li><p><strong>Identity-Aware:</strong> You can now add Cloudflare Access to require a login before anyone even sees your site.</p>
</li>
<li><p><strong>Static Content:</strong> Nginx Alpine is the excelent standard for lightweight, high-performance static hosting.</p>
</li>
</ol>
<hr />
<h3 id="heading-conclusion">Conclusion</h3>
<p>And just like that—simple and efficient—we’ve implemented a robust, Cloudflare-protected solution. We’ve moved from a 'Home Hobbyist' setup to a Professional Zero Trust Architecture, proving that high-level security doesn't have to be over-complicated.</p>
<p>By eliminating the public attack surface and leveraging Docker's immutability, <a target="_blank" href="http://up2runc.com">up2runc.com</a> is now production-ready.</p>
<p><strong>Are you ready for the next project?</strong></p>
]]></content:encoded></item><item><title><![CDATA[Beyond Port Forwarding: The SRE Way]]></title><description><![CDATA[When I decided to host my personal brand site, https://www.UpToDeploy.com, I faced a classic dilemma: pay for a VPS or use the hardware I already own. I chose my Raspberry Pi, but as someone focused on Security and Reliability, simply opening ports o...]]></description><link>https://blog.uptodeploy.com/beyond-port-forwarding-the-sre-way</link><guid isPermaLink="true">https://blog.uptodeploy.com/beyond-port-forwarding-the-sre-way</guid><category><![CDATA[Cloud Computing]]></category><category><![CDATA[Raspberry Pi]]></category><category><![CDATA[cloudflare]]></category><category><![CDATA[Linux]]></category><category><![CDATA[zerotrust]]></category><category><![CDATA[architecture]]></category><category><![CDATA[SRE]]></category><dc:creator><![CDATA[Jose Alvarez R.]]></dc:creator><pubDate>Tue, 30 Dec 2025 23:21:52 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767136778306/68eecb1b-5228-40dd-b4b3-303884b3c543.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When I decided to host my personal brand site, <a target="_blank" href="https://www.google.com/search?q=UpToDeploy.com"><strong>https://www.UpToDeploy.com</strong></a>, I faced a classic dilemma: pay for a VPS or use the hardware I already own. I chose my <strong>Raspberry Pi</strong>, but as someone focused on <strong>Security and Reliability</strong>, simply opening ports on my home router (Port Forwarding) was not an option.</p>
<p>In this article, I’ll show you how I leverage the <strong>Zero Trust</strong> architecture to expose my local environment to the world securely.</p>
<h2 id="heading-the-problem-the-risks-of-traditional-hosting">The Problem: The Risks of Traditional Hosting</h2>
<p>Traditional home hosting requires exposing your public IP and opening ports (like 80 or 443). This makes your home network a target for DDoS attacks and port scanning. I needed a solution that followed the principle of "least privilege."</p>
<h2 id="heading-the-solution-cloudflare-tunnels">The Solution: Cloudflare Tunnels</h2>
<p>Cloudflare Tunnels (part of the Zero Trust suite) allow you to create a secure, outbound-only connection from your infrastructure to Cloudflare’s edge.</p>
<p><strong>Why this is a game-changer:</strong></p>
<ul>
<li><p><strong>No Inbound Ports:</strong> My router remains closed to the internet.</p>
</li>
<li><p><strong>Identity-Based Access:</strong> I can layer authentication if needed.</p>
</li>
<li><p><strong>Hidden IP:</strong> My home IP is never exposed to the public; only Cloudflare’s IP addresses are visible.</p>
</li>
</ul>
<h2 id="heading-the-stack">The Stack</h2>
<p>To keep the deployment clean and reproducible, I used a <strong>containerized approach</strong>:</p>
<ul>
<li><p><strong>Hardware:</strong> Raspberry Pi.</p>
</li>
<li><p><strong>Web Server:</strong> Nginx (Alpine-based for a tiny footprint).</p>
</li>
<li><p><strong>Orchestration:</strong> Docker Compose.</p>
</li>
<li><p><strong>Connectivity:</strong> <code>cloudflared</code> (The Cloudflare Tunnel connector).</p>
</li>
</ul>
<h3 id="heading-the-deployment-docker-compose">The Deployment (Docker Compose)</h3>
<p>Instead of installing the connector directly on the OS, I deployed it as a sidecar container. This ensures that if I move my site to another machine, the entire infrastructure moves with it.</p>
<p>This is my simple docker-compose.yaml</p>
<pre><code class="lang-yaml"><span class="hljs-attr">services:</span>
  <span class="hljs-attr">web:</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">nginx:alpine</span>
    <span class="hljs-attr">container_name:</span> <span class="hljs-string">website-linkbio</span>
    <span class="hljs-attr">restart:</span> <span class="hljs-string">unless-stopped</span>
    <span class="hljs-attr">volumes:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">./index.html:/usr/share/nginx/html/index.html:ro</span>

  <span class="hljs-attr">tunnel:</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">cloudflare/cloudflared:latest</span>
    <span class="hljs-attr">restart:</span> <span class="hljs-string">always</span>
    <span class="hljs-attr">environment:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">TUNNEL_TOKEN=${CLOUDFLARE_TOKEN}</span>
    <span class="hljs-attr">command:</span> <span class="hljs-string">tunnel</span> <span class="hljs-string">--no-autoupdate</span> <span class="hljs-string">run</span>
</code></pre>
<h2 id="heading-key-takeaways">Key Takeaways</h2>
<ol>
<li><p><strong>Security first:</strong> By using a tunnel, I've eliminated the primary attack surface of home hosting.</p>
</li>
<li><p><strong>Resilience:</strong> Docker ensures that if the Raspberry Pi reboots, the site and the tunnel come back online automatically.</p>
</li>
<li><p><strong>Professionalism:</strong> Using my custom domain <a target="_blank" href="http://uptodeploy.com"><code>uptodeploy.com</code></a> with full SSL/TLS encryption, despite being hosted in a residential network.</p>
</li>
</ol>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Setting up <strong>UpToDeploy</strong> wasn't just about a website; it was about practicing the <strong>SRE and Architecture</strong> principles I believe in. It’s a proof of concept that professional, secure, and highly available services can be built from anywhere.</p>
]]></content:encoded></item><item><title><![CDATA[UpToDeploy: Elevating Reliability in the Age of AI and Cloud]]></title><description><![CDATA[The Journey Toward Systems Architecture
Hello! My name is Jose Alvarez Rodriguez. If you are coming from my LinkedIn or know me from the industry, you’ll know that my passion is making things work—but above all, making them resilient.
Today, I am lau...]]></description><link>https://blog.uptodeploy.com/uptodeploy-elevating-reliability-in-the-age-of-ai-and-cloud</link><guid isPermaLink="true">https://blog.uptodeploy.com/uptodeploy-elevating-reliability-in-the-age-of-ai-and-cloud</guid><category><![CDATA[SRE]]></category><category><![CDATA[Cloud Computing]]></category><category><![CDATA[Devops]]></category><category><![CDATA[architecture]]></category><category><![CDATA[AI]]></category><category><![CDATA[Kubernetes]]></category><dc:creator><![CDATA[Jose Alvarez R.]]></dc:creator><pubDate>Tue, 30 Dec 2025 01:59:31 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/FocSgUZ10JM/upload/93639a498621a43907e491c079097cb4.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-the-journey-toward-systems-architecture">The Journey Toward Systems Architecture</h2>
<p>Hello! My name is <strong>Jose Alvarez Rodriguez</strong>. If you are coming from my LinkedIn or know me from the industry, you’ll know that my passion is making things work—but above all, making them resilient.</p>
<p>Today, I am launching <strong>UpToDeploy</strong>, a space where Site Reliability Engineering (SRE) meets modern architecture. My goal is to transition from pure operations toward strategic consultancy, and I want to invite you to join me in this process.</p>
<h2 id="heading-my-three-pillars-the-uptodeploy-dna">My Three Pillars: The UpToDeploy DNA</h2>
<p>To me, a "production-ready" system is not just code that runs; it is a structure sustained by three pillars that I consider non-negotiable:</p>
<ul>
<li><p><strong>Cloud (Azure &amp; AWS):</strong> The foundation of scalability. My focus is not just "moving things to the cloud," but designing cost-efficient and high-availability architectures.</p>
</li>
<li><p><strong>Security:</strong> As an SRE, I understand that security is not a final step, but an integral part of the software development lifecycle (DevSecOps).</p>
</li>
<li><p><strong>Containerization (Kubernetes &amp; Docker):</strong> The unit of measurement for modern computing. Containers are the key to portability and the agility that consultancy firms demand today.</p>
</li>
</ul>
<h2 id="heading-the-new-horizon-ai-and-observability">The New Horizon: AI and Observability</h2>
<p>We cannot talk about architecture in 2025 without mentioning Artificial Intelligence. Part of my mission with this blog is to investigate how AI is transforming the SRE role: from predictive failure analysis to intelligent infrastructure automation.</p>
<p>At <strong>UpToDeploy</strong>, we will explore how to integrate AI models to make our architectures not only robust but also "self-healing."</p>
<h2 id="heading-what-to-expect-from-this-space">What to Expect from This Space?</h2>
<p>Whether you are a recruiter, a fellow engineer, or someone seeking technology consultancy, here you will find:</p>
<ul>
<li><p>Architecture analysis on <strong>Azure and AWS</strong>.</p>
</li>
<li><p>Security guides for <strong>containerized environments</strong>.</p>
</li>
<li><p>Reflections on the impact of <strong>AI in IT operations</strong>.</p>
</li>
</ul>
]]></content:encoded></item></channel></rss>