UpToDeploy | SRE, Cloud Architecture & Security

30 Essential Docker Commands: A Practical Guide for SREs

Jose Alvarez R. — Wed, 11 Feb 2026 06:00:54 GMT

1. Container Management (Lifecycle)

Command	Purpose	Example
`docker ps`	List running containers.	`docker ps -a` (to see all containers, including those that exited with errors).
`docker run`	Create and start a container.	`docker run -d --name web nginx` (run a web server in the background/detached).
`docker stop`	Stop a running container.	`docker stop $(docker ps -q)` (stop all running containers at once).
`docker rm`	Remove a container.	`docker rm -f web` (force removal of a running container).
`docker exec`	Run a command in a container.	`docker exec -it web bash` (get an interactive shell inside a container).
`docker logs`	Fetch container logs.	`docker logs -f --tail 100 web` (follow the last 100 log lines in real-time).

2. Image Management

Command	Purpose	Example
`docker images`	List available images.	`docker images -q` (get only the IDs of all local images).
`docker pull`	Download an image.	`docker pull redis:latest` (ensure you have the latest version before deployment).
`docker build`	Build an image from a Dockerfile.	`docker build -t my-app:v1 .` (tag a new version of your application).
`docker rmi`	Remove an image.	`docker rmi $(docker images -f "dangling=true" -q)` (clean up unused/dangling images).
`docker tag`	Create a tag for an image.	`docker tag my-app:v1 myrepo/my-app:v1` (prepare an image for a registry).
`docker push`	Upload an image to a registry.	`docker push myrepo/my-app:v1` (deploy the image to Docker Hub or ACR/ECR).

3. Observability & Troubleshooting

Command	Purpose	Example
`docker stats`	Live resource usage.	`docker stats --no-stream` (get a snapshot of CPU/RAM usage per container).
`docker inspect`	Low-level info on objects.	`docker inspect web` (Information about the container)
`docker top`	Display running processes.	`docker top web` (see what processes are running inside the container host-side).
`docker port`	List port mappings.	`docker port web` (verify which host port is mapped to the container's port).
`docker events`	Real-time events from server.	`docker events --since 1h` (audit what happened in the Docker engine recently).
`docker diff`	Changes in the FS.	`docker diff web` (see what files were modified/added in the container layer).

4. Network & Volumes

Command	Purpose	Example
`docker network ls`	List networks.	`docker network ls` (check available drivers like bridge, host, or overlay).
`docker network inspect`	Detailed network info.	`docker network inspect bridge` (see which containers are attached to a network).
`docker volume ls`	List volumes.	`docker volume ls` (identify persistent data volumes).
`docker volume prune`	Remove unused volumes.	`docker volume prune` (recover disk space from orphaned volumes).
`docker cp`	Copy files to/from container.	`docker cp web:/etc/nginx/nginx.conf .` (extract a config file for local review).
`docker network connect`	Connect container to network.	`docker network connect my-net web` (dynamically attach a container to a new network).

5. Advanced & Cleanup

Command	Purpose	Example
`docker system prune`	Total cleanup.	`docker system prune -a --volumes` (wipe everything unused to free up disk).
`docker system df`	Show docker disk usage.	`docker system df` (diagnose why the Docker partition is full).
`docker-compose up`	Orchestrate multiple containers.	`docker-compose up -d` (deploy an entire stack defined in a YAML).
`docker-compose logs`	View logs from a stack.	`docker-compose logs -f app` (follow logs for a specific service in the stack).
`docker commit`	Create image from container.	`docker commit web my-emergency-image` (save a container's state for forensics).
`docker wait`	Wait for a container to stop.	`docker wait web` (useful in scripts to take action after a container finishes).

30 Essential Commands in Linux: Quick Guide for SREs

Jose Alvarez R. — Sun, 08 Feb 2026 02:34:28 GMT

1. System Observability & Performance

Command	Purpose	Real-World SRE Example
`top` / `htop`	Process monitoring.	`htop` (to visually identify which CPU core is pinned at 100%).
`uptime`	System load average.	`uptime` (to check if the load average exceeds the number of CPU cores).
`vmstat`	Memory & CPU stats.	`vmstat 1 5` (report every second to detect context switching or swapping).
`iostat`	Disk I/O utilization.	`iostat -xz 1` (monitor extended disk latency in real-time).
`free -h`	Available RAM & Swap.	`free -h` (quick check before deploying a high-memory container).
`sar`	Historical performance.	`sar -u -f /var/log/sa/sa10` (analyze CPU usage from the 10th of the month).

2. Networking & Connectivity

Command	Purpose	Real-World SRE Example
`ip addr`	IP configuration.	`ip addr show eth0` (validate the IP assigned to a specific interface).
`ss -tulpn`	Open ports & sockets.	`ss -tulpn`
`dig`	DNS lookups.	`dig +short` `blog.uptodeploy.com` (get only the A record IP for a domain).
`curl -Iv`	HTTP(S) debugging.	`curl -Iv` `https://google.com` (inspect TLS handshake and headers).
`traceroute`	Network path routing.	`traceroute -I 8.8.8.8` (use ICMP to find where the packet drop occurs).
`tcpdump`	Packet capturing.	`tcpdump -i eth0 port 443` (sniffing HTTPS traffic for deep debugging).

3. File System & Logs

Command	Purpose	Real-World SRE Example
`df -h`	Disk space usage.	`df -h /` (check if the root partition is at 100% capacity).
`du -sh *`	Directory size.	`du -sh /var/log/*` (find which log file is consuming the most space).
`lsof`	List open files.	`lsof -i :22` (see active users connected via SSH).
`tail -f`	Real-time log follow.	`tail -f /var/log/syslog` (monitor system events as they happen).
`grep -r`	Pattern searching.	`grep -r "error" /var/log/nginx/` (search for errors across all Nginx logs).
`find`	Locate files.	`find /etc -name "*.conf"` (locate all configuration files in /etc).

4. Process Management & Security

Command	Purpose	Real-World SRE Example
`ps aux`	List active processes.	`ps aux --sort=-%mem` (list processes by highest RAM consumption).
`kill -9`	Force termination.	`kill -9 1234` (kill a zombie or hung process with PID 1234).
`systemctl`	Manage services.	`systemctl restart docker` (restart the Docker daemon).
`journalctl -xe`	Systemd logs.	`journalctl -u nginx.service -f` (follow logs for a specific service).
`sudo`	Superuser privileges.	`sudo visudo` (safely edit the sudoers file to manage permissions).
`chmod` / `chown`	Permissions & Ownership.	`chown -R www-data:www-data /var/www/html` (fix web server permissions).

5. SRE Power Tools

Command	Purpose	Real-World SRE Example
`strace`	Trace system calls.	`strace -p 1234` (debug why a process is stuck or failing).
`dmesg -T`	Kernel ring buffer.	`dmesg -T`
`awk`	Text processing.	`awk '{print $1}' access.log` (extract only the IPs from an access log).
`rsync`	Efficient file sync.	`rsync -avz ./data/ remote:/backup/` (sync data with compression and delta).
`openssl`	SSL/TLS management.	`openssl x509 -in cert.crt -text -noout` (check certificate expiration/details).

SRE Guide: Eliminating "Toil" – The Art of Scaling Without Burning Out

Jose Alvarez R. — Tue, 03 Feb 2026 06:00:10 GMT

Automate to scale, not just to survive

In the world of Site Reliability Engineering (SRE), not all automation is created equal. There is a silent enemy that consumes engineer’s time, stalls innovation, and drains operational budgets: Toil.

If your daily routine revolves around manually putting out fires, you aren't doing SRE; you are doing traditional operations with a modern title.

1. What is Toil, really? (And what it isn't)

We often mistake "boring work" for Toil. However, for a task to be technically classified as Toil, it must meet the four following points:

Manual: It is performed by a human (e.g., Connecting through SSH to a server to restart a pod or clear logs).
Repetitive: You do it over and over again, week after week.
Automatable: If a Bash script or a Python workflow could handle it, it’s Toil.
No Enduring Value: Once you are done, the system hasn't structurally improved. The state simply returned to "point zero."

Note: If you are designing a new architecture in Azure or hardening a Linux, that is not Toil; that is engineering. You are leaving the system better than you found it.

2. The 50% Rule

In big organizations, we follow a strict mandate: An SRE should spend no more than 50% of their time on Toil.

The other 50%: Must be dedicated exclusively to engineering projects. This includes developing new tools, optimizing Infrastructure as Code (IaC), or implementing advanced security policies.
Why it's vital: If Toil grows at the same rate as your systems, you will eventually need an army of operators just to "keep the lights on." Toil doesn't scale; engineering does.

3. Strategies to Eliminate Toil

To tackle this situation, we need a clear solutions. Here is how we implement it in real-world scenearios:

Self-Healing Systems: Instead of manual intervention during a failure, we use or design the system to perform its own Health Checks and repair itself (e.g., restarting services or replacing unhealthy instances) without human input.
Declarative Infrastructure: We eliminate human error by manual configurations. Every change to networks, firewalls, or servers must be defined in code, ensuring auditability, repeatability, and consistent deployments.
Event-Driven Operations: We set up automated triggers. If the system detects a resource will hit its limit within a specific timeframe, a logical routine should handle the expansion before it turns into an incident.
GitOps & Continuous Delivery: The "truth" for the infrastructure resides in a controlled repository. Any drift between the live environment and the code is automatically reconciled, eliminating manual configuration drift.

4. The Role of AI Against Toil

AI is the ultimate weapon because, unlike a static Bash script, AI is adaptive. As SRE, we must leverage this:

Smart Alert Classification: AI can filter out monitoring "noise," discarding false positives and automatically handling low-impact alerts that previously woke someone up at 3 AM.
Code & Config Generation: Using AI to generate Ansible playbooks or Azure Policies drastically reduces manual writing time, letting you focus on the logic and security of the design.
Anomaly Analysis: Moving from threshold-based alerts (e.g., CPU > 80%) to behavior-based alerts (e.g., "this traffic pattern is unusual for a Monday morning").

Conclusion: "Automate yourself out of your current job"

The goal of an SRE is to automate themselves out of their daily operational tasks. This doesn't mean losing your job—it means freeing up your time to focus on other needs.

Essential Protocols: Quick Guide for SREs & SysAdmins

Jose Alvarez R. — Mon, 02 Feb 2026 05:53:23 GMT

In infrastructure, you don't guess—you verify. Whether you're debugging a Kubernetes CNI or a cloud-native database, knowing the underlying protocol is the key to faster troubleshooting.

Here is a cheat sheet of the 25 most important protocols and the commands to test them directly from your terminal.

Protocol	Role in Infrastructure	Command to Test
HTTPS	Secure web traffic & API entry points.	`curl -I` `https://google.com`
SSH	Secure remote server management.	`ssh -v user@host`
DNS	The backbone of service discovery.	`dig +short` `google.com`
TCP	Connection-oriented, reliable delivery.	`nc -zv host 443`
UDP	Fast delivery for VoIP, VPN, & DNS.	`nc -zuv host 1194`
ICMP	Network reachability & latency testing.	`ping -c 4 8.8.8.8`
BGP	Internet-scale routing (AS path).	`mtr --aslookup 8.8.8.8`
NTP	Log & certificate synchronization.	`chronyc sources`
TLS/SSL	Encryption layer for all secure traffic.	`openssl s_client -connect host:443`
DHCP	Dynamic IP allocation & management.	`ip addr show`
ARP	IP to MAC resolution (Layer 2).	`ip neighbor`
NFS	Network file sharing (Linux/Unix).	`showmount -e host`
SMB	Network file sharing (Windows/Azure Files).	`smbclient -L //host`
SFTP	Secure file transfer over SSH.	`sftp user@host`
SMTP	Outbound mail & alert notifications.	`swaks --to` `user@domain.com`
SNMP	Monitoring hardware & network devices.	`snmpwalk -v2c -c public host`
LDAP	Centralized user authentication.	`ldapsearch -x -h host`
RDP	Remote desktop access (Windows/VDI).	`telnet host 3389`
iSCSI	Storage over IP (SAN connectivity).	`iscsiadm -m discovery`
gRPC	High-performance microservices.	`grpcurl host:port list`
Redis	Fast caching & session management.	`redis-cli -h host ping`
SQL	Database connectivity (MySQL/PostgreSQL).	`mysql -h host -P 3306`
MQTT	IoT & event-driven messaging.	`mosquitto_pub -h host -t test`
VXLAN	Overlay networks (CNIs like Flannel/Calico).	`ip -d link show type vxlan`
HTTP/2	Modern web performance (Multiplexing).	`curl -I --http2` `https://google.com`

SRE Guide: Blame-Free Post-mortems – From Chaos to Systemic Resilience

Jose Alvarez R. — Wed, 28 Jan 2026 14:00:52 GMT

The Incident Doesn't End at the "Fix"

In the daily life of an SRE, the first reaction to a downtime is the "Quick Fix": restarting a pod, scaling a node, or triggering a rollback. However, an incident isn’t truly closed when the service returns to normal (T4). In my experience, it only ends when the team fully understands the root cause and takes concrete steps to ensure it never happens again.

This is where the Post-mortem becomes our most powerful tool for building resilient infrastructures.

1. The "Blame-Free" Philosophy: Why It’s Non-Negotiable

Human error is a symptom, not the cause. If an engineer accidentally executes a destructive command in production, the question shouldn't be "Who did it?" but rather "Why did the system allow a single command to compromise our availability?"

The Psychology of Reliability: If the team fears retaliation, they will hide mistakes. In SRE, a hidden error is a ticking time bomb.
Systemic Focus: We look for flaws in design, architecture, or CI/CD processes, not individuals.
Learning Culture: A Blame-Free Post-mortem encourages everyone to share their findings, preventing the rest of the team from making the same mistake.

2. Anatomy of a High-Level Post-mortem

A technical document should be a clear roadmap. To make it effective for your workflow, ensure it includes:

A. Executive Summary & Impact

State what happened directly: "The payment API was down for 45 minutes, affecting 30% of transactions." It is vital to include which SLO/SLA metrics were compromised.

B. Detailed Timeline (Do you remember it?)

This is the "log" of the crisis. It’s fundamental for understanding our MTTD (Detection) and MTTR (Recovery).

T0: Incident start (via metrics or logs).
T1: Alert triggered.
T2: Investigation begins.
T3: Mitigation applied.
T4: Service restored and stable.

C. Root Cause Analysis (RCA)

This is where we dive into the "nuts and bolts": Was it a memory leak in a microservice? A database deadlock? A misconfigured Firewall rule in the Cloud?

3. Workshop: Applying the "5 Whys"

To get to the bottom of the issue, don't stop at the first logical answer. Look at this real-world example:

Scenario: The authentication service failed.

Why did the service fail? Because the container entered a crash loop (CrashLoopBackOff).
Why was it crashing? Because it couldn't connect to the Redis cluster.
Why couldn't it connect to Redis? Because the credentials in the Kubernetes Secret were incorrect.
Why were the credentials incorrect? Because they were rotated manually and not updated in the deployment.
Why were they rotated manually? (Root Cause): We lack a secrets management system (like HashiCorp Vault or Azure Key Vault) to automate rotation and syncing.

4. Powering the Process with AIOps

AI doesn't replace our technical judgment, but it accelerates administrative tasks so we can focus on strategy:

Timeline Reconstruction: AI can analyze thousands of logs and messages across communication channels to build a timeline in seconds.
Anomaly Detection: It identifies unusual traffic patterns that occurred before the incident which might have gone unnoticed.
Intelligent Drafting: Generating a first draft based on raw data allows engineers to focus on adding high-value context and definitive fixes.

5. The "Action Plan": No Tasks, No Improvement

The final output must be a list of tasks in your backlog. Each task must be:

Specific: Instead of "Improve monitoring," use "Configure latency alert at the 99% for the /auth endpoint."
Prioritized: Distinguish between immediate actions (preventing a recurrence tomorrow) and structural improvements.

Conclusion

Failure is an investment you’ve already paid for. You’ve already spent time, money, and "points" from your Error Budget. Don't waste that investment: document it, learn from it, and above all, automate the solution.

SRE Metrics Guide: Measuring the Incident Lifecycle

Jose Alvarez R. — Mon, 26 Jan 2026 06:00:40 GMT

In SRE (Site Reliability Engineering), time is not just a number; it is the core resource that determines whether we meet or breach our SLO (Service Level Objective). To manage incidents professionally, we must deconstruct the timeline into specific metrics that reveal exactly where we can optimize our systems and processes.

1. The Incident Lifecycle: From T0 to T4

An incident is not an isolated event but a sequence of stages. Whether it is a failing Kubernetes pod or a misconfigured security rule, every event follows this chronology:

T0: Incident Start (The actual moment the failure occurs).
T1: Detection. The monitoring system identifies the failure and triggers an alert.
T2: Acknowledgment. An engineer acknowledges the alert and begins the investigation.
T3: Mitigation. A fix is applied (Hotfix, Rollback, Restart).
T4: Full Recovery. The service is 100% operational for the user again.

2. Key Metrics (MTTx)

Understanding these intervals allows us to move from "guessing" to "managing with data."

MTTD: Mean Time to Detect (T0 - T1)

What it measures: The effectiveness of our observability stack.
The Goal: We aim for seconds. If a user notifies you before your tools do, your monitoring needs adjustment.

MTTA: Mean Time to Acknowledge (T1 - T2)

What it measures: The responsiveness of the On-call team.
The Goal: To reduce the time an alert remains unaddressed, which helps mitigate "alert fatigue."

MTTR: The Recovery Standard

In the industry, we differentiate between two approaches for MTTR:

MTTR (Recovery): From T0 to T4. This is the total downtime experienced by the customer.
MTTR (Repair): From T2 to T3. It measures technical agility in applying a solution once the problem is identified.

MTBF: Mean Time Between Failures (T0 - T4)

What it measures: The structural stability of the architecture.
The Insight: If you repair quickly (low MTTR) but the system fails constantly (low MTBF), you have a root-cause technical debt issue that must be addressed.

3. The Impact on the Error Budget

Every minute of downtime is a direct withdrawal from your Error Budget.

Quick Calculation: If your SLO is 99.9% (approx. 43 minutes of allowed downtime per month) and a single incident has an MTTR of 30 minutes, you have consumed 70% of your monthly budget in a single event. Precision in these metrics is fundamental for decision-making.

4. Optimization with Automation and AI

To drive these numbers down using a Cloud-Native approach, we apply technology at every stage:

Optimizing MTTD: We implement anomaly detection. AI can identify traffic variations that static rules might ignore, triggering T1 almost instantly.
Optimizing MTTR: We prioritize Self-Healing. Through Kubernetes Operators or automation scripts, the system can execute a T3 (like an automatic restart) before a human even intervenes.
Accelerating RCA: AI tools correlate events and logs to provide the "why" quickly, allowing engineers to move from T2 to T3 much faster.

Conclusion: From Support to Architecture

Mastering these metrics allows you to manage infrastructure with technical precision.

Reducing MTTD provides clear visibility.
Reducing MTTR protects your Error Budget.
Increasing MTBF builds confidence in the platform.

By integrating automation and AI into this flow, you shift from executing manual tasks to becoming the architect who designs resilient systems. Remember: in SRE, what cannot be measured, cannot be improved.

SRE Guide: The Art of Measuring Trust (SLI, SLO, SLA)

Jose Alvarez R. — Thu, 22 Jan 2026 14:00:29 GMT

In the world of infrastructure, we often obsess over whether a server is "alive" (ping). But a business doesn't care about a ping; it cares about the user experience.

To understand this, let's step away from the data center for a moment and imagine we are the owners of a busy Burger Restaurant.

The Restaurant Metaphor

1. SLI (Service Level Indicator) - "The Thermometer"

The SLI is the raw, real-time measurement of what is happening right now. It is a snapshot of reality.

In the Restaurant: It’s the exact time it takes for a waiter to bring a burger to the table after the customer orders.
In SRE: It’s the latency (e.g., 300ms) or the success rate of requests (e.g., 99.9% of responses are 200 OK).

Rule of thumb: The SLI answers the question: "How is the service performing at this very second?"

2. SLO (Service Level Objective) - "The Internal Promise"

The SLO is the target you set for your team to keep the customers happy. It’s your "Line in the Sand."

In the Restaurant: You decide that 95% of burgers must be served in under 15 minutes.
- Why not 100%? Because you know that sometimes the kitchen gets slammed or a waiter trips. Aiming for 100% would require hiring 50 waiters for one table, and you would go bankrupt.
In SRE: 99.9% of requests to the Azure API must respond in less than 200ms over a rolling 30-day window.

Key Concept: The SLO is the balance between user happiness and operational cost.

3. SLA (Service Level Agreement) - "The Legal Contract"

The SLA is what you promise the customer in writing, including the consequences if you fail.

In the Restaurant: You hang a sign on the door: "If your food takes longer than 30 minutes, it’s free!" * Note that the SLA (30 min) is much more relaxed than your internal goal/SLO (15 min). This gives you a "safety buffer."
In SRE: This is the legal contract. If the platform falls below 99% uptime, the provider must pay back credits or refunds.

Quick Comparison

Acronym	Name	Who watches it?	What happens if it fails?
SLI	Indicator	The Engineer	We tune the code or the resources.
SLO	Objective	The SRE Team	We stop new changes (Error Budget).
SLA	Agreement	The Lawyer / Client	There are financial consequences.

The Error Budget: Your "Room for Innovation"

If your SLO is to deliver 95% of burgers on time, you have a 5% margin of error. That is your Error Budget.

Budget is full? You can spend that 5% experimenting with a risky new recipe (Innovation).
Budget is empty? You made too many mistakes this month. Stop experimenting and focus 100% on making the kitchen stable (Reliability).

How AI helps us?

The AI acts like a Highly Intelligent Kitchen Supervisor:

Detection: The AI notices the oil is taking 2 degrees longer to heat up before the meat comes out undercooked (AIOps).
Prediction: It warns you: "At the current rate you're burning burgers, you'll have to start giving them away for free in 3 days (SLA breach prediction)."
Action: If it sees a crowd coming, it automatically fires up a second grill (e.g. Auto-scaling in Azure).

Final Conclusion

At the end of the day, whether you are managing a Raspberry Pi at home or a multi-region infrastructure with Azure or AWS, the lesson is the same: You cannot manage what you do not measure.

The SLI, SLO, and SLA framework isn't just a set of acronyms; it is a shared language between technology and business.

SLIs give us the truth.
SLOs give us a goal.
SLAs define our commitment.

By mastering this framework—and accelerating it with AI—you stop being the person who "fixes servers" and become the architect who ensures the business can keep its promises to its users. Reliability is not a lucky accident; it is a calculated decision.

How to Securely Expose Your Local Lab Using Cloudflare Tunnel and Docker

Jose Alvarez R. — Tue, 06 Jan 2026 03:49:49 GMT

In the world of Cloud Engineering, security is not an afterthought; it’s a foundation. When hosting a portfolio or a home lab, the old-school method of Port Forwarding is a major security risk. It exposes your home IP and leaves your network vulnerable.

Today, we are taking a Zero Trust approach. We will use Cloudflare Tunnel (cloudflared) and Docker Compose to create a secure bridge that allows the world to see your work without ever opening a port on your router.

The Architecture

Before we dive into the terminal, let's look at the flow of traffic:

User requests domain example.com.
Cloudflare Edge receives the request.
Cloudflared Connector (running in your Docker or server) pulls the request through an outbound-only encrypted tunnel.
Webserver serves the static files locally.

URL: https://developers.cloudflare.com/cloudflare-one/networks/connectors/cloudflare-tunnel/

Prerequisites for our project:

Domain: Managed by Cloudflare (e.g., up2runc.com).
Environment: Docker & Docker Compose installed.
Access: Cloudflare Zero Trust dashboard (Free Tier).

Step 1: Initialize the Cloudflare Tunnel

We need to create the tunnel identity in the Cloudflare Cloud.

Navigate to the Zero Trust Dashboard > Networks > Connectors
Click Create a Tunnel > Cloudflared > Select Cloudflared
Name it (e.g website-prod)
Save Tunnel
Choose Docker provider:

Important: Never share your token!!

Step 2: Provisioning with Docker Compose

We will define our infrastructure as code. This ensures our setup is reproducible.

File: docker-compose.yml

YAML FILE

version: '3.8'
services:
  # Service 1: The Alpine Linux
  web:
    image: nginx:alpine
    container_name: web-server
    restart: unless-stopped
    volumes:
      - .:/usr/share/nginx/html:ro

  # Service 2: The Tunnel Connector
  tunnel:
    image: cloudflare/cloudflared:latest
    container_name: cloudflare-connector
    restart: unless-stopped
    command: tunnel --no-autoupdate run --token ${TUNNEL_TOKEN}
    depends_on:
      - web

Step 3: Configure Public Hostnames

Go back to your Cloudflare Tunnel settings and click the Public Hostname tab.

Public Hostname: up2runc.com
Service: http://web:80

Note: We use web:80 because Docker Compose creates an internal network where the services can talk to each other by name.

Step 4: Deployment & Verification

Execute the deployment:

Bash

export TUNNEL_TOKEN=your_token_here
docker-compose up -d

Verification Checklist:

Status: Check the tunnel status in the dashboard; it should show HEALTHY.
Connectivity: Run curl -I https://up2runc.com and look for the Server: cloudflare header.
Security: Confirm your router has NO ports forwarded to your machine.

Key Takeaways

Outbound Only: The tunnel only makes outbound connections. This means your firewall stays closed.
Identity-Aware: You can now add Cloudflare Access to require a login before anyone even sees your site.
Static Content: Nginx Alpine is the excelent standard for lightweight, high-performance static hosting.

Conclusion

And just like that—simple and efficient—we’ve implemented a robust, Cloudflare-protected solution. We’ve moved from a 'Home Hobbyist' setup to a Professional Zero Trust Architecture, proving that high-level security doesn't have to be over-complicated.

By eliminating the public attack surface and leveraging Docker's immutability, up2runc.com is now production-ready.

Are you ready for the next project?

Beyond Port Forwarding: The SRE Way

Jose Alvarez R. — Tue, 30 Dec 2025 23:21:52 GMT

When I decided to host my personal brand site, https://www.UpToDeploy.com, I faced a classic dilemma: pay for a VPS or use the hardware I already own. I chose my Raspberry Pi, but as someone focused on Security and Reliability, simply opening ports on my home router (Port Forwarding) was not an option.

In this article, I’ll show you how I leverage the Zero Trust architecture to expose my local environment to the world securely.

The Problem: The Risks of Traditional Hosting

Traditional home hosting requires exposing your public IP and opening ports (like 80 or 443). This makes your home network a target for DDoS attacks and port scanning. I needed a solution that followed the principle of "least privilege."

The Solution: Cloudflare Tunnels

Cloudflare Tunnels (part of the Zero Trust suite) allow you to create a secure, outbound-only connection from your infrastructure to Cloudflare’s edge.

Why this is a game-changer:

No Inbound Ports: My router remains closed to the internet.
Identity-Based Access: I can layer authentication if needed.
Hidden IP: My home IP is never exposed to the public; only Cloudflare’s IP addresses are visible.

The Stack

To keep the deployment clean and reproducible, I used a containerized approach:

Hardware: Raspberry Pi.
Web Server: Nginx (Alpine-based for a tiny footprint).
Orchestration: Docker Compose.
Connectivity: cloudflared (The Cloudflare Tunnel connector).

The Deployment (Docker Compose)

Instead of installing the connector directly on the OS, I deployed it as a sidecar container. This ensures that if I move my site to another machine, the entire infrastructure moves with it.

This is my simple docker-compose.yaml

services:
  web:
    image: nginx:alpine
    container_name: website-linkbio
    restart: unless-stopped
    volumes:
      - ./index.html:/usr/share/nginx/html/index.html:ro

  tunnel:
    image: cloudflare/cloudflared:latest
    restart: always
    environment:
      - TUNNEL_TOKEN=${CLOUDFLARE_TOKEN}
    command: tunnel --no-autoupdate run

Key Takeaways

Security first: By using a tunnel, I've eliminated the primary attack surface of home hosting.
Resilience: Docker ensures that if the Raspberry Pi reboots, the site and the tunnel come back online automatically.
Professionalism: Using my custom domain uptodeploy.com with full SSL/TLS encryption, despite being hosted in a residential network.

Conclusion

Setting up UpToDeploy wasn't just about a website; it was about practicing the SRE and Architecture principles I believe in. It’s a proof of concept that professional, secure, and highly available services can be built from anywhere.

UpToDeploy: Elevating Reliability in the Age of AI and Cloud

Jose Alvarez R. — Tue, 30 Dec 2025 01:59:31 GMT

The Journey Toward Systems Architecture

Hello! My name is Jose Alvarez Rodriguez. If you are coming from my LinkedIn or know me from the industry, you’ll know that my passion is making things work—but above all, making them resilient.

Today, I am launching UpToDeploy, a space where Site Reliability Engineering (SRE) meets modern architecture. My goal is to transition from pure operations toward strategic consultancy, and I want to invite you to join me in this process.

My Three Pillars: The UpToDeploy DNA

To me, a "production-ready" system is not just code that runs; it is a structure sustained by three pillars that I consider non-negotiable:

Cloud (Azure & AWS): The foundation of scalability. My focus is not just "moving things to the cloud," but designing cost-efficient and high-availability architectures.
Security: As an SRE, I understand that security is not a final step, but an integral part of the software development lifecycle (DevSecOps).
Containerization (Kubernetes & Docker): The unit of measurement for modern computing. Containers are the key to portability and the agility that consultancy firms demand today.

The New Horizon: AI and Observability

We cannot talk about architecture in 2025 without mentioning Artificial Intelligence. Part of my mission with this blog is to investigate how AI is transforming the SRE role: from predictive failure analysis to intelligent infrastructure automation.

At UpToDeploy, we will explore how to integrate AI models to make our architectures not only robust but also "self-healing."

What to Expect from This Space?

Whether you are a recruiter, a fellow engineer, or someone seeking technology consultancy, here you will find:

Architecture analysis on Azure and AWS.
Security guides for containerized environments.
Reflections on the impact of AI in IT operations.