From 16e11d9f998e5f0900bf7bbf486458f76ff0263e Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Tue, 18 Nov 2025 06:19:25 +0000
Subject: [PATCH] =?UTF-8?q?fix(ci):=20CR=C3=8DTICO=20-=20Corrige=20timeout?=
 =?UTF-8?q?s=20y=20agrega=20detecci=C3=B3n=20de=20crash=20loops=20en=20Doc?=
 =?UTF-8?q?ker?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PROBLEMA CRÍTICO IDENTIFICADO:
El frontend de la imagen Docker se reiniciaba constantemente en un crash loop,
imposibilitando ver logs y hacer debugging. Los builds tomaban demasiado tiempo.

═══════════════════════════════════════════════════════════════════════════════
🔴 CORRECCIONES CRÍTICAS
═══════════════════════════════════════════════════════════════════════════════

1️⃣ AJUSTE DE TIMEOUTS (builds tardaban más de lo esperado):
   • docker-intellidocs.yml build-and-push: 120min → 45min
   • ci.yml build-docker-image: 120min → 60min
   ⚠️  Timeouts anteriores eran excesivos para detección temprana de fallos

2️⃣ DETECCIÓN DE CRASH LOOPS (CRÍTICO):

   📍 Nuevo step: "Start container for health check"
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   • Inicia container en modo detached con configuración mínima
   • Variables de entorno: PAPERLESS_SECRET_KEY, PAPERLESS_REDIS
   • Nombre único para tracking: intellidocs-health-check-$$
   • Wait 15s para inicialización
   • Verifica que container sigue corriendo (detecta crashes inmediatos)
   • Si falla: muestra logs completos + docker inspect
   • Cleanup automático en caso de error

   📍 Nuevo step: "Check for crash loops (CRITICAL)"
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   • Lee RestartCount del container
   • Si RestartCount > 0: FALLO CRÍTICO con logs detallados
   • Muestra últimas 100 líneas de logs
   • Muestra estado del container (docker inspect + jq)
   • Verifica Health.Status si está disponible
   • Double-check que container sigue corriendo

   📍 Nuevo step: "Check frontend is responding (HTTP health check)"
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   • Wait adicional de 10s para web server
   • 30 intentos con 2s entre cada uno (60s total)
   • curl -f -s -m 5 http://localhost:8000/
   • En cada intento:
     ✓ Verifica que container sigue corriendo
     ✓ Verifica RestartCount = 0 (detecta nuevos crashes)
     ✓ Si hay restart: FALLO con logs
   • Si no responde después de 60s:
     ✓ Muestra últimas 100 líneas de logs
     ✓ Muestra procesos: ps aux
     ✓ Muestra network: netstat -tlnp
     ✓ Fallo con información completa para debugging

   📍 Nuevo step: "Stop and remove test container"
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   • Siempre ejecuta (if: always())
   • Muestra últimas 50 líneas de logs antes de cleanup
   • Limpia container aunque haya fallado
   • Evita containers huérfanos

3️⃣ REPORTE ACTUALIZADO:
   • "✅ CRITICAL: No crash loops detected"
   • "✅ CRITICAL: Frontend responds correctly"

═══════════════════════════════════════════════════════════════════════════════
🎯 BENEFICIOS
═══════════════════════════════════════════════════════════════════════════════

✅ DETECCIÓN TEMPRANA DE CRASH LOOPS:
   Antes: Container crasheaba en silencio, imposible debuggear
   Ahora: Detecta restarts en tiempo real con logs completos

✅ VERIFICACIÓN DE FRONTEND FUNCIONAL:
   Antes: No se verificaba si frontend respondía
   Ahora: HTTP health check robusto con 30 intentos

✅ DEBUGGING COMPLETO:
   Si falla, muestra:
   • Logs completos (últimas 100 líneas)
   • Estado del container (docker inspect)
   • Procesos corriendo (ps aux)
   • Conexiones de red (netstat)

✅ TIMEOUTS REALISTAS:
   Antes: 120min permitía fallos muy tardíos
   Ahora: 45-60min falla más rápido si hay problemas

═══════════════════════════════════════════════════════════════════════════════
📊 ESTADÍSTICAS
═══════════════════════════════════════════════════════════════════════════════

• Nuevos steps de health check: 4
• Verificaciones de crash loops: 3 puntos de chequeo
• Timeout total de HTTP health check: 60 segundos
• Logs capturados en caso de fallo: 3 fuentes (logs, ps, netstat)
• Cleanup garantizado: if: always()

═══════════════════════════════════════════════════════════════════════════════

Refs: CRITICAL-CRASH-LOOP-DETECTION
Fixes: Frontend crasheando constantemente sin logs visibles
---
 .github/workflows/ci.yml                 |   2 +-
 .github/workflows/docker-intellidocs.yml | 158 ++++++++++++++++++++++-
 2 files changed, 158 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 7ed3d63de..15d059140 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -607,7 +607,7 @@ except Exception as e:
   build-docker-image:
     name: Build Docker image for ${{ github.ref_name }}
     runs-on: ubuntu-24.04
-    timeout-minutes: 120
+    timeout-minutes: 60
     if: github.event_name == 'push' && (startsWith(github.ref, 'refs/heads/feature-') || startsWith(github.ref, 'refs/heads/fix-') || github.ref == 'refs/heads/dev' || github.ref == 'refs/heads/beta' || contains(github.ref, 'beta.rc') || startsWith(github.ref, 'refs/tags/v') || startsWith(github.ref, 'refs/heads/l10n_'))
     concurrency:
       group: ${{ github.workflow }}-build-docker-image-${{ github.ref_name }}
diff --git a/.github/workflows/docker-intellidocs.yml b/.github/workflows/docker-intellidocs.yml
index 33c83044c..11eefebf1 100644
--- a/.github/workflows/docker-intellidocs.yml
+++ b/.github/workflows/docker-intellidocs.yml
@@ -116,7 +116,7 @@ jobs:
   build-and-push:
     name: Build IntelliDocs Docker Image
     runs-on: ubuntu-24.04
-    timeout-minutes: 120
+    timeout-minutes: 45
     needs: test-ml-dependencies
     permissions:
       contents: read
@@ -272,6 +272,160 @@ jobs:
           fi
           echo "✓ OpenCV system dependencies verified"
 
+      - name: Start container for health check (CRITICAL - Detect crash loops)
+        id: start-container
+        run: |
+          IMAGE_TAG="${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.tag.outputs.tag }}"
+
+          echo "🔍 Starting container in detached mode to test for crash loops..."
+
+          # Start container with a name
+          CONTAINER_NAME="intellidocs-health-check-$$"
+
+          if ! docker run -d --name "$CONTAINER_NAME" \
+            -e PAPERLESS_SECRET_KEY=test-secret-key-for-ci \
+            -e PAPERLESS_REDIS=redis://localhost:6379 \
+            -p 8000:8000 \
+            "$IMAGE_TAG"; then
+            echo "✗ Failed to start container"
+            exit 1
+          fi
+
+          echo "container_name=$CONTAINER_NAME" >> $GITHUB_OUTPUT
+          echo "✓ Container started: $CONTAINER_NAME"
+
+          # Wait for container to initialize
+          echo "⏳ Waiting 15 seconds for container to initialize..."
+          sleep 15
+
+          # Check if container is still running (detect immediate crashes)
+          if ! docker ps | grep -q "$CONTAINER_NAME"; then
+            echo "✗ Container is not running - it may have crashed immediately"
+            echo "📋 Container logs:"
+            docker logs "$CONTAINER_NAME" 2>&1 || true
+            echo ""
+            echo "🔍 Container inspect:"
+            docker inspect "$CONTAINER_NAME" 2>&1 || true
+            docker rm -f "$CONTAINER_NAME" 2>/dev/null || true
+            exit 1
+          fi
+
+          echo "✓ Container is running after 15 seconds"
+
+      - name: Check for crash loops (CRITICAL)
+        run: |
+          CONTAINER_NAME="${{ steps.start-container.outputs.container_name }}"
+
+          echo "🔍 Checking restart count to detect crash loops..."
+
+          # Get restart count
+          RESTART_COUNT=$(docker inspect --format='{{.RestartCount}}' "$CONTAINER_NAME" 2>/dev/null || echo "0")
+
+          echo "📊 Restart count: $RESTART_COUNT"
+
+          if [ "$RESTART_COUNT" -gt 0 ]; then
+            echo "✗ CRITICAL: Container has restarted $RESTART_COUNT times - CRASH LOOP DETECTED!"
+            echo ""
+            echo "📋 Last 100 lines of container logs:"
+            docker logs --tail 100 "$CONTAINER_NAME" 2>&1 || true
+            echo ""
+            echo "🔍 Container state:"
+            docker inspect "$CONTAINER_NAME" | jq '.[0].State' 2>/dev/null || docker inspect "$CONTAINER_NAME" 2>&1 || true
+            docker rm -f "$CONTAINER_NAME" 2>/dev/null || true
+            exit 1
+          fi
+
+          # Check container health status
+          HEALTH_STATUS=$(docker inspect --format='{{.State.Health.Status}}' "$CONTAINER_NAME" 2>/dev/null || echo "none")
+          echo "🏥 Health status: $HEALTH_STATUS"
+
+          # Check if container is still running
+          if ! docker ps | grep -q "$CONTAINER_NAME"; then
+            echo "✗ Container stopped running during health check"
+            echo "📋 Container logs:"
+            docker logs "$CONTAINER_NAME" 2>&1 || true
+            docker rm -f "$CONTAINER_NAME" 2>/dev/null || true
+            exit 1
+          fi
+
+          echo "✓ No crash loops detected - container is stable"
+
+      - name: Check frontend is responding (HTTP health check)
+        run: |
+          CONTAINER_NAME="${{ steps.start-container.outputs.container_name }}"
+
+          echo "🌐 Testing if frontend is responding on http://localhost:8000..."
+
+          # Give the web server more time to start
+          echo "⏳ Waiting 10 more seconds for web server to be ready..."
+          sleep 10
+
+          # Try to connect to the frontend (up to 30 attempts with 2s between each)
+          MAX_ATTEMPTS=30
+          ATTEMPT=0
+          SUCCESS=false
+
+          while [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do
+            ATTEMPT=$((ATTEMPT + 1))
+            echo "Attempt $ATTEMPT/$MAX_ATTEMPTS: Testing HTTP connection..."
+
+            # Try HTTP request with timeout
+            if curl -f -s -m 5 http://localhost:8000/ > /dev/null 2>&1; then
+              echo "✓ Frontend is responding!"
+              SUCCESS=true
+              break
+            fi
+
+            # Check if container is still running
+            if ! docker ps | grep -q "$CONTAINER_NAME"; then
+              echo "✗ Container stopped during health check!"
+              echo "📋 Container logs:"
+              docker logs "$CONTAINER_NAME" 2>&1 || true
+              docker rm -f "$CONTAINER_NAME" 2>/dev/null || true
+              exit 1
+            fi
+
+            # Check for new restarts
+            RESTART_COUNT=$(docker inspect --format='{{.RestartCount}}' "$CONTAINER_NAME" 2>/dev/null || echo "0")
+            if [ "$RESTART_COUNT" -gt 0 ]; then
+              echo "✗ Container restarted during health check - CRASH LOOP!"
+              echo "📋 Container logs:"
+              docker logs --tail 100 "$CONTAINER_NAME" 2>&1 || true
+              docker rm -f "$CONTAINER_NAME" 2>/dev/null || true
+              exit 1
+            fi
+
+            sleep 2
+          done
+
+          if [ "$SUCCESS" = false ]; then
+            echo "✗ Frontend did not respond after $MAX_ATTEMPTS attempts (60 seconds)"
+            echo ""
+            echo "📋 Last 100 lines of container logs:"
+            docker logs --tail 100 "$CONTAINER_NAME" 2>&1 || true
+            echo ""
+            echo "🔍 Container processes:"
+            docker exec "$CONTAINER_NAME" ps aux 2>&1 || true
+            echo ""
+            echo "🌐 Network connections:"
+            docker exec "$CONTAINER_NAME" netstat -tlnp 2>&1 || true
+            docker rm -f "$CONTAINER_NAME" 2>/dev/null || true
+            exit 1
+          fi
+
+          echo "✓ Frontend health check PASSED - application is responding correctly"
+
+      - name: Stop and remove test container
+        if: always()
+        run: |
+          CONTAINER_NAME="${{ steps.start-container.outputs.container_name }}"
+          if [ -n "$CONTAINER_NAME" ]; then
+            echo "🧹 Cleaning up test container: $CONTAINER_NAME"
+            docker logs --tail 50 "$CONTAINER_NAME" 2>&1 || true
+            docker rm -f "$CONTAINER_NAME" 2>/dev/null || true
+            echo "✓ Cleanup complete"
+          fi
+
       - name: Generate test report
         if: always()
         run: |
@@ -281,6 +435,8 @@ jobs:
           echo "✅ ML dependencies verified" >> $GITHUB_STEP_SUMMARY
           echo "✅ Django migrations validated" >> $GITHUB_STEP_SUMMARY
           echo "✅ System dependencies verified" >> $GITHUB_STEP_SUMMARY
+          echo "✅ **CRITICAL: No crash loops detected**" >> $GITHUB_STEP_SUMMARY
+          echo "✅ **CRITICAL: Frontend responds correctly**" >> $GITHUB_STEP_SUMMARY
           echo "" >> $GITHUB_STEP_SUMMARY
           echo "**Image:** \`${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.tag.outputs.tag }}\`" >> $GITHUB_STEP_SUMMARY