ブログに戻る
インフラ

Docker Browser Automation: Containerized Scaling Guide

How to run browser automation in Docker containers with proper resource allocation, shared memory, and process management.

Introduction

Docker containers are the standard deployment unit for browser automation at scale. They provide isolation, reproducible environments, and horizontal scaling. However, browsers are resource-intensive applications with specific requirements around shared memory, display servers, and process management that require careful container configuration.

Docker Configuration Essentials

Shared Memory

Chrome uses shared memory (/dev/shm) for inter-process communication. The default Docker shared memory size (64 MB) is too small and causes Chrome to crash:

services:
  worker:
    image: your-automation-image
    shm_size: '2gb'  # Required for Chrome

Or with docker run:

docker run --shm-size=2g your-automation-image

Dockerfile

A minimal Dockerfile for browser automation with BotCloud:

FROM node:20-slim

# Install only essential system dependencies
RUN apt-get update && apt-get install -y \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY package*.json ./
RUN npm ci --production

COPY . .

CMD ["node", "worker.js"]

Since BotCloud runs browsers in the cloud, you do not need Chrome or its system dependencies in your container. The container only needs Node.js and your automation code.

Resource Limits

Set memory and CPU limits to prevent runaway containers:

services:
  worker:
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
        reservations:
          cpus: '1'
          memory: 1G

Docker Compose for Multi-Worker

version: '3.8'

services:
  worker:
    build: .
    shm_size: '2gb'
    environment:
      - BOTCLOUD_API_KEY=${BOTCLOUD_API_KEY}
      - CONCURRENCY=5
    deploy:
      replicas: 4
      resources:
        limits:
          cpus: '2'
          memory: 2G
    restart: unless-stopped
    volumes:
      - ./data:/app/data
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

Worker Pattern

A production worker that processes tasks from a queue:

const puppeteer = require('puppeteer-core');

const CONCURRENCY = parseInt(process.env.CONCURRENCY || '5');
const API_KEY = process.env.BOTCLOUD_API_KEY;

async function processTask(task) {
  const browser = await puppeteer.connect({
    browserWSEndpoint:
      `wss://bots.win/ws?apiKey=${API_KEY}&proxy=${encodeURIComponent(task.proxy)}`,
  });

  try {
    const page = await browser.newPage();
    await page.goto(task.url, { timeout: 30000 });
    const result = await page.evaluate(task.extractScript);
    return { taskId: task.id, status: 'success', data: result };
  } catch (error) {
    return { taskId: task.id, status: 'error', error: error.message };
  } finally {
    await browser.close();
  }
}

async function worker() {
  while (true) {
    const tasks = await fetchTasks(CONCURRENCY);
    if (tasks.length === 0) {
      await new Promise(r => setTimeout(r, 5000));
      continue;
    }

    const results = await Promise.allSettled(
      tasks.map(task => processTask(task))
    );

    await reportResults(results);
  }
}

// Graceful shutdown
process.on('SIGTERM', () => {
  console.log('Received SIGTERM, finishing current tasks...');
  // Set flag to stop accepting new tasks
  // Wait for current tasks to complete
  // Then exit
});

worker().catch(console.error);

Health Checks

Add health checks to detect stuck containers:

services:
  worker:
    healthcheck:
      test: ["CMD", "node", "healthcheck.js"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
// healthcheck.js
const http = require('http');
http.get('http://localhost:3001/health', (res) => {
  process.exit(res.statusCode === 200 ? 0 : 1);
}).on('error', () => process.exit(1));

Scaling Strategies

Horizontal Scaling

Increase the number of container replicas:

docker compose up --scale worker=10

Kubernetes

For larger deployments, use Kubernetes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: automation-worker
spec:
  replicas: 10
  template:
    spec:
      containers:
        - name: worker
          image: your-automation-image
          resources:
            limits:
              memory: "2Gi"
              cpu: "2000m"
          env:
            - name: BOTCLOUD_API_KEY
              valueFrom:
                secretKeyRef:
                  name: botcloud-secrets
                  key: api-key

Best Practices

  1. Always set --shm-size=2g even when using cloud browsers (Node.js may need it for WebSocket buffers)
  2. Use --init flag or tini to handle zombie processes
  3. Set resource limits to prevent one container from starving others
  4. Implement health checks to detect and restart stuck workers
  5. Log to stdout/stderr for Docker's log aggregation
  6. Handle SIGTERM for graceful shutdown during rolling deployments
  7. Use secrets management for API keys, not environment variables in compose files
#docker#containers#scaling#deployment