ブログに戻る
インフラ

Browser Automation Performance: Optimization Guide for Scale

Optimize cloud browser automation for throughput and resource efficiency. Covers concurrency, resource blocking, and connection pooling.

Introduction

Performance optimization for browser automation is about maximizing throughput while maintaining reliability. Each unnecessary resource load, unoptimized wait, or mismanaged connection reduces the number of tasks you can process per hour.

Connection Management

Connection Pooling

Reuse browser connections when possible:

class BrowserPool {
  constructor(apiKey, poolSize = 5) {
    this.apiKey = apiKey;
    this.poolSize = poolSize;
    this.available = [];
    this.inUse = new Set();
  }

  async acquire(proxy) {
    if (this.available.length > 0) {
      const browser = this.available.pop();
      this.inUse.add(browser);
      return browser;
    }

    const browser = await puppeteer.connect({
      browserWSEndpoint:
        `wss://bots.win/ws?apiKey=${this.apiKey}&proxy=${encodeURIComponent(proxy)}`,
    });

    this.inUse.add(browser);
    return browser;
  }

  release(browser) {
    this.inUse.delete(browser);
    if (this.available.length < this.poolSize) {
      this.available.push(browser);
    } else {
      browser.close();
    }
  }
}

Concurrency Control

Limit concurrent operations to prevent resource exhaustion:

class ConcurrencyLimiter {
  constructor(limit) {
    this.limit = limit;
    this.running = 0;
    this.queue = [];
  }

  async run(fn) {
    while (this.running >= this.limit) {
      await new Promise(resolve => this.queue.push(resolve));
    }

    this.running++;
    try {
      return await fn();
    } finally {
      this.running--;
      if (this.queue.length > 0) {
        this.queue.shift()();
      }
    }
  }
}

const limiter = new ConcurrencyLimiter(20);
const tasks = urls.map(url =>
  limiter.run(() => processUrl(url))
);
await Promise.all(tasks);

Resource Optimization

Block Unnecessary Resources

Most automation does not need images, fonts, or analytics:

await page.setRequestInterception(true);
page.on('request', (req) => {
  const type = req.resourceType();
  const url = req.url();

  // Block heavy resources
  if (['image', 'media', 'font', 'stylesheet'].includes(type)) {
    req.abort();
    return;
  }

  // Block analytics and tracking
  if (url.includes('google-analytics') ||
      url.includes('facebook.net') ||
      url.includes('doubleclick.net')) {
    req.abort();
    return;
  }

  req.continue();
});

Savings: Blocking images and media typically reduces page load time by 40-60% and bandwidth by 70-80%.

Disable JavaScript When Possible

For content extraction that does not require JavaScript rendering:

await page.setJavaScriptEnabled(false);
await page.goto(url);
const content = await page.content();

Savings: 50-80% faster page loads for static content.

Choose the Right Wait Strategy

// Fastest: just wait for initial HTML
await page.goto(url, { waitUntil: 'domcontentloaded' });

// Standard: wait for network to quiet
await page.goto(url, { waitUntil: 'networkidle2' });

// Slowest: wait for complete network silence
await page.goto(url, { waitUntil: 'networkidle0' });

Use domcontentloaded when you only need the HTML structure. Use networkidle2 for most cases. Reserve networkidle0 for pages that must be fully loaded.

Parallelize Independent Operations

// Bad: sequential operations
const title = await page.title();
const url = await page.url();
const cookies = await page.cookies();

// Good: parallel operations
const [title, url, cookies] = await Promise.all([
  page.title(),
  page.url(),
  page.cookies(),
]);

Throughput Benchmarks

Expected throughput varies by task complexity:

Task TypeTypical DurationPages/Hour (20 concurrent)
Simple page load2-5s15,000-36,000
Form submission5-10s7,200-14,400
Full page render + screenshot10-20s3,600-7,200
Complex SPA interaction15-30s2,400-4,800

Monitoring

Track these metrics to identify optimization opportunities:

function reportMetrics(taskId, startTime, result) {
  const duration = Date.now() - startTime;
  console.log(JSON.stringify({
    taskId,
    duration,
    status: result.status,
    bytesTransferred: result.bytesTransferred,
    requestCount: result.requestCount,
  }));
}

Best Practices

  1. Block unnecessary resources to reduce load time and bandwidth
  2. Use the fastest wait strategy that meets your requirements
  3. Limit concurrency to prevent connection exhaustion
  4. Reuse browser connections when processing multiple pages
  5. Monitor throughput metrics to identify bottlenecks
  6. Set timeouts on everything to prevent hung tasks from blocking the queue
  7. Process tasks in batches rather than one at a time
#performance#optimization#scaling#throughput