Browser Automation Performance: Optimization Guide for Scale
Optimize cloud browser automation for throughput and resource efficiency. Covers concurrency, resource blocking, and connection pooling.
Introduction
Performance optimization for browser automation is about maximizing throughput while maintaining reliability. Each unnecessary resource load, unoptimized wait, or mismanaged connection reduces the number of tasks you can process per hour.
Connection Management
Connection Pooling
Reuse browser connections when possible:
class BrowserPool {
constructor(apiKey, poolSize = 5) {
this.apiKey = apiKey;
this.poolSize = poolSize;
this.available = [];
this.inUse = new Set();
}
async acquire(proxy) {
if (this.available.length > 0) {
const browser = this.available.pop();
this.inUse.add(browser);
return browser;
}
const browser = await puppeteer.connect({
browserWSEndpoint:
`wss://bots.win/ws?apiKey=${this.apiKey}&proxy=${encodeURIComponent(proxy)}`,
});
this.inUse.add(browser);
return browser;
}
release(browser) {
this.inUse.delete(browser);
if (this.available.length < this.poolSize) {
this.available.push(browser);
} else {
browser.close();
}
}
}
Concurrency Control
Limit concurrent operations to prevent resource exhaustion:
class ConcurrencyLimiter {
constructor(limit) {
this.limit = limit;
this.running = 0;
this.queue = [];
}
async run(fn) {
while (this.running >= this.limit) {
await new Promise(resolve => this.queue.push(resolve));
}
this.running++;
try {
return await fn();
} finally {
this.running--;
if (this.queue.length > 0) {
this.queue.shift()();
}
}
}
}
const limiter = new ConcurrencyLimiter(20);
const tasks = urls.map(url =>
limiter.run(() => processUrl(url))
);
await Promise.all(tasks);
Resource Optimization
Block Unnecessary Resources
Most automation does not need images, fonts, or analytics:
await page.setRequestInterception(true);
page.on('request', (req) => {
const type = req.resourceType();
const url = req.url();
// Block heavy resources
if (['image', 'media', 'font', 'stylesheet'].includes(type)) {
req.abort();
return;
}
// Block analytics and tracking
if (url.includes('google-analytics') ||
url.includes('facebook.net') ||
url.includes('doubleclick.net')) {
req.abort();
return;
}
req.continue();
});
Savings: Blocking images and media typically reduces page load time by 40-60% and bandwidth by 70-80%.
Disable JavaScript When Possible
For content extraction that does not require JavaScript rendering:
await page.setJavaScriptEnabled(false);
await page.goto(url);
const content = await page.content();
Savings: 50-80% faster page loads for static content.
Navigation Optimization
Choose the Right Wait Strategy
// Fastest: just wait for initial HTML
await page.goto(url, { waitUntil: 'domcontentloaded' });
// Standard: wait for network to quiet
await page.goto(url, { waitUntil: 'networkidle2' });
// Slowest: wait for complete network silence
await page.goto(url, { waitUntil: 'networkidle0' });
Use domcontentloaded when you only need the HTML structure. Use networkidle2 for most cases. Reserve networkidle0 for pages that must be fully loaded.
Parallelize Independent Operations
// Bad: sequential operations
const title = await page.title();
const url = await page.url();
const cookies = await page.cookies();
// Good: parallel operations
const [title, url, cookies] = await Promise.all([
page.title(),
page.url(),
page.cookies(),
]);
Throughput Benchmarks
Expected throughput varies by task complexity:
| Task Type | Typical Duration | Pages/Hour (20 concurrent) |
|---|---|---|
| Simple page load | 2-5s | 15,000-36,000 |
| Form submission | 5-10s | 7,200-14,400 |
| Full page render + screenshot | 10-20s | 3,600-7,200 |
| Complex SPA interaction | 15-30s | 2,400-4,800 |
Monitoring
Track these metrics to identify optimization opportunities:
function reportMetrics(taskId, startTime, result) {
const duration = Date.now() - startTime;
console.log(JSON.stringify({
taskId,
duration,
status: result.status,
bytesTransferred: result.bytesTransferred,
requestCount: result.requestCount,
}));
}
Best Practices
- Block unnecessary resources to reduce load time and bandwidth
- Use the fastest wait strategy that meets your requirements
- Limit concurrency to prevent connection exhaustion
- Reuse browser connections when processing multiple pages
- Monitor throughput metrics to identify bottlenecks
- Set timeouts on everything to prevent hung tasks from blocking the queue
- Process tasks in batches rather than one at a time