Headless Browser Best Practices for Production Automation
Production-ready patterns for running headless browser automation, covering error handling, resource management, and operational stability.
Introduction
Running browser automation in production requires practices beyond what works during development. Development scripts can crash, leak memory, and ignore edge cases. Production automation must be reliable, observable, and resource-efficient.
This guide covers the patterns that separate prototype automation from production-quality systems.
Process Lifecycle Management
Always Close Browsers
The most common resource leak in browser automation is failing to close browser instances. Unreleased sessions consume resources:
// Bad: browser left open if an error occurs
const browser = await puppeteer.connect({ ... });
const page = await browser.newPage();
await page.goto(url);
const data = await page.evaluate(() => document.title);
// Good: try/finally ensures cleanup
const browser = await puppeteer.connect({
browserWSEndpoint: 'wss://bots.win/ws?apiKey=YOUR_API_KEY',
});
try {
const page = await browser.newPage();
await page.goto(url);
return await page.evaluate(() => document.title);
} finally {
await browser.close();
}
Set Hard Timeouts
Every operation should have a timeout. Default timeouts are often too long for production:
const page = await browser.newPage();
page.setDefaultNavigationTimeout(30000); // 30 seconds
page.setDefaultTimeout(15000); // 15 seconds for other operations
await page.goto(url, { timeout: 30000 });
await page.waitForSelector('.content', { timeout: 10000 });
Handle Disconnections
WebSocket connections to cloud browsers can drop due to network issues:
browser.on('disconnected', () => {
console.error('Browser disconnected - reconnecting');
// Implement reconnection logic
});
Memory Management
Close Pages After Use
Each open page consumes memory. Close pages as soon as you are done:
const urls = ['url1', 'url2', 'url3', ...];
for (const url of urls) {
const page = await browser.newPage();
try {
await page.goto(url);
// Process the page
} finally {
await page.close(); // Free memory immediately
}
}
Block Unnecessary Resources
Pages load many resources you may not need:
await page.setRequestInterception(true);
page.on('request', (req) => {
const type = req.resourceType();
if (['image', 'media', 'font'].includes(type)) {
req.abort();
} else {
req.continue();
}
});
Error Handling
Categorize Errors
Not all errors deserve the same response:
async function processWithRetry(task) {
for (let attempt = 0; attempt < 3; attempt++) {
try {
return await processTask(task);
} catch (error) {
if (error.message.includes('net::ERR_')) {
// Network error - retry with delay
await new Promise(r => setTimeout(r, 2000 * (attempt + 1)));
continue;
}
if (error.message.includes('Target closed')) {
// Page crashed - retry immediately
continue;
}
// Unknown error - do not retry
throw error;
}
}
}
Log Structured Data
For every automation task, log enough context to debug failures:
function logTask(taskId, status, details) {
console.log(JSON.stringify({
timestamp: new Date().toISOString(),
taskId,
status,
duration: details.duration,
url: details.url,
error: details.error?.message,
}));
}
Monitoring
Track these metrics for operational visibility:
| Metric | Threshold | Action |
|---|---|---|
| Error rate | > 10% | Investigate, pause if > 25% |
| P95 duration | > 60s | Optimize or increase timeouts |
| Memory per task | > 500 MB | Check for memory leaks |
| Queue depth | Growing | Scale up workers |
| Success rate | < 90% | Check target site changes |
Graceful Shutdown
Handle process termination cleanly:
const activeBrowsers = new Set();
process.on('SIGTERM', async () => {
console.log('Shutting down gracefully...');
// Close all active browsers
await Promise.all(
Array.from(activeBrowsers).map(browser =>
browser.close().catch(() => {})
)
);
process.exit(0);
});
Best Practices
- Always use try/finally for browser and page cleanup
- Set explicit timeouts on every operation
- Block unnecessary resources to reduce memory and bandwidth
- Log structured data for every task
- Implement circuit breakers to stop when error rates spike
- Handle disconnections with automatic reconnection
- Monitor resource consumption and set alerts
- Test failure scenarios - simulate timeouts, crashes, and network drops