Monitoring & Observability
at the Edge
Structured logging, real-time metrics, alerting strategies, and debugging patterns for 28 Cloudflare Workers in production.
You can't fix what you can't see. With 28 Workers processing requests across 300+ edge locations, observability isn't optionalโit's the difference between "we noticed a 10% revenue drop" and "we fixed the bug before users noticed."
Here's the monitoring stack keeping our edge infrastructure visible and debuggable.
Pattern 1: Structured Logging
Every log entry follows the same structure. No exceptions:
interface LogEntry {
timestamp: string;
level: 'debug' | 'info' | 'warn' | 'error';
requestId: string;
worker: string;
environment: string;
message: string;
data?: Record<string, any>;
error?: {
name: string;
message: string;
stack?: string;
};
duration?: number;
cf?: {
colo: string;
country: string;
};
}
class Logger {
constructor(
private worker: string,
private requestId: string,
private cf?: IncomingRequestCfProperties
) {}
info(message: string, data?: Record<string, any>) {
this.log('info', message, data);
}
error(message: string, error: Error, data?: Record<string, any>) {
this.log('error', message, {
...data,
error: {
name: error.name,
message: error.message,
stack: error.stack
}
});
}
private log(level: LogEntry['level'], message: string, data?: any) {
const entry: LogEntry = {
timestamp: new Date().toISOString(),
level,
requestId: this.requestId,
worker: this.worker,
environment: ENV,
message,
data,
cf: this.cf ? { colo: this.cf.colo, country: this.cf.country } : undefined
};
console.log(JSON.stringify(entry));
}
}
Pattern 2: Request Tracing
One request ID flows through all services:
export function withTracing(handler: Handler): Handler {
return async (request, env, ctx) => {
// Get or create request ID
const requestId = request.headers.get('X-Request-ID')
|| crypto.randomUUID();
const startTime = Date.now();
const logger = new Logger('api-gateway', requestId, request.cf);
logger.info('Request started', {
method: request.method,
url: request.url,
userAgent: request.headers.get('User-Agent')
});
try {
const response = await handler(request, env, ctx);
logger.info('Request completed', {
status: response.status,
duration: Date.now() - startTime
});
// Add tracing headers to response
const headers = new Headers(response.headers);
headers.set('X-Request-ID', requestId);
headers.set('X-Response-Time', `${Date.now() - startTime}ms`);
return new Response(response.body, { ...response, headers });
} catch (error) {
logger.error('Request failed', error, {
duration: Date.now() - startTime
});
throw error;
}
};
}
Pattern 3: Real-Time Metrics
Push metrics to Analytics Engine or external services:
export function trackMetrics(
request: Request,
response: Response,
duration: number,
ctx: ExecutionContext,
env: Env
) {
const datapoint = {
// Dimensions (groupable)
worker: 'api-gateway',
method: request.method,
path: new URL(request.url).pathname,
status: response.status.toString(),
statusGroup: Math.floor(response.status / 100) + 'xx',
colo: request.cf?.colo || 'unknown',
country: request.cf?.country || 'unknown',
// Metrics (aggregatable)
count: 1,
duration,
success: response.ok ? 1 : 0,
error: response.ok ? 0 : 1
};
// Fire and forget
ctx.waitUntil(
env.ANALYTICS.writeDataPoint({
blobs: [datapoint.worker, datapoint.method, datapoint.path],
doubles: [datapoint.duration, datapoint.count],
indexes: [datapoint.status]
})
);
}
Pattern 4: Health Checks with Cron
const ENDPOINTS = [
{ name: 'API Gateway', url: 'https://api.proptechusa.ai/health' },
{ name: 'Lead Processor', url: 'https://leads.proptechusa.ai/health' },
{ name: 'AI Chatbot', url: 'https://chat.proptechusa.ai/health' },
];
export default {
async scheduled(event: ScheduledEvent, env: Env, ctx: ExecutionContext) {
const results = await Promise.all(
ENDPOINTS.map(async (endpoint) => {
const start = Date.now();
try {
const res = await fetch(endpoint.url, {
signal: AbortSignal.timeout(5000)
});
return {
name: endpoint.name,
healthy: res.ok,
latency: Date.now() - start,
status: res.status
};
} catch (e) {
return {
name: endpoint.name,
healthy: false,
latency: Date.now() - start,
error: e.message
};
}
})
);
const unhealthy = results.filter(r => !r.healthy);
if (unhealthy.length > 0) {
await sendSlackAlert({
text: `๐จ Health Check Failed`,
blocks: unhealthy.map(r => ({
type: 'section',
text: { type: 'mrkdwn', text: `*${r.name}*: ${r.error || r.status}` }
}))
}, env);
}
}
};
Pattern 5: Error Alerting
async function sendSlackAlert(
error: Error,
context: {
requestId: string;
worker: string;
url: string;
},
env: Env
) {
await fetch(env.SLACK_WEBHOOK_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
blocks: [
{
type: 'header',
text: { type: 'plain_text', text: '๐จ Production Error' }
},
{
type: 'section',
fields: [
{ type: 'mrkdwn', text: `*Worker:*\n${context.worker}` },
{ type: 'mrkdwn', text: `*Request ID:*\n\`${context.requestId}\`` },
{ type: 'mrkdwn', text: `*Error:*\n${error.message}` },
{ type: 'mrkdwn', text: `*URL:*\n${context.url}` }
]
},
{
type: 'section',
text: {
type: 'mrkdwn',
text: `\`\`\`${error.stack?.slice(0, 500)}\`\`\``
}
}
]
})
});
}
Observability Checklist
- Structured JSON logs with consistent schema
- Request IDs propagated through all services
- Latency tracking at p50, p95, p99 percentiles
- Error rate monitoring with alerting thresholds
- Health checks running every minute via Cron
- Slack alerts for critical errors and outages
- Dashboard for real-time metrics visualization
- Log retention for debugging historical issues
Observability isn't about collecting dataโit's about answering questions. "Why did that request fail?" should take seconds to answer, not hours of digging through logs.
Related Articles
Need Observability Setup?
We build monitoring systems that catch issues before users do.
โ Get Started