Uptime Monitoring & Alerting

~20 min · Monitoring

Ref

Primary Source

Better Stack — Uptime Monitoring Guide

Practical guide to setting up uptime monitoring with realistic alerting thresholds. Read →

Why Uptime Monitoring?

Sentry catches errors in running code. Uptime monitoring catches when your app stops running entirely. A monitor pings your /health endpoint every 30–60 seconds. If it doesn't respond, you get alerted. You know before your users do.

The /health Endpoint

app.get('/health', async (req, res) => {
  const health = { status: 'healthy', uptime: process.uptime() };
  try {
    await db.query('SELECT 1');
    health.database = 'ok';
  } catch {
    health.database = 'down';
    health.status = 'degraded';
  }
  try {
    await redis.ping();
    health.redis = 'ok';
  } catch {
    health.redis = 'down';
    health.status = 'degraded';
  }
  res.status(health.status === 'healthy' ? 200 : 503).json(health);
});

Free Uptime Tools

Tool	Free tier	Best for
UptimeRobot	50 monitors, 5-min checks	Getting started
Better Stack	10 monitors, 3-min checks	Startups — great integrations
Freshping	50 monitors, 1-min checks	Fastest free checks

Alerting Best Practices

Alert on 2+ consecutive failures — not just 1 (reduces false alarms)
Alert channels: Slack #alerts + PagerDuty/OpsGenie for on-call
SSL cert expiry monitoring: alert 30 days before
Status page: publish at status.yourapp.com for transparent communication

Incident Response

Acknowledge — confirm someone is looking
Assess — docker compose ps, docker compose logs -f api
Mitigate — restart service or rollback deploy
Communicate — update status page
Postmortem — write what happened, why, prevention

💡

Write a runbook before you need it: "if alert X fires, do steps 1-5." At 3am you want a checklist, not a problem-solving session.

Check Your Understanding

1. /health returns 200 but doesn't check the database. What problem does this cause?

2. You get 3 false alarm alerts weekly because your server hiccups for 30 seconds. Fix?

3. Your SSL cert expired. A user found out and emailed you. What monitoring would have prevented this?