Health Endpoint Monitoring Pattern

Link McKinneyFebruary 20, 2025About 2 min

Context and Problem

Applications need to ensure that their components are healthy and available, detecting failures proactively.

Lack of visibility into application health.
Difficulty detecting partial failures within a distributed system.
Inability to route traffic away from unhealthy instances.
Challenges in automating service recovery.

Solution

The Health Endpoint Monitoring pattern provides dedicated health check endpoints that external systems can query to determine application status.

Implement a `/health` endpoint that reports application status.
Include checks for database connectivity, external dependencies, and internal processes.
Integrate health checks with load balancers and orchestrators.
Log and alert on health check failures.
Use different levels of health checks (liveness, readiness, and startup probes).

Benefits

Proactive failure detection: Identifies service failures before they impact users.
Automated recovery: Enables load balancers and orchestrators to handle failures.
Improved observability: Provides insight into the health of system components.
Enhanced resiliency: Helps ensure high availability of services.

Trade-offs

Increased overhead: Frequent health checks consume system resources.
False positives/negatives: Improperly configured checks may misreport status.
Security risks: Exposing health endpoints publicly can provide attackers with system insights.

Issues and Considerations

Dependency failures: A failing dependency may not always indicate an application failure.
Monitoring granularity: Defining the right level of checks for different scenarios.
Scalability: Managing health checks in large-scale distributed systems.

When to Use This Pattern

When monitoring application health for automated recovery.
When integrating with load balancers to avoid routing traffic to unhealthy instances.
When improving visibility into service dependencies.

Example Implementation

class HealthEndpoint {
  constructor(private services: ServiceDependencies) {}

  async checkHealth(): Promise<HealthStatus> {
    try {
      // 1. Basic Health Check
      const basicHealth = await this.checkBasicHealth();
      
      // 2. Database Health
      const dbHealth = await this.checkDatabaseHealth();
      
      // 3. Cache Health
      const cacheHealth = await this.checkCacheHealth();
      
      // 4. External Dependencies
      const dependencyHealth = await this.checkDependencies();
      
      // 5. System Metrics
      const metrics = await this.collectMetrics();
      
      return {
        status: this.aggregateStatus([
          basicHealth,
          dbHealth,
          cacheHealth,
          dependencyHealth
        ]),
        metrics,
        timestamp: new Date()
      };
    } catch (error) {
      return {
        status: "unhealthy",
        error: error.message,
        timestamp: new Date()
      };
    }
  }
}