Class: AgentHarness::ProviderHealthCheck

Inherits:
Object
  • Object
show all
Defined in:
lib/agent_harness/provider_health_check.rb

Overview

Performs health checks on configured providers

Validates provider setup, authentication status, and reachability. Returns per-provider status objects with name, status, message, and latency.

Examples:

Check all providers

results = AgentHarness::ProviderHealthCheck.check_all
results.each { |r| puts "#{r[:name]}: #{r[:status]}" }

Check a single provider

result = AgentHarness::ProviderHealthCheck.check(:claude)
puts result[:status] # => "ok", "error", or "degraded"

Constant Summary collapse

DEFAULT_TIMEOUT =

Single source of truth: derive the fallback from HealthCheckConfig’s default so that the timeout isn’t duplicated here and in configuration.rb.

HealthCheckConfig.new.timeout

Class Method Summary collapse

Class Method Details

.check(provider_name, timeout: configured_timeout) ⇒ Hash

Check health of a single provider

Parameters:

  • provider_name (Symbol, String)

    the provider name

  • timeout (Integer) (defaults to: configured_timeout)

    timeout in seconds

Returns:

  • (Hash)

    health status with :name, :status, :message, :latency_ms keys



43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# File 'lib/agent_harness/provider_health_check.rb', line 43

def check(provider_name, timeout: configured_timeout)
  name = normalize_name(provider_name)
  start_time = monotonic_now
  timeout = validate_timeout(timeout)

  Timeout.timeout(timeout) do
    perform_check(name, start_time)
  end
rescue Timeout::Error
  build_result(
    name: name,
    status: "error",
    message: "Health check timed out after #{timeout}s",
    start_time: start_time || monotonic_now
  )
rescue NotImplementedError => e
  # NotImplementedError inherits from ScriptError, not StandardError,
  # so it must be rescued explicitly. Its messages are safe internal
  # setup errors (e.g., missing provider methods) that help users
  # diagnose configuration problems.
  AgentHarness.logger&.error("ProviderHealthCheck error for #{name}: #{e.class}")
  build_result(
    name: name,
    status: "error",
    message: "Health check failed: #{e.class}: #{e.message}",
    start_time: start_time || monotonic_now
  )
rescue => e
  # Return a generic message to avoid leaking sensitive details
  # (e.g., tokens embedded in exception messages). Log only the
  # exception class (not the message) to avoid leaking secrets.
  AgentHarness.logger&.error("ProviderHealthCheck error for #{name}: #{e.class}")
  build_result(
    name: name,
    status: "error",
    message: "Health check failed: #{e.class}",
    start_time: start_time || monotonic_now
  )
end

.check_all(timeout: configured_timeout) ⇒ Array<Hash>

Check health of all configured providers

Parameters:

  • timeout (Integer) (defaults to: configured_timeout)

    timeout in seconds for each check

Returns:

  • (Array<Hash>)

    health status for each provider



28
29
30
31
32
33
34
35
36
# File 'lib/agent_harness/provider_health_check.rb', line 28

def check_all(timeout: configured_timeout)
  provider_names = if AgentHarness.configuration.providers.empty?
    Providers::Registry.instance.all
  else
    enabled_provider_names
  end

  provider_names.map { |name| check(name, timeout: timeout) }
end

.format_results(results) ⇒ String

Format health check results for CLI output

Parameters:

  • results (Array<Hash>)

    health check results

Returns:

  • (String)

    formatted output



87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# File 'lib/agent_harness/provider_health_check.rb', line 87

def format_results(results)
  lines = ["Checking providers..."]

  if results.empty?
    lines << ""
    lines << "No providers checked."
    return lines.join("\n")
  end

  results.each do |result|
    name = result[:name].to_s.ljust(16)
    case result[:status]
    when "ok"
      latency = result[:latency_ms] ? "(#{result[:latency_ms]}ms)" : ""
      lines << "#{name} OK #{latency}".rstrip
    when "degraded"
      lines << "  ~ #{name} #{result[:message]}"
    else
      lines << "#{name} #{result[:message]}"
    end
  end

  failed = results.count { |r| r[:status] == "error" }
  degraded = results.count { |r| r[:status] == "degraded" }
  total = results.size

  lines << ""
  summary_parts = []
  summary_parts << "#{failed} failed" if failed > 0
  summary_parts << "#{degraded} degraded" if degraded > 0

  provider_word = (total == 1) ? "provider" : "providers"
  lines << if summary_parts.any?
    "#{total} #{provider_word} checked: #{summary_parts.join(", ")}."
  else
    "All #{total} #{provider_word} healthy."
  end

  lines.join("\n")
end