Skip to content

Telemetry Smoke

imperfect-cli telemetry-smoke proves the external-fetch failure and Alice retry telemetry paths without touching live AQICN or Alice dependencies. The command swaps in deterministic in-process HTTP clients, forwards the real Logfire metric/log calls when LOGFIRE_TOKEN is configured, and fails if the expected attributes are missing or if the sentinel AQICN token appears in telemetry attributes.

Hermetic local check:

env $(grep -v '^#' env.template | xargs) LOGFIRE_TOKEN= DATABASE_URL= \
  uv run imperfect-cli telemetry-smoke --no-emit-logfire

Logfire-exporting smoke, using LOGFIRE_TOKEN from .env.cli or the shell:

env $(grep -v '^#' env.template | grep -v '^LOGFIRE_TOKEN=' | xargs) \
  uv run imperfect-cli telemetry-smoke

Expected command output includes:

  • external.fetch.outcome with target=aqicn, status=http_error
  • environment.aqicn_failed with target=aqicn, status=http_error
  • Alice fetch_data transient error, retrying with target=alice.fetch_data, attempt=1, max_attempts=2
  • Alice fetch_data connection error with target=alice.fetch_data, attempt=2, max_attempts=2

Logfire Queries

Run these against the imperfect-api Logfire project after the exporting smoke.

Metric:

SELECT service_name, metric_name, attributes->>'target' AS target,
       attributes->>'status' AS status, sum(scalar_value) AS n,
       max(recorded_timestamp) AS latest
FROM metrics
WHERE recorded_timestamp > now() - interval '30 minutes'
  AND service_name = 'imperfect-cli'
  AND metric_name = 'external.fetch.outcome'
GROUP BY service_name, metric_name, target, status
ORDER BY latest DESC
LIMIT 20;

Warnings/errors:

SELECT service_name, message, level, attributes->>'target' AS target,
       attributes->>'status' AS status, attributes->>'attempt' AS attempt,
       attributes->>'max_attempts' AS max_attempts,
       attributes->>'error_type' AS error_type,
       attributes->>'status_code' AS status_code,
       count(*) AS n, max(start_timestamp) AS latest
FROM records
WHERE start_timestamp > now() - interval '30 minutes'
  AND service_name = 'imperfect-cli'
  AND message IN (
    'environment.aqicn_failed',
    'Alice fetch_data transient error, retrying',
    'Alice fetch_data connection error'
  )
GROUP BY service_name, message, level, target, status, attempt, max_attempts,
         error_type, status_code
ORDER BY latest DESC
LIMIT 50;