Monitoring

The Foreman project uses a sponsored Grafana instance at https://theforeman.grafana.net for metrics and alerting

Access

People who work on infrastructure can be added to the organization by Eric, Evgeni or Ewoud.

Dashboards

  • Blackbox Exporter (HTTP Prober) shows HTTP status at https://theforeman.grafana.net/d/NEzutrbMk/blackbox-exporter-http-prober
  • Restic Exporter shows backup status at https://theforeman.grafana.net/d/9f4a1fae-9438-41af-97f3-ca0f87f8ba3f/restic-exporter
  • Various OS-level views can be seen at https://theforeman.grafana.net/dashboards/f/integration---linux-node/

Alerting

Contact points

Contact points in Grafana are "notification groups", that can use different integrations (like mail, Slack, etc) and targets (like mail address etc).

Right now only one contact point is defined: grafana-default-email - it sends email to Eric, Ewoud and Evgeni.

Alert rules

pending package updates

When apt_upgrades_pending or yum_upgrades_pending is > 0 an alert is sent to grafana-default-email

reboot required

When node_reboot_required is > 0 an alert is sent to grafana-default-email

missing backup

When time() - restic_snapshot_timestamp_seconds is > 90000 (= the last restic snapshot is older than 25h) an alert is sent to grafana-default-email