terraform-modules/grafana
2025-08-26 18:38:36 +02:00
..
alert-folder added grafana module 2025-07-10 11:28:59 +02:00
alert-group added grafana module 2025-07-10 11:28:59 +02:00
alerts updated grafana modules 2025-08-05 16:40:48 +02:00
contact-point-gchat updated grafana modules 2025-08-05 16:40:48 +02:00
contact-point-opsgenie added grafana module 2025-07-10 11:28:59 +02:00
contact-point-opsgenie-heartbeat added grafana module 2025-07-10 11:28:59 +02:00
datasource Fixes 2025-08-26 18:38:36 +02:00
examples updated grafana modules 2025-08-05 16:40:48 +02:00
message-template added grafana module 2025-07-10 11:28:59 +02:00
notification-policy added grafana module 2025-07-10 11:28:59 +02:00
README.md Fixes 2025-08-26 18:38:36 +02:00

Modules for Grafana alerts and dashboards

Alerting

Please check documentation about Grafana alerting here and official documentation for deeper look.

The Terraform modules are separated per resource type, check README in each module directory for spefic examples. Below is example for alerts using „Prometheus/Thanos“ datasource and sending notification to „Google Chat“.

Authentication

Set Grafana credentials as Terraform variables:

export TF_VAR_grafana_url="https://grafana.example.com"
export TF_VAR_grafana_username="admin"
export TF_VAR_grafana_password="super-secret"

These credentials are used by all modules to authenticate with the Grafana API.


Directory Structure

Organize alerts, templates, and Terraform code as follows:

.
├── alerts/
│   ├── common-infra/
│   │   ├── loki/
│   │   │   └── alert-loki.yaml
│   │   └── thanos/
│   │       └── alert-thanos.yaml
│   ├── oncall/
│   │   └── alert-oncall.yaml
│   └── heartbeats/
│       └── alert-heartbeat.yaml
├── templates/
│   └── myteam/
│       └── gchat-message.tmpl
└── main.tf
  • Alerts: YAML files defining rule groups (apiVersion: 1, groups: [...]).
  • Templates: Notification templates for Google Chat contact points.
  • Terraform code: References modules and binds everything together.

Defining Secrets

Datasource URLs and credentials should be stored in Terraform variables, not hardcoded.

Example: Environment Variables

export TF_VAR_thanos_coin_prd_url="https://thanos.example.com"
export TF_VAR_thanos_coin_prd_user="reader"
export TF_VAR_thanos_coin_prd_pass="password"

export TF_VAR_loki_coin_prd_url="https://loki.example.com"
export TF_VAR_loki_coin_prd_user="reader"
export TF_VAR_loki_coin_prd_pass="password"

export TF_VAR_opsgenie_api_key="xxxxxx"

Module Usage

Datasources

Define multiple datasources (Prometheus, Loki, etc.) with unique keys for URL/username/password:

module "datasource" {
  source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/datasource?ref=main"

  datasources = {
    Thanos-Common-Infra-PRD = {
      type                  = "prometheus"
      url_key               = "thanos_coin_prd"
      basic_auth_user_key   = "thanos_coin_prd"
      pass_key              = "thanos_coin_prd"
      is_default            = true
    }
    Loki-Common-Infra-PRD = {
      type                = "loki"
      url_key             = "loki_coin_prd"
      basic_auth_user_key = "loki_coin_prd"
      pass_key            = "loki_coin_prd"
    }
  }

  datasource_urls = {
    thanos_coin_prd = var.thanos_coin_prd_url
    loki_coin_prd   = var.loki_coin_prd_url
  }

  datasource_users = {
    thanos_coin_prd = var.thanos_coin_prd_user
    loki_coin_prd   = var.loki_coin_prd_user
  }

  datasource_passwords = {
    thanos_coin_prd = var.thanos_coin_prd_pass
    loki_coin_prd   = var.loki_coin_prd_pass
  }
}

Contact Points

Google Chat

Each Google Chat space is configured as a contact point:

module "gchat-contact-point-coin" {
  source              = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/contact-point-gchat?ref=main"
  gchat_url           = var.gchat_url_coin
  contact_point_name  = "gchat-coin"
  templates_dir       = "templates/coin"
  template_prefix     = "coin-"
  disable_provenance  = true
}

OpsGenie

OpsGenie contact points use API keys:

module "opsgenie-contact-point" {
  source             = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/contact-point-opsgenie?ref=main"
  contact_point_name = "opsgenie-dev"
  opsgenie_api_key   = var.opsgenie_api_key
}

Alert Folders

Organize alerts in Grafana folders for logical separation:

module "alert-folder" {
  source       = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/alert-folder?ref=main"
  alert-folder = "Common-Infra-Alerts"
}

Notification Policies

Map folders to contact points (e.g., send “Common-Infra-Alerts” to Google Chat):

module "notification-policy" {
  source                    = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/notification-policy?ref=main"
  default_contact_point_uid = module.gchat-contact-point-coin.contact_point_name
  group_by                  = ["alertname"]

  folder_policies = {
    "Common-Infra-Alerts"        = module.gchat-contact-point-coin.contact_point_name
    "Common-Infra-OnCall-Alerts" = module.opsgenie-contact-point.contact_name
  }
}

Alert Definitions

Alert rules are defined in YAML and applied via the module:

module "alerting-coin" {
  source             = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/alerts?ref=main"
  alerts_dir         = "alerts/common-infra/thanos"
  datasource_uid     = module.datasource.datasource_uids["Thanos-Common-Infra-PRD"]
  folder_uid         = module.alert-folder.folder_uid
  receiver           = module.gchat-contact-point-coin.contact_point_name
  disable_provenance = true
}

Alert YAML Format

Alerts are defined in YAML Grafana format. The easiest way to get example from scratch is to define alert in Grafana UI and then export it using „Export rules“ button. However, make sure to remove some fields which are not needed and provided by Terraform module logic automatically:

  • datasourceUid (defined with alerts module)
  • notification_settings ( defined with notification policy module)

Each file must have apiVersion: 1 and define groups:

apiVersion: 1
groups:
  - name: infra-alerts
    interval: 1m
    rules:
      - uid: pod-restart-alert
        title: "Pod Restart Count High"
        condition: C
        data:
          - refId: A
            relativeTimeRange:
              from: 600
              to: 0
            model:
              expr: increase(kube_pod_container_status_restarts_total{}[5m]) > 3
              instant: true
              refId: A
          - refId: C
            datasourceUid: __expr__
            model:
              conditions:
                - evaluator:
                    params: [0]
                    type: gt
                  operator:
                    type: and
                  query:
                    params: [C]
                  reducer:
                    type: last
                  type: query
              expression: A
              type: threshold
        noDataState: OK
        execErrState: Error
        for: 5m
        annotations:
          description: "Pod is restarting too often"

Updating Alerts or Contact Points

  • Add new YAML files under alerts/ for additional rules.
  • Add new modules in main.tf for new datasources or contact points.
  • Run terraform apply to sync changes to Grafana.

You can examples for alerts and templates in examples folder.