[AWS] ECS Fargate + Prometheus + Grafana (with Terraform)

Infra/AWS

[AWS] ECS Fargate + Prometheus + Grafana (with Terraform)

하우아유두잉 2024. 4. 1. 11:34

ECS Fargate에 모니터링 시스템을 구축하고자 한다.

조사해보니 Prometheus + Grafana 조합이 가장 합리적이라고 판단했다.

무료이면서 아웃풋이 괜찮다.

자세히 알고 싶다면, 검색하길 바란다.

각설하고, 주요 과정을 기록하고자 한다.

결과는 정말 별거 없지만, 삽질을 며칠 했다..

자, 먼저 인프라 구성부터 살펴보자.

백엔드와 프론트엔드 서비스 각각에 prometheus/node-exporter container를 추가했다.

terrafom 코드로 IoC 작업을 했다.

...

resource "aws_ecs_task_definition" "example" {

	container_definitions = jsonencode([
    	{
    	  name        = "node-exporter"
          image       = "quay.io/prometheus/node-exporter:latest"
          cpu         = 256,
          memory      = 512
          essential   = true
          portMappings = [
            {
              containerPort = 9100
              hostPort      = 9100
              protocol      = "tcp"
            }
          ]
          logConfiguration = {
            logDriver = "awslogs",
            options = {
              awslogs-group         = aws_cloudwatch_log_group.example.name,
              awslogs-region        = "[ap-west-1]",
              awslogs-stream-prefix = "ecs"
            }
          },

		}
    ])
}

여기서 삽질 한 내용을 잠깐 최대한 줄여서,, 말하자면,,

아래 내용을 -> terraform으로 컨버팅 하려고 했다.

볼륨설정과 커멘드를 모두 무조건 해야 되는 줄 알고, 바로 적용 하려고 했다가 실패했다.

Fargate에서는 volume을 사용하려면 EFS를 mount 해야 한다고 한다.

그런데 mount가 계속 실패돼서 원인을 계속 못 찾다가,

ecs-exporter로 바꿨다.

그런데 metrics가 너무 부족해서 다시 node-exporter 설치를 시도 했고,

volume과 command를 빼도 잘 되는게 확인 되었다.(괜한 삽질..)

// docker-compose.yml
...
  node-exporter:
    image: quay.io/prometheus/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /:/host:ro
    command:
      - "--path.rootfs=/host"
    ports:
      - 9100:9100

컨테이너에 잘 추가가 되었고 상태가 Running이면 성공!

접속해보면 아래와 같은 결과가 나온다.

그럼 이제 프로메테우스로 수집해보자.

아래는 설정 파일이다.

target을 본인의 값으로 설정해주자.

# prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:

  - job_name: "example"
    scheme: https
    static_configs:
      - targets: ["example.com:9100"]

아래는 프로메테우스 docker compose 파일이다.

# docker-compose.yml


version: "3.8"

volumes:
  prometheus_data: {}

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      - "--web.console.libraries=/etc/prometheus/console_libraries"
      - "--web.console.templates=/etc/prometheus/consoles"
      - "--web.enable-lifecycle"
    ports:
      - 9090:9090

프로메테우스 실행

$ docker-compose up -d

localhost:9090으로 접속해서 확인해보자.

Status > Targets 이동

정상적으로 작동되면 State가 UP이 된다.

그럼 이제 Grafana를 추가하자.

.datasource.yml 생성

# config file version
apiVersion: 1

deleteDatasources:
  - name: Prometheus

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    orgId: 1
    url: http://prometheus:9090
    basicAuth: false
    isDefault: false
    version: 1
    editable: false

docker-compose.yml 수정

version: "3.8"

volumes:
  prometheus_data: {}

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      - "--web.console.libraries=/etc/prometheus/console_libraries"
      - "--web.console.templates=/etc/prometheus/consoles"
      - "--web.enable-lifecycle"
    ports:
      - 9090:9090

  grafana:
    image: grafana/grafana
    ports:
      - 9091:3000
    volumes:
      - ./datasource.yml:/etc/grafana/provisioning/datasources/datasource.yml
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin

docker-compose 재시작

localhost:9091로 접속해보자.

관리자 계정을 만들어서 로그인 한다.

자, 그럼 대시보드를 추가하자.

그라파나는 프로메테우스에서 저장하는 시계열 metrics 데이터를 시각화해주는 도구이다.

아래 사이트에서 적당한 템플릿을 가져오면 편하다.

https://grafana.com/grafana/dashboards/?search=node

Dashboards | Grafana Labs

Thank you! Your message has been received!

grafana.com

나의 경우 node-exporter를 사용 했기 때문에 아래 템플릿을 추가했다.

https://grafana.com/grafana/dashboards/1860-node-exporter-full/

Node Exporter Full | Grafana Labs

Thank you! Your message has been received!

grafana.com

우측 하단에 ID를 복사한다.

다시 그라파나에서

Dashboards > New > Import 클릭!

복사한 ID를 붙여넣기하고 Load 버튼을 클릭하면 된다.

시간이 흘러 데이터가 축적되면 아래와 같다.

이렇게 프로메테우스와 그라파나를 사용해 기본적인 모니터링 방법을 알아봤다.

알림기능도 넣을 수 있다니, 무료치고 너무 혜자툴이 아닌가?

세부적인 기능은 더 알아봐야겠지만, 이 작업만 할 수 있는 환경이 아니라 일단 여기까지만 하고자 한다.

레퍼런스

https://velog.io/@arnold_99/%EB%AA%A8%EB%8B%88%ED%84%B0%EB%A7%81-%ED%94%84%EB%A1%9C%EB%A9%94%ED%85%8C%EC%9A%B0%EC%8A%A4%EC%99%80-%EA%B7%B8%EB%9D%BC%ED%8C%8C%EB%82%98

모니터링, 프로메테우스와 그라파나

모니터링이 어떤 것인지 간단히 살펴보겠습니다.m-k8s 노드에서 bpytop 명령을 실행하면 다음 그림과 같이 시스템 상태 정보가 보입니다.화면에서 리소스의 상태 및 문제가 될 가능성이 있는 정보

velog.io

https://kubernetes.github.io/ingress-nginx/user-guide/monitoring/#grafana

Prometheus and Grafana installation - Ingress-Nginx Controller

Monitoring Two different methods to install and configure Prometheus and Grafana are described in this doc. * Prometheus and Grafana installation using Pod Annotations. This installs Prometheus and Grafana in the same namespace as NGINX Ingress * Prometheu

kubernetes.github.io

https://lordofkangs.tistory.com/329

[Grafana] 그라파나 연동하기 ( With SpringBoot, Prometheus )

그라파나(Grafana)란? 시간이 지남에 따라 추이가 변하는 데이터를 메트릭(Metric)이라 부른다. CPU 사용률, 메모리 사용률, 트래픽 등이 메트릭(Metric)에 해당된다. 메트릭은 시간별로 데이터가 수집

lordofkangs.tistory.com

https://github.com/rfmoz

rfmoz - Overview

rfmoz has 12 repositories available. Follow their code on GitHub.

github.com

https://medium.com/finda-tech/grafana%EB%9E%80-f3c7c1551c38

Grafana란?

시계열 데이터에 대한 대시보드를 제공해주는 Data Visualization Tool인 Grafana에 대해 알아보자.

medium.com

https://play.grafana.org/dashboards

Grafana

If you're seeing this Grafana has failed to load its application files 1. This could be caused by your reverse proxy settings. 2. If you host grafana under subpath make sure your grafana.ini root_url setting includes subpath. If not using a reverse proxy m

play.grafana.org

https://medium.com/@itsinil/prometheus%EC%99%80-grafana%EB%A5%BC-%ED%86%B5%ED%95%9C-%EB%AA%A8%EB%8B%88%ED%84%B0%EB%A7%81-%EC%8B%9C%EC%8A%A4%ED%85%9C-%EA%B5%AC%EC%B6%95%ED%95%98%EA%B8%B0-47b9b6e829b4

Prometheus와 Grafana를 통한 모니터링 시스템 구축하기

회사에서 운영중인 인프라는 IDC , AWS 클라우드가 복합적으로 구성되어 있습니다. 많은 서버와 인스턴스 그리고 서비스의 양 대비 인력이 부족하기에 장애나 신뢰성을 측정하고자 할때 빠르게

medium.com

https://docs.aws.amazon.com/ko_kr/AmazonCloudWatch/latest/monitoring/ContainerInsights-Prometheus.html

Container Insights Prometheus 지표 모니터링 - Amazon CloudWatch

이 페이지에 작업이 필요하다는 점을 알려 주셔서 감사합니다. 실망시켜 드려 죄송합니다. 잠깐 시간을 내어 설명서를 향상시킬 수 있는 방법에 대해 말씀해 주십시오.

docs.aws.amazon.com

https://aws-diary.tistory.com/72

Grafana Mornitoring - 4. ECS Cluster & Container

Cluster의 지표는 EC2에서 NodeExporter를 통해 Metric을 수집하는 방법과 동일합니다. Container의 각 지표를 수집하는 방법은 생각 외로 매우 간단합니다. CAdvisor를 그냥 서비스로 하나 띄워주기만 하면