一、背景

  • 本地开发测试环境使用vm虚拟化搭建,同时运行这k8s集群,当前存在着一个问题则是如何监控其他虚拟主机(非k8s集群node节点)。为了降低监控成本这里同样采用prometheus进行监控。参考文档:https://github.com/prometheus/node_exporter

  • 官方就node expoter自动发现监控上没有做过多的介绍(事实上本地自建的虚拟机做自动发现功能存在一定的难度。对于云计算平台自动发现功能监控参考文档:https://prometheus.io/docs/prometheus/latest/configuration/configuration/)

  • 这里通过scrape_configs定义consul_sd_config对本地虚拟主机进行自动发现监控。参考文档:https://prometheus.io/docs/prometheus/latest/configuration/configuration/
  • 架构说明,通过将本地每台的虚拟主机的注册信息注册至consul,prometheus+consul部署自动发现虚拟主机。而后运行周期性注销连接失败的服务(主机下线)。

二、安装consul

[k8s-dev-test@rancher-k8s-conn consul]$ helm upgrade -i node-expoter-consul -n kube-ops .

三、关联promtheus

参考:prometheus配置additional 参考文档见:https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/additional-scrape-config.md

[k8s-dev-test@rancher-k8s-conn tmp]$ vi prometheus-additional.yaml
- job_name: 'node-expoter-consul-prometheus'
  consul_sd_configs:
  - server: 'node-expoter-consul.kube-ops:8500'
    services: ['server']

#备注:这里我仅定义了server这个组,如果有多个需求可以定义多个不同分类。例如webserver、dbserver

四、注册

参考脚本:https://github.com/xiangys0134/deploy/blob/master/%E5%BE%AE%E6%9C%8D%E5%8A%A1/prometheus/register_consul_node.sh

#!/bin/bash
# linux初始化nodeport
# Author yousong.xiang
# Date 2021.9.30
# v1.0.1

node_exporter_version='1.1.2'
# download_path="https://github.com/prometheus/node_exporter/releases/download/v{node_exporter_version}/node_exporter-{node_exporter_version}.linux-amd64.tar.gz"
download_path="https://shelllinux.oss-cn-shanghai.aliyuncs.com/deploy/source/node_exporter-{node_exporter_version}.linux-amd64.tar.gz"
node_exporter_conf='/usr/lib/systemd/system/node_exporter.service'

function env_check {
    uid=(id -u)
    if [ {uid} -ne 0 ]; then
        echo '==此脚本需要root用户执行,程序即将退出.'
        exit 2
    fi

    ping -c 1 -W 2 www.aliyun.com >/dev/null 2>&1
    if [? -ne 0 ]; then
        echo '==网络不通,请检查网络'
        exit 6
    fi
}

function check_rpm {
    rpm_name=1
    num=`rpm -qa|grep{rpm_name}|wc -l`
    echo {num}
}

function register_conf {
  if [ ! -f{node_exporter_conf} ]; then
    cat>>{node_exporter_conf}<<EOF
[Unit]
Description=node_exporter

[Service]
Type=simple
Restart=on-failure
ExecStart=/usr/local/node_exporter/node_exporter --web.listen-address=0.0.0.0:9101

[Install]
WantedBy=multi-user.target
EOF
  fi
  systemctl daemon-reload
  systemctl start node_exporter.service
  systemctl enable node_exporter.service
}

function register_consul {
  service_name=1
  instance_id=2
  ip=3
  port=4
  curl -X PUT -d '{"id":"'"instance_id"'",
                   "name":"'"service_name"'",
                   "address":"'"ip"'",
                   "port":'"port"',
                   "tags":["'"service_name"'"],
                   "checks":[{"http": "http://'"ip"':'"port"'","interval":"10s"}]}' \
                   http://192.168.7.45:30501/v1/agent/service/register
}

function main {
  env_check
  if [ `check_rpm wget` -eq 0 ]; then
    yum install -y wget
  fi

  echo '正在下载包...'
  wget {download_path}
  if [? -ne 0 ]; then
    echo '包下载失败,请检查网络!'
    exit 3
  fi
  tar -zxvf node_exporter-{node_exporter_version}.linux-amd64.tar.gz
  mv node_exporter-{node_exporter_version}.linux-amd64 /usr/local/node_exporter

  register_conf
  rm -rf node_exporter-{node_exporter_version}.linux-amd64.tar.gz
}



if [# -eq 0 ]; then
  main
else
  main
  #传参示例:bash -x a.sh server rancher-k8s-conn 192.168.7.58 9101
  register_consul $@
fi

五、注销服务

consul注销服务参考:https://www.consul.io/api-docs/catalog

周期性的注销无效服务(可以将脚本封装至docker镜像中,通过cron周期性的运行任务),参考脚本如下:

#!/bin/bash
# consul删除安装
# Author yousong.xiang
# Date 2021.9.30
# v1.0.1

#k8s service负载地址
consul_address='node-expoter-consul.kube-ops:8500'

[ -f /etc/profile ] && . /etc/profile

yum install -y epel-release jq
yum install -y jq

function consul_del {
  CONSUL_NODES=(curl -s -XGET http://{consul_address}/v1/catalog/nodes | jq -r '.[].Address')
  CONSUL_CRITICAL=(curl -s -XGET http://{consul_address}/v1/health/state/critical | jq -r '.[].ServiceID')
  for critical in {CONSUL_CRITICAL}

  do
    for consul_ip in{CONSUL_NODES}
    do
        curl -s -XPUT http://{consul_ip}:8500/v1/agent/service/deregister/{critical} &> /dev/null
        echo "${critical} 已删除" >> consul_del-`date +%Y%m%d`.log
    done
  done
}

consul_del
最后修改日期: 2021年10月8日

作者

留言

撰写回覆或留言

发布留言必须填写的电子邮件地址不会公开。