知乎专栏 |
配置 docker /etc/docker/daemon.json
指定metrics采集端口, Prometheus 会定时从该端口拉取数据
{ "metrics-addr" : "127.0.0.1:9323", "experimental" : true }
查看 Docker 状态信息
iMac:prometheus neo$ curl http://localhost:9323/metrics # HELP builder_builds_failed_total Number of failed image builds # TYPE builder_builds_failed_total counter builder_builds_failed_total{reason="build_canceled"} 0 builder_builds_failed_total{reason="build_target_not_reachable_error"} 0 builder_builds_failed_total{reason="command_not_supported_error"} 0 builder_builds_failed_total{reason="dockerfile_empty_error"} 0 builder_builds_failed_total{reason="dockerfile_syntax_error"} 0 builder_builds_failed_total{reason="error_processing_commands_error"} 0 builder_builds_failed_total{reason="missing_onbuild_arguments_error"} 0 builder_builds_failed_total{reason="unknown_instruction_error"} 0 # HELP builder_builds_triggered_total Number of triggered image builds # TYPE builder_builds_triggered_total counter builder_builds_triggered_total 0 # HELP engine_daemon_container_actions_seconds The number of seconds it takes to process each container action # TYPE engine_daemon_container_actions_seconds histogram engine_daemon_container_actions_seconds_bucket{action="changes",le="0.005"} 1 engine_daemon_container_actions_seconds_bucket{action="changes",le="0.01"} 1 engine_daemon_container_actions_seconds_bucket{action="changes",le="0.025"} 1 engine_daemon_container_actions_seconds_bucket{action="changes",le="0.05"} 1 engine_daemon_container_actions_seconds_bucket{action="changes",le="0.1"} 1 engine_daemon_container_actions_seconds_bucket{action="changes",le="0.25"} 1 engine_daemon_container_actions_seconds_bucket{action="changes",le="0.5"} 1 engine_daemon_container_actions_seconds_bucket{action="changes",le="1"} 1 engine_daemon_container_actions_seconds_bucket{action="changes",le="2.5"} 1 engine_daemon_container_actions_seconds_bucket{action="changes",le="5"} 1 engine_daemon_container_actions_seconds_bucket{action="changes",le="10"} 1 engine_daemon_container_actions_seconds_bucket{action="changes",le="+Inf"} 1 engine_daemon_container_actions_seconds_sum{action="changes"} 0 engine_daemon_container_actions_seconds_count{action="changes"} 1 engine_daemon_container_actions_seconds_bucket{action="commit",le="0.005"} 1 engine_daemon_container_actions_seconds_bucket{action="commit",le="0.01"} 1 engine_daemon_container_actions_seconds_bucket{action="commit",le="0.025"} 1 engine_daemon_container_actions_seconds_bucket{action="commit",le="0.05"} 1 engine_daemon_container_actions_seconds_bucket{action="commit",le="0.1"} 1 engine_daemon_container_actions_seconds_bucket{action="commit",le="0.25"} 1 engine_daemon_container_actions_seconds_bucket{action="commit",le="0.5"} 1 engine_daemon_container_actions_seconds_bucket{action="commit",le="1"} 1 engine_daemon_container_actions_seconds_bucket{action="commit",le="2.5"} 1 engine_daemon_container_actions_seconds_bucket{action="commit",le="5"} 1 engine_daemon_container_actions_seconds_bucket{action="commit",le="10"} 1 engine_daemon_container_actions_seconds_bucket{action="commit",le="+Inf"} 1 engine_daemon_container_actions_seconds_sum{action="commit"} 0 engine_daemon_container_actions_seconds_count{action="commit"} 1 engine_daemon_container_actions_seconds_bucket{action="create",le="0.005"} 1 engine_daemon_container_actions_seconds_bucket{action="create",le="0.01"} 1 engine_daemon_container_actions_seconds_bucket{action="create",le="0.025"} 1 engine_daemon_container_actions_seconds_bucket{action="create",le="0.05"} 1 engine_daemon_container_actions_seconds_bucket{action="create",le="0.1"} 1 engine_daemon_container_actions_seconds_bucket{action="create",le="0.25"} 1 engine_daemon_container_actions_seconds_bucket{action="create",le="0.5"} 1 engine_daemon_container_actions_seconds_bucket{action="create",le="1"} 2 engine_daemon_container_actions_seconds_bucket{action="create",le="2.5"} 2 engine_daemon_container_actions_seconds_bucket{action="create",le="5"} 2 engine_daemon_container_actions_seconds_bucket{action="create",le="10"} 2 engine_daemon_container_actions_seconds_bucket{action="create",le="+Inf"} 2 engine_daemon_container_actions_seconds_sum{action="create"} 0.552623576 engine_daemon_container_actions_seconds_count{action="create"} 2 engine_daemon_container_actions_seconds_bucket{action="delete",le="0.005"} 1 engine_daemon_container_actions_seconds_bucket{action="delete",le="0.01"} 1 engine_daemon_container_actions_seconds_bucket{action="delete",le="0.025"} 1 engine_daemon_container_actions_seconds_bucket{action="delete",le="0.05"} 1 engine_daemon_container_actions_seconds_bucket{action="delete",le="0.1"} 2 engine_daemon_container_actions_seconds_bucket{action="delete",le="0.25"} 2 engine_daemon_container_actions_seconds_bucket{action="delete",le="0.5"} 2 engine_daemon_container_actions_seconds_bucket{action="delete",le="1"} 2 engine_daemon_container_actions_seconds_bucket{action="delete",le="2.5"} 2 engine_daemon_container_actions_seconds_bucket{action="delete",le="5"} 2 engine_daemon_container_actions_seconds_bucket{action="delete",le="10"} 2 engine_daemon_container_actions_seconds_bucket{action="delete",le="+Inf"} 2 engine_daemon_container_actions_seconds_sum{action="delete"} 0.097789156 engine_daemon_container_actions_seconds_count{action="delete"} 2 engine_daemon_container_actions_seconds_bucket{action="start",le="0.005"} 1 engine_daemon_container_actions_seconds_bucket{action="start",le="0.01"} 1 engine_daemon_container_actions_seconds_bucket{action="start",le="0.025"} 1 engine_daemon_container_actions_seconds_bucket{action="start",le="0.05"} 1 engine_daemon_container_actions_seconds_bucket{action="start",le="0.1"} 1 engine_daemon_container_actions_seconds_bucket{action="start",le="0.25"} 1 engine_daemon_container_actions_seconds_bucket{action="start",le="0.5"} 1 engine_daemon_container_actions_seconds_bucket{action="start",le="1"} 1 engine_daemon_container_actions_seconds_bucket{action="start",le="2.5"} 3 engine_daemon_container_actions_seconds_bucket{action="start",le="5"} 3 engine_daemon_container_actions_seconds_bucket{action="start",le="10"} 3 engine_daemon_container_actions_seconds_bucket{action="start",le="+Inf"} 3 engine_daemon_container_actions_seconds_sum{action="start"} 2.804409176 engine_daemon_container_actions_seconds_count{action="start"} 3 # HELP engine_daemon_container_states_containers The count of containers in various states # TYPE engine_daemon_container_states_containers gauge engine_daemon_container_states_containers{state="paused"} 0 engine_daemon_container_states_containers{state="running"} 2 engine_daemon_container_states_containers{state="stopped"} 2 # HELP engine_daemon_engine_cpus_cpus The number of cpus that the host system of the engine has # TYPE engine_daemon_engine_cpus_cpus gauge engine_daemon_engine_cpus_cpus 2 # HELP engine_daemon_engine_info The information related to the engine and the OS it is running on # TYPE engine_daemon_engine_info gauge engine_daemon_engine_info{architecture="x86_64",commit="ff3fbc9d55",daemon_id="JXJ2:2434:PD5N:4UXM:POXB:ANLF:HHOE:G25W:Y3AG:UFUO:CBZP:H7K4",graphdriver="overlay2",kernel="4.19.76-linuxkit",os="Docker Desktop",os_type="linux",version="19.03.13-beta2"} 1 # HELP engine_daemon_engine_memory_bytes The number of bytes of memory that the host system of the engine has # TYPE engine_daemon_engine_memory_bytes gauge engine_daemon_engine_memory_bytes 2.088206336e+09 # HELP engine_daemon_events_subscribers_total The number of current subscribers to events # TYPE engine_daemon_events_subscribers_total gauge engine_daemon_events_subscribers_total 7 # HELP engine_daemon_events_total The number of events logged # TYPE engine_daemon_events_total counter engine_daemon_events_total 11 # HELP engine_daemon_health_checks_failed_total The total number of failed health checks # TYPE engine_daemon_health_checks_failed_total counter engine_daemon_health_checks_failed_total 0 # HELP engine_daemon_health_checks_total The total number of health checks # TYPE engine_daemon_health_checks_total counter engine_daemon_health_checks_total 0 # HELP engine_daemon_network_actions_seconds The number of seconds it takes to process each network action # TYPE engine_daemon_network_actions_seconds histogram engine_daemon_network_actions_seconds_bucket{action="allocate",le="0.005"} 0 engine_daemon_network_actions_seconds_bucket{action="allocate",le="0.01"} 0 engine_daemon_network_actions_seconds_bucket{action="allocate",le="0.025"} 0 engine_daemon_network_actions_seconds_bucket{action="allocate",le="0.05"} 0 engine_daemon_network_actions_seconds_bucket{action="allocate",le="0.1"} 0 engine_daemon_network_actions_seconds_bucket{action="allocate",le="0.25"} 1 engine_daemon_network_actions_seconds_bucket{action="allocate",le="0.5"} 1 engine_daemon_network_actions_seconds_bucket{action="allocate",le="1"} 2 engine_daemon_network_actions_seconds_bucket{action="allocate",le="2.5"} 2 engine_daemon_network_actions_seconds_bucket{action="allocate",le="5"} 2 engine_daemon_network_actions_seconds_bucket{action="allocate",le="10"} 2 engine_daemon_network_actions_seconds_bucket{action="allocate",le="+Inf"} 2 engine_daemon_network_actions_seconds_sum{action="allocate"} 0.721134186 engine_daemon_network_actions_seconds_count{action="allocate"} 2 engine_daemon_network_actions_seconds_bucket{action="connect",le="0.005"} 0 engine_daemon_network_actions_seconds_bucket{action="connect",le="0.01"} 0 engine_daemon_network_actions_seconds_bucket{action="connect",le="0.025"} 0 engine_daemon_network_actions_seconds_bucket{action="connect",le="0.05"} 0 engine_daemon_network_actions_seconds_bucket{action="connect",le="0.1"} 0 engine_daemon_network_actions_seconds_bucket{action="connect",le="0.25"} 1 engine_daemon_network_actions_seconds_bucket{action="connect",le="0.5"} 1 engine_daemon_network_actions_seconds_bucket{action="connect",le="1"} 2 engine_daemon_network_actions_seconds_bucket{action="connect",le="2.5"} 2 engine_daemon_network_actions_seconds_bucket{action="connect",le="5"} 2 engine_daemon_network_actions_seconds_bucket{action="connect",le="10"} 2 engine_daemon_network_actions_seconds_bucket{action="connect",le="+Inf"} 2 engine_daemon_network_actions_seconds_sum{action="connect"} 0.70473929 engine_daemon_network_actions_seconds_count{action="connect"} 2 # HELP etcd_debugging_snap_save_marshalling_duration_seconds The marshalling cost distributions of save called by snapshot. # TYPE etcd_debugging_snap_save_marshalling_duration_seconds histogram etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.001"} 0 etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.002"} 0 etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.004"} 0 etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.008"} 0 etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.016"} 0 etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.032"} 0 etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.064"} 0 etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.128"} 0 etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.256"} 0 etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.512"} 0 etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="1.024"} 0 etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="2.048"} 0 etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="4.096"} 0 etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="8.192"} 0 etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="+Inf"} 0 etcd_debugging_snap_save_marshalling_duration_seconds_sum 0 etcd_debugging_snap_save_marshalling_duration_seconds_count 0 # HELP etcd_debugging_snap_save_total_duration_seconds The total latency distributions of save called by snapshot. # TYPE etcd_debugging_snap_save_total_duration_seconds histogram etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.001"} 0 etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.002"} 0 etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.004"} 0 etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.008"} 0 etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.016"} 0 etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.032"} 0 etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.064"} 0 etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.128"} 0 etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.256"} 0 etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.512"} 0 etcd_debugging_snap_save_total_duration_seconds_bucket{le="1.024"} 0 etcd_debugging_snap_save_total_duration_seconds_bucket{le="2.048"} 0 etcd_debugging_snap_save_total_duration_seconds_bucket{le="4.096"} 0 etcd_debugging_snap_save_total_duration_seconds_bucket{le="8.192"} 0 etcd_debugging_snap_save_total_duration_seconds_bucket{le="+Inf"} 0 etcd_debugging_snap_save_total_duration_seconds_sum 0 etcd_debugging_snap_save_total_duration_seconds_count 0 # HELP etcd_disk_wal_fsync_duration_seconds The latency distributions of fsync called by wal. # TYPE etcd_disk_wal_fsync_duration_seconds histogram etcd_disk_wal_fsync_duration_seconds_bucket{le="0.001"} 0 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.002"} 0 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.004"} 0 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.008"} 0 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.016"} 0 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.032"} 0 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.064"} 0 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.128"} 0 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.256"} 0 etcd_disk_wal_fsync_duration_seconds_bucket{le="0.512"} 0 etcd_disk_wal_fsync_duration_seconds_bucket{le="1.024"} 0 etcd_disk_wal_fsync_duration_seconds_bucket{le="2.048"} 0 etcd_disk_wal_fsync_duration_seconds_bucket{le="4.096"} 0 etcd_disk_wal_fsync_duration_seconds_bucket{le="8.192"} 0 etcd_disk_wal_fsync_duration_seconds_bucket{le="+Inf"} 0 etcd_disk_wal_fsync_duration_seconds_sum 0 etcd_disk_wal_fsync_duration_seconds_count 0 # HELP etcd_snap_db_fsync_duration_seconds The latency distributions of fsyncing .snap.db file # TYPE etcd_snap_db_fsync_duration_seconds histogram etcd_snap_db_fsync_duration_seconds_bucket{le="0.001"} 0 etcd_snap_db_fsync_duration_seconds_bucket{le="0.002"} 0 etcd_snap_db_fsync_duration_seconds_bucket{le="0.004"} 0 etcd_snap_db_fsync_duration_seconds_bucket{le="0.008"} 0 etcd_snap_db_fsync_duration_seconds_bucket{le="0.016"} 0 etcd_snap_db_fsync_duration_seconds_bucket{le="0.032"} 0 etcd_snap_db_fsync_duration_seconds_bucket{le="0.064"} 0 etcd_snap_db_fsync_duration_seconds_bucket{le="0.128"} 0 etcd_snap_db_fsync_duration_seconds_bucket{le="0.256"} 0 etcd_snap_db_fsync_duration_seconds_bucket{le="0.512"} 0 etcd_snap_db_fsync_duration_seconds_bucket{le="1.024"} 0 etcd_snap_db_fsync_duration_seconds_bucket{le="2.048"} 0 etcd_snap_db_fsync_duration_seconds_bucket{le="4.096"} 0 etcd_snap_db_fsync_duration_seconds_bucket{le="8.192"} 0 etcd_snap_db_fsync_duration_seconds_bucket{le="+Inf"} 0 etcd_snap_db_fsync_duration_seconds_sum 0 etcd_snap_db_fsync_duration_seconds_count 0 # HELP etcd_snap_db_save_total_duration_seconds The total latency distributions of v3 snapshot save # TYPE etcd_snap_db_save_total_duration_seconds histogram etcd_snap_db_save_total_duration_seconds_bucket{le="0.1"} 0 etcd_snap_db_save_total_duration_seconds_bucket{le="0.2"} 0 etcd_snap_db_save_total_duration_seconds_bucket{le="0.4"} 0 etcd_snap_db_save_total_duration_seconds_bucket{le="0.8"} 0 etcd_snap_db_save_total_duration_seconds_bucket{le="1.6"} 0 etcd_snap_db_save_total_duration_seconds_bucket{le="3.2"} 0 etcd_snap_db_save_total_duration_seconds_bucket{le="6.4"} 0 etcd_snap_db_save_total_duration_seconds_bucket{le="12.8"} 0 etcd_snap_db_save_total_duration_seconds_bucket{le="25.6"} 0 etcd_snap_db_save_total_duration_seconds_bucket{le="51.2"} 0 etcd_snap_db_save_total_duration_seconds_bucket{le="+Inf"} 0 etcd_snap_db_save_total_duration_seconds_sum 0 etcd_snap_db_save_total_duration_seconds_count 0 # HELP go_gc_duration_seconds A summary of the GC invocation durations. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 1.1441e-05 go_gc_duration_seconds{quantile="0.25"} 1.7381e-05 go_gc_duration_seconds{quantile="0.5"} 4.7132e-05 go_gc_duration_seconds{quantile="0.75"} 8.847e-05 go_gc_duration_seconds{quantile="1"} 0.000336452 go_gc_duration_seconds_sum 0.000573966 go_gc_duration_seconds_count 7 # HELP go_goroutines Number of goroutines that currently exist. # TYPE go_goroutines gauge go_goroutines 124 # HELP go_memstats_alloc_bytes Number of bytes allocated and still in use. # TYPE go_memstats_alloc_bytes gauge go_memstats_alloc_bytes 1.3152408e+07 # HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed. # TYPE go_memstats_alloc_bytes_total counter go_memstats_alloc_bytes_total 3.7942088e+07 # HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table. # TYPE go_memstats_buck_hash_sys_bytes gauge go_memstats_buck_hash_sys_bytes 1.458259e+06 # HELP go_memstats_frees_total Total number of frees. # TYPE go_memstats_frees_total counter go_memstats_frees_total 239116 # HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata. # TYPE go_memstats_gc_sys_bytes gauge go_memstats_gc_sys_bytes 2.4064e+06 # HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use. # TYPE go_memstats_heap_alloc_bytes gauge go_memstats_heap_alloc_bytes 1.3152408e+07 # HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used. # TYPE go_memstats_heap_idle_bytes gauge go_memstats_heap_idle_bytes 4.8480256e+07 # HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use. # TYPE go_memstats_heap_inuse_bytes gauge go_memstats_heap_inuse_bytes 1.67936e+07 # HELP go_memstats_heap_objects Number of allocated objects. # TYPE go_memstats_heap_objects gauge go_memstats_heap_objects 134382 # HELP go_memstats_heap_released_bytes_total Total number of heap bytes released to OS. # TYPE go_memstats_heap_released_bytes_total counter go_memstats_heap_released_bytes_total 4.6186496e+07 # HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system. # TYPE go_memstats_heap_sys_bytes gauge go_memstats_heap_sys_bytes 6.5273856e+07 # HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection. # TYPE go_memstats_last_gc_time_seconds gauge go_memstats_last_gc_time_seconds 1.6024955900357985e+09 # HELP go_memstats_lookups_total Total number of pointer lookups. # TYPE go_memstats_lookups_total counter go_memstats_lookups_total 0 # HELP go_memstats_mallocs_total Total number of mallocs. # TYPE go_memstats_mallocs_total counter go_memstats_mallocs_total 373498 # HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures. # TYPE go_memstats_mcache_inuse_bytes gauge go_memstats_mcache_inuse_bytes 3472 # HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system. # TYPE go_memstats_mcache_sys_bytes gauge go_memstats_mcache_sys_bytes 16384 # HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures. # TYPE go_memstats_mspan_inuse_bytes gauge go_memstats_mspan_inuse_bytes 215424 # HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system. # TYPE go_memstats_mspan_sys_bytes gauge go_memstats_mspan_sys_bytes 229376 # HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place. # TYPE go_memstats_next_gc_bytes gauge go_memstats_next_gc_bytes 1.8665712e+07 # HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations. # TYPE go_memstats_other_sys_bytes gauge go_memstats_other_sys_bytes 542885 # HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator. # TYPE go_memstats_stack_inuse_bytes gauge go_memstats_stack_inuse_bytes 1.835008e+06 # HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator. # TYPE go_memstats_stack_sys_bytes gauge go_memstats_stack_sys_bytes 1.835008e+06 # HELP go_memstats_sys_bytes Number of bytes obtained by system. Sum of all system allocations. # TYPE go_memstats_sys_bytes gauge go_memstats_sys_bytes 7.1762168e+07 # HELP http_request_duration_microseconds The HTTP request latencies in microseconds. # TYPE http_request_duration_microseconds summary http_request_duration_microseconds{handler="prometheus",quantile="0.5"} 5785.224 http_request_duration_microseconds{handler="prometheus",quantile="0.9"} 18160.443 http_request_duration_microseconds{handler="prometheus",quantile="0.99"} 18160.443 http_request_duration_microseconds_sum{handler="prometheus"} 27367.838 http_request_duration_microseconds_count{handler="prometheus"} 3 # HELP http_request_size_bytes The HTTP request sizes in bytes. # TYPE http_request_size_bytes summary http_request_size_bytes{handler="prometheus",quantile="0.5"} 232 http_request_size_bytes{handler="prometheus",quantile="0.9"} 232 http_request_size_bytes{handler="prometheus",quantile="0.99"} 232 http_request_size_bytes_sum{handler="prometheus"} 696 http_request_size_bytes_count{handler="prometheus"} 3 # HELP http_requests_total Total number of HTTP requests made. # TYPE http_requests_total counter http_requests_total{code="200",handler="prometheus",method="get"} 3 # HELP http_response_size_bytes The HTTP response sizes in bytes. # TYPE http_response_size_bytes summary http_response_size_bytes{handler="prometheus",quantile="0.5"} 4145 http_response_size_bytes{handler="prometheus",quantile="0.9"} 4171 http_response_size_bytes{handler="prometheus",quantile="0.99"} 4171 http_response_size_bytes_sum{handler="prometheus"} 12422 http_response_size_bytes_count{handler="prometheus"} 3 # HELP logger_log_entries_size_greater_than_buffer_total Number of log entries which are larger than the log buffer # TYPE logger_log_entries_size_greater_than_buffer_total counter logger_log_entries_size_greater_than_buffer_total 0 # HELP logger_log_read_operations_failed_total Number of log reads from container stdio that failed # TYPE logger_log_read_operations_failed_total counter logger_log_read_operations_failed_total 0 # HELP logger_log_write_operations_failed_total Number of log write operations that failed # TYPE logger_log_write_operations_failed_total counter logger_log_write_operations_failed_total 0 # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 1.36 # HELP process_max_fds Maximum number of open file descriptors. # TYPE process_max_fds gauge process_max_fds 1.048576e+06 # HELP process_open_fds Number of open file descriptors. # TYPE process_open_fds gauge process_open_fds 88 # HELP process_resident_memory_bytes Resident memory size in bytes. # TYPE process_resident_memory_bytes gauge process_resident_memory_bytes 6.0104704e+07 # HELP process_start_time_seconds Start time of the process since unix epoch in seconds. # TYPE process_start_time_seconds gauge process_start_time_seconds 1.6024954353e+09 # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 1.223262208e+09 # HELP swarm_dispatcher_scheduling_delay_seconds Scheduling delay is the time a task takes to go from NEW to RUNNING state. # TYPE swarm_dispatcher_scheduling_delay_seconds histogram swarm_dispatcher_scheduling_delay_seconds_bucket{le="0.005"} 0 swarm_dispatcher_scheduling_delay_seconds_bucket{le="0.01"} 0 swarm_dispatcher_scheduling_delay_seconds_bucket{le="0.025"} 0 swarm_dispatcher_scheduling_delay_seconds_bucket{le="0.05"} 0 swarm_dispatcher_scheduling_delay_seconds_bucket{le="0.1"} 0 swarm_dispatcher_scheduling_delay_seconds_bucket{le="0.25"} 0 swarm_dispatcher_scheduling_delay_seconds_bucket{le="0.5"} 0 swarm_dispatcher_scheduling_delay_seconds_bucket{le="1"} 0 swarm_dispatcher_scheduling_delay_seconds_bucket{le="2.5"} 0 swarm_dispatcher_scheduling_delay_seconds_bucket{le="5"} 0 swarm_dispatcher_scheduling_delay_seconds_bucket{le="10"} 0 swarm_dispatcher_scheduling_delay_seconds_bucket{le="+Inf"} 0 swarm_dispatcher_scheduling_delay_seconds_sum 0 swarm_dispatcher_scheduling_delay_seconds_count 0 # HELP swarm_manager_configs_total The number of configs in the cluster object store # TYPE swarm_manager_configs_total gauge swarm_manager_configs_total 0 # HELP swarm_manager_leader Indicates if this manager node is a leader # TYPE swarm_manager_leader gauge swarm_manager_leader 0 # HELP swarm_manager_networks_total The number of networks in the cluster object store # TYPE swarm_manager_networks_total gauge swarm_manager_networks_total 0 # HELP swarm_manager_nodes The number of nodes # TYPE swarm_manager_nodes gauge swarm_manager_nodes{state="disconnected"} 0 swarm_manager_nodes{state="down"} 0 swarm_manager_nodes{state="ready"} 0 swarm_manager_nodes{state="unknown"} 0 # HELP swarm_manager_secrets_total The number of secrets in the cluster object store # TYPE swarm_manager_secrets_total gauge swarm_manager_secrets_total 0 # HELP swarm_manager_services_total The number of services in the cluster object store # TYPE swarm_manager_services_total gauge swarm_manager_services_total 0 # HELP swarm_manager_tasks_total The number of tasks in the cluster object store # TYPE swarm_manager_tasks_total gauge swarm_manager_tasks_total{state="accepted"} 0 swarm_manager_tasks_total{state="assigned"} 0 swarm_manager_tasks_total{state="complete"} 0 swarm_manager_tasks_total{state="failed"} 0 swarm_manager_tasks_total{state="new"} 0 swarm_manager_tasks_total{state="orphaned"} 0 swarm_manager_tasks_total{state="pending"} 0 swarm_manager_tasks_total{state="preparing"} 0 swarm_manager_tasks_total{state="ready"} 0 swarm_manager_tasks_total{state="rejected"} 0 swarm_manager_tasks_total{state="remove"} 0 swarm_manager_tasks_total{state="running"} 0 swarm_manager_tasks_total{state="shutdown"} 0 swarm_manager_tasks_total{state="starting"} 0 # HELP swarm_node_manager Whether this node is a manager or not # TYPE swarm_node_manager gauge swarm_node_manager 0 # HELP swarm_raft_snapshot_latency_seconds Raft snapshot create latency. # TYPE swarm_raft_snapshot_latency_seconds histogram swarm_raft_snapshot_latency_seconds_bucket{le="0.005"} 0 swarm_raft_snapshot_latency_seconds_bucket{le="0.01"} 0 swarm_raft_snapshot_latency_seconds_bucket{le="0.025"} 0 swarm_raft_snapshot_latency_seconds_bucket{le="0.05"} 0 swarm_raft_snapshot_latency_seconds_bucket{le="0.1"} 0 swarm_raft_snapshot_latency_seconds_bucket{le="0.25"} 0 swarm_raft_snapshot_latency_seconds_bucket{le="0.5"} 0 swarm_raft_snapshot_latency_seconds_bucket{le="1"} 0 swarm_raft_snapshot_latency_seconds_bucket{le="2.5"} 0 swarm_raft_snapshot_latency_seconds_bucket{le="5"} 0 swarm_raft_snapshot_latency_seconds_bucket{le="10"} 0 swarm_raft_snapshot_latency_seconds_bucket{le="+Inf"} 0 swarm_raft_snapshot_latency_seconds_sum 0 swarm_raft_snapshot_latency_seconds_count 0 # HELP swarm_raft_transaction_latency_seconds Raft transaction latency. # TYPE swarm_raft_transaction_latency_seconds histogram swarm_raft_transaction_latency_seconds_bucket{le="0.005"} 0 swarm_raft_transaction_latency_seconds_bucket{le="0.01"} 0 swarm_raft_transaction_latency_seconds_bucket{le="0.025"} 0 swarm_raft_transaction_latency_seconds_bucket{le="0.05"} 0 swarm_raft_transaction_latency_seconds_bucket{le="0.1"} 0 swarm_raft_transaction_latency_seconds_bucket{le="0.25"} 0 swarm_raft_transaction_latency_seconds_bucket{le="0.5"} 0 swarm_raft_transaction_latency_seconds_bucket{le="1"} 0 swarm_raft_transaction_latency_seconds_bucket{le="2.5"} 0 swarm_raft_transaction_latency_seconds_bucket{le="5"} 0 swarm_raft_transaction_latency_seconds_bucket{le="10"} 0 swarm_raft_transaction_latency_seconds_bucket{le="+Inf"} 0 swarm_raft_transaction_latency_seconds_sum 0 swarm_raft_transaction_latency_seconds_count 0 # HELP swarm_store_batch_latency_seconds Raft store batch latency. # TYPE swarm_store_batch_latency_seconds histogram swarm_store_batch_latency_seconds_bucket{le="0.005"} 0 swarm_store_batch_latency_seconds_bucket{le="0.01"} 0 swarm_store_batch_latency_seconds_bucket{le="0.025"} 0 swarm_store_batch_latency_seconds_bucket{le="0.05"} 0 swarm_store_batch_latency_seconds_bucket{le="0.1"} 0 swarm_store_batch_latency_seconds_bucket{le="0.25"} 0 swarm_store_batch_latency_seconds_bucket{le="0.5"} 0 swarm_store_batch_latency_seconds_bucket{le="1"} 0 swarm_store_batch_latency_seconds_bucket{le="2.5"} 0 swarm_store_batch_latency_seconds_bucket{le="5"} 0 swarm_store_batch_latency_seconds_bucket{le="10"} 0 swarm_store_batch_latency_seconds_bucket{le="+Inf"} 0 swarm_store_batch_latency_seconds_sum 0 swarm_store_batch_latency_seconds_count 0 # HELP swarm_store_lookup_latency_seconds Raft store read latency. # TYPE swarm_store_lookup_latency_seconds histogram swarm_store_lookup_latency_seconds_bucket{le="0.005"} 0 swarm_store_lookup_latency_seconds_bucket{le="0.01"} 0 swarm_store_lookup_latency_seconds_bucket{le="0.025"} 0 swarm_store_lookup_latency_seconds_bucket{le="0.05"} 0 swarm_store_lookup_latency_seconds_bucket{le="0.1"} 0 swarm_store_lookup_latency_seconds_bucket{le="0.25"} 0 swarm_store_lookup_latency_seconds_bucket{le="0.5"} 0 swarm_store_lookup_latency_seconds_bucket{le="1"} 0 swarm_store_lookup_latency_seconds_bucket{le="2.5"} 0 swarm_store_lookup_latency_seconds_bucket{le="5"} 0 swarm_store_lookup_latency_seconds_bucket{le="10"} 0 swarm_store_lookup_latency_seconds_bucket{le="+Inf"} 0 swarm_store_lookup_latency_seconds_sum 0 swarm_store_lookup_latency_seconds_count 0 # HELP swarm_store_memory_store_lock_duration_seconds Duration for which the raft memory store lock was held. # TYPE swarm_store_memory_store_lock_duration_seconds histogram swarm_store_memory_store_lock_duration_seconds_bucket{le="0.005"} 0 swarm_store_memory_store_lock_duration_seconds_bucket{le="0.01"} 0 swarm_store_memory_store_lock_duration_seconds_bucket{le="0.025"} 0 swarm_store_memory_store_lock_duration_seconds_bucket{le="0.05"} 0 swarm_store_memory_store_lock_duration_seconds_bucket{le="0.1"} 0 swarm_store_memory_store_lock_duration_seconds_bucket{le="0.25"} 0 swarm_store_memory_store_lock_duration_seconds_bucket{le="0.5"} 0 swarm_store_memory_store_lock_duration_seconds_bucket{le="1"} 0 swarm_store_memory_store_lock_duration_seconds_bucket{le="2.5"} 0 swarm_store_memory_store_lock_duration_seconds_bucket{le="5"} 0 swarm_store_memory_store_lock_duration_seconds_bucket{le="10"} 0 swarm_store_memory_store_lock_duration_seconds_bucket{le="+Inf"} 0 swarm_store_memory_store_lock_duration_seconds_sum 0 swarm_store_memory_store_lock_duration_seconds_count 0 # HELP swarm_store_read_tx_latency_seconds Raft store read tx latency. # TYPE swarm_store_read_tx_latency_seconds histogram swarm_store_read_tx_latency_seconds_bucket{le="0.005"} 0 swarm_store_read_tx_latency_seconds_bucket{le="0.01"} 0 swarm_store_read_tx_latency_seconds_bucket{le="0.025"} 0 swarm_store_read_tx_latency_seconds_bucket{le="0.05"} 0 swarm_store_read_tx_latency_seconds_bucket{le="0.1"} 0 swarm_store_read_tx_latency_seconds_bucket{le="0.25"} 0 swarm_store_read_tx_latency_seconds_bucket{le="0.5"} 0 swarm_store_read_tx_latency_seconds_bucket{le="1"} 0 swarm_store_read_tx_latency_seconds_bucket{le="2.5"} 0 swarm_store_read_tx_latency_seconds_bucket{le="5"} 0 swarm_store_read_tx_latency_seconds_bucket{le="10"} 0 swarm_store_read_tx_latency_seconds_bucket{le="+Inf"} 0 swarm_store_read_tx_latency_seconds_sum 0 swarm_store_read_tx_latency_seconds_count 0 # HELP swarm_store_write_tx_latency_seconds Raft store write tx latency. # TYPE swarm_store_write_tx_latency_seconds histogram swarm_store_write_tx_latency_seconds_bucket{le="0.005"} 0 swarm_store_write_tx_latency_seconds_bucket{le="0.01"} 0 swarm_store_write_tx_latency_seconds_bucket{le="0.025"} 0 swarm_store_write_tx_latency_seconds_bucket{le="0.05"} 0 swarm_store_write_tx_latency_seconds_bucket{le="0.1"} 0 swarm_store_write_tx_latency_seconds_bucket{le="0.25"} 0 swarm_store_write_tx_latency_seconds_bucket{le="0.5"} 0 swarm_store_write_tx_latency_seconds_bucket{le="1"} 0 swarm_store_write_tx_latency_seconds_bucket{le="2.5"} 0 swarm_store_write_tx_latency_seconds_bucket{le="5"} 0 swarm_store_write_tx_latency_seconds_bucket{le="10"} 0 swarm_store_write_tx_latency_seconds_bucket{le="+Inf"} 0 swarm_store_write_tx_latency_seconds_sum 0 swarm_store_write_tx_latency_seconds_count 0
配置 /etc/prometheus/prometheus.yml
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Attach these labels to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager). external_labels: monitor: 'netkiller-monitor' # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first.rules" # - "second.rules" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['host.docker.internal:9090'] # Only works on Docker Desktop for Mac - job_name: 'docker' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['docker.for.mac.host.internal:9323'] - job_name: 'node-exporter' static_configs: - targets: ['node-exporter:9100']
$ docker service create --replicas 1 --name my-prometheus \ --mount type=bind,source=/tmp/prometheus.yml,destination=/etc/prometheus/prometheus.yml \ --publish published=9090,target=9090,protocol=tcp \ prom/prometheus
docker-compress
version: '3.9' services: prometheus: image: prom/prometheus:latest container_name: prometheus volumes: - ./mac/prometheus.yml:/etc/prometheus/prometheus.yml command: - '--config.file=/etc/prometheus/prometheus.yml' - "--web.console.libraries=/usr/share/prometheus/console_libraries" - "--web.console.templates=/usr/share/prometheus/consoles" ports: - '9090:9090' node-exporter: image: prom/node-exporter:latest container_name: node-exporter ports: - '9100:9100'
https://grafana.com/grafana/dashboards/8919
version: '3.9' services: node-exporter: image: prom/node-exporter:latest container_name: node-exporter hostname: node-exporter restart: always volumes: - /proc:/host/proc:ro - /sys:/host/sys:ro - /:/rootfs:ro ports: - '9100:9100' command: - '--path.procfs=/host/proc' - '--path.sysfs=/host/sys' - --collector.filesystem.ignored-mount-points - "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)"
docker run \ --volume=/:/rootfs:ro \ --volume=/var/run:/var/run:rw \ --volume=/sys:/sys:ro \ --volume=/var/lib/docker/:/var/lib/docker:ro \ --publish=8080:8090 \ --detach=true \ --name=cadvisor \ google/cadvisor:latest
修改 prometheus.yml 添加 cadvisor 监控
- job_name: cadvisor1 static_configs: - targets: ['cadvisor:8090']
Nginx 配置,开启状态
/etc/nginx/conf.d/status.conf:
server { listen 80; server_name 127.0.0.1; location = /status { stub_status; access_log off; allow 127.0.0.1; deny all; } }
如果 nginx 是 docker 运行需要设置 server_name,实体机不需要指定 server_name。
docker-compose.yml 编排脚本
version: '3.9' services: nginx-prometheus-exporter: image: nginx/nginx-prometheus-exporter:latest command: -nginx.scrape-uri http://your_ipaddress_or_domain/status ports: - "9113:9113"
nginx-prometheus-exporter 官方下载地址:https://github.com/nginxinc/nginx-prometheus-exporter
调试方法
$ nginx-prometheus-exporter -nginx.scrape-uri http://<nginx>/status neo@MacBook-Pro-Neo ~/workspace/Linux % curl http://localhost:9113/metrics # HELP nginx_connections_accepted Accepted client connections # TYPE nginx_connections_accepted counter nginx_connections_accepted 53 # HELP nginx_connections_active Active client connections # TYPE nginx_connections_active gauge nginx_connections_active 10 # HELP nginx_connections_handled Handled client connections # TYPE nginx_connections_handled counter nginx_connections_handled 53 # HELP nginx_connections_reading Connections where NGINX is reading the request header # TYPE nginx_connections_reading gauge nginx_connections_reading 0 # HELP nginx_connections_waiting Idle client connections # TYPE nginx_connections_waiting gauge nginx_connections_waiting 9 # HELP nginx_connections_writing Connections where NGINX is writing the response back to the client # TYPE nginx_connections_writing gauge nginx_connections_writing 1 # HELP nginx_http_requests_total Total http requests # TYPE nginx_http_requests_total counter nginx_http_requests_total 390 # HELP nginx_up Status of the last metric scrape # TYPE nginx_up gauge nginx_up 1 # HELP nginxexporter_build_info Exporter build information # TYPE nginxexporter_build_info gauge nginxexporter_build_info{commit="5f88afbd906baae02edfbab4f5715e06d88538a0",date="2021-03-22T20:16:09Z",version="0.9.0"} 1
配置 prometheus.yml 加入 job
- job_name: 'nginx_exporter' static_configs: - targets: ['nginx-exporter:9113']
NGINX exporter dashboard: https://grafana.com/grafana/dashboards/12708
Official dashboard for NGINX Prometheus exporter for https://github.com/nginxinc/nginx-prometheus-exporter
https://github.com/oliver006/redis_exporter
version: '3.9' services: redis-exporter: image: oliver006/redis_exporter container_name: redis-exporter hostname: redis-exporter restart: always ports: - "9121:9121" command: - '--redis.addr=redis://:passw0rd@redis.netkiller.cn:6379'
使用下面命令确认 redis-exporter 是否工作正常
root@production:~/prometheus# curl -s http://redis.netkiller.cn:9121/metrics | head # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 0 go_gc_duration_seconds{quantile="0.25"} 0 go_gc_duration_seconds{quantile="0.5"} 0 go_gc_duration_seconds{quantile="0.75"} 0 go_gc_duration_seconds{quantile="1"} 0 go_gc_duration_seconds_sum 0 go_gc_duration_seconds_count 0 # HELP go_goroutines Number of goroutines that currently exist.
修改配置文件 prometheus.yml 加入下面配置
scrape_configs: - job_name: redis_exporter static_configs: - targets: ['<<REDIS-EXPORTER-HOSTNAME>>:9121']
Grafana 面板:https://grafana.com/grafana/dashboards/763
https://github.com/percona/mongodb_exporter
docker-compose.yml 构建脚本
version: '3.9' services: mongodb_exporter: image: noenv/mongo-exporter:latest container_name: mongodb_exporter hostname: mongodb_exporter restart: always ports: - "9216:9216" command: - '--mongodb.uri=mongodb://admin:admin@mongo.netkiller.cn:27017/admin'
检查 exporter 数据采集状态
root@production:~/prometheus# curl -s http://localhost:9216/metrics | head # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 2.4908e-05 go_gc_duration_seconds{quantile="0.25"} 2.7779e-05 go_gc_duration_seconds{quantile="0.5"} 2.9463e-05 go_gc_duration_seconds{quantile="0.75"} 3.736e-05 go_gc_duration_seconds{quantile="1"} 0.000120332 go_gc_duration_seconds_sum 0.001014832 go_gc_duration_seconds_count 26 # HELP go_goroutines Number of goroutines that currently exist.
修改配置文件 prometheus.yml 加入下面配置
- job_name: mongo_exporter static_configs: - targets: ['mongo.netkiller.cn:9216']
Dashboard for Grafana (ID: 2583)
https://github.com/prometheus/mysqld_exporter
创建 MySQL 监控用户
mysql> CREATE USER 'exporter'@'%' IDENTIFIED BY 'exporterpassword' WITH MAX_USER_CONNECTIONS 3; mysql> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';
version: '3.9' services: mysqld_exporter: image: prom/mysqld-exporter:latest container_name: mysqld_exporter hostname: mysqld_exporter restart: always ports: - "9104:9104" environment: - DATA_SOURCE_NAME=exporter:passw0rd@(db.netkiller.cn:3306)/neo # command: # --collect.info_schema.processlist # --collect.info_schema.innodb_metrics # --collect.info_schema.tablestats # --collect.info_schema.tables # --collect.info_schema.userstats # --collect.engine_innodb_status
检查 exporter 数据采集状态
root@production:~# curl -s http://db.netkiller.cn:9104/metrics | head # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 1.9298e-05 go_gc_duration_seconds{quantile="0.25"} 2.846e-05 go_gc_duration_seconds{quantile="0.5"} 3.8975e-05 go_gc_duration_seconds{quantile="0.75"} 6.0157e-05 go_gc_duration_seconds{quantile="1"} 0.000150234 go_gc_duration_seconds_sum 0.007067359 go_gc_duration_seconds_count 145 # HELP go_goroutines Number of goroutines that currently exist.
修改配置文件 prometheus.yml 加入下面配置
- job_name: mysql_exporter static_configs: - targets: ['db.netkiller.cn:9104']
https://grafana.com/oss/prometheus/exporters/mysql-exporter/
14057
默认配置文件
version: '3.9' services: blackbox_exporter: image: prom/blackbox-exporter:latest container_name: blackbox_exporter hostname: blackbox-exporter restart: always ports: - "9115:9115" # environment: volumes: - ${PWD}/blackbox-exporter/config.yml:/etc/blackbox_exporter/config.yml
/etc/blackbox_exporter/config.yml
modules: http_2xx: prober: http timeout: 10s http: method: GET http_post_2xx: prober: http http: method: POST tcp_connect: prober: tcp timeout: 10s pop3s_banner: prober: tcp timeout: 10s tcp: query_response: - expect: "^+OK" tls: true tls_config: insecure_skip_verify: false ssh_banner: prober: tcp tcp: query_response: - expect: "^SSH-2.0-" - send: "SSH-2.0-blackbox-ssh-check" irc_banner: prober: tcp tcp: query_response: - send: "NICK prober" - send: "USER prober prober prober :prober" - expect: "PING :([^ ]+)" send: "PONG ${1}" - expect: "^:[^ ]+ 001" icmp: prober: icmp timeout: 2s
配置 Prometheus 在配置文件 prometheus.yml 中增加如下内容
scrape_configs: - job_name: blackbox_exporter static_configs: - targets: ['blackbox-exporter:9115'] - job_name: blackbox-http metrics_path: /probe params: module: [http_2xx] static_configs: - targets: - http://192.168.30.10 - http://192.168.30.11 - http://192.168.3.15 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: blackbox-exporter:9115 - job_name: 'blackbox-ping' metrics_path: /probe params: modelus: [icmp] static_configs: - targets: - 8.8.8.8 labels: instance: Google DNS - targets: - 247.192.129.167 labels: instance: test relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: blackbox-exporter:9115 - job_name: 'blackbox_tcp_connect' scrape_interval: 30s metrics_path: /probe params: module: [tcp_connect] static_configs: - targets: - 127.0.0.1:3306 - 127.0.0.1:6379 - 127.0.0.1:27017 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: blackbox-exporter:9115
neo@MacBook-Pro-Neo ~/workspace/docker/prometheus % mkdir blackbox-exporter neo@MacBook-Pro-Neo ~/workspace/docker/prometheus % docker-compose cp blackbox_exporter:/etc/blackbox_exporter/config.yml blackbox-exporter neo@MacBook-Pro-Neo ~/workspace/docker/prometheus % cat blackbox-exporter/config.yml modules: http_2xx: prober: http http_post_2xx: prober: http http: method: POST tcp_connect: prober: tcp pop3s_banner: prober: tcp tcp: query_response: - expect: "^+OK" tls: true tls_config: insecure_skip_verify: false ssh_banner: prober: tcp tcp: query_response: - expect: "^SSH-2.0-" - send: "SSH-2.0-blackbox-ssh-check" irc_banner: prober: tcp tcp: query_response: - send: "NICK prober" - send: "USER prober prober prober :prober" - expect: "PING :([^ ]+)" send: "PONG ${1}" - expect: "^:[^ ]+ 001" icmp: prober: icmp
neo@MacBook-Pro-Neo ~ % curl -s http://localhost:9115/metrics | head # HELP blackbox_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which blackbox_exporter was built. # TYPE blackbox_exporter_build_info gauge blackbox_exporter_build_info{branch="HEAD",goversion="go1.16.4",revision="5d575b88eb12c65720862e8ad2c5890ba33d1ed0",version="0.19.0"} 1 # HELP blackbox_exporter_config_last_reload_success_timestamp_seconds Timestamp of the last successful configuration reload. # TYPE blackbox_exporter_config_last_reload_success_timestamp_seconds gauge blackbox_exporter_config_last_reload_success_timestamp_seconds 1.6298732380407274e+09 # HELP blackbox_exporter_config_last_reload_successful Blackbox exporter config loaded successfully. # TYPE blackbox_exporter_config_last_reload_successful gauge blackbox_exporter_config_last_reload_successful 1 # HELP blackbox_module_unknown_total Count of unknown modules requested by probes
Prometheus Blackbox Exporter: 12275
Ping
curl -s http://127.0.0.1:9115/probe?target=127.0.0.1&module=icmp
neo@MacBook-Pro-Neo ~/workspace/docker/prometheus % curl -s http://127.0.0.1:9115/probe\?target\=127.0.0.1\&module\=icmp | grep ^\probe_success probe_success 1
默认超时时间太长,使用一个错误IP地址13.13.13.13测试,会等待很长时间
neo@MacBook-Pro-Neo ~/workspace/docker/prometheus % curl -s http://127.0.0.1:9115/probe\?target\=13.13.13.13\&module\=icmp | grep ^\probe_success probe_success 0
优化方法是设置 timeout,编辑 /etc/blackbox_exporter/config.yml 配置设置为2秒,这样2秒立即反馈IP地址PING结果。
icmp: prober: icmp timeout: 2s
TCP 检查端口号
curl -s http://127.0.0.1:9115/probe?target=127.0.0.1:8080&module=tcp_connect&debug=true
HTTP/HTTPS URL
curl -s http://127.0.0.1:9115/probe?target=http://www.netkiller.cn&module=http_2xxx
HTTP 不能仅仅看 probe_success 状态,还要看 probe_http_status_code,这是 HTTP服务器返回的状态码,通常是 200
neo@MacBook-Pro-Neo ~/workspace/docker/prometheus % curl -s http://127.0.0.1:9115/probe\?target\=http://192.168.30.11\&module\=http_2xx | grep -v ^# probe_dns_lookup_time_seconds 0.000241511 probe_duration_seconds 0.011169367 probe_failed_due_to_regex 0 probe_http_content_length -1 probe_http_duration_seconds{phase="connect"} 0.003367677 probe_http_duration_seconds{phase="processing"} 0.006039874 probe_http_duration_seconds{phase="resolve"} 0.000241511 probe_http_duration_seconds{phase="tls"} 0 probe_http_duration_seconds{phase="transfer"} 0.000451174 probe_http_redirects 0 probe_http_ssl 0 probe_http_status_code 200 probe_http_uncompressed_body_length 407 probe_http_version 1.1 probe_ip_addr_hash 2.66977244e+08 probe_ip_protocol 4 probe_success 1
HTTPS
neo@MacBook-Pro-Neo ~/workspace/docker/prometheus % curl -s http://127.0.0.1:9115/probe\?target\=https://www.netkiller.cn/api/captcha\&module\=http_2xx | grep -v ^# probe_dns_lookup_time_seconds 0.023551527 probe_duration_seconds 0.054094864 probe_failed_due_to_regex 0 probe_http_content_length -1 probe_http_duration_seconds{phase="connect"} 0.005037651 probe_http_duration_seconds{phase="processing"} 0.009932338 probe_http_duration_seconds{phase="resolve"} 0.023551527 probe_http_duration_seconds{phase="tls"} 0.011010897 probe_http_duration_seconds{phase="transfer"} 0.0009768 probe_http_redirects 0 probe_http_ssl 1 probe_http_status_code 200 probe_http_uncompressed_body_length 2604 probe_http_version 2 probe_ip_addr_hash 7.14414465e+08 probe_ip_protocol 4 probe_ssl_earliest_cert_expiry 1.661299199e+09 probe_ssl_last_chain_expiry_timestamp_seconds 1.661299199e+09 probe_ssl_last_chain_info{fingerprint_sha256="fd49505ad2ab79ef02070a20172ae56acbe525195ae0ddbe18359ce4144fea6b"} 1 probe_success 1 probe_tls_version_info{version="TLS 1.2"} 1
⚠️注意这几项,probe_http_ssl 1,probe_http_version 2,probe_tls_version_info{version="TLS 1.2"} 1
probe_dns_lookup_time_seconds #DNS解析时间,单位s probe_duration_seconds #探测从开始到结束的时间,单位 s,请求这个页面响应时间 probe_failed_due_to_regex 0 probe_http_content_length #HTTP 内容响应的长度 #按照阶段统计每阶段的时间 probe_http_duration_seconds{phase="connect"} 0.050388884 #连接时间 probe_http_duration_seconds{phase="processing"} 0.45868667 #处理请求的时间 probe_http_duration_seconds{phase="resolve"} 0.040037612 #响应时间 probe_http_duration_seconds{phase="tls"} 0.145433254 #校验证书的时间 probe_http_duration_seconds{phase="transfer"} 0.000566269 probe_http_redirects 1 #是否重定向的 probe_http_ssl 1 SSL证书可用 probe_http_status_code 200 #返回的状态码 probe_http_uncompressed_body_length #未压缩的响应主体长度 probe_http_version 2 #http 协议的版本 probe_ip_protocol 4 #IP协议的版本号,4是ipv4,6是 ipv6 probe_ssl_earliest_cert_expiry SSL证书过期时间 probe_success 1 #是否探测成功,1表示成功,0表示失败 probe_tls_version_info{version="TLS 1.2"} 1 #TLS 的版本号
restful
http_post_2xx: prober: http timeout: 5s http: method: POST headers: Content-Type: application/json body: '{}'
http auth
http_basic_auth_example: prober: http timeout: 5s http: method: POST headers: Host: "login.example.com" basic_auth: username: "username" password: "mysecret"
http_2xx_example: prober: http timeout: 5s http: valid_http_versions: ["HTTP/1.1", "HTTP/2"] valid_status_codes: [200,301,302]
SSL证书检查
http_2xx_example: prober: http timeout: 5s http: valid_status_codes: [] method: GET no_follow_redirects: false fail_if_ssl: false fail_if_not_ssl: false
检测返回内容
http_2xx_example: prober: http timeout: 5s http: method: GET fail_if_matches_regexp: - "Could not connect to database" fail_if_not_matches_regexp: - "Download the latest version here"
% docker-compose cp snmp_exporter:/etc/snmp_exporter/snmp.yml snmp-exporter % vim snmp-exporter/snmp.yml auth: community: public
确认交换机或路由器的SNMP已经开启,如何开启交换机和路由器的SNMP请参考 《Netkiller Network 手札》
neo@MacBook-Pro-Neo ~/workspace % snmpwalk -v2c -c public 172.16.254.254 | more SNMPv2-MIB::sysDescr.0 = STRING: H3C Series Router MSR26-00 H3C Comware Platform Software Comware Software Version 5.20, Release 2516P15 Copyright(c) 2004-..}> New H3C Technologies Co., Ltd. SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.25506.1.913 DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (794793008) 91 days, 23:45:30.08 SNMPv2-MIB::sysContact.0 = STRING: R&D Hangzhou, New H3C Technologies Co., Ltd. SNMPv2-MIB::sysName.0 = STRING: MSR2610 SNMPv2-MIB::sysLocation.0 = STRING: Hangzhou, China SNMPv2-MIB::sysServices.0 = INTEGER: 78 IF-MIB::ifNumber.0 = INTEGER: 24 IF-MIB::ifIndex.1 = INTEGER: 1 IF-MIB::ifIndex.2 = INTEGER: 2 IF-MIB::ifIndex.3 = INTEGER: 3 IF-MIB::ifIndex.4 = INTEGER: 4 IF-MIB::ifIndex.5 = INTEGER: 5 IF-MIB::ifIndex.6 = INTEGER: 6 IF-MIB::ifIndex.7 = INTEGER: 7 IF-MIB::ifIndex.8 = INTEGER: 8 IF-MIB::ifIndex.9 = INTEGER: 9 IF-MIB::ifIndex.10 = INTEGER: 10
测试网站 http://localhost:9116
或者使用 curl 命令,确保你监控的社会能读取到 SNMP 数据。
neo@MacBook-Pro-Neo ~/workspace % curl -s http://localhost:9116/snmp\?target\=172.16.254.254 | more # HELP ifAdminStatus The desired state of the interface - 1.3.6.1.2.1.2.2.1.7 # TYPE ifAdminStatus gauge ifAdminStatus{ifAlias="Aux0 Interface",ifDescr="Aux0",ifIndex="1",ifName="Aux0"} 1 ifAdminStatus{ifAlias="Cellular0/0 Interface",ifDescr="Cellular0/0",ifIndex="2",ifName="Cellular0/0"} 1 ifAdminStatus{ifAlias="Dialer1 Interface",ifDescr="Dialer1",ifIndex="14",ifName="Dialer1"} 1 ifAdminStatus{ifAlias="GigabitEthernet0/0 Interface",ifDescr="GigabitEthernet0/0",ifIndex="3",ifName="GigabitEthernet0/0"} 1 ifAdminStatus{ifAlias="GigabitEthernet0/1 Interface",ifDescr="GigabitEthernet0/1",ifIndex="4",ifName="GigabitEthernet0/1"} 1 ifAdminStatus{ifAlias="GigabitEthernet0/2 Interface",ifDescr="GigabitEthernet0/2",ifIndex="5",ifName="GigabitEthernet0/2"} 1 ifAdminStatus{ifAlias="GigabitEthernet0/3 Interface",ifDescr="GigabitEthernet0/3",ifIndex="6",ifName="GigabitEthernet0/3"} 1 ifAdminStatus{ifAlias="GigabitEthernet0/4 Interface",ifDescr="GigabitEthernet0/4",ifIndex="7",ifName="GigabitEthernet0/4"} 1 ifAdminStatus{ifAlias="GigabitEthernet0/5 Interface",ifDescr="GigabitEthernet0/5",ifIndex="8",ifName="GigabitEthernet0/5"} 1 ifAdminStatus{ifAlias="GigabitEthernet0/6 Interface",ifDescr="GigabitEthernet0/6",ifIndex="9",ifName="GigabitEthernet0/6"} 1 ifAdminStatus{ifAlias="GigabitEthernet0/7 Interface",ifDescr="GigabitEthernet0/7",ifIndex="10",ifName="GigabitEthernet0/7"} 1 ifAdminStatus{ifAlias="GigabitEthernet0/8 Interface",ifDescr="GigabitEthernet0/8",ifIndex="11",ifName="GigabitEthernet0/8"} 1 ifAdminStatus{ifAlias="GigabitEthernet0/9 Interface",ifDescr="GigabitEthernet0/9",ifIndex="12",ifName="GigabitEthernet0/9"} 1 ifAdminStatus{ifAlias="NULL0 Interface",ifDescr="NULL0",ifIndex="13",ifName="NULL0"} 1
snmp 的监控 Dashboard ID 为:10523