Monitoring - Data sources

Introduction

Linux provides monitoring capabilities through various interfaces, allowing administrators to observe system performance and status. This article will explore the core monitoring data sources available in Linux kernel. Understanding these mechanisms is essential before diving into monitoring solutions like Zabbix or Prometheus.

The `/proc` virtual filesystem

The /proc filesystem is a virtual filesystem that doesn't exist physically on disk but is created in memory when the system boots up. It serves as an interface to kernel data structures, providing a wealth of information about the running OS.

Basic OS statistics in `/proc`

Process (PID) information

Each running process has its own directory in /proc, named after its Process ID. Inside these directories, you'll find files containing information about that process:

/proc/[PID]/status - Contains process state, memory usage, and other details
/proc/[PID]/cmdline - Command that started the process
/proc/[PID]/fd/ - Directory containing file descriptors
/proc/[PID]/maps - Memory maps of the process
/proc/[PID]/environ - Environment variables

For more information about process-specific files in /proc, see the Linux kernel documentation.

Monitoring

System-wide information

Several files provide system-wide statistics:

/proc/cpuinfo - Detailed information about CPU(s)
/proc/meminfo - Memory usage statistics
/proc/loadavg - System load averages
/proc/stat - Various system statistics including CPU utilization
/proc/diskstats - Disk I/O statistics
/proc/net/dev - Network interface statistics

Reading `/proc` files

Most /proc files can be read using standard utilities like cat, less, or grep. For example:

$ cat /proc/cpuinfo
$ cat /proc/meminfo | grep MemFree
$ cat /proc/loadavg

Calculating metrics from `/proc`

Many monitoring tools calculate derived metrics from raw /proc data. For example, CPU utilization percentage is calculated by comparing values from /proc/stat across time intervals:

Read CPU time values from /proc/stat
Wait a defined interval
Read values again
Calculate the difference in values
Determine percentages based on these differences

Here's a simplified example of calculating CPU usage:

# First reading
user1=$(awk '/^cpu / {print $2}' /proc/stat)
nice1=$(awk '/^cpu / {print $3}' /proc/stat)
system1=$(awk '/^cpu / {print $4}' /proc/stat)
idle1=$(awk '/^cpu / {print $5}' /proc/stat)
iowait1=$(awk '/^cpu / {print $6}' /proc/stat)
total1=$((user1 + nice1 + system1 + idle1 + iowait1))

# Wait 1 second
sleep 1

# Second reading
user2=$(awk '/^cpu / {print $2}' /proc/stat)
nice2=$(awk '/^cpu / {print $3}' /proc/stat)
system2=$(awk '/^cpu / {print $4}' /proc/stat)
idle2=$(awk '/^cpu / {print $5}' /proc/stat)
iowait2=$(awk '/^cpu / {print $6}' /proc/stat)
total2=$((user2 + nice2 + system2 + idle2 + iowait2))

# Calculate differences
total_diff=$((total2 - total1))
idle_diff=$((idle2 - idle1))

# Calculate CPU usage percentage
cpu_usage=$(( 100 * (total_diff - idle_diff) / total_diff ))
echo "CPU usage: $cpu_usage%"

Yes, we have complete bash script to check current usage, no more top !!!

`/sys` filesystem

The /sys filesystem (or interface) is another virtual filesystem that exposes kernel objects, device drivers, and their attributes. It's useful for hardware monitoring:

/sys/class/thermal/ - Temperature sensors
/sys/class/power_supply/ - Battery information
/sys/devices/system/cpu/ - CPU control information

Disk and Block Devices – `/sys/block/`

The /sys/block/ directory contains entries for all block devices, such as hard drives (sda, nvme0n1, etc.) and their partitions. These entries can give you detailed insights into the state and behavior of storage devices.

Each device subdirectory (e.g. /sys/block/sda/) contains various files and subdirectories that provide:

a) I/O statistics

Located in /sys/block/<device_name>/stat, this file provides statistics such as reads, writes, sectors transferred, and I/O time.

Example:

cat /sys/block/sda/stat 
# Output: 12345 678 9101112 1314 151617 1819 2021222 2324 0 0 0

Each column (from kernel documentation) represents: 1. Reads completed 2. Reads merged 3. Sectors read 4. Time spent reading (ms) 5. Writes completed 6. Writes merged 7. Sectors written 8. Time spent writing (ms) 9. I/Os currently in progress 10. Time spent doing I/Os (ms) 11. Weighted time spent on I/Os (ms)

These statistics are cumulative since boot and can be used to calculate disk throughput, latency, and usage over time.

b) rotational vs non-rotational

Located in /sys/block/<device_name>/queue/rotational, this file indicates whether a device is a traditional spinning HDD (1) or an SSD (0).

Example:

cat /sys/block/sda/queue/rotational 
# Output: 1 → means spinning disk

This can help optimize I/O scheduling strategies in the kernel or monitoring tools.

c) I/O scheduler

Located in /sys/block/<device>/queue/scheduler, this file lists available I/O schedulers and highlights the active one in brackets.

Example:

cat /sys/block/sda/queue/scheduler 
# Output: none [mq-deadline] kyber bfq

To change the scheduler (e.g., to bfq):

echo bfq | tee /sys/block/sda/queue/scheduler

Note: Changing I/O scheduler in this way is preserved only until reboot. To make this avaliable also after reboot you need to change this configuration in GRUB or through udev. NVMe/SSD usually has its own built-in scheduler.

d) Device information and topology

Located in: * /sys/block/<device>/device/ – Contains udev attributes and hardware-specific details * /sys/block/<device>/queue/logical_block_size – Logical block size in bytes * /sys/block/<device>/queue/physical_block_size – Physical block size (important for SSDs)

Other important `/sys` directories

Power Management – `/sys/power/`

Contains system-wide power management settings:

cat /sys/power/state          # Shows available sleep states
cat /sys/power/disk           # Shows disk power management mode

CPU Information – `/sys/devices/system/cpu/`

Provides detailed CPU information and control:

# CPU topology
ls /sys/devices/system/cpu/cpu0/topology/

# CPU frequency information
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq

# CPU governor (power management)
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

Network Interfaces – `/sys/class/net/`

Contains information about network interfaces:

# Get interface status (up/down)
cat /sys/class/net/eth0/operstate

# Check interface speed
cat /sys/class/net/eth0/speed

# MAC address
cat /sys/class/net/eth0/address

The /sys filesystem is a powerful kernel interface for both monitoring and configuring hardware at a low level. When it comes to disks, /sys/block provides real-time access to I/O performance metrics, scheduling, and device characteristics. Tools such as iostat, udevadm, and smartctl often use this data behind the scenes.

The /sys filesystem is especially valuable for automated monitoring scripts that need direct access to hardware parameters without parsing complex command output. Unlike /proc, which focuses primarily on processes and system statistics, /sys is organized around hardware devices and subsystems, making it the goto source for hardware-related monitoring.

SNMP (Simple Network Management Protocol)

SNMP is a standard protocol for collecting and organizing information about managed devices on networks. It's widely used for monitoring network devices and servers.

SNMP Architecture

SNMP architecture consists of:

SNMP Manager - the monitoring system that requests information
SNMP Agent - software running on monitored devices that collects and provides data
Management Information Base (MIB) - a hierarchical database that defines what data can be collected
Object Identifiers (OIDs) - unique identifiers for each data point in the MIB

Setting up SNMP on Linux

To use SNMP in a Linux OS, you need to install and configure an SNMP agent like snmpd:

# Install SNMP agent on RHEL like
$ yum install net-snmp
# Debian like
$ apt install snmpd

The configuration file is by default located at /etc/snmp/snmpd.conf. Basic configuration involves:

Setting community strings (like passwords)
Configuring access control
Defining which system information to expose

Querying SNMP data

The snmpwalk tool can query all available SNMP data from a device. For specific values, use snmpget.

Linux-specific SNMP extensions

The net-snmp package provides Linux-specific SNMP extensions that expose data from /proc through SNMP:

CPU utilization
Memory usage
Disk statistics
Process information
Network interface statistics

Extending SNMP

SNMP can be extended to collect custom metrics through:

Extending the agent - writing scripts that are executed by the SNMP agent
SNMP traps - asynchronous notifications sent from agents to managers when thresholds are crossed

`sysstat` Tools

The sysstat package provides a collection of performance monitoring tools:

sar - System Activity Reporter
iostat - I/O statistics
mpstat - Multiprocessor statistics
pidstat - Per-process statistics

These tools collect data from /proc and format it for easier reading:

# Install SNMP agent on RHEL like
$ yum install sysstat
# Debian like
$ apt install sysstat

# View CPU statistics
$ sar -u 1 5

# View memory statistics
$ sar -r 1 5

# View disk I/O statistics
$ iostat -x 1 5

Linux kernel counters

The Linux kernel maintains counters that track system activities:

netstat - Network statistics
vmstat - Virtual memory statistics
perf - Performance analysis tool

Example usage:

# Network connections
$ netstat -an

# Memory statistics
$ vmstat 1 5

# Performance counters
$ perf stat -p <PID>

cgroups (Control Groups)

cgroups are a Linux kernel feature that isolate and track resource usage of process groups:

CPU time
Memory
Disk I/O
Network bandwidth

cgroup data can be accessed through: - /sys/fs/cgroup/ (cgroups v1) - /sys/fs/cgroup/<controller>/<cgroup>/ (cgroups v2)

Kernel events via eBPF

Extended Berkeley Packet Filter (eBPF) is a powerful technology that allows programs to run in the Linux kernel space. It's increasingly used for monitoring:

Performance analysis
Security monitoring
Network packet inspection

Tools like bpftrace and bcc (BPF Compiler Collection) leverage eBPF:

# Count system calls by process
$ bpftrace -e 'tracepoint:syscalls:sys_enter_* { @[comm] = count(); }'

# Monitor file opens
$ bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s opened %s\n", comm, str(args->filename)); }'

Conclusion

Linux provides rich interfaces for OS monitoring through the /proc filesystem, SNMP, and various kernel facilities. These data sources form the foundation for more sophisticated monitoring solutions like Zabbix, Prometheus, Nagios, and many many more.

Understanding how these underlying mechanisms work helps administrators: - Diagnose problems more effectively - Create custom monitoring solutions - Better interpret the output of monitoring tools - Optimize system performance

In the next part, I'll dive into how these data sources can be utilized.

Monitoring - Data sources

Introduction

The /proc virtual filesystem

Basic OS statistics in /proc

Process (PID) information

System-wide information

Reading /proc files

Calculating metrics from /proc

/sys filesystem

Disk and Block Devices – /sys/block/

a) I/O statistics

b) rotational vs non-rotational

c) I/O scheduler

d) Device information and topology

Other important /sys directories

Power Management – /sys/power/

CPU Information – /sys/devices/system/cpu/

Network Interfaces – /sys/class/net/

SNMP (Simple Network Management Protocol)

SNMP Architecture

Setting up SNMP on Linux

Querying SNMP data

Linux-specific SNMP extensions

Extending SNMP

sysstat Tools

Linux kernel counters

cgroups (Control Groups)

Kernel events via eBPF

Conclusion

The `/proc` virtual filesystem

Basic OS statistics in `/proc`

Reading `/proc` files

Calculating metrics from `/proc`

`/sys` filesystem

Disk and Block Devices – `/sys/block/`

Other important `/sys` directories

Power Management – `/sys/power/`

CPU Information – `/sys/devices/system/cpu/`

Network Interfaces – `/sys/class/net/`

`sysstat` Tools