Monitoring - Data sources
Introduction
Linux provides monitoring capabilities through various interfaces, allowing administrators to observe system performance and status. This article will explore the core monitoring data sources available in Linux kernel. Understanding these mechanisms is essential before diving into monitoring solutions like Zabbix or Prometheus.
The /proc
virtual filesystem
The /proc
filesystem is a virtual filesystem that doesn't exist physically on disk but is created in memory when the system boots up. It serves as an interface to kernel data structures, providing a wealth of information about the running OS.
Basic OS statistics in /proc
Process (PID) information
Each running process has its own directory in /proc
, named after its Process ID. Inside these directories, you'll find files containing information about that process:
/proc/[PID]/status
- Contains process state, memory usage, and other details/proc/[PID]/cmdline
- Command that started the process/proc/[PID]/fd/
- Directory containing file descriptors/proc/[PID]/maps
- Memory maps of the process/proc/[PID]/environ
- Environment variables
For more information about process-specific files in
/proc
, see the Linux kernel documentation.
System-wide information
Several files provide system-wide statistics:
/proc/cpuinfo
- Detailed information about CPU(s)/proc/meminfo
- Memory usage statistics/proc/loadavg
- System load averages/proc/stat
- Various system statistics including CPU utilization/proc/diskstats
- Disk I/O statistics/proc/net/dev
- Network interface statistics
Reading /proc
files
Most /proc
files can be read using standard utilities like cat
, less
, or grep
. For example:
Calculating metrics from /proc
Many monitoring tools calculate derived metrics from raw /proc
data. For example, CPU utilization percentage is calculated by comparing values from /proc/stat
across time intervals:
- Read CPU time values from
/proc/stat
- Wait a defined interval
- Read values again
- Calculate the difference in values
- Determine percentages based on these differences
Here's a simplified example of calculating CPU usage:
# First reading
user1=$(awk '/^cpu / {print $2}' /proc/stat)
nice1=$(awk '/^cpu / {print $3}' /proc/stat)
system1=$(awk '/^cpu / {print $4}' /proc/stat)
idle1=$(awk '/^cpu / {print $5}' /proc/stat)
iowait1=$(awk '/^cpu / {print $6}' /proc/stat)
total1=$((user1 + nice1 + system1 + idle1 + iowait1))
# Wait 1 second
sleep 1
# Second reading
user2=$(awk '/^cpu / {print $2}' /proc/stat)
nice2=$(awk '/^cpu / {print $3}' /proc/stat)
system2=$(awk '/^cpu / {print $4}' /proc/stat)
idle2=$(awk '/^cpu / {print $5}' /proc/stat)
iowait2=$(awk '/^cpu / {print $6}' /proc/stat)
total2=$((user2 + nice2 + system2 + idle2 + iowait2))
# Calculate differences
total_diff=$((total2 - total1))
idle_diff=$((idle2 - idle1))
# Calculate CPU usage percentage
cpu_usage=$(( 100 * (total_diff - idle_diff) / total_diff ))
echo "CPU usage: $cpu_usage%"
Yes, we have complete bash script to check current usage, no more top !!!
/sys
filesystem
The /sys
filesystem (or interface) is another virtual filesystem that exposes kernel objects, device drivers, and their attributes. It's useful for hardware monitoring:
/sys/class/thermal/
- Temperature sensors/sys/class/power_supply/
- Battery information/sys/devices/system/cpu/
- CPU control information
Disk and Block Devices – /sys/block/
The /sys/block/
directory contains entries for all block devices, such as hard drives (sda
, nvme0n1
, etc.) and their partitions. These entries can give you detailed insights into the state and behavior of storage devices.
Each device subdirectory (e.g. /sys/block/sda/
) contains various files and subdirectories that provide:
a) I/O statistics
Located in /sys/block/<device_name>/stat
, this file provides statistics such as reads, writes, sectors transferred, and I/O time.
Example:
Each column (from kernel documentation) represents: 1. Reads completed 2. Reads merged 3. Sectors read 4. Time spent reading (ms) 5. Writes completed 6. Writes merged 7. Sectors written 8. Time spent writing (ms) 9. I/Os currently in progress 10. Time spent doing I/Os (ms) 11. Weighted time spent on I/Os (ms)
These statistics are cumulative since boot and can be used to calculate disk throughput, latency, and usage over time.
b) rotational vs non-rotational
Located in /sys/block/<device_name>/queue/rotational
, this file indicates whether a device is a traditional spinning HDD (1
) or an SSD (0
).
Example:
This can help optimize I/O scheduling strategies in the kernel or monitoring tools.
c) I/O scheduler
Located in /sys/block/<device>/queue/scheduler
, this file lists available I/O schedulers and highlights the active one in brackets.
Example:
To change the scheduler (e.g., to bfq
):
Note: Changing I/O scheduler in this way is preserved only until reboot. To make this avaliable also after reboot you need to change this configuration in GRUB or through udev
.
NVMe/SSD usually has its own built-in scheduler.
d) Device information and topology
Located in:
* /sys/block/<device>/device/
– Contains udev attributes and hardware-specific details
* /sys/block/<device>/queue/logical_block_size
– Logical block size in bytes
* /sys/block/<device>/queue/physical_block_size
– Physical block size (important for SSDs)
Other important /sys
directories
Power Management – /sys/power/
Contains system-wide power management settings:
cat /sys/power/state # Shows available sleep states
cat /sys/power/disk # Shows disk power management mode
CPU Information – /sys/devices/system/cpu/
Provides detailed CPU information and control:
# CPU topology
ls /sys/devices/system/cpu/cpu0/topology/
# CPU frequency information
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
# CPU governor (power management)
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Network Interfaces – /sys/class/net/
Contains information about network interfaces:
# Get interface status (up/down)
cat /sys/class/net/eth0/operstate
# Check interface speed
cat /sys/class/net/eth0/speed
# MAC address
cat /sys/class/net/eth0/address
The /sys
filesystem is a powerful kernel interface for both monitoring and configuring hardware at a low level. When it comes to disks, /sys/block
provides real-time access to I/O performance metrics, scheduling, and device characteristics. Tools such as iostat
, udevadm
, and smartctl
often use this data behind the scenes.
The /sys
filesystem is especially valuable for automated monitoring scripts that need direct access to hardware parameters without parsing complex command output. Unlike /proc
, which focuses primarily on processes and system statistics, /sys
is organized around hardware devices and subsystems, making it the goto source for hardware-related monitoring.
SNMP (Simple Network Management Protocol)
SNMP is a standard protocol for collecting and organizing information about managed devices on networks. It's widely used for monitoring network devices and servers.
SNMP Architecture
SNMP architecture consists of:
- SNMP Manager - the monitoring system that requests information
- SNMP Agent - software running on monitored devices that collects and provides data
- Management Information Base (MIB) - a hierarchical database that defines what data can be collected
- Object Identifiers (OIDs) - unique identifiers for each data point in the MIB
Setting up SNMP on Linux
To use SNMP in a Linux OS, you need to install and configure an SNMP agent like snmpd
:
The configuration file is by default located at /etc/snmp/snmpd.conf
. Basic configuration involves:
- Setting community strings (like passwords)
- Configuring access control
- Defining which system information to expose
Querying SNMP data
The snmpwalk
tool can query all available SNMP data from a device.
For specific values, use snmpget
.
Linux-specific SNMP extensions
The net-snmp
package provides Linux-specific SNMP extensions that expose data from /proc
through SNMP:
- CPU utilization
- Memory usage
- Disk statistics
- Process information
- Network interface statistics
Extending SNMP
SNMP can be extended to collect custom metrics through:
- Extending the agent - writing scripts that are executed by the SNMP agent
- SNMP traps - asynchronous notifications sent from agents to managers when thresholds are crossed
sysstat
Tools
The sysstat
package provides a collection of performance monitoring tools:
sar
- System Activity Reporteriostat
- I/O statisticsmpstat
- Multiprocessor statisticspidstat
- Per-process statistics
These tools collect data from /proc
and format it for easier reading:
# Install SNMP agent on RHEL like
$ yum install sysstat
# Debian like
$ apt install sysstat
# View CPU statistics
$ sar -u 1 5
# View memory statistics
$ sar -r 1 5
# View disk I/O statistics
$ iostat -x 1 5
Linux kernel counters
The Linux kernel maintains counters that track system activities:
- netstat - Network statistics
- vmstat - Virtual memory statistics
- perf - Performance analysis tool
Example usage:
# Network connections
$ netstat -an
# Memory statistics
$ vmstat 1 5
# Performance counters
$ perf stat -p <PID>
cgroups (Control Groups)
cgroups are a Linux kernel feature that isolate and track resource usage of process groups:
- CPU time
- Memory
- Disk I/O
- Network bandwidth
cgroup data can be accessed through:
- /sys/fs/cgroup/
(cgroups v1)
- /sys/fs/cgroup/<controller>/<cgroup>/
(cgroups v2)
Kernel events via eBPF
Extended Berkeley Packet Filter (eBPF) is a powerful technology that allows programs to run in the Linux kernel space. It's increasingly used for monitoring:
- Performance analysis
- Security monitoring
- Network packet inspection
Tools like bpftrace
and bcc
(BPF Compiler Collection) leverage eBPF:
# Count system calls by process
$ bpftrace -e 'tracepoint:syscalls:sys_enter_* { @[comm] = count(); }'
# Monitor file opens
$ bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s opened %s\n", comm, str(args->filename)); }'
Conclusion 
Linux provides rich interfaces for OS monitoring through the /proc
filesystem, SNMP, and various kernel facilities. These data sources form the foundation for more sophisticated monitoring solutions like Zabbix, Prometheus, Nagios, and many many more.
Understanding how these underlying mechanisms work helps administrators: - Diagnose problems more effectively - Create custom monitoring solutions - Better interpret the output of monitoring tools - Optimize system performance
In the next part, I'll dive into how these data sources can be utilized.