GPU Monitoring
Beszel can monitor GPU usage, temperature, and power draw.
AMD GPUs
Work in progress
AMD has deprecated rocm-smi in favor of amd-smi. The agent works with rocm-smi on Linux, but hasn't been updated to work with amd-smi yet.
Beszel uses rocm-smi to monitor AMD GPUs. This must be available on the system, and you must use the binary agent (not the Docker agent).
Make sure rocm-smi is accessible
Installing rocm-smi-lib on Arch and Debian places the rocm-smi binary in /opt/rocm. If this isn't in the PATH of the user running beszel-agent, symlink to /usr/local/bin:
sudo ln -s /opt/rocm/bin/rocm-smi /usr/local/bin/rocm-smiNvidia GPUs
Docker agent
Make sure NVIDIA Container Toolkit is installed on the host system.
Use henrygd/beszel-agent-nvidia and add the following deploy block to your docker-compose.yml.
beszel-agent:
image: henrygd/beszel-agent-nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- utilityBinary agent
You must have nvidia-smi available on the system.
If it doesn't work, you may need to allow access to your devices in the service configuration. See discussion #563 for more information.
[Service]
DeviceAllow=/dev/nvidiactl rw
DeviceAllow=/dev/nvidia0 rw
# If you have multiple GPUs, make sure to allow all of them
DeviceAllow=/dev/nvidia1 rw
DeviceAllow=/dev/nvidia2 rwsystemctl daemon-reload
systemctl restart beszel-agentNvidia Jetson
You must use the binary agent and have tegrastats installed.
Intel GPUs
Note that only one GPU per system is supported. We may add support for multiple GPUs in the future.
Docker agent
Use the henrygd/beszel-agent-intel image with the additional options below.
beszel-agent:
image: henrygd/beszel-agent-intel
cap_add:
- CAP_PERFMON
devices:
- /dev/dri/card0:/dev/dri/card0Use ls /dev/dri to find the name of your GPU:
ls /dev/driby-path card0 renderD128Binary agent
You must have intel_gpu_top installed. This is typically part of the intel-gpu-tools package.
sudo apt install intel-gpu-toolssudo pacman -S intel-gpu-toolsAssuming you're not running the agent as root, you'll need to set the cap_perfmon capability on the intel_gpu_top binary.
sudo setcap cap_perfmon=ep /usr/bin/intel_gpu_topTroubleshooting
To independently test the intel_gpu_top command:
# docker
docker exec -it beszel-agent intel_gpu_top -s 3000 -l
# binary
sudo -u beszel intel_gpu_top -s 3000 -lSpecify the device name
On some systems you need to specify the device name for intel_gpu_top. Use the INTEL_GPU_DEVICE environment variable to set the -d value.
INTEL_GPU_DEVICE=drm:/dev/dri/card0This is equivalent to running intel_gpu_top -s 3000 -l -d drm:/dev/dri/card0.
Lower the perf_event_paranoid kernel parameter
You may need to lower the value for the perf_event_paranoid kernel parameter. See issue #1150 or #1203 for more information.
sudo sysctl kernel.perf_event_paranoid=2To make this change persistant across reboots you need to add it to the sysctl configuration
echo "kernel.perf_event_paranoid=2" | sudo tee /etc/sysctl.d/99-intel-gpu-beszel.conf
sudo sysctl --system