tally/gpu/no-hardcoded-visible-devices

GPU visibility is deployment policy; hardcoding it in the image reduces portability.

Property	Value
Severity	Warning
Category	Correctness
Default	Enabled
Auto-fix	Partial

Description

Detects ENV instructions that hardcode GPU device visibility variables (NVIDIA_VISIBLE_DEVICES, CUDA_VISIBLE_DEVICES) inside the image. GPU visibility is deployment policy that should be set at runtime via docker run --gpus, NVIDIA_VISIBLE_DEVICES in the orchestrator, or similar mechanisms — not baked into the container image.

Why this matters

Portability — images with hardcoded device indices or UUIDs cannot run on hosts with different GPU topologies without rebuilding
Orchestrator conflict — Kubernetes device plugins, Slurm, and other schedulers set GPU visibility externally; image-level settings can conflict with or override orchestrator intent
Redundancy — official nvidia/cuda base images already set NVIDIA_VISIBLE_DEVICES=all via image labels; re-declaring it in the Dockerfile is pure noise

What is flagged

Pattern	Flagged?	Fix safety
`ENV NVIDIA_VISIBLE_DEVICES=all` on `nvidia/cuda:*` base	Yes (redundant)	`FixSafe` — safe to delete
`ENV NVIDIA_VISIBLE_DEVICES=0` or `=0,1` (device indices)	Yes	`FixSuggestion`
`ENV NVIDIA_VISIBLE_DEVICES=GPU-<uuid>` or `MIG-<uuid>`	Yes	`FixSuggestion`
`ENV CUDA_VISIBLE_DEVICES=<non-empty>`	Yes	`FixSuggestion`
`ENV NVIDIA_VISIBLE_DEVICES=all` on non-CUDA base	No — intentional for custom GPU images	—
`ENV NVIDIA_VISIBLE_DEVICES=none` / `void` / empty	No — intentional disable signal	—
`ENV NVIDIA_VISIBLE_DEVICES=${VAR}` (variable reference)	No — parameterized, not hardcoded	—
`ENV CUDA_VISIBLE_DEVICES=none` / `NoDevFiles` / empty	No — intentional disable	—

Examples

Violation

# Redundant: nvidia/cuda already sets NVIDIA_VISIBLE_DEVICES=all
FROM nvidia/cuda:12.2.0-runtime-ubuntu22.04
ENV NVIDIA_VISIBLE_DEVICES=all

# Hardcoded device indices make the image non-portable
FROM nvidia/cuda:12.2.0-runtime-ubuntu22.04
ENV NVIDIA_VISIBLE_DEVICES=0,1

# CUDA_VISIBLE_DEVICES bakes deployment policy into the image
FROM ubuntu:22.04
ENV CUDA_VISIBLE_DEVICES=0

# GPU UUIDs are host-specific
FROM ubuntu:22.04
ENV NVIDIA_VISIBLE_DEVICES=GPU-aaaa-bbbb-cccc-dddd-eeee-ffffffffffff

No violation

# NVIDIA_VISIBLE_DEVICES=all on a non-CUDA base is intentional
FROM ubuntu:22.04
ENV NVIDIA_VISIBLE_DEVICES=all

# Disable signals are intentional
FROM nvidia/cuda:12.2.0-runtime-ubuntu22.04
ENV NVIDIA_VISIBLE_DEVICES=none

# Variable references are parameterized, not hardcoded
FROM nvidia/cuda:12.2.0-runtime-ubuntu22.04
ARG GPU_DEVICES=all
ENV NVIDIA_VISIBLE_DEVICES=${GPU_DEVICES}

Auto-fix behavior

The rule offers two fix safety levels:

FixSafe (applied with --fix): removes the redundant NVIDIA_VISIBLE_DEVICES=all on nvidia/cuda base images. This is 100% behavior-preserving because the base image already sets this value.
FixSuggestion (applied with --fix --fix-unsafe): removes hardcoded device indices, UUIDs, or CUDA_VISIBLE_DEVICES values. This improves portability but changes deployment semantics — the user must ensure GPU visibility is provided at runtime.

For multi-key ENV instructions, only the flagged key is removed; other keys are preserved.

Configuration

This rule has no rule-specific options.

[rules.tally.gpu.no-hardcoded-visible-devices]
severity = "warning"

​Description

​Why this matters

​What is flagged

​Examples

​Violation

​No violation

​Auto-fix behavior

​Configuration

​References