API Reference¶

Autogenerated docs for the public modules most users interact with. The CLI module exposes the Typer entry point, while the controllers and utilities power both the command-line and Python recipes.

For global sessions, gpu_ids=None means all visible GPUs. Explicit values are visible device ordinals after CUDA or ROCm visibility filtering. Empty, duplicate, or out-of-range lists are invalid; lists with more than 64 entries are also invalid. Startup raises ValueError if discovery resolves to zero devices. Telemetry may expose metadata such as physical_id, but those fields are not accepted as selection IDs. On CUDA, NVML telemetry records are returned only for visible ordinals that Torch CUDA can select, and surviving visible IDs are not compacted after filtering. NVML-only devices are not exposed as public gpu_ids. On ROCm, telemetry records are returned only for visible ordinals that Torch can select; nullable memory fields mean memory telemetry is unavailable after successful selection.

GlobalGPUController validates local constructor inputs (gpu_ids, interval, busy_threshold, and vram_to_keep) before platform or hardware probing. Visible-count checks for explicit IDs still run after backend discovery because they depend on the current visible device count. Its omitted vram_to_keep default is the shared low-power public default, 1GiB, matching the CLI and service APIs. Direct CudaGPUController, RocmGPUController, and MacMGPUController constructors use the same omitted vram_to_keep="1GiB" default. Public interval values must be finite positive seconds, including fractional seconds, capped by the Python runtime wait limit. Public VRAM byte-equivalent values below 4 bytes or above 1 PiB are rejected, and internal tensor element counts round up to cover the requested byte-equivalent amount.

Direct CUDA, ROCm, and Mac M controller rank values are public visible device ordinals. CUDA and ROCm ranks must be plain integers within 0..torch.cuda.device_count()-1 in the current process environment; Mac M ranks must be the plain integer 0. Non-integer ranks raise TypeError, and negative or out-of-range ranks raise ValueError during construction, before KeepGPU creates a torch.device, checks backend availability, calls backend device selection, starts a worker, or queries telemetry. CUDA and ROCm call torch.cuda.device_count() only after the rank is known to be a plain integer.

CUDA telemetry resolves visible ordinals through CUDA_VISIBLE_DEVICES before querying NVML. Numeric tokens, full UUID tokens, and unique UUID prefixes are supported, and parsing stops at -1 after any valid preceding tokens. Malformed, duplicate/equivalent, ambiguous, out-of-range, or unresolved mappings report unavailable utilization instead of guessing a physical device. ROCm telemetry resolves ROCR_VISIBLE_DEVICES as the base mask and one matching HIP_VISIBLE_DEVICES/CUDA_VISIBLE_DEVICES overlay before querying ROCm SMI. Unresolved mappings report unavailable utilization instead of falling back to a possibly wrong physical device.

Public Python controller, CLI, REST, JSON-RPC, and MCP defaults use vram_to_keep/vram 1GiB and busy_threshold=25, so omitted settings target a modest VRAM signal only when utilization backoff permits; busy or unavailable telemetry sleeps before allocating keep tensors or running compute. Pass busy_threshold=-1 only when you intentionally want unconditional keepalive compute without utilization backoff.

CUDA and ROCm keep() calls wait for fatal backend startup setup to succeed before reporting success. Startup failures such as device-selection errors are raised synchronously, while normal low-power allocation can still be deferred by utilization backoff after startup succeeds.

Single-GPU workload iteration controls must be positive integers. CUDA exposes relu_iterations; ROCm and Mac M expose iterations. Non-integer values raise TypeError, and non-positive values raise ValueError before a worker can start, rather than creating a silent no-op keep loop or a background thread crash.

For service session IDs, job_id=None is the only omitted/all-sessions sentinel. Custom IDs must be non-empty strings containing only letters, digits, ., _, -, or ~, and may not be standalone . or ..; invalid values raise ValueError before session state changes.

CLI entrypoint for KeepGPU.