viva_tensor/optim/auto_tune

Auto-Tuning System - Self-Optimizing Tensor Library

Inspired by HuggingChat + ggml + Candle research

Features:

  1. GPU Auto-Detection (detects VRAM, load, capabilities)
  2. Adaptive Quantization (int8 for inference, fp32 for training)
  3. Zero-Copy between Gleam and Rust NIFs
  4. Auto Batch Size Optimizer (learns the best batch for the hardware)

Target: RTX 4090 24GB VRAM + 32GB RAM

Types

Auto-tuner state

pub type AutoTuner {
  AutoTuner(
    hardware: HardwareProfile,
    quant_mode: QuantMode,
    history: List(BatchResult),
    current_batch_size: Int,
  )
}

Constructors

Result of a batch execution

pub type BatchResult {
  BatchResult(
    batch_size: Int,
    duration_ms: Float,
    throughput: Float,
  )
}

Constructors

  • BatchResult(
      batch_size: Int,
      duration_ms: Float,
      throughput: Float,
    )

Detected device type

pub type Device {
  Cuda(gpu_id: Int, vram_gb: Float)
  Metal(device_id: Int)
  Cpu(cores: Int)
}

Constructors

  • Cuda(gpu_id: Int, vram_gb: Float)
  • Metal(device_id: Int)
  • Cpu(cores: Int)

Complete hardware profile

pub type HardwareProfile {
  HardwareProfile(
    device: Device,
    total_vram_gb: Float,
    available_vram_gb: Float,
    total_ram_gb: Float,
    gpu_load_pct: Float,
    optimal_batch_size: Int,
  )
}

Constructors

  • HardwareProfile(
      device: Device,
      total_vram_gb: Float,
      available_vram_gb: Float,
      total_ram_gb: Float,
      gpu_load_pct: Float,
      optimal_batch_size: Int,
    )
pub type MemoryPressure {
  Low
  Medium
  High
  Critical
}

Constructors

  • Low
  • Medium
  • High
  • Critical
pub type MemoryStrategy {
  MemoryStrategy(
    batch_size_mult: Float,
    quant_mode: QuantMode,
    gc_aggressive: Bool,
  )
}

Constructors

  • MemoryStrategy(
      batch_size_mult: Float,
      quant_mode: QuantMode,
      gc_aggressive: Bool,
    )

Quantization context

pub type QuantContext {
  QuantContext(mode: QuantMode, scales: List(Float))
}

Constructors

  • QuantContext(mode: QuantMode, scales: List(Float))

Quantization mode

pub type QuantMode {
  Inference
  Training
  Adaptive
}

Constructors

  • Inference
  • Training
  • Adaptive

Values

pub fn check_memory_pressure(
  hw: HardwareProfile,
) -> MemoryPressure

Checks memory pressure

pub fn compute_scale(tensor: tensor.Tensor) -> Float

Computes scale for absmax quantization (int8)

pub fn detect_cpu_only() -> HardwareProfile

Creates auto-tuner for CPU-only

pub fn detect_hardware() -> HardwareProfile

Detects available hardware

pub fn get_memory_strategy(
  pressure: MemoryPressure,
) -> MemoryStrategy

Strategy based on memory pressure

pub fn main() -> Nil
pub fn new() -> AutoTuner

Creates new auto-tuner

pub fn new_quant_context(mode: QuantMode) -> QuantContext

Creates quantization context

pub fn profile(
  tuner: AutoTuner,
  batch_size: Int,
  duration_ms: Float,
) -> AutoTuner

Records execution result and optimizes

pub fn run_hardware_profile() -> Nil

Runs complete hardware profile

pub fn should_quantize(
  ctx: QuantContext,
  is_inference: Bool,
) -> Bool

Decides quantization mode based on the operation

Search Document