viva_tensor/auto_tune

Auto-Tuning System - Self-Optimizing Tensor Library

Inspirado pelas pesquisas do HuggingChat + ggml + Candle

Features:

  1. GPU Auto-Detection (detecta VRAM, load, capabilities)
  2. Adaptive Quantization (int8 para inference, fp32 para training)
  3. Zero-Copy entre Gleam e Rust NIFs
  4. Auto Batch Size Optimizer (aprende o melhor batch para o hardware)

Target: RTX 4090 24GB VRAM + 32GB RAM

Types

Estado do auto-tuner

pub type AutoTuner {
  AutoTuner(
    hardware: HardwareProfile,
    quant_mode: QuantMode,
    history: List(BatchResult),
    current_batch_size: Int,
  )
}

Constructors

Resultado de uma execução de batch

pub type BatchResult {
  BatchResult(
    batch_size: Int,
    duration_ms: Float,
    throughput: Float,
  )
}

Constructors

  • BatchResult(
      batch_size: Int,
      duration_ms: Float,
      throughput: Float,
    )

Device type detectado

pub type Device {
  Cuda(gpu_id: Int, vram_gb: Float)
  Metal(device_id: Int)
  Cpu(cores: Int)
}

Constructors

  • Cuda(gpu_id: Int, vram_gb: Float)
  • Metal(device_id: Int)
  • Cpu(cores: Int)

Hardware profile completo

pub type HardwareProfile {
  HardwareProfile(
    device: Device,
    total_vram_gb: Float,
    available_vram_gb: Float,
    total_ram_gb: Float,
    gpu_load_pct: Float,
    optimal_batch_size: Int,
  )
}

Constructors

  • HardwareProfile(
      device: Device,
      total_vram_gb: Float,
      available_vram_gb: Float,
      total_ram_gb: Float,
      gpu_load_pct: Float,
      optimal_batch_size: Int,
    )
pub type MemoryPressure {
  Low
  Medium
  High
  Critical
}

Constructors

  • Low
  • Medium
  • High
  • Critical
pub type MemoryStrategy {
  MemoryStrategy(
    batch_size_mult: Float,
    quant_mode: QuantMode,
    gc_aggressive: Bool,
  )
}

Constructors

  • MemoryStrategy(
      batch_size_mult: Float,
      quant_mode: QuantMode,
      gc_aggressive: Bool,
    )

Contexto de quantização

pub type QuantContext {
  QuantContext(mode: QuantMode, scales: List(Float))
}

Constructors

  • QuantContext(mode: QuantMode, scales: List(Float))

Modo de quantização

pub type QuantMode {
  Inference
  Training
  Adaptive
}

Constructors

  • Inference
  • Training
  • Adaptive

Values

pub fn check_memory_pressure(
  hw: HardwareProfile,
) -> MemoryPressure

Verifica pressão de memória

pub fn compute_scale(tensor: tensor.Tensor) -> Float

Calcula escala para quantização absmax (int8)

pub fn detect_cpu_only() -> HardwareProfile

Cria auto-tuner para CPU-only

pub fn detect_hardware() -> HardwareProfile

Detecta hardware disponível

pub fn get_memory_strategy(
  pressure: MemoryPressure,
) -> MemoryStrategy

Estratégia baseada em pressão de memória

pub fn main() -> Nil
pub fn new() -> AutoTuner

Cria novo auto-tuner

pub fn new_quant_context(mode: QuantMode) -> QuantContext

Cria contexto de quantização

pub fn profile(
  tuner: AutoTuner,
  batch_size: Int,
  duration_ms: Float,
) -> AutoTuner

Registra resultado de execução e otimiza

pub fn run_hardware_profile() -> Nil

Roda profile completo do hardware

pub fn should_quantize(
  ctx: QuantContext,
  is_inference: Bool,
) -> Bool

Decide modo de quantização baseado na operação

Search Document