viva_tensor/auto_tune

Auto-Tuning System - Self-Optimizing Tensor Library

Inspirado pelas pesquisas do HuggingChat + ggml + Candle

Features:

GPU Auto-Detection (detecta VRAM, load, capabilities)
Adaptive Quantization (int8 para inference, fp32 para training)
Zero-Copy entre Gleam e Rust NIFs
Auto Batch Size Optimizer (aprende o melhor batch para o hardware)

Target: RTX 4090 24GB VRAM + 32GB RAM

Types

AutoTuner

</>

Estado do auto-tuner

pub type AutoTuner {
  AutoTuner(
    hardware: HardwareProfile,
    quant_mode: QuantMode,
    history: List(BatchResult),
    current_batch_size: Int,
  )
}

Constructors

AutoTuner(
  hardware: HardwareProfile,
  quant_mode: QuantMode,
  history: List(BatchResult),
  current_batch_size: Int,
)

BatchResult

</>

Resultado de uma execução de batch

pub type BatchResult {
  BatchResult(
    batch_size: Int,
    duration_ms: Float,
    throughput: Float,
  )
}

Constructors

BatchResult(
  batch_size: Int,
  duration_ms: Float,
  throughput: Float,
)

Device

</>

Device type detectado

pub type Device {
  Cuda(gpu_id: Int, vram_gb: Float)
  Metal(device_id: Int)
  Cpu(cores: Int)
}

Constructors

```
Cuda(gpu_id: Int, vram_gb: Float)
```
```
Metal(device_id: Int)
```
```
Cpu(cores: Int)
```

HardwareProfile

</>

Hardware profile completo

pub type HardwareProfile {
  HardwareProfile(
    device: Device,
    total_vram_gb: Float,
    available_vram_gb: Float,
    total_ram_gb: Float,
    gpu_load_pct: Float,
    optimal_batch_size: Int,
  )
}

Constructors

HardwareProfile(
  device: Device,
  total_vram_gb: Float,
  available_vram_gb: Float,
  total_ram_gb: Float,
  gpu_load_pct: Float,
  optimal_batch_size: Int,
)

MemoryPressure

</>

pub type MemoryPressure {
  Low
  Medium
  High
  Critical
}

Constructors

```
Low
```
```
Medium
```
```
High
```
```
Critical
```

MemoryStrategy

</>

pub type MemoryStrategy {
  MemoryStrategy(
    batch_size_mult: Float,
    quant_mode: QuantMode,
    gc_aggressive: Bool,
  )
}

Constructors

MemoryStrategy(
  batch_size_mult: Float,
  quant_mode: QuantMode,
  gc_aggressive: Bool,
)

QuantContext

</>

Contexto de quantização

pub type QuantContext {
  QuantContext(mode: QuantMode, scales: List(Float))
}

Constructors

QuantContext(mode: QuantMode, scales: List(Float))

QuantMode

</>

Modo de quantização

pub type QuantMode {
  Inference
  Training
  Adaptive
}

Constructors

```
Inference
```
```
Training
```
```
Adaptive
```

Values

check_memory_pressure

</>

pub fn check_memory_pressure(
  hw: HardwareProfile,
) -> MemoryPressure

Verifica pressão de memória

compute_scale

</>

pub fn compute_scale(tensor: tensor.Tensor) -> Float

Calcula escala para quantização absmax (int8)

detect_cpu_only

</>

pub fn detect_cpu_only() -> HardwareProfile

Cria auto-tuner para CPU-only

detect_hardware

</>

pub fn detect_hardware() -> HardwareProfile

Detecta hardware disponível

get_memory_strategy

</>

pub fn get_memory_strategy(
  pressure: MemoryPressure,
) -> MemoryStrategy

Estratégia baseada em pressão de memória

main

</>

pub fn main() -> Nil

new

</>

pub fn new() -> AutoTuner

Cria novo auto-tuner

new_quant_context

</>

pub fn new_quant_context(mode: QuantMode) -> QuantContext

Cria contexto de quantização

profile

</>

pub fn profile(
  tuner: AutoTuner,
  batch_size: Int,
  duration_ms: Float,
) -> AutoTuner

Registra resultado de execução e otimiza

run_hardware_profile

</>

pub fn run_hardware_profile() -> Nil

Roda profile completo do hardware

should_quantize

</>

pub fn should_quantize(
  ctx: QuantContext,
  is_inference: Bool,
) -> Bool

Decide modo de quantização baseado na operação