viva_tensor/optim/auto_tune
Auto-Tuning System - Self-Optimizing Tensor Library
Inspired by HuggingChat + ggml + Candle research
Features:
- GPU Auto-Detection (detects VRAM, load, capabilities)
- Adaptive Quantization (int8 for inference, fp32 for training)
- Zero-Copy between Gleam and Rust NIFs
- Auto Batch Size Optimizer (learns the best batch for the hardware)
Target: RTX 4090 24GB VRAM + 32GB RAM
Types
Auto-tuner state
pub type AutoTuner {
AutoTuner(
hardware: HardwareProfile,
quant_mode: QuantMode,
history: List(BatchResult),
current_batch_size: Int,
)
}
Constructors
-
AutoTuner( hardware: HardwareProfile, quant_mode: QuantMode, history: List(BatchResult), current_batch_size: Int, )
Result of a batch execution
pub type BatchResult {
BatchResult(
batch_size: Int,
duration_ms: Float,
throughput: Float,
)
}
Constructors
-
BatchResult( batch_size: Int, duration_ms: Float, throughput: Float, )
Detected device type
pub type Device {
Cuda(gpu_id: Int, vram_gb: Float)
Metal(device_id: Int)
Cpu(cores: Int)
}
Constructors
-
Cuda(gpu_id: Int, vram_gb: Float) -
Metal(device_id: Int) -
Cpu(cores: Int)
Complete hardware profile
pub type HardwareProfile {
HardwareProfile(
device: Device,
total_vram_gb: Float,
available_vram_gb: Float,
total_ram_gb: Float,
gpu_load_pct: Float,
optimal_batch_size: Int,
)
}
Constructors
-
HardwareProfile( device: Device, total_vram_gb: Float, available_vram_gb: Float, total_ram_gb: Float, gpu_load_pct: Float, optimal_batch_size: Int, )
pub type MemoryPressure {
Low
Medium
High
Critical
}
Constructors
-
Low -
Medium -
High -
Critical
Values
pub fn check_memory_pressure(
hw: HardwareProfile,
) -> MemoryPressure
Checks memory pressure
pub fn compute_scale(tensor: tensor.Tensor) -> Float
Computes scale for absmax quantization (int8)
pub fn get_memory_strategy(
pressure: MemoryPressure,
) -> MemoryStrategy
Strategy based on memory pressure
pub fn new_quant_context(mode: QuantMode) -> QuantContext
Creates quantization context
pub fn profile(
tuner: AutoTuner,
batch_size: Int,
duration_ms: Float,
) -> AutoTuner
Records execution result and optimizes
pub fn should_quantize(
ctx: QuantContext,
is_inference: Bool,
) -> Bool
Decides quantization mode based on the operation