viva_tensor/auto_tune
Auto-Tuning System - Self-Optimizing Tensor Library
Inspirado pelas pesquisas do HuggingChat + ggml + Candle
Features:
- GPU Auto-Detection (detecta VRAM, load, capabilities)
- Adaptive Quantization (int8 para inference, fp32 para training)
- Zero-Copy entre Gleam e Rust NIFs
- Auto Batch Size Optimizer (aprende o melhor batch para o hardware)
Target: RTX 4090 24GB VRAM + 32GB RAM
Types
Estado do auto-tuner
pub type AutoTuner {
AutoTuner(
hardware: HardwareProfile,
quant_mode: QuantMode,
history: List(BatchResult),
current_batch_size: Int,
)
}
Constructors
-
AutoTuner( hardware: HardwareProfile, quant_mode: QuantMode, history: List(BatchResult), current_batch_size: Int, )
Resultado de uma execução de batch
pub type BatchResult {
BatchResult(
batch_size: Int,
duration_ms: Float,
throughput: Float,
)
}
Constructors
-
BatchResult( batch_size: Int, duration_ms: Float, throughput: Float, )
Device type detectado
pub type Device {
Cuda(gpu_id: Int, vram_gb: Float)
Metal(device_id: Int)
Cpu(cores: Int)
}
Constructors
-
Cuda(gpu_id: Int, vram_gb: Float) -
Metal(device_id: Int) -
Cpu(cores: Int)
Hardware profile completo
pub type HardwareProfile {
HardwareProfile(
device: Device,
total_vram_gb: Float,
available_vram_gb: Float,
total_ram_gb: Float,
gpu_load_pct: Float,
optimal_batch_size: Int,
)
}
Constructors
-
HardwareProfile( device: Device, total_vram_gb: Float, available_vram_gb: Float, total_ram_gb: Float, gpu_load_pct: Float, optimal_batch_size: Int, )
pub type MemoryPressure {
Low
Medium
High
Critical
}
Constructors
-
Low -
Medium -
High -
Critical
Values
pub fn check_memory_pressure(
hw: HardwareProfile,
) -> MemoryPressure
Verifica pressão de memória
pub fn compute_scale(tensor: tensor.Tensor) -> Float
Calcula escala para quantização absmax (int8)
pub fn get_memory_strategy(
pressure: MemoryPressure,
) -> MemoryStrategy
Estratégia baseada em pressão de memória
pub fn new_quant_context(mode: QuantMode) -> QuantContext
Cria contexto de quantização
pub fn profile(
tuner: AutoTuner,
batch_size: Int,
duration_ms: Float,
) -> AutoTuner
Registra resultado de execução e otimiza
pub fn should_quantize(
ctx: QuantContext,
is_inference: Bool,
) -> Bool
Decide modo de quantização baseado na operação