viva_tensor/blackwell
Blackwell-Inspired Compression Engine
INSPIRADO NA ARQUITETURA NVIDIA BLACKWELL ULTRA:
- NVFP4: Two-level scaling (micro-block FP8 E4M3 + tensor-level FP32)
- Hardware decompression: 800 GB/s
- Micro-block size: 16 valores
- Memory hierarchy: HBM3e → L2 → L1 → Registers
DIFERENCIAL GLEAM:
- GenServer actors para gerenciar chunks de memória
- Supervisores OTP para fault tolerance
- Zero-copy views via Erlang binaries
- BEAM schedulers para paralelismo massivo
FÍSICA DO SILÍCIO (limites reais):
- 8-bit multiplier: 64 units de área
- 32-bit multiplier: 576 units (9x maior!)
- HBM4 (2026): 2 TB/s por chip
- Blackwell: 8 TB/s HBM3e bandwidth
OBJETIVO: Fazer Pure Gleam competir com hardware dedicado!
Types
Tensor comprimido estilo Blackwell
pub type BlackwellTensor {
BlackwellTensor(
blocks: List(MicroBlock),
global_scale: Float,
shape: List(Int),
num_elements: Int,
memory_bytes: Int,
compression_ratio: Float,
)
}
Constructors
-
BlackwellTensor( blocks: List(MicroBlock), global_scale: Float, shape: List(Int), num_elements: Int, memory_bytes: Int, compression_ratio: Float, )Arguments
- blocks
-
Micro-blocks de 16 valores cada
- global_scale
-
Escala global do tensor (FP32)
- shape
-
Shape original
- num_elements
-
Número de elementos
- memory_bytes
-
Memória em bytes (real)
- compression_ratio
-
Taxa de compressão alcançada
Configuração de compressão
pub type CompressionConfig {
CompressionConfig(
block_size: Int,
bits_per_value: Int,
symmetric: Bool,
max_error_pct: Float,
)
}
Constructors
-
CompressionConfig( block_size: Int, bits_per_value: Int, symmetric: Bool, max_error_pct: Float, )Arguments
- block_size
-
Tamanho do micro-block (default: 16 para NVFP4)
- bits_per_value
-
Bits por valor (4 para NVFP4, 8 para INT8)
- symmetric
-
Usar symmetric quantization
- max_error_pct
-
Tolerância de erro máxima
Estatísticas de compressão
pub type CompressionStats {
CompressionStats(
original_bytes: Int,
compressed_bytes: Int,
compression_ratio: Float,
mean_error: Float,
max_error: Float,
blocks_processed: Int,
)
}
Constructors
-
CompressionStats( original_bytes: Int, compressed_bytes: Int, compression_ratio: Float, mean_error: Float, max_error: Float, blocks_processed: Int, )
Estatísticas de distribuição
pub type DistributionStats {
DistributionStats(
mean: Float,
std: Float,
min_val: Float,
max_val: Float,
dynamic_range: Float,
sparsity: Float,
)
}
Constructors
-
DistributionStats( mean: Float, std: Float, min_val: Float, max_val: Float, dynamic_range: Float, sparsity: Float, )
Nível na hierarquia de memória
pub type MemoryLevel {
Registers
L1Cache
L2Cache
Hbm
SystemRam
Storage
}
Constructors
-
RegistersRegistradores (mais rápido, ~10KB)
-
L1CacheL1 Cache (~128KB, 100+ GB/s)
-
L2CacheL2 Cache (~6MB, 50 GB/s)
-
HbmHBM/DRAM (~24GB, 8 TB/s para Blackwell)
-
SystemRamSystem RAM (~32GB, 50 GB/s)
-
StorageNVMe SSD (~1TB, 7 GB/s)
Micro-block de 16 valores (inspirado Blackwell NVFP4)
pub type MicroBlock {
MicroBlock(values: List(Int), scale: Float, zero_point: Float)
}
Constructors
-
MicroBlock(values: List(Int), scale: Float, zero_point: Float)Arguments
- values
-
Dados quantizados (4-bit cada, empacotados)
- scale
-
Escala do micro-block (FP8 E4M3 simulado)
- zero_point
-
Zero-point para valores negativos
Chunk de dados em streaming
pub type StreamChunk {
StreamChunk(id: Int, block: MicroBlock, compressed: Bool)
}
Constructors
-
StreamChunk(id: Int, block: MicroBlock, compressed: Bool)
Estado do compressor em streaming
pub type StreamState {
StreamState(
config: CompressionConfig,
processed_chunks: Int,
total_bytes_in: Int,
total_bytes_out: Int,
)
}
Constructors
-
StreamState( config: CompressionConfig, processed_chunks: Int, total_bytes_in: Int, total_bytes_out: Int, )
Values
pub fn analyze_and_compress(t: tensor.Tensor) -> BlackwellTensor
Analisa tensor e escolhe melhor configuração
pub fn benchmark_blackwell_compression() -> Nil
pub fn compress(
t: tensor.Tensor,
config: CompressionConfig,
) -> BlackwellTensor
Comprime tensor usando NVFP4 style
pub fn compression_stats(
original: tensor.Tensor,
compressed: BlackwellTensor,
) -> CompressionStats
Calcula estatísticas de compressão
pub fn decompress(bt: BlackwellTensor) -> tensor.Tensor
Descomprime tensor Blackwell de volta para FP32
pub fn memory_bandwidth_gbps(level: MemoryLevel) -> Float
Simula bandwidth em GB/s
pub fn new_stream(config: CompressionConfig) -> StreamState
Cria novo estado de streaming
pub fn nvfp4_config() -> CompressionConfig
Configuração padrão NVFP4 (Blackwell style)
pub fn process_chunk(
state: StreamState,
data: List(Float),
) -> #(StreamState, MicroBlock)
Processa um chunk de dados em streaming
pub fn transfer_time_us(
size_mb: Float,
level: MemoryLevel,
) -> Float
Calcula tempo de transferência