viva_tensor/blackwell

Blackwell-Inspired Compression Engine

INSPIRADO NA ARQUITETURA NVIDIA BLACKWELL ULTRA:

DIFERENCIAL GLEAM:

FÍSICA DO SILÍCIO (limites reais):

OBJETIVO: Fazer Pure Gleam competir com hardware dedicado!

Types

Tensor comprimido estilo Blackwell

pub type BlackwellTensor {
  BlackwellTensor(
    blocks: List(MicroBlock),
    global_scale: Float,
    shape: List(Int),
    num_elements: Int,
    memory_bytes: Int,
    compression_ratio: Float,
  )
}

Constructors

  • BlackwellTensor(
      blocks: List(MicroBlock),
      global_scale: Float,
      shape: List(Int),
      num_elements: Int,
      memory_bytes: Int,
      compression_ratio: Float,
    )

    Arguments

    blocks

    Micro-blocks de 16 valores cada

    global_scale

    Escala global do tensor (FP32)

    shape

    Shape original

    num_elements

    Número de elementos

    memory_bytes

    Memória em bytes (real)

    compression_ratio

    Taxa de compressão alcançada

Configuração de compressão

pub type CompressionConfig {
  CompressionConfig(
    block_size: Int,
    bits_per_value: Int,
    symmetric: Bool,
    max_error_pct: Float,
  )
}

Constructors

  • CompressionConfig(
      block_size: Int,
      bits_per_value: Int,
      symmetric: Bool,
      max_error_pct: Float,
    )

    Arguments

    block_size

    Tamanho do micro-block (default: 16 para NVFP4)

    bits_per_value

    Bits por valor (4 para NVFP4, 8 para INT8)

    symmetric

    Usar symmetric quantization

    max_error_pct

    Tolerância de erro máxima

Estatísticas de compressão

pub type CompressionStats {
  CompressionStats(
    original_bytes: Int,
    compressed_bytes: Int,
    compression_ratio: Float,
    mean_error: Float,
    max_error: Float,
    blocks_processed: Int,
  )
}

Constructors

  • CompressionStats(
      original_bytes: Int,
      compressed_bytes: Int,
      compression_ratio: Float,
      mean_error: Float,
      max_error: Float,
      blocks_processed: Int,
    )

Estatísticas de distribuição

pub type DistributionStats {
  DistributionStats(
    mean: Float,
    std: Float,
    min_val: Float,
    max_val: Float,
    dynamic_range: Float,
    sparsity: Float,
  )
}

Constructors

  • DistributionStats(
      mean: Float,
      std: Float,
      min_val: Float,
      max_val: Float,
      dynamic_range: Float,
      sparsity: Float,
    )

Nível na hierarquia de memória

pub type MemoryLevel {
  Registers
  L1Cache
  L2Cache
  Hbm
  SystemRam
  Storage
}

Constructors

  • Registers

    Registradores (mais rápido, ~10KB)

  • L1Cache

    L1 Cache (~128KB, 100+ GB/s)

  • L2Cache

    L2 Cache (~6MB, 50 GB/s)

  • Hbm

    HBM/DRAM (~24GB, 8 TB/s para Blackwell)

  • SystemRam

    System RAM (~32GB, 50 GB/s)

  • Storage

    NVMe SSD (~1TB, 7 GB/s)

Micro-block de 16 valores (inspirado Blackwell NVFP4)

pub type MicroBlock {
  MicroBlock(values: List(Int), scale: Float, zero_point: Float)
}

Constructors

  • MicroBlock(values: List(Int), scale: Float, zero_point: Float)

    Arguments

    values

    Dados quantizados (4-bit cada, empacotados)

    scale

    Escala do micro-block (FP8 E4M3 simulado)

    zero_point

    Zero-point para valores negativos

Chunk de dados em streaming

pub type StreamChunk {
  StreamChunk(id: Int, block: MicroBlock, compressed: Bool)
}

Constructors

  • StreamChunk(id: Int, block: MicroBlock, compressed: Bool)

Estado do compressor em streaming

pub type StreamState {
  StreamState(
    config: CompressionConfig,
    processed_chunks: Int,
    total_bytes_in: Int,
    total_bytes_out: Int,
  )
}

Constructors

  • StreamState(
      config: CompressionConfig,
      processed_chunks: Int,
      total_bytes_in: Int,
      total_bytes_out: Int,
    )

Values

pub fn analyze_and_compress(t: tensor.Tensor) -> BlackwellTensor

Analisa tensor e escolhe melhor configuração

pub fn benchmark_blackwell_compression() -> Nil
pub fn compress(
  t: tensor.Tensor,
  config: CompressionConfig,
) -> BlackwellTensor

Comprime tensor usando NVFP4 style

pub fn compression_stats(
  original: tensor.Tensor,
  compressed: BlackwellTensor,
) -> CompressionStats

Calcula estatísticas de compressão

pub fn decompress(bt: BlackwellTensor) -> tensor.Tensor

Descomprime tensor Blackwell de volta para FP32

pub fn int8_config() -> CompressionConfig

Configuração INT8 (mais precisa)

pub fn main() -> Nil
pub fn memory_bandwidth_gbps(level: MemoryLevel) -> Float

Simula bandwidth em GB/s

pub fn memory_latency_ns(level: MemoryLevel) -> Int

Simula latência de acesso

pub fn new_stream(config: CompressionConfig) -> StreamState

Cria novo estado de streaming

pub fn nvfp4_config() -> CompressionConfig

Configuração padrão NVFP4 (Blackwell style)

pub fn process_chunk(
  state: StreamState,
  data: List(Float),
) -> #(StreamState, MicroBlock)

Processa um chunk de dados em streaming

pub fn transfer_time_us(
  size_mb: Float,
  level: MemoryLevel,
) -> Float

Calcula tempo de transferência

Search Document