viva_tensor/optim/blackwell

Blackwell-Inspired Compression Engine

INSPIRED BY THE NVIDIA BLACKWELL ULTRA ARCHITECTURE:

GLEAM DIFFERENTIATOR:

SILICON PHYSICS (real limits):

GOAL: Make Pure Gleam compete with dedicated hardware!

Types

Blackwell-style compressed tensor

pub type BlackwellTensor {
  BlackwellTensor(
    blocks: List(MicroBlock),
    global_scale: Float,
    shape: List(Int),
    num_elements: Int,
    memory_bytes: Int,
    compression_ratio: Float,
  )
}

Constructors

  • BlackwellTensor(
      blocks: List(MicroBlock),
      global_scale: Float,
      shape: List(Int),
      num_elements: Int,
      memory_bytes: Int,
      compression_ratio: Float,
    )

    Arguments

    blocks

    Micro-blocks of 16 values each

    global_scale

    Global tensor scale (FP32)

    shape

    Original shape

    num_elements

    Number of elements

    memory_bytes

    Memory in bytes (actual)

    compression_ratio

    Achieved compression ratio

Compression configuration

pub type CompressionConfig {
  CompressionConfig(
    block_size: Int,
    bits_per_value: Int,
    symmetric: Bool,
    max_error_pct: Float,
  )
}

Constructors

  • CompressionConfig(
      block_size: Int,
      bits_per_value: Int,
      symmetric: Bool,
      max_error_pct: Float,
    )

    Arguments

    block_size

    Micro-block size (default: 16 for NVFP4)

    bits_per_value

    Bits per value (4 for NVFP4, 8 for INT8)

    symmetric

    Use symmetric quantization

    max_error_pct

    Maximum error tolerance

Compression statistics

pub type CompressionStats {
  CompressionStats(
    original_bytes: Int,
    compressed_bytes: Int,
    compression_ratio: Float,
    mean_error: Float,
    max_error: Float,
    blocks_processed: Int,
  )
}

Constructors

  • CompressionStats(
      original_bytes: Int,
      compressed_bytes: Int,
      compression_ratio: Float,
      mean_error: Float,
      max_error: Float,
      blocks_processed: Int,
    )

Distribution statistics

pub type DistributionStats {
  DistributionStats(
    mean: Float,
    std: Float,
    min_val: Float,
    max_val: Float,
    dynamic_range: Float,
    sparsity: Float,
  )
}

Constructors

  • DistributionStats(
      mean: Float,
      std: Float,
      min_val: Float,
      max_val: Float,
      dynamic_range: Float,
      sparsity: Float,
    )

Level in the memory hierarchy

pub type MemoryLevel {
  Registers
  L1Cache
  L2Cache
  Hbm
  SystemRam
  Storage
}

Constructors

  • Registers

    Registers (fastest, ~10KB)

  • L1Cache

    L1 Cache (~128KB, 100+ GB/s)

  • L2Cache

    L2 Cache (~6MB, 50 GB/s)

  • Hbm

    HBM/DRAM (~24GB, 8 TB/s for Blackwell)

  • SystemRam

    System RAM (~32GB, 50 GB/s)

  • Storage

    NVMe SSD (~1TB, 7 GB/s)

Micro-block of 16 values (Blackwell NVFP4 inspired)

pub type MicroBlock {
  MicroBlock(values: List(Int), scale: Float, zero_point: Float)
}

Constructors

  • MicroBlock(values: List(Int), scale: Float, zero_point: Float)

    Arguments

    values

    Quantized data (4-bit each, packed)

    scale

    Micro-block scale (simulated FP8 E4M3)

    zero_point

    Zero-point for negative values

Streaming data chunk

pub type StreamChunk {
  StreamChunk(id: Int, block: MicroBlock, compressed: Bool)
}

Constructors

  • StreamChunk(id: Int, block: MicroBlock, compressed: Bool)

Streaming compressor state

pub type StreamState {
  StreamState(
    config: CompressionConfig,
    processed_chunks: Int,
    total_bytes_in: Int,
    total_bytes_out: Int,
  )
}

Constructors

  • StreamState(
      config: CompressionConfig,
      processed_chunks: Int,
      total_bytes_in: Int,
      total_bytes_out: Int,
    )

Values

pub fn analyze_and_compress(t: tensor.Tensor) -> BlackwellTensor

Analyzes tensor and chooses best configuration

pub fn benchmark_blackwell_compression() -> Nil
pub fn compress(
  t: tensor.Tensor,
  config: CompressionConfig,
) -> BlackwellTensor

Compresses tensor using NVFP4 style

pub fn compression_stats(
  original: tensor.Tensor,
  compressed: BlackwellTensor,
) -> CompressionStats

Computes compression statistics

pub fn decompress(bt: BlackwellTensor) -> tensor.Tensor

Decompresses Blackwell tensor back to FP32

pub fn int8_config() -> CompressionConfig

INT8 configuration (higher precision)

pub fn main() -> Nil
pub fn memory_bandwidth_gbps(level: MemoryLevel) -> Float

Simulates bandwidth in GB/s

pub fn memory_latency_ns(level: MemoryLevel) -> Int

Simulates access latency

pub fn new_stream(config: CompressionConfig) -> StreamState

Creates new streaming state

pub fn nvfp4_config() -> CompressionConfig

Default NVFP4 configuration (Blackwell style)

pub fn process_chunk(
  state: StreamState,
  data: List(Float),
) -> #(StreamState, MicroBlock)

Processes a data chunk in streaming mode

pub fn transfer_time_us(
  size_mb: Float,
  level: MemoryLevel,
) -> Float

Computes transfer time

Search Document