viva_tensor/metrics

Advanced Metrics for Quantization

Based on Qwen3-235B analysis of state-of-the-art algorithms:

QWEN3 INSIGHTS:

  1. AWQ: Protecting 1% of salient weights drastically reduces error
  2. NF4: Non-uniform quantiles (normal distribution) > uniform
  3. GPTQ: Weighting error by Hessian improves precision
  4. Flash Attention: Online softmax with shifting avoids overflow

Types

Per-layer metrics (for LLMs)

pub type LayerMetrics {
  LayerMetrics(
    layer_name: String,
    metrics: QuantMetrics,
    sensitivity: Float,
  )
}

Constructors

  • LayerMetrics(
      layer_name: String,
      metrics: QuantMetrics,
      sensitivity: Float,
    )

Complete quantization metrics

pub type QuantMetrics {
  QuantMetrics(
    mse: Float,
    mae: Float,
    rmse: Float,
    cosine_sim: Float,
    snr_db: Float,
    sqnr_db: Float,
    max_error: Float,
    p99_error: Float,
    outlier_pct: Float,
  )
}

Constructors

  • QuantMetrics(
      mse: Float,
      mae: Float,
      rmse: Float,
      cosine_sim: Float,
      snr_db: Float,
      sqnr_db: Float,
      max_error: Float,
      p99_error: Float,
      outlier_pct: Float,
    )

    Arguments

    mse

    Mean Squared Error

    mae

    Mean Absolute Error

    rmse

    Root Mean Squared Error

    cosine_sim

    Cosine Similarity (1.0 = perfect)

    snr_db

    Signal-to-Noise Ratio (dB)

    sqnr_db

    Signal-to-Quantization-Noise Ratio (dB)

    max_error

    Max absolute error

    p99_error

    99th percentile of error

    outlier_pct

    Percentage of values with error > 1%

Values

pub fn benchmark_metrics() -> Nil
pub fn compute_all(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> QuantMetrics

Computes all metrics at once

pub fn compute_saliency(
  weights: tensor.Tensor,
  activations: List(List(Float)),
) -> List(Float)

Computes weight saliency based on activations Salience(w) = Var(activation) * w²

pub fn cosine_similarity(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

Cosine Similarity - measures direction, not magnitude 1.0 = identical vectors, 0.0 = orthogonal, -1.0 = opposite

pub fn error_percentile(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
  percentile: Float,
) -> Float

Error percentile (approximated via sorting)

pub fn find_salient_weights(
  saliency: List(Float),
  top_pct: Float,
) -> List(Int)

Identifies top K% of salient weights

pub fn mae(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

MAE - Mean Absolute Error

pub fn main() -> Nil
pub fn max_error(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

Max Error - worst case

pub fn mse(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

MSE - Mean Squared Error

pub fn outlier_percentage(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
  threshold: Float,
) -> Float

Percentage of outliers (error > threshold)

pub fn rmse(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

RMSE - Root Mean Squared Error

pub fn snr_db(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

SNR - Signal-to-Noise Ratio in dB SNR = 10 * log10(signal_power / noise_power)

pub fn theoretical_sqnr(bits: Int) -> Float

SQNR - Signal-to-Quantization-Noise Ratio Theoretical for N bits: SQNR = 6.02 * N + 1.76 dB

Search Document