viva_tensor/metrics

Advanced Metrics for Quantization

Based on Qwen3-235B analysis of state-of-the-art algorithms:

MSE (Mean Squared Error)
MAE (Mean Absolute Error)
Cosine Similarity
SNR (Signal-to-Noise Ratio)
SQNR (Signal-to-Quantization-Noise Ratio)
Perplexity Delta (for LLMs)

QWEN3 INSIGHTS:

AWQ: Protecting 1% of salient weights drastically reduces error
NF4: Non-uniform quantiles (normal distribution) > uniform
GPTQ: Weighting error by Hessian improves precision
Flash Attention: Online softmax with shifting avoids overflow

Types

LayerMetrics

</>

Per-layer metrics (for LLMs)

pub type LayerMetrics {
  LayerMetrics(
    layer_name: String,
    metrics: QuantMetrics,
    sensitivity: Float,
  )
}

Constructors

LayerMetrics(
  layer_name: String,
  metrics: QuantMetrics,
  sensitivity: Float,
)

QuantMetrics

</>

Complete quantization metrics

pub type QuantMetrics {
  QuantMetrics(
    mse: Float,
    mae: Float,
    rmse: Float,
    cosine_sim: Float,
    snr_db: Float,
    sqnr_db: Float,
    max_error: Float,
    p99_error: Float,
    outlier_pct: Float,
  )
}

Constructors

```
QuantMetrics(
  mse: Float,
  mae: Float,
  rmse: Float,
  cosine_sim: Float,
  snr_db: Float,
  sqnr_db: Float,
  max_error: Float,
  p99_error: Float,
  outlier_pct: Float,
)
```
Arguments

mse

Mean Squared Error

mae

Mean Absolute Error

rmse

Root Mean Squared Error

cosine_sim

Cosine Similarity (1.0 = perfect)

snr_db

Signal-to-Noise Ratio (dB)

sqnr_db

Signal-to-Quantization-Noise Ratio (dB)

max_error

Max absolute error

p99_error

99th percentile of error

outlier_pct

Percentage of values with error > 1%

Values

benchmark_metrics

</>

pub fn benchmark_metrics() -> Nil

compute_all

</>

pub fn compute_all(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> QuantMetrics

Computes all metrics at once

compute_saliency

</>

pub fn compute_saliency(
  weights: tensor.Tensor,
  activations: List(List(Float)),
) -> List(Float)

Computes weight saliency based on activations Salience(w) = Var(activation) * w²

cosine_similarity

</>

pub fn cosine_similarity(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

Cosine Similarity - measures direction, not magnitude 1.0 = identical vectors, 0.0 = orthogonal, -1.0 = opposite

error_percentile

</>

pub fn error_percentile(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
  percentile: Float,
) -> Float

Error percentile (approximated via sorting)

find_salient_weights

</>

pub fn find_salient_weights(
  saliency: List(Float),
  top_pct: Float,
) -> List(Int)

Identifies top K% of salient weights

mae

</>

pub fn mae(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

MAE - Mean Absolute Error

main

</>

pub fn main() -> Nil

max_error

</>

pub fn max_error(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

Max Error - worst case

mse

</>

pub fn mse(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

MSE - Mean Squared Error

outlier_percentage

</>

pub fn outlier_percentage(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
  threshold: Float,
) -> Float

Percentage of outliers (error > threshold)

rmse

</>

pub fn rmse(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

RMSE - Root Mean Squared Error

snr_db

</>

pub fn snr_db(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

SNR - Signal-to-Noise Ratio in dB SNR = 10 * log10(signal_power / noise_power)

theoretical_sqnr

</>

pub fn theoretical_sqnr(bits: Int) -> Float

SQNR - Signal-to-Quantization-Noise Ratio Theoretical for N bits: SQNR = 6.02 * N + 1.76 dB

Constructors

Constructors

Arguments