viva_tensor/metrics
Advanced Metrics for Quantization
Based on Qwen3-235B analysis of state-of-the-art algorithms:
- MSE (Mean Squared Error)
- MAE (Mean Absolute Error)
- Cosine Similarity
- SNR (Signal-to-Noise Ratio)
- SQNR (Signal-to-Quantization-Noise Ratio)
- Perplexity Delta (for LLMs)
QWEN3 INSIGHTS:
- AWQ: Protecting 1% of salient weights drastically reduces error
- NF4: Non-uniform quantiles (normal distribution) > uniform
- GPTQ: Weighting error by Hessian improves precision
- Flash Attention: Online softmax with shifting avoids overflow
Types
Per-layer metrics (for LLMs)
pub type LayerMetrics {
LayerMetrics(
layer_name: String,
metrics: QuantMetrics,
sensitivity: Float,
)
}
Constructors
-
LayerMetrics( layer_name: String, metrics: QuantMetrics, sensitivity: Float, )
Complete quantization metrics
pub type QuantMetrics {
QuantMetrics(
mse: Float,
mae: Float,
rmse: Float,
cosine_sim: Float,
snr_db: Float,
sqnr_db: Float,
max_error: Float,
p99_error: Float,
outlier_pct: Float,
)
}
Constructors
-
QuantMetrics( mse: Float, mae: Float, rmse: Float, cosine_sim: Float, snr_db: Float, sqnr_db: Float, max_error: Float, p99_error: Float, outlier_pct: Float, )Arguments
- mse
-
Mean Squared Error
- mae
-
Mean Absolute Error
- rmse
-
Root Mean Squared Error
- cosine_sim
-
Cosine Similarity (1.0 = perfect)
- snr_db
-
Signal-to-Noise Ratio (dB)
- sqnr_db
-
Signal-to-Quantization-Noise Ratio (dB)
- max_error
-
Max absolute error
- p99_error
-
99th percentile of error
- outlier_pct
-
Percentage of values with error > 1%
Values
pub fn benchmark_metrics() -> Nil
pub fn compute_all(
original: tensor.Tensor,
quantized: tensor.Tensor,
) -> QuantMetrics
Computes all metrics at once
pub fn compute_saliency(
weights: tensor.Tensor,
activations: List(List(Float)),
) -> List(Float)
Computes weight saliency based on activations Salience(w) = Var(activation) * w²
pub fn cosine_similarity(
original: tensor.Tensor,
quantized: tensor.Tensor,
) -> Float
Cosine Similarity - measures direction, not magnitude 1.0 = identical vectors, 0.0 = orthogonal, -1.0 = opposite
pub fn error_percentile(
original: tensor.Tensor,
quantized: tensor.Tensor,
percentile: Float,
) -> Float
Error percentile (approximated via sorting)
pub fn find_salient_weights(
saliency: List(Float),
top_pct: Float,
) -> List(Int)
Identifies top K% of salient weights
pub fn mae(
original: tensor.Tensor,
quantized: tensor.Tensor,
) -> Float
MAE - Mean Absolute Error
pub fn max_error(
original: tensor.Tensor,
quantized: tensor.Tensor,
) -> Float
Max Error - worst case
pub fn mse(
original: tensor.Tensor,
quantized: tensor.Tensor,
) -> Float
MSE - Mean Squared Error
pub fn outlier_percentage(
original: tensor.Tensor,
quantized: tensor.Tensor,
threshold: Float,
) -> Float
Percentage of outliers (error > threshold)
pub fn rmse(
original: tensor.Tensor,
quantized: tensor.Tensor,
) -> Float
RMSE - Root Mean Squared Error
pub fn snr_db(
original: tensor.Tensor,
quantized: tensor.Tensor,
) -> Float
SNR - Signal-to-Noise Ratio in dB SNR = 10 * log10(signal_power / noise_power)
pub fn theoretical_sqnr(bits: Int) -> Float
SQNR - Signal-to-Quantization-Noise Ratio Theoretical for N bits: SQNR = 6.02 * N + 1.76 dB