viva_tensor/metrics

Métricas Avançadas para Quantização

Baseado na análise do Qwen3-235B sobre algoritmos state-of-the-art:

INSIGHTS DO QWEN3:

  1. AWQ: Proteger 1% dos pesos salientes reduz erro drasticamente
  2. NF4: Quantis não-uniformes (distribuição normal) > uniformes
  3. GPTQ: Ponderar erro pelo Hessian melhora precisão
  4. Flash Attention: Online softmax com shifting evita overflow

Types

Métricas por camada (para LLMs)

pub type LayerMetrics {
  LayerMetrics(
    layer_name: String,
    metrics: QuantMetrics,
    sensitivity: Float,
  )
}

Constructors

  • LayerMetrics(
      layer_name: String,
      metrics: QuantMetrics,
      sensitivity: Float,
    )

Métricas completas de quantização

pub type QuantMetrics {
  QuantMetrics(
    mse: Float,
    mae: Float,
    rmse: Float,
    cosine_sim: Float,
    snr_db: Float,
    sqnr_db: Float,
    max_error: Float,
    p99_error: Float,
    outlier_pct: Float,
  )
}

Constructors

  • QuantMetrics(
      mse: Float,
      mae: Float,
      rmse: Float,
      cosine_sim: Float,
      snr_db: Float,
      sqnr_db: Float,
      max_error: Float,
      p99_error: Float,
      outlier_pct: Float,
    )

    Arguments

    mse

    Mean Squared Error

    mae

    Mean Absolute Error

    rmse

    Root Mean Squared Error

    cosine_sim

    Cosine Similarity (1.0 = perfeito)

    snr_db

    Signal-to-Noise Ratio (dB)

    sqnr_db

    Signal-to-Quantization-Noise Ratio (dB)

    max_error

    Max absolute error

    p99_error

    Percentil 99 do erro

    outlier_pct

    Porcentagem de valores com erro > 1%

Values

pub fn benchmark_metrics() -> Nil
pub fn compute_all(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> QuantMetrics

Computa todas as métricas de uma vez

pub fn compute_saliency(
  weights: tensor.Tensor,
  activations: List(List(Float)),
) -> List(Float)

Computa saliência de pesos baseado em ativações Salience(w) = Var(activation) * w²

pub fn cosine_similarity(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

Cosine Similarity - mede direção, não magnitude 1.0 = vetores idênticos, 0.0 = ortogonais, -1.0 = opostos

pub fn error_percentile(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
  percentile: Float,
) -> Float

Percentil do erro (aproximado via sorting)

pub fn find_salient_weights(
  saliency: List(Float),
  top_pct: Float,
) -> List(Int)

Identifica top K% de pesos salientes

pub fn mae(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

MAE - Mean Absolute Error

pub fn main() -> Nil
pub fn max_error(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

Max Error - pior caso

pub fn mse(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

MSE - Mean Squared Error

pub fn outlier_percentage(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
  threshold: Float,
) -> Float

Porcentagem de outliers (erro > threshold)

pub fn rmse(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

RMSE - Root Mean Squared Error

pub fn snr_db(
  original: tensor.Tensor,
  quantized: tensor.Tensor,
) -> Float

SNR - Signal-to-Noise Ratio em dB SNR = 10 * log10(signal_power / noise_power)

pub fn theoretical_sqnr(bits: Int) -> Float

SQNR - Signal-to-Quantization-Noise Ratio Teórico para N bits: SQNR = 6.02 * N + 1.76 dB

Search Document