viva_tensor/core/ops

Tensor operations - where the actual computation happens.

Design philosophy: correctness first, then optimize the hot paths. The naive O(n³) matmul is fine for small matrices. For large ones, we delegate to BLAS via NIF (Apple Accelerate on macOS, OpenBLAS elsewhere).

Broadcasting follows NumPy semantics exactly because (1) it’s well-documented, (2) everyone expects it, and (3) I tried inventing my own rules once. Never again.

Historical note: Hadamard product (element-wise mul) is named after Jacques Hadamard, who used it in his 1893 theorem on determinants. Most people just call it “element-wise multiplication” now.

Values

abs

</>

pub fn abs(t: tensor.Tensor) -> tensor.Tensor

Element-wise absolute value

add

</>

pub fn add(
  a: tensor.Tensor,
  b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)

Element-wise add. Shapes must match (use add_broadcast for different shapes).

add_auto

</>

pub fn add_auto(
  a: tensor.Tensor,
  b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)

Auto-selecting element-wise add. Delegates to Zig SIMD or Accelerate.

add_broadcast

</>

pub fn add_broadcast(
  a: tensor.Tensor,
  b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)

Element-wise addition with broadcasting

add_scalar

</>

pub fn add_scalar(t: tensor.Tensor, s: Float) -> tensor.Tensor

Add constant to all elements. Useful for bias terms.

all_backends_info

</>

pub fn all_backends_info() -> String

Get detailed status of all backends

argmax

</>

pub fn argmax(t: tensor.Tensor) -> Int

Index of max element. Returns 0 for empty tensors (debatable choice).

argmin

</>

pub fn argmin(t: tensor.Tensor) -> Int

Index of minimum element

backend_info

</>

pub fn backend_info() -> String

Get current backend info string Shows best available backend for tensor operations

broadcast_shape

</>

pub fn broadcast_shape(
  a: List(Int),
  b: List(Int),
) -> Result(List(Int), error.TensorError)

Compute broadcast shape

broadcast_to

</>

pub fn broadcast_to(
  t: tensor.Tensor,
  target_shape: List(Int),
) -> Result(tensor.Tensor, error.TensorError)

Broadcast tensor to target shape

can_broadcast

</>

pub fn can_broadcast(a: List(Int), b: List(Int)) -> Bool

Check if shapes can broadcast together.

clamp

</>

pub fn clamp(
  t: tensor.Tensor,
  min_val: Float,
  max_val: Float,
) -> tensor.Tensor

Clamp to [min, max]. Useful for gradient clipping.

div

</>

pub fn div(
  a: tensor.Tensor,
  b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)

a / b element-wise. Watch out for division by zero (you get Infinity).

dot

</>

pub fn dot(
  a: tensor.Tensor,
  b: tensor.Tensor,
) -> Result(Float, error.TensorError)

Dot product: Σ(a_i * b_i). The foundation of all neural networks, really. Uses array-based access for large vectors, list-based for small ones.

dot_auto

</>

pub fn dot_auto(
  a: tensor.Tensor,
  b: tensor.Tensor,
) -> Result(Float, error.TensorError)

Smart dot - delegates to fastest available backend at runtime. Instrumented: records latency and backend selection metrics.

dot_fast

</>

pub fn dot_fast(
  a: tensor.Tensor,
  b: tensor.Tensor,
) -> Result(Float, error.TensorError)

Faster dot using Erlang arrays. ~2-3x speedup for large vectors.

exp

</>

pub fn exp(t: tensor.Tensor) -> tensor.Tensor

Element-wise exponential

log

</>

pub fn log(t: tensor.Tensor) -> tensor.Tensor

Element-wise natural log

map

</>

pub fn map(
  t: tensor.Tensor,
  f: fn(Float) -> Float,
) -> tensor.Tensor

Map a function over all elements. The workhorse of tensor ops.

map_indexed

</>

pub fn map_indexed(
  t: tensor.Tensor,
  f: fn(Float, Int) -> Float,
) -> tensor.Tensor

Like map but you also get the index. Useful for positional encoding.

matmul

</>

pub fn matmul(
  a: tensor.Tensor,
  b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)

Matrix multiplication. The operation that launched a thousand GPUs. C[i,j] = Σ_k A[i,k] * B[k,j]

Uses array-based O(1) access for O(mnp) total. For serious work, use matmul_auto() which delegates to BLAS.

matmul_auto

</>

pub fn matmul_auto(
  a: tensor.Tensor,
  b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)

Smart matmul. Can be 1400x faster than pure Gleam for 500x500 matrices. Instrumented: logs backend selection and records latency metrics.

matmul_fast

</>

pub fn matmul_fast(
  a: tensor.Tensor,
  b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)

Faster matmul using arrays. Use this for matrices > 50x50.

matmul_vec

</>

pub fn matmul_vec(
  mat: tensor.Tensor,
  vec: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)

Matrix-vector multiplication: [m, n] @ [n] -> [m] Uses array-based O(1) access for O(mn) total instead of O(mn^2).

max

</>

pub fn max(t: tensor.Tensor) -> Float

Maximum value

mean

</>

pub fn mean(t: tensor.Tensor) -> Float

Mean of all elements

min

</>

pub fn min(t: tensor.Tensor) -> Float

Minimum value

mul

</>

pub fn mul(
  a: tensor.Tensor,
  b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)

Hadamard product (element-wise multiply). Not to be confused with matmul!

mul_auto

</>

pub fn mul_auto(
  a: tensor.Tensor,
  b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)

Auto-selecting element-wise multiply. Delegates to Zig SIMD when available.

mul_broadcast

</>

pub fn mul_broadcast(
  a: tensor.Tensor,
  b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)

Element-wise multiplication with broadcasting

negate

</>

pub fn negate(t: tensor.Tensor) -> tensor.Tensor

Negate all elements (multiply by -1).

norm

</>

pub fn norm(t: tensor.Tensor) -> Float

L2 norm

normalize

</>

pub fn normalize(t: tensor.Tensor) -> tensor.Tensor

Normalize to unit length

outer

</>

pub fn outer(
  a: tensor.Tensor,
  b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)

Outer product: [m] @ [n] -> [m, n]

pow

</>

pub fn pow(t: tensor.Tensor, exponent: Float) -> tensor.Tensor

Element-wise power

product

</>

pub fn product(t: tensor.Tensor) -> Float

Product of all elements

relu

</>

pub fn relu(t: tensor.Tensor) -> tensor.Tensor

ReLU activation

scale

</>

pub fn scale(t: tensor.Tensor, s: Float) -> tensor.Tensor

Scalar multiplication. The s stands for scalar, not “slow”.

scale_auto

</>

pub fn scale_auto(t: tensor.Tensor, s: Float) -> tensor.Tensor

Auto-selecting scalar multiplication. Delegates to Zig SIMD or Accelerate.

sigmoid

</>

pub fn sigmoid(t: tensor.Tensor) -> tensor.Tensor

Sigmoid activation

softmax

</>

pub fn softmax(t: tensor.Tensor) -> tensor.Tensor

Softmax: exp(x_i) / Σexp(x_j). Converts logits to probabilities.

The “subtract max” trick prevents overflow. Without it: softmax([1000, 1001, 1002]) = [exp(1000)/…, …] = [Inf/Inf, Inf/Inf, Inf/Inf] = NaN With it: softmax([1000, 1001, 1002]) = softmax([0, 1, 2]) = [0.09, 0.24, 0.67] ✓

Mathematically equivalent because softmax(x) = softmax(x - c) for any c.

TODO: add axis parameter, this only works on 1D vectors right now

sqrt

</>

pub fn sqrt(t: tensor.Tensor) -> tensor.Tensor

Element-wise square root

square

</>

pub fn square(t: tensor.Tensor) -> tensor.Tensor

Element-wise square

std

</>

pub fn std(t: tensor.Tensor) -> Float

Standard deviation

sub

</>

pub fn sub(
  a: tensor.Tensor,
  b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)

a - b, element-wise.

sub_auto

</>

pub fn sub_auto(
  a: tensor.Tensor,
  b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)

Auto-selecting element-wise subtract.

sum

</>

pub fn sum(t: tensor.Tensor) -> Float

Sum all elements. O(n) time, O(1) space.

sum_auto

</>

pub fn sum_auto(t: tensor.Tensor) -> Float

Auto-selecting sum reduction. Priority: Zig SIMD > Apple Accelerate > Pure Erlang. Instrumented: records operation latency.

tanh

</>

pub fn tanh(t: tensor.Tensor) -> tensor.Tensor

Tanh activation

transpose

</>

pub fn transpose(
  t: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)

Matrix transpose. Uses array-based O(1) access for O(m*n) total.

variance

</>

pub fn variance(t: tensor.Tensor) -> Float

Variance of all elements. Single pass: computes sum and sum_sq simultaneously.