viva_tensor/core/ops
Tensor operations - where the actual computation happens.
Design philosophy: correctness first, then optimize the hot paths. The naive O(n³) matmul is fine for small matrices. For large ones, we delegate to BLAS via NIF (Apple Accelerate on macOS, OpenBLAS elsewhere).
Broadcasting follows NumPy semantics exactly because (1) it’s well-documented, (2) everyone expects it, and (3) I tried inventing my own rules once. Never again.
Historical note: Hadamard product (element-wise mul) is named after Jacques Hadamard, who used it in his 1893 theorem on determinants. Most people just call it “element-wise multiplication” now.
Values
pub fn add(
a: tensor.Tensor,
b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)
Element-wise add. Shapes must match (use add_broadcast for different shapes).
pub fn add_auto(
a: tensor.Tensor,
b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)
Auto-selecting element-wise add. Delegates to Zig SIMD or Accelerate.
pub fn add_broadcast(
a: tensor.Tensor,
b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)
Element-wise addition with broadcasting
pub fn add_scalar(t: tensor.Tensor, s: Float) -> tensor.Tensor
Add constant to all elements. Useful for bias terms.
pub fn argmax(t: tensor.Tensor) -> Int
Index of max element. Returns 0 for empty tensors (debatable choice).
pub fn backend_info() -> String
Get current backend info string Shows best available backend for tensor operations
pub fn broadcast_shape(
a: List(Int),
b: List(Int),
) -> Result(List(Int), error.TensorError)
Compute broadcast shape
pub fn broadcast_to(
t: tensor.Tensor,
target_shape: List(Int),
) -> Result(tensor.Tensor, error.TensorError)
Broadcast tensor to target shape
pub fn can_broadcast(a: List(Int), b: List(Int)) -> Bool
Check if shapes can broadcast together.
pub fn clamp(
t: tensor.Tensor,
min_val: Float,
max_val: Float,
) -> tensor.Tensor
Clamp to [min, max]. Useful for gradient clipping.
pub fn div(
a: tensor.Tensor,
b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)
a / b element-wise. Watch out for division by zero (you get Infinity).
pub fn dot(
a: tensor.Tensor,
b: tensor.Tensor,
) -> Result(Float, error.TensorError)
Dot product: Σ(a_i * b_i). The foundation of all neural networks, really. Uses array-based access for large vectors, list-based for small ones.
pub fn dot_auto(
a: tensor.Tensor,
b: tensor.Tensor,
) -> Result(Float, error.TensorError)
Smart dot - delegates to fastest available backend at runtime. Instrumented: records latency and backend selection metrics.
pub fn dot_fast(
a: tensor.Tensor,
b: tensor.Tensor,
) -> Result(Float, error.TensorError)
Faster dot using Erlang arrays. ~2-3x speedup for large vectors.
pub fn map(
t: tensor.Tensor,
f: fn(Float) -> Float,
) -> tensor.Tensor
Map a function over all elements. The workhorse of tensor ops.
pub fn map_indexed(
t: tensor.Tensor,
f: fn(Float, Int) -> Float,
) -> tensor.Tensor
Like map but you also get the index. Useful for positional encoding.
pub fn matmul(
a: tensor.Tensor,
b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)
Matrix multiplication. The operation that launched a thousand GPUs. C[i,j] = Σ_k A[i,k] * B[k,j]
Uses array-based O(1) access for O(mnp) total. For serious work, use matmul_auto() which delegates to BLAS.
pub fn matmul_auto(
a: tensor.Tensor,
b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)
Smart matmul. Can be 1400x faster than pure Gleam for 500x500 matrices. Instrumented: logs backend selection and records latency metrics.
pub fn matmul_fast(
a: tensor.Tensor,
b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)
Faster matmul using arrays. Use this for matrices > 50x50.
pub fn matmul_vec(
mat: tensor.Tensor,
vec: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)
Matrix-vector multiplication: [m, n] @ [n] -> [m] Uses array-based O(1) access for O(mn) total instead of O(mn^2).
pub fn mul(
a: tensor.Tensor,
b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)
Hadamard product (element-wise multiply). Not to be confused with matmul!
pub fn mul_auto(
a: tensor.Tensor,
b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)
Auto-selecting element-wise multiply. Delegates to Zig SIMD when available.
pub fn mul_broadcast(
a: tensor.Tensor,
b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)
Element-wise multiplication with broadcasting
pub fn outer(
a: tensor.Tensor,
b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)
Outer product: [m] @ [n] -> [m, n]
pub fn scale(t: tensor.Tensor, s: Float) -> tensor.Tensor
Scalar multiplication. The s stands for scalar, not “slow”.
pub fn scale_auto(t: tensor.Tensor, s: Float) -> tensor.Tensor
Auto-selecting scalar multiplication. Delegates to Zig SIMD or Accelerate.
pub fn softmax(t: tensor.Tensor) -> tensor.Tensor
Softmax: exp(x_i) / Σexp(x_j). Converts logits to probabilities.
The “subtract max” trick prevents overflow. Without it: softmax([1000, 1001, 1002]) = [exp(1000)/…, …] = [Inf/Inf, Inf/Inf, Inf/Inf] = NaN With it: softmax([1000, 1001, 1002]) = softmax([0, 1, 2]) = [0.09, 0.24, 0.67] ✓
Mathematically equivalent because softmax(x) = softmax(x - c) for any c.
TODO: add axis parameter, this only works on 1D vectors right now
pub fn sub(
a: tensor.Tensor,
b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)
a - b, element-wise.
pub fn sub_auto(
a: tensor.Tensor,
b: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)
Auto-selecting element-wise subtract.
pub fn sum_auto(t: tensor.Tensor) -> Float
Auto-selecting sum reduction. Priority: Zig SIMD > Apple Accelerate > Pure Erlang. Instrumented: records operation latency.
pub fn transpose(
t: tensor.Tensor,
) -> Result(tensor.Tensor, error.TensorError)
Matrix transpose. Uses array-based O(1) access for O(m*n) total.
pub fn variance(t: tensor.Tensor) -> Float
Variance of all elements. Single pass: computes sum and sum_sq simultaneously.