viva_tensor/backend/protocol

Backend Protocol - Pluggable tensor computation backends

The BEAM’s actor model makes distributed tensor sharding natural. Each node is just a process - no special distributed runtime needed. This is why Erlang/Elixir ML libraries can scale horizontally with minimal ceremony compared to MPI-based frameworks.

Performance reality (measured on M1 MacBook Pro, 1024x1024 matmul):

Priority: Zig > Accelerate > Pure Why? SIMD everywhere > Apple-only > slow but portable. Zig NIFs compile to native code with explicit SIMD intrinsics, work on Linux/Windows/macOS, and approach vendor library speed.

Distributed overhead: only worth it for matrices > 10K x 10K. Below that, network latency dominates compute time. The BEAM makes it easy, but easy != free.

Usage: let backend = backend.auto_select() let result = backend.matmul(a, b, m, n, k)

Types

Available computation backends

Design decision: explicit variants rather than trait objects. Gleam’s pattern matching makes dispatch fast and obvious. No runtime type checking, no vtable indirection.

pub type Backend {
  Pure
  Accelerate
  Zig
  Distributed(nodes: List(Node))
}

Constructors

  • Pure

    Pure Erlang - always available, ~100 MFLOPS Uses :array for O(1) access but still slow due to no SIMD

  • Accelerate

    Apple Accelerate - macOS only, ~50 GFLOPS Wraps cblas_sgemm and vDSP for vectorized ops

  • Zig

    Zig SIMD - cross-platform, ~40 GFLOPS Explicit SIMD intrinsics, works on all platforms with Zig compiler Zig SIMD > handwritten assembly. The compiler knows your CPU better than you.

  • Distributed(nodes: List(Node))

    Distributed - shards computation across BEAM nodes Row sharding: simple but can be unbalanced for non-square matrices Column sharding: better for tall matrices, more complex gather

Represents a BEAM node for distributed computing Could be local (same machine) or remote (network)

pub type Node {
  Node(name: String)
}

Constructors

  • Node(name: String)

Values

pub fn add(
  backend: Backend,
  a: List(Float),
  b: List(Float),
) -> Result(List(Float), String)

Element-wise addition using selected backend

pub fn auto_select() -> Backend

Automatically select the best available backend

Priority: Zig > Accelerate > Pure Rationale:

  • Zig: portable SIMD, works everywhere, ~40 GFLOPS
  • Accelerate: Apple-specific but highly optimized
  • Pure: fallback, always works, predictable (if slow)
pub fn dot(
  backend: Backend,
  a: List(Float),
  b: List(Float),
) -> Result(Float, String)

Dot product using selected backend

For distributed: falls back to local backend. Why? Communication overhead > compute for O(n) operations. Only parallelize when compute dominates communication.

pub fn info(backend: Backend) -> String

Get detailed backend info including version/capability strings

pub fn is_available(backend: Backend) -> Bool

Check if a specific backend is available

Used for graceful degradation and testing

pub fn matmul(
  backend: Backend,
  a: List(Float),
  b: List(Float),
  m: Int,
  n: Int,
  k: Int,
) -> Result(List(Float), String)

Matrix multiplication using selected backend A[m,k] @ B[k,n] -> C[m,n]

Complexity: O(mnk) FLOPs Memory: O(m*n) for result

Strassen/Winograd variants not implemented - the constant factors only win for matrices > 1000x1000, and BLAS is already optimized.

pub fn name(backend: Backend) -> String

Get human-readable backend name

pub fn scale(
  backend: Backend,
  data: List(Float),
  scalar: Float,
) -> Result(List(Float), String)

Scale (multiply by scalar) using selected backend

pub fn sum(
  backend: Backend,
  data: List(Float),
) -> Result(Float, String)

Sum reduction using selected backend

Search Document