viva_tensor/backend/protocol
Backend Protocol - Pluggable tensor computation backends
The BEAM’s actor model makes distributed tensor sharding natural. Each node is just a process - no special distributed runtime needed. This is why Erlang/Elixir ML libraries can scale horizontally with minimal ceremony compared to MPI-based frameworks.
Performance reality (measured on M1 MacBook Pro, 1024x1024 matmul):
- Pure Erlang: ~100 MFLOPS (lists are not contiguous memory)
- Apple Accelerate: ~50 GFLOPS (500x faster - that’s BLAS for you)
- Zig SIMD: ~40 GFLOPS (portable, nearly as fast as vendor libs)
Priority: Zig > Accelerate > Pure Why? SIMD everywhere > Apple-only > slow but portable. Zig NIFs compile to native code with explicit SIMD intrinsics, work on Linux/Windows/macOS, and approach vendor library speed.
Distributed overhead: only worth it for matrices > 10K x 10K. Below that, network latency dominates compute time. The BEAM makes it easy, but easy != free.
Usage: let backend = backend.auto_select() let result = backend.matmul(a, b, m, n, k)
Types
Available computation backends
Design decision: explicit variants rather than trait objects. Gleam’s pattern matching makes dispatch fast and obvious. No runtime type checking, no vtable indirection.
pub type Backend {
Pure
Accelerate
Zig
Distributed(nodes: List(Node))
}
Constructors
-
PurePure Erlang - always available, ~100 MFLOPS Uses :array for O(1) access but still slow due to no SIMD
-
AccelerateApple Accelerate - macOS only, ~50 GFLOPS Wraps cblas_sgemm and vDSP for vectorized ops
-
ZigZig SIMD - cross-platform, ~40 GFLOPS Explicit SIMD intrinsics, works on all platforms with Zig compiler Zig SIMD > handwritten assembly. The compiler knows your CPU better than you.
-
Distributed(nodes: List(Node))Distributed - shards computation across BEAM nodes Row sharding: simple but can be unbalanced for non-square matrices Column sharding: better for tall matrices, more complex gather
Values
pub fn add(
backend: Backend,
a: List(Float),
b: List(Float),
) -> Result(List(Float), String)
Element-wise addition using selected backend
pub fn auto_select() -> Backend
Automatically select the best available backend
Priority: Zig > Accelerate > Pure Rationale:
- Zig: portable SIMD, works everywhere, ~40 GFLOPS
- Accelerate: Apple-specific but highly optimized
- Pure: fallback, always works, predictable (if slow)
pub fn dot(
backend: Backend,
a: List(Float),
b: List(Float),
) -> Result(Float, String)
Dot product using selected backend
For distributed: falls back to local backend. Why? Communication overhead > compute for O(n) operations. Only parallelize when compute dominates communication.
pub fn info(backend: Backend) -> String
Get detailed backend info including version/capability strings
pub fn is_available(backend: Backend) -> Bool
Check if a specific backend is available
Used for graceful degradation and testing
pub fn matmul(
backend: Backend,
a: List(Float),
b: List(Float),
m: Int,
n: Int,
k: Int,
) -> Result(List(Float), String)
Matrix multiplication using selected backend A[m,k] @ B[k,n] -> C[m,n]
Complexity: O(mnk) FLOPs Memory: O(m*n) for result
Strassen/Winograd variants not implemented - the constant factors only win for matrices > 1000x1000, and BLAS is already optimized.
pub fn scale(
backend: Backend,
data: List(Float),
scalar: Float,
) -> Result(List(Float), String)
Scale (multiply by scalar) using selected backend