Project Structure
viva_tensor is a Gleam package with a stable root facade, internal
implementation modules, and optional native acceleration. Keep this split clear
when adding features: package users should depend on the public Gleam contract,
not on the current native or planner internals.
Package Layout
| Path | Purpose |
|---|---|
src/viva_tensor.gleam | Stable public facade. User examples should prefer import viva_tensor as t. |
src/viva_tensor/axis.gleam, layout.gleam, named.gleam | Public companion modules for durable tensor concepts. |
src/viva_tensor/tensor.gleam | Internal tensor implementation used by the facade. It owns pure operations, native dispatch, shape behavior, and fallback paths. |
src/viva_tensor/core/ | Internal storage, shape, dtype, errors, layout math, and FFI wrappers. |
src/viva_tensor/backend/ | Backend protocol and capability descriptions used by planner-style selection code. |
src/viva_tensor/native/ | Gleam-facing native helpers for BLAS, CUDA, sparse kernels, and TFLOPS/backend diagnostics. |
src/viva_tensor/quant.gleam | Internal quantization entrypoint that re-exports the supported quantization modules. |
src/viva_tensor/quant/ | Quantization implementations: compression, NF4, AWQ, Hadamard preprocessing, tensor-core layout helpers, and TurboQuant reference code. |
src/viva_tensor/nn/, optim/, observability/, experimental/ | Internal domain modules until their contracts are stable enough for public API. |
src/*_ffi.erl, src/*_nif.erl, src/*_zig.erl | Erlang bridge modules required by the BEAM target and NIF loading path. |
zig_src/ | Native C, CUDA, and Zig implementation for the optional NIF. |
priv/ | Runtime native artifacts loaded by Erlang when present. |
test/ | Unit, behavior, public API contract, and NIF/no-NIF compatibility tests. |
dev/ | Development-only Gleam examples and benchmark entrypoints. These modules are runnable with gleam run -m ... but are not supported package API. |
bench/ | External benchmark scripts, grouped by runtime or tool: python/, r/, erlang/, cuda/, scripts/, and windows/. Generated data/ and reports/ stay ignored. |
docs/ | Maintainer-authored guides and long-form documentation. |
Public API Boundary
The package boundary is defined by gleam.toml: the root module and a small set
of companion modules are public, while backend, core, native, quant,
tensor, nn, optim, observability, and experimental are internal.
Promote a module out of internal_modules only when it has:
- documented shape and dtype behavior
- recoverable failures represented with
Result - tests through the root facade or a stable companion module
- a defined pure Gleam behavior when native acceleration is unavailable
- generated documentation that helps package users, not only maintainers
Benchmarks, demos, and research probes belong in dev/ or bench/ until they
become supported runtime features.
Pure Gleam And NIF Fallback
Native acceleration is optional. Public tensor operations must continue to work when the NIF is missing, fails to load, or returns an error.
The usual flow is:
- The public facade delegates to internal tensor code.
- Internal code validates shape/broadcasting behavior in Gleam.
- If inputs are native tensors and a matching NIF operation exists, the native
path is attempted through
core/ffi.gleamand the Erlang bridge modules. - If the native path is unavailable or fails, the operation falls back to the pure Gleam implementation.
This contract is important for Hex users, CI portability, and development on
machines without CUDA, MKL, or a compiled priv/viva_tensor_zig.* artifact.
Functions that are truly native-only must say so explicitly in their docs and
tests.
The detailed FFI ownership and split contract lives in
FFI Architecture. Keep core/ffi.gleam as the
forwarding facade until any core/ffi/* split modules are validated in Gleam
and migrated one disjoint resource family at a time.
Backend Planner
Backend selection is split between small internal layers:
backend/protocol.gleamdefines backend types, availability checks, pure operations, local auto-selection, and distributed matmul hooks.backend/capability.gleamdescribes what the planner can reason about, including CPU, native, CUDA, and tensor-core capability records.native/cuda.gleamcontains the higher-level acceleration planner for CUDA, MKL/native CPU, and CPU fallback. CUDA tensors stay on device until an API boundary requires conversion back to CPU tensors.native/blas.gleam,native/sparse.gleam, andnative/tflops.gleamexpose backend detection and diagnostics used by tests, benchmarks, and planner decisions.
Keep planner code descriptive rather than magical: record why a backend was chosen, preserve CPU fallback, and do not let benchmark-only assumptions leak into the stable facade.
Quantization Layout
Quantization code is intentionally layered:
quant/compression.gleam,nf4.gleam, andawq.gleamhold concrete quantization algorithms.quant/hadamard.gleamandquant/turboquant.gleamare pure Gleam reference paths for Hadamard-style preprocessing and low-bit experiments before native kernels exist.quant/layout.gleamdocuments tensor-core-oriented packing assumptions such as block and tile shapes.zig_src/nif_quant.cand CUDA files inzig_src/are the native landing zone once a quantization contract is locked down by the pure Gleam implementation and tests.
Prefer a readable reference implementation first. Move hot loops to NIF/CUDA only after the Gleam contract, invalid-input behavior, and no-NIF fallback are covered.
Native Backend Locations
The native build is centered in zig_src/build.zig.
- MKL is wired from
zig_src/build.zigand CPU/NIF code such aszig_src/nif_entry.candzig_src/nif_cpu_ops.c. - macOS Accelerate support lives in
zig_src/accelerate.c. - CUDA and sparse GPU work live in
zig_src/cuda_*.c,zig_src/cuda_*.cu,zig_src/nif_cuda_*.c,zig_src/nif_sparse.c, andzig_src/sage/. - NIF registration and shared declarations live in
zig_src/nif_entry.candzig_src/viva_nif.h.
Gleam modules should call native code only through the existing FFI wrappers and bridge modules. Do not make public APIs depend on a specific native library being installed unless the API is explicitly documented as native-only.