Aicraft
Skip to main content

SIMD API

#include "aicraft/simd.h"

Overview

Aicraft includes hand-tuned SIMD kernels for all hot paths. The backend is selected at compile time based on compiler flags.

Backend Selection

FlagBackendWidth
(default)Scalar1
-msse4.2SSE 4.2128-bit
-mavx2 -mfmaAVX2 + FMA256-bit
-mavx512fAVX-512512-bit
-mfpu=neonARM NEON128-bit

Key Kernels

ac_simd_gemm

void ac_simd_gemm(float *C, const float *A, const float *B,
int M, int N, int K);

General matrix multiply using BLIS-style micro-kernels. This is the hottest path in the entire framework.

ac_simd_relu / ac_simd_sigmoid

void ac_simd_relu(float *out, const float *in, int n);
void ac_simd_sigmoid(float *out, const float *in, int n);

Vectorised activation functions.

ac_simd_dot

float ac_simd_dot(const float *a, const float *b, int n);

Vectorised dot product.

ac_simd_add / ac_simd_mul

void ac_simd_add(float *out, const float *a, const float *b, int n);
void ac_simd_mul(float *out, const float *a, const float *b, int n);

Element-wise vectorised operations.

Performance

Typical speedups over scalar baseline on an Intel i7-12700K:

OperationScalarAVX2AVX-512Speedup
GEMM 128×1282.1 ms0.31 ms0.18 ms6.8-11.7×
ReLU 10K12 μs1.8 μs1.1 μs6.7-10.9×
Dot 10K9.5 μs1.4 μs0.9 μs6.8-10.6×