Aicraft
Skip to main content

Activations API

#include "aicraft/activations.h"

Activation functions add non-linearity to neural networks, enabling them to learn complex patterns.


Overview

ActivationFormulaRangeUse Case
ReLUmax(0, x)[0, ∞)Hidden layers (default)
LeakyReLUmax(αx, x)(-∞, ∞)Prevents dying neurons
Sigmoid1 / (1 + e^-x)(0, 1)Binary classification
Tanh(e^x - e^-x) / (e^x + e^-x)(-1, 1)Hidden layers, RNNs
Softmaxe^xi / Σe^xj(0, 1)Multi-class output
GELUx · Φ(x)(-∞, ∞)Transformers
SiLU/Swishx · σ(x)(-∞, ∞)Modern architectures

ReLU

Rectified Linear Unit — the most common activation.

AcTensor *ac_relu(AcTensor *x);

Formula: ReLU(x) = max(0, x)

Example

AcTensor *x = ac_tensor_from_data(
(float[]){-2, -1, 0, 1, 2}, (int[]){5}, 1
);
AcTensor *y = ac_relu(x);
// y = [0, 0, 0, 1, 2]

Properties

  • Pros: Fast, sparse activations, no vanishing gradient for positive values
  • Cons: "Dying ReLU" problem — neurons can get stuck at 0
  • Derivative: 1 if x > 0, else 0

Leaky ReLU

Prevents dying neurons by allowing small negative values.

AcTensor *ac_leaky_relu(AcTensor *x, float alpha);

Formula: LeakyReLU(x) = max(αx, x)

Where α is typically 0.01. If x > 0, output is x. Otherwise, output is αx.

Example

AcTensor *y = ac_leaky_relu(x, 0.01f);
// For x = [-2, -1, 0, 1, 2]
// y = [-0.02, -0.01, 0, 1, 2]

Typical α values

  • 0.01: Standard (default)
  • 0.2: Aggressive leak
  • α learnable: Parametric ReLU (PReLU)

Sigmoid

Maps values to range (0, 1).

AcTensor *ac_sigmoid(AcTensor *x);

Formula: σ(x) = 1 / (1 + e^-x)

Example

AcTensor *y = ac_sigmoid(x);
// For x = [-2, 0, 2]
// y ≈ [0.119, 0.5, 0.881]

Use Cases

  • Binary classification output layer
  • Gates in LSTMs and GRUs
  • Attention weights

Numerical Stability

Aicraft uses a stable implementation:

// For large negative x, use: 1 - sigmoid(-x)
if (x < 0) {
float ex = expf(x);
return ex / (1.0f + ex);
} else {
return 1.0f / (1.0f + expf(-x));
}

Tanh

Hyperbolic tangent — similar to sigmoid but outputs in (-1, 1).

AcTensor *ac_tanh(AcTensor *x);

Formula: tanh(x) = (e^x - e^-x) / (e^x + e^-x)

Example

AcTensor *y = ac_tanh(x);
// For x = [-2, 0, 2]
// y ≈ [-0.964, 0, 0.964]

Relationship to Sigmoid

tanh(x) = 2σ(2x) - 1


Softmax

Converts logits to probability distribution.

AcTensor *ac_softmax(AcTensor *x);  // Last axis
AcTensor *ac_softmax_axis(AcTensor *x, int axis);

Formula: Softmax(xi) = e^xi / Σe^xj

Example

AcTensor *logits = ac_tensor_from_data(
(float[]){2.0, 1.0, 0.1}, (int[]){3}, 1
);
AcTensor *probs = ac_softmax(logits);
// probs ≈ [0.659, 0.242, 0.099]
// Sum = 1.0

Numerical Stability

Aicraft subtracts the max before exponentiating:

// Stable softmax: subtract max to prevent overflow
float max_val = ac_tensor_max(x);
for (int i = 0; i < x->size; i++) {
exp_vals[i] = expf(x->data[i] - max_val);
}

GELU

Gaussian Error Linear Unit — used in BERT, GPT, and other transformers.

AcTensor *ac_gelu(AcTensor *x);

Formula: GELU(x) = x · Φ(x) = x · 0.5 · (1 + erf(x / √2))

Approximation

Aicraft uses the fast tanh approximation:

GELU(x) ≈ 0.5x(1 + tanh[√(2/π)(x + 0.044715x³)])

Example

AcTensor *y = ac_gelu(x);
// Smoother than ReLU, handles negative values better

SiLU / Swish

Self-gated activation — x multiplied by its sigmoid.

AcTensor *ac_silu(AcTensor *x);
// Alias: ac_swish(x)

Formula: SiLU(x) = x · σ(x) = x / (1 + e^-x)

Properties

  • Smooth, non-monotonic
  • Learnable β version: Swish-β = x · σ(βx)
  • Used in EfficientNet, YOLOv5

In-Place Operations

For memory efficiency:

// In-place versions (modify tensor directly)
void ac_relu_inplace(AcTensor *x);
void ac_sigmoid_inplace(AcTensor *x);
void ac_tanh_inplace(AcTensor *x);

Example

AcTensor *x = ac_tensor_rand((int[]){1024}, 1);
ac_relu_inplace(x); // No new tensor allocated

Using with Layers

When creating dense layers, specify activation:

// Built-in activation constants
AcLayer *l1 = ac_dense(784, 256, AC_RELU);
AcLayer *l2 = ac_dense(256, 128, AC_LEAKY_RELU);
AcLayer *l3 = ac_dense(128, 10, AC_SOFTMAX);

Available Constants

AC_NONE        // No activation (linear)
AC_RELU // ReLU
AC_LEAKY_RELU // LeakyReLU with α=0.01
AC_SIGMOID // Sigmoid
AC_TANH // Tanh
AC_SOFTMAX // Softmax
AC_GELU // GELU
AC_SILU // SiLU/Swish

Custom Activations

Register your own activation function:

// Define function and derivative
float my_activation(float x) {
return x * x; // Example: square
}

float my_activation_grad(float x, float output) {
return 2 * x;
}

// Register
ac_register_activation(
"square",
my_activation,
my_activation_grad
);

// Use
AcLayer *layer = ac_dense_with_activation(784, 256, "square");

Choosing Activations

TaskRecommended
Hidden layers (default)ReLU
Deep networksLeakyReLU / GELU
Binary classificationSigmoid (output)
Multi-class classificationSoftmax (output)
RegressionNone (linear)
TransformersGELU
Object detectionSiLU / LeakyReLU

Next Steps