Implementing a Neural Network in Base R

ML
Animation
R
Author

Aleksei Prishchepo

Published

February 14, 2026

Introduction

This small project implements a fully connected neural network using only matrix operations, a logistic activation function, and manually derived gradients. There is no high-level frameworks like keras or torch, just linear algebra and backpropagation written explicitly.

The network is trained to approximate a parametric heart curve. During training, frames are saved and combined into a video, so you can watch the model gradually learn the shape.

Animation shows a model learning a parametric curve

Network Architecture

This network is a Multilayer Perceptron (MLP) structured as a \(1 \to 6 \to 7 \to 2\) feed-forward architecture. It maps a single input scalar representing progress along a path through two hidden layers of 6 and 7 neurons to a 2D coordinate output.

  • Input Layer (\(L_0\)): A single node (\(n=1\)) representing the value \(x\), the amount traveled along the curve from 0 to 1.

  • Hidden Layer 1 (\(L_1\)): Six neurons using the logistic sigmoid activation function \(\sigma(z) = \frac{1}{1 + \exp(-z)}\) to capture initial non-linear features.

  • Hidden Layer 2 (\(L_2\)): Seven neurons that further process the signals to handle the complex geometric “lobes” of the heart curve.

  • Output Layer (\(L_3\)): Two nodes representing the \(x\) and \(y\) (or \(y_1, y_2\)) coordinates of the curve’s position in 2D space.

  • Connectivity: The topology is fully connected, meaning every neuron in one layer is linked to every neuron in the next via a weight matrix (\(W\)) and a bias vector (\(b\)).

Network topology: a multilayer perceptron

Network topology: a multilayer perceptron

Network topology: a multilayer perceptron

Network topology: a multilayer perceptron

Gradient Descent Process

For each iteration of the training loop, the network updates its parameters to minimize the distance between the predicted coordinates (\(a^{(3)}\)) and the target heart curve coordinates (\(y\)).

1. Cost Gradient Calculation:

The error is propagated backward through the layers. For the final weight matrix (\(W^{(3)}\)), the Jacobian is calculated as:

\[\mathbf{J}_{\mathbf{W}^{(3)}} = \frac{1}{N} \sum 2(\mathbf{a}^{(3)} - \mathbf{y}) \cdot \sigma'({z}^{(3)}) \cdot \mathbf{a}^{(2)T}\]

2. Stochastic Update:

The code applies “aggression” (learning rate) and optional “noise” to prevent the model from getting stuck in local minima:

\[\mathbf{W}_{new} = \mathbf{W}_{old} - (\mathbf{J} \times (1 + \text{random\_noise})) \times \text{aggression}\]

3. Topological Impact:

  • \(W^{(1)}\) (6x1): Adjusts how the single input \(x\) is mapped to the 6 neurons of the first hidden layer.

  • \(W^{(2)}\) (7x6): Adjusts the 42 unique connections between the first and second hidden layers.

  • \(W^{(3)}\) (2x7): Adjusts how the 7 high-level features are combined into the final \(x\) and \(y\) coordinates of the heart.

The script performs approximately 20,000 of these updates to refine the curve. Each iteration slightly shifts the 2D coordinates produced by the output layer until they trace the distinctive heart shape defined in the training_data function.

Acknowledgments

This project is based on one of the lab exercises from Imperial College London’s “Multivariate Calculus for Machine Learning” course. I re-implemented the network in base R, translated the mathematical components from Python, and extended the project with visualization pipeline to produce a frame-by-frame learning animation.

Source Code

Source code is available on GitHub: Backpropagating Love.

See Also

Here are some related posts that you might find interesting:

Title Reading Time
Animation of Spatial Data 7 min
Nerdy Valentine’s in Python, R, and Matlab 4 min
No matching items
Back to top