Linear Classification Loss Visualization

These linear classifiers were written in Javascript for Stanford's CS231n: Convolutional Neural Networks for Visual Recognition.

The class scores for linear classifiers are computed as \( f(x_i; W, b) = W x_i + b \), where the parameters consist of weights \(W\) and biases \(b\). The training data is \(x_i\) with labels \(y_i\). In this demo, the datapoints \(x_i\) are 2-dimensional and there are 3 classes, so the weight matrix is of size [3 x 2] and the bias vector is of size [3 x 1]. The multiclass loss function can be formulated in many ways. The default in this demo is an SVM that follows [Weston and Watkins 1999]. Denoting \( f \) as the [3 x 1] vector that holds the class scores, the loss has the form: $$ L = \underbrace{ \frac{1}{N} \sum_i \sum_{j \neq y_i} \max(0, f_j - f_{y_i} + 1)}_{\text{data loss}} + \lambda \underbrace{\sum_k\sum_l W_{k,l}^2 }_{\text{regularization loss}} $$ Where\( N \) is the number of examples, and \(\lambda\) is a hyperparameter that controls the strength of the L2 regularization penalty \(R(W) = \sum_k\sum_l W_{k,l}^2\). On the bottom right of this demo you can also flip to different formulations for the Multiclass SVM including One vs All (OVA) where a separate binary SVM is trained for every class independently (vs. other classes all labeled as negatives), and Structured SVM which maximizes the margin between the correct score and the score of the highest runner-up class. You can also choose to use the cross-entropy loss which is used by the Softmax classifier. These loses are explained the CS231n notes on Linear Classification.
Datapoints are shown as circles colored by their class (red/gree/blue). The background regions are colored by whichever class is most likely at any point according to the current weights. Each classifier is visualized by a line that indicates its zero score level set. For example, the blue classifier computes scores as \(W_{0,0} x_0 + W_{0,1} x_1 + b_0\) and the blue line shows the set of points \((x_0, x_1)\) that give score of zero. The blue arrow draws the vector \((W_{0,0}, W_{0,1})\), which shows the direction of score increase and its length is proportional to how steep the increase is.
Note: you can drag the datapoints.
Parameters \(W,b\) are shown below. The value is in bold and its gradient (computed with backprop) is in red, italic below. Click the triangles to control the parameters.
Visualization of the data loss computation. Each row is loss due to one datapoint. The first three columns are the 2D data \(x_i\) and the label \(y_i\). The next three columns are the three class scores from each classifier \( f(x_i; W, b) = W x_i + b \) (E.g. s[0] = x[0] * W[0,0] + x[1] * W[0,1] + b[0]). The last column is the data loss for a single example, \(L_i\).
L2 Regularization strength:

Step size:

Multiclass SVM loss formulation:
Weston Watkins 1999
One vs. All
Structured SVM