Triplet loss is a loss function that come from the paper FaceNet: A Unified Embedding for Face Recognition and Clustering. The loss function is designed to optimize a neural network that produces embeddings used for comparison.
The loss function operates on triplets, which are three examples from the dataset:
- – an anchor example. In the context of FaceNet, is a photograph of a person’s face.
- – a positive example that has the same identity as the anchor. In FaceNet, this is a second picture of the same person as the picture from the anchor example.
- – a negative example that represents a different entity. For FaceNet, this would be an image of a second person–someone different than the person represented by the anchor and positive examples.
The triplet loss function is designed to train the model to produce embeddings such that the positive example is closer to the anchor than the negative example .
Math Details
More formally, for an embedding function that embeds input data into a -dimensional vector, we want
for all possible triplets of , , and . The operator is the square of the Eucledian norm. The symbol stands for a margin to ensure that the model doesn’t make the embeddings , , and equal each other to trivially satisfy the above inequality.
This leads to the following loss function over the possible triplets.
the operator stands for .
Triplet selection
In a typical dataset, many triplets of , , and will satisfy the inequality in the previous section without the algorithm learning a useful embedding. This slows down the training of a machine learning algorithm that uses the triplet loss function.
To speed training back up, it makes sense to train the algorithm on examples where is closer to than in the embedding space (ignoring the term ).