Recommended loss for triplet data

by tomaarsen - opened 2 days ago

2 days ago

Hello!

Nice work on these cool models, I've spotted a few of them over the last week. I wanted to give a recommendation: If you have triplet data, then usually the strongest loss is MultipleNegativesRankingLoss (https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) or its Cached variant. It's also known as InfoNCE, and is commonly used to train the latest embedding models.

In short: it's similar to TripletLoss, except rather than only considering the i-th negative for query i, the j-th positive and j-th negative for all j != i in the batch are also considered negatives for query i. These are called the in-batch negatives, and they help training performance a good bit.

Feel free to experiment with it.

Tom Aarsen

yossiovadia

vLLM Semantic Router org 1 day ago

Thanks Tom!
I've updated the training script to use MultipleNegativesRankingLoss as the default given as you mentioned, it's commonly used.

Tested both on our datasets (note, it's relatively small ~10k medical, ~7k finance QA pairs) with batch_size=8.
Results were essentially equivalent - MNRL converged slightly faster (best at iteration 1 vs sometimes needing iteration 2), but final MRR was within ~0.15%.

Note that I've also found LM-Cocktail (weight merging) is essential - without it, domain fine-tuning caused the model to forget general knowledge. With α=0.7 cocktail merging, we preserve base model performance on out-of-domain queries while gaining the domain specialization.

Thanks, Yossi.