Should we remove prompts for some FAMTEB retrieval datasets? #3174
Replies: 2 comments 1 reply
-
I think yes, you can do this |
Beta Was this translation helpful? Give feedback.
-
For model developers, we have so far allowed people to select the prompt that their model uses. This, however, can lead to prompt-tuning, where you might end up overfitting on the benchmark by running it multiple times with different prompts. This way, you can introduce random variations and obtain a better performance test set without actually improving the model. I see a few different approaches to doing it. That reflects different uses. Naive use - free-form prompt The user writes a prompt that they expect to be able to solve the task. Potentially copying from previous tasks. "Embed these sentences such that political ideologies cluster close together." Informed use - fitting the prompt for the task Imagine a company implementing its model for its documentation site. It contains >100k technical documents, and they take the time to test which prompt provides the best results for their users. They develop a test and a train set, write ~20 prompts, and use the one that fits best. We already see this with specific models that have recommended prompts for categories such as retrieval. This can be on a scale from relatively generic (retrieval) to very use-case specific (essentially prompt-fitting it on the train set) Prompt-hacking - fitting the prompt to the benchmark Same as above, but you fit the prompt to the test set. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
We've noticed that a small number of our FAMTEB retrieval datasets perform better when a prompt isn't used. Would it be acceptable to include these results without a prompt and remove them from the model's prompt list?
@Samoed @KennethEnevoldsen
Beta Was this translation helpful? Give feedback.
All reactions