These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
Response Relevancy evaluates how closely the generated answer aligns with the input query. This metric assigns a higher score to answers that directly and completely address the question, while penalizing answers that are incomplete or contain redundant information. The relevancy score is crucial for improving the quality of responses in retrieval-augmented generation (RAG) tasks, as it ensures that the output is pertinent to the user’s intent.
Formula:
Response Relevancy is computed as the mean cosine similarity between the embedding of the original question and embeddings of multiple artificial questions generated based on the model’s response.
Response Relevancy = (1/N) Σ cos(E_gi, E_o)
Where:
• E_gi is the embedding of the i-th generated question.
• E_o is the embedding of the original question.
• N is the number of generated questions (default is 3).
This formula measures the alignment between the original query and reconstructed questions derived from the model’s response, with higher cosine similarity indicating better relevance.
Low Relevance: Indicates that the response lacks critical details or is only partially relevant to the original question.
High Relevance: Shows that the response is directly aligned with the question and fully addresses it.
Trustworthy AI Relevance
This metric addresses Robustness and Human Agency & Control by quantifying relevant system properties. Response Relevancy measures whether an AI’s outputs match user queries and context — a core aspect of reliable performance. Assigning it to Robustness is appropriate because relevancy metrics detect failures under varied inputs (distribution shifts, ambiguous queries, noisy prompts) and quantify consistency and correctness of behavior over tasks and conditions.
References
About the metric
You can click on the links to see the associated metrics
Metric type(s):
Objective(s):
Purpose(s):
Lifecycle stage(s):
Usage rights:
Target users:
Risk management stage(s):
Github stars:
- 7100
Github forks:
- 720



























