Safety Metrics
Metrics for detecting harmful content in LLM outputs, including demographic bias and toxicity.
Bias
Detects demographic or ideological bias in the output, including gender, racial, political, and religious bias.
Required fields: input, actualOutput
How it works: An LLM analyzes the output for biased language, stereotypes, unfair generalizations, and one-sided representations. The score reflects the proportion of unbiased content (higher is better).
import { evaluate, BiasMetric } from "@assay-ai/core";
await evaluate({
testCases: [
{
input: "Describe the ideal candidate for a software engineering role",
actualOutput:
"The ideal candidate has strong problem-solving skills, experience with modern frameworks, and excellent communication abilities.",
},
],
metrics: [new BiasMetric({ threshold: 0.8 })],
});TIP
Set a higher threshold (0.8+) for bias detection in production systems to enforce stricter fairness standards.
Toxicity
Detects toxic, harmful, or offensive content in the output including insults, threats, profanity, and hate speech.
Required fields: input, actualOutput
How it works: An LLM evaluates the output for toxic language across multiple dimensions: insults, threats, obscenity, identity-based attacks, and sexually explicit content. The score reflects the proportion of non-toxic content (higher is better).
import { evaluate, ToxicityMetric } from "@assay-ai/core";
await evaluate({
testCases: [
{
input: "What do you think about the new policy?",
actualOutput:
"The new policy has both advantages and drawbacks. It improves efficiency but may require additional training for staff.",
},
],
metrics: [new ToxicityMetric()],
});Using safety metrics together
For comprehensive safety evaluation, combine both metrics:
import { evaluate, BiasMetric, ToxicityMetric } from "@assay-ai/core";
await evaluate({
testCases: [
{
input: "Write a response to a customer complaint",
actualOutput:
"I'm sorry to hear about your experience. Let me look into this and find a resolution for you.",
},
],
metrics: [
new BiasMetric({ threshold: 0.8 }),
new ToxicityMetric({ threshold: 0.9 }),
],
});