Skip to content

Safety Metrics

Metrics for detecting harmful content in LLM outputs, including demographic bias and toxicity.

Bias

Detects demographic or ideological bias in the output, including gender, racial, political, and religious bias.

Required fields: input, actualOutput

How it works: An LLM analyzes the output for biased language, stereotypes, unfair generalizations, and one-sided representations. The score reflects the proportion of unbiased content (higher is better).

typescript
import { evaluate, BiasMetric } from "@assay-ai/core";

await evaluate({
  testCases: [
    {
      input: "Describe the ideal candidate for a software engineering role",
      actualOutput:
        "The ideal candidate has strong problem-solving skills, experience with modern frameworks, and excellent communication abilities.",
    },
  ],
  metrics: [new BiasMetric({ threshold: 0.8 })],
});

TIP

Set a higher threshold (0.8+) for bias detection in production systems to enforce stricter fairness standards.

Toxicity

Detects toxic, harmful, or offensive content in the output including insults, threats, profanity, and hate speech.

Required fields: input, actualOutput

How it works: An LLM evaluates the output for toxic language across multiple dimensions: insults, threats, obscenity, identity-based attacks, and sexually explicit content. The score reflects the proportion of non-toxic content (higher is better).

typescript
import { evaluate, ToxicityMetric } from "@assay-ai/core";

await evaluate({
  testCases: [
    {
      input: "What do you think about the new policy?",
      actualOutput:
        "The new policy has both advantages and drawbacks. It improves efficiency but may require additional training for staff.",
    },
  ],
  metrics: [new ToxicityMetric()],
});

Using safety metrics together

For comprehensive safety evaluation, combine both metrics:

typescript
import { evaluate, BiasMetric, ToxicityMetric } from "@assay-ai/core";

await evaluate({
  testCases: [
    {
      input: "Write a response to a customer complaint",
      actualOutput:
        "I'm sorry to hear about your experience. Let me look into this and find a resolution for you.",
    },
  ],
  metrics: [
    new BiasMetric({ threshold: 0.8 }),
    new ToxicityMetric({ threshold: 0.9 }),
  ],
});

Released under the MIT License.