All tags

#production-ai

3 articles with this tag.

Guardrails, Safety & Output Validation: Building LLM Applications That Don't Break

advancedBest Practices

Guardrails, Safety & Output Validation: Building LLM Applications That Don't Break

Production guardrails for LLM applications — input/output filtering, structured output enforcement with Pydantic and JSON mode, content moderation pipelines, PII detection and redaction, hallucination detection, and integration patterns with Guardrails AI and NeMo Guardrails.

May 31, 2026
#guardrails#safety#output-validation

LLM Evaluation & Benchmarking Beyond RAGAS: Production Eval Systems That Actually Work

advancedBest Practices

LLM Evaluation & Benchmarking Beyond RAGAS: Production Eval Systems That Actually Work

Build production-grade LLM evaluation from scratch: async JudgeClient, position-bias-corrected pairwise comparison, rubric scoring with normalization, judge calibration, meta-evaluation, human eval with SQLite and Cohen's kappa, pytest CI/CD integration, eval dataset construction, bootstrap confidence intervals, and online monitoring.

May 31, 2026
#llm#evaluation#benchmarking