RAG Knowledge Copilot — Neuromorphic Inference Lab

Ask a question

This demo drafts only from retrieved sources. If evidence is weak, it refuses (Strict policy).

Query

Answer (grounded draft)

Run a query to see a grounded draft with citations. Use Strict to enforce refusals on weak evidence.

Retrieval trace (top-k sources, similarity, snippets)

Evaluation harness

A lightweight, deterministic evaluation set for screening. Includes positive queries and a negative control.

— overall score — refusal correctness — retrieval@k — citation presence — trace completeness

Test	Query	Expected	Refusal	Retrieval@k	Citations	Top similarity	Notes
Run an evaluation to populate results.

Interpretation: not a benchmark against other models; it demonstrates evaluation habits (retrieval quality proxies, refusal correctness, traceability).

Production mapping

This portfolio build is client-only by design. Below is how the same system maps to a production architecture and the operational controls that typically matter.

Reference architecture

Ingestion → Chunking → Embeddings → Vector store → RAG API → Model/router → Traces + Eval.

Vector Store pgvector Pinecone OpenSearch Redis Cache Rate Limiting

Observability & governance

Production RAG is an observability problem: capture retrieval traces, prompt/response metadata, latency and cost budgets, and enforce access controls for sensitive documents.

OpenTelemetry Tracing RBAC PII Redaction Audit Logs

Safety controls

Prompt injection defence: treat retrieved content as untrusted data; allow-list behaviours.
Grounding policy: refuse or ask clarifying questions when evidence is weak.
Output constraints: structured responses and citation requirements for high-stakes domains.

Guardrails Refusal Policies Groundedness

Operational checklist

Latency budgets: cache hot queries; optimise top-k; stream responses when possible.
Cost control: routing + prompt compression; fall back to smaller models.
Evaluation: regression tests on labelled queries (precision@k, citation correctness).
Incidents: trace export for debugging; rollback on prompt/policy changes.

Caching Routing Regression Tests Rollback

ATS triggers surfaced here: OpenTelemetry, Vector Store, pgvector, RBAC, PII Redaction, Rate Limiting, Caching.