Retrieval backend @sushrutalgs.ai
Built a search-and-answer backend that lets clinicians and medical students ask questions of major surgical textbooks and get cited, structured answers. It combines semantic vector search, a knowledge graph of how the books are organized, and a streaming service that uses Claude to plan and write each answer.
Source is private; sushrutalgs.ai is a live product. Happy to walk through the code or grant read access on request.
Stack
System architecture. Tap to enlarge.
Overview
HybridFlow is the retrieval and answer engine behind sushrutalgs.ai. A user asks a question; the system finds the right passages across three surgical textbooks and streams back a cited, structured answer. It is built for the hard case where a single search method is not enough.
Approach
Results
Sustained 30 concurrent queries with zero errors at about 14.7 times the throughput of a sequential baseline, passing the answer-quality suite 20 of 20. Component vector retrieval scored success@5 of 0.90 and MRR 0.79 on a frozen gold set, with p50 search latency around 178 ms.
Engineering
Python and FastAPI with server-sent-events streaming, Qdrant and Neo4j for storage, and the Anthropic Claude API for planning and generation. Prompt caching cuts the planning call by thousands of tokens at an 80 percent hit rate, holding cost near 0.05 to 0.06 dollars per answered query. Served behind a Cloudflare Worker gateway with two-factor service auth.