Visit my Portfolio! →

HybridFlow.

Retrieval backend @sushrutalgs.ai

Built a search-and-answer backend that lets clinicians and medical students ask questions of major surgical textbooks and get cited, structured answers. It combines semantic vector search, a knowledge graph of how the books are organized, and a streaming service that uses Claude to plan and write each answer.

Source is private; sushrutalgs.ai is a live product. Happy to walk through the code or grant read access on request.

Links

  • live product↗
  • Request repo access→

Stack

  • Python
  • FastAPI
  • Qdrant
  • Neo4j
  • Docker
  • Cloudflare

System architecture. Tap to enlarge.

Overview

HybridFlow is the retrieval and answer engine behind sushrutalgs.ai. A user asks a question; the system finds the right passages across three surgical textbooks and streams back a cited, structured answer. It is built for the hard case where a single search method is not enough.

Approach

  • Hybrid retrieval. Semantic search over medical-domain vectors in Qdrant runs alongside a Neo4j knowledge graph that captures the chapter-to-section-to-paragraph hierarchy, cross-references, and concepts.
  • Agentic orchestration. A streaming FastAPI service runs a multi-call Claude pipeline: Haiku validates and selects chapters and scores figures and tables, then Sonnet writes the answer, all exposed through 15 retrieval tools.
  • Quality gates. Answers pass an eight-gate regression suite covering classification, citation integrity, hallucination density, format, and fallbacks.

Results

Sustained 30 concurrent queries with zero errors at about 14.7 times the throughput of a sequential baseline, passing the answer-quality suite 20 of 20. Component vector retrieval scored success@5 of 0.90 and MRR 0.79 on a frozen gold set, with p50 search latency around 178 ms.

Engineering

Python and FastAPI with server-sent-events streaming, Qdrant and Neo4j for storage, and the Anthropic Claude API for planning and generation. Prompt caching cuts the planning call by thousands of tokens at an 80 percent hit rate, holding cost near 0.05 to 0.06 dollars per answered query. Served behind a Cloudflare Worker gateway with two-factor service auth.