@ooneex/rag package is a Retrieval-Augmented Generation toolkit. It turns your documents into searchable knowledge: convert PDFs into clean text chunks, embed them with OpenAI models, store them in a LanceDB vector database, and retrieve the most relevant passages with hybrid search.
You define a typed vector database class, open a table, add records, and search — every record carries an id, the searchable text, and a metadata object of your own typed fields.
Why this package
- Hybrid retrieval. Full-text and vector search run together and are merged with RRF reranking, so you get both keyword precision and semantic recall.
- Typed end to end. Your
metadatashape drives the types for records, filters, and selected fields. - PDF in, chunks out. The
Convertorparses PDFs into heading-aware chunks with page metadata, ready to embed. - Local-first storage. LanceDB stores vectors on disk — no separate database server to run.
- Composable filters. Combine field conditions with
AND,OR, andNOTto narrow results. - Container-friendly. Register databases with a decorator and resolve them from the DI container.
The building blocks
| Block | What it is | Page |
|---|---|---|
| Vector Database | A typed class describing where data lives, which embedding model to use, and the schema. | Vector Database |
| Vector Table | The handle you add records to, look them up, and index. | Vector Table |
| Convertor | Turns PDFs into structured, embeddable chunks. | Convertor |
| Embeddings | The OpenAI models that turn text into vectors. | Embeddings |
| Search | Hybrid full-text + vector search with RRF reranking. | Search |
| Filtering | Composable conditions to scope queries. | Filtering |
Installation
The record shape
Every row in a table has the same three top-level fields:metadata with a DataType and the database carries it through everywhere — added records, search results, filters, and selected columns are all typed against it.
End-to-end example
How retrieval works
When you callsearch(), the table:
- Runs a vector search over the embedded
textand a full-text search over the same column. - Merges both result sets with an RRF reranker (Reciprocal Rank Fusion).
- Applies your
filterandselect, then returns the toplimitrecords, typed against yourmetadata.
Tables are created with three indexes on first
open() — a btree index on id, a full-text (FTS) index on text, and an IVF-PQ vector index on vector — so search is fast out of the box. See Vector Database for details.