emily.index_classes.OkapiIndex
Created on Wed May 6 14:09:33 2026
@author: Dr Peter J Bleackley
Attributes
Classes
Indexes documents using BM25 |
Module Contents
- class emily.index_classes.OkapiIndex.OkapiIndex(config_path: pathlib.Path | None = None)[source]
Indexes documents using BM25
- async add_documents(corpus: collections.abc.AsyncIterable[tuple[str, list[str]]])[source]
Adds documents to the index and computes the TF component
- Parameters:
corpus (AsyncIterable[tuple[str,list[str]]]) – Documents to add. Tuple of a filename and a list of strings, which contains the preprocessed (lower cased, punctuation removed) contents of the document
- Return type:
None.
- __call__(query: list[str]) polars.LazyFrame[source]
Finds candidate documents with the highest BM25 score. Uses the Pareto principal to automatically threhold the results. For N candidate documents Np results will be returned, such that the account for (1-p) of the total relevance of the sample
- Parameters:
query (list[str]) – Parsed query string
- Returns:
LazyFrame containing a single column, “filename”
- Return type:
polars.LazyFrame