emily.index_classes.OkapiIndex

Created on Wed May 6 14:09:33 2026

@author: Dr Peter J Bleackley

Attributes

stop

Classes

OkapiIndex

Indexes documents using BM25

Module Contents

emily.index_classes.OkapiIndex.stop[source]
class emily.index_classes.OkapiIndex.OkapiIndex(config_path: pathlib.Path | None = None)[source]

Indexes documents using BM25

async add_documents(corpus: collections.abc.AsyncIterable[tuple[str, list[str]]])[source]

Adds documents to the index and computes the TF component

Parameters:

corpus (AsyncIterable[tuple[str,list[str]]]) – Documents to add. Tuple of a filename and a list of strings, which contains the preprocessed (lower cased, punctuation removed) contents of the document

Return type:

None.

__call__(query: list[str]) polars.LazyFrame[source]

Finds candidate documents with the highest BM25 score. Uses the Pareto principal to automatically threhold the results. For N candidate documents Np results will be returned, such that the account for (1-p) of the total relevance of the sample

Parameters:

query (list[str]) – Parsed query string

Returns:

LazyFrame containing a single column, “filename”

Return type:

polars.LazyFrame

save(path: pathlib.Path)[source]

Saves the index to Parquet

Parameters:

path (Path) – Filename to save index to.

Return type:

None.

async clear()[source]

Clears the index

Return type:

None.