emily.index_classes.OkapiIndex ============================== .. py:module:: emily.index_classes.OkapiIndex .. autoapi-nested-parse:: Created on Wed May 6 14:09:33 2026 @author: Dr Peter J Bleackley Attributes ---------- .. autoapisummary:: emily.index_classes.OkapiIndex.stop Classes ------- .. autoapisummary:: emily.index_classes.OkapiIndex.OkapiIndex Module Contents --------------- .. py:data:: stop .. py:class:: OkapiIndex(config_path: Optional[pathlib.Path] = None) Indexes documents using BM25 .. py:method:: add_documents(corpus: collections.abc.AsyncIterable[tuple[str, list[str]]]) :async: Adds documents to the index and computes the TF component :param corpus: Documents to add. Tuple of a filename and a list of strings, which contains the preprocessed (lower cased, punctuation removed) contents of the document :type corpus: AsyncIterable[tuple[str,list[str]]] :rtype: None. .. py:method:: __call__(query: list[str]) -> polars.LazyFrame Finds candidate documents with the highest BM25 score. Uses the Pareto principal to automatically threhold the results. For N candidate documents Np results will be returned, such that the account for (1-p) of the total relevance of the sample :param query: Parsed query string :type query: list[str] :returns: LazyFrame containing a single column, "filename" :rtype: polars.LazyFrame .. py:method:: save(path: pathlib.Path) Saves the index to Parquet :param path: Filename to save index to. :type path: Path :rtype: None. .. py:method:: clear() :async: Clears the index :rtype: None.