emily.index_classes.CooccurrenceIndex ===================================== .. py:module:: emily.index_classes.CooccurrenceIndex .. autoapi-nested-parse:: Created on Thu May 7 08:43:03 2026 @author: Dr Peter J Bleackley Attributes ---------- .. autoapisummary:: emily.index_classes.CooccurrenceIndex.stop Classes ------- .. autoapisummary:: emily.index_classes.CooccurrenceIndex.CooccurrenceIndex Module Contents --------------- .. py:data:: stop .. py:class:: CooccurrenceIndex(path: Optional[pathlib.Path] = None) Uses information theory to identify documents in which the query terms tend to occur in the same sentences. See `this page `_ for details of the algorithm. .. py:method:: add_documents(corpus: collections.abc.AsyncIterable[tuple[str, list[list[str]]]]) :async: Adds documents to the index :param corpus: Iterable of tuples of filename, and the document as a list of list of strings (parsed sentences) :type corpus: AsyncIterable[tuple[str,list[list[str]]]] :rtype: None. .. py:method:: __call__(query: list[str]) -> polars.LazyFrame Finds candidate documents where the words in the query tend to cooccur. Uses the Pareto principal to automatically threhold the results. For N candidate documents Np results will be returned, such that the account for (1-p) of the total relevance of the sample :param query: The query as a list of strings :type query: list[str] :returns: Contains a single column, "filename" :rtype: pl.LazyFrame .. py:method:: save(path: pathlib.Path) Saves the indices to parquet files :param path: Directory to save indices in :type path: Path :rtype: None. .. py:method:: clear() :async: Clears indices :rtype: None.