emily.DocSearch =============== .. py:module:: emily.DocSearch .. autoapi-nested-parse:: Created on Thu May 7 12:26:18 2026 @author: Dr Peter J Bleackley Attributes ---------- .. autoapisummary:: emily.DocSearch.splitter Classes ------- .. autoapisummary:: emily.DocSearch.DocSearch Functions --------- .. autoapisummary:: emily.DocSearch.parse Module Contents --------------- .. py:data:: splitter .. py:function:: parse(sentence: str) -> list[str] Splits sentence into lower-case words with punctuation removed :param sentence: Sentence to be parsed :type sentence: str :returns: Sentence as a list of lower-case words :rtype: list[str] .. py:class:: DocSearch(embedding_url: str, reranking_url: str, vector_dir: str, collection_name: str, index_dir: Optional[str] = None) Overall search component .. py:attribute:: vector_index .. py:attribute:: reranker .. py:attribute:: sentence_tokenizer .. py:method:: sentences(text: str) -> list[list[str]] Splits a text into sentences, and then each sentence into a list of lower-case strings :param text: A document to be split into sentences. :type text: str :returns: Each sentence in the document is represented by a list of strings :rtype: list[list[str]] .. py:method:: add_documents(corpus: collections.abc.AsyncIterable[tuple[str, str]]) :async: Adds documents to the database :param corpus: Documents as tuples of (filename,text) :type corpus: AsyncIterable[tuple[str,str]] :rtype: None. .. py:method:: __call__(query: str, top_k: int = 10) -> pandas.Series :async: Searches for documents relevant to query. Finds the 2*top_k best matches from the vector database, the automatically thresholded reposnses from each of the Okapi and Cooccurrence indices, retrieves the text and reranks them :param query: Text to query for. :type query: str :param top_k: Number of results to return. The default is 10. :type top_k: int, optional :returns: The reranker scores of the top_k best matching documents, indexed by their filenames. :rtype: pd.Series .. py:method:: save(path: str) Saves search indices :param path: Directory in which to save indices. :type path: str :rtype: None. .. py:method:: clear() :async: Clears indices :rtype: None.