emily.DocSearch
Created on Thu May 7 12:26:18 2026
@author: Dr Peter J Bleackley
Attributes
Classes
Overall search component |
Functions
|
Splits sentence into lower-case words with punctuation removed |
Module Contents
- emily.DocSearch.parse(sentence: str) list[str][source]
Splits sentence into lower-case words with punctuation removed
- Parameters:
sentence (str) – Sentence to be parsed
- Returns:
Sentence as a list of lower-case words
- Return type:
list[str]
- class emily.DocSearch.DocSearch(embedding_url: str, reranking_url: str, vector_dir: str, collection_name: str, index_dir: str | None = None)[source]
Overall search component
- sentences(text: str) list[list[str]][source]
Splits a text into sentences, and then each sentence into a list of lower-case strings
- Parameters:
text (str) – A document to be split into sentences.
- Returns:
Each sentence in the document is represented by a list of strings
- Return type:
list[list[str]]
- async add_documents(corpus: collections.abc.AsyncIterable[tuple[str, str]])[source]
Adds documents to the database
- Parameters:
corpus (AsyncIterable[tuple[str,str]]) – Documents as tuples of (filename,text)
- Return type:
None.
- async __call__(query: str, top_k: int = 10) pandas.Series[source]
Searches for documents relevant to query. Finds the 2*top_k best matches from the vector database, the automatically thresholded reposnses from each of the Okapi and Cooccurrence indices, retrieves the text and reranks them
- Parameters:
query (str) – Text to query for.
top_k (int, optional) – Number of results to return. The default is 10.
- Returns:
The reranker scores of the top_k best matching documents, indexed by their filenames.
- Return type:
pd.Series