5.3.2.1. gemini_application.chatpopup.chatpopup
Interactive chat pop-up application using Azure OpenAI or local Ollama.
Supports document ingestion into ChromaDB and retrieval-augmented generation.
Classes
Retrieval-augmented chat application built on ChromaDB + Ollama/Azure OpenAI. |
|
|
A single text chunk with id and metadata for storage and citation. |
- class gemini_application.chatpopup.chatpopup.ChatPopup[source]
Bases:
ApplicationAbstractRetrieval-augmented chat application built on ChromaDB + Ollama/Azure OpenAI.
Initialize configuration fields; actual clients are created in initialize_model().
- build_prompt(user_message, selected)[source]
Build RAG prompt and return structured citation items for UI display.
- Return type:
Tuple[str,List[Dict[str,Any]]]
- chunk_text_with_metadata(source, page, text, file_sig, lang='unknown', translated=False)[source]
Normalize and chunk page text, then attach metadata for citations and filtering.
- Return type:
List[ChunkRecord]
- chunksplitter_for_embeddings(text, max_words, overlap_words=0)[source]
Split text into overlapping word chunks suitable for embedding models.
- Return type:
List[str]
- detect_language(text)[source]
Detect language of a text sample. Returns ‘en’, ‘nl’ etc. .
- Return type:
str
- embed_one_ollama(text)[source]
Embed one text, shrinking it iteratively if Ollama rejects it as too long.
- Return type:
List[float]
- file_signature(file_path)[source]
Return a lightweight signature used to detect file changes.
- Return type:
Dict[str,Any]
- filter_context(context)[source]
Extract and filter Chroma query results by similarity threshold.
- Return type:
Dict[str,Any]
- get_embedding(user_message)[source]
Embed a user query string for retrieval.
- Return type:
List[float]
- get_embedding_list(chunks)[source]
Embed a list of chunks using batching.
- Return type:
List[List[float]]
- init_parameters(parameters)[source]
Apply parameters from a dict and initialize models and database clients.
- Return type:
None
- initialize_model()[source]
Create LLM/embedding clients and open the Chroma collection.
- Return type:
None
- is_context_error(e)[source]
Return True if an exception indicates an embedding context-length overflow.
- Return type:
bool
- load_manifest()[source]
Load the manifest containing file signatures for incremental ingestion.
- Return type:
Dict[str,Any]
- load_pdf_pages(file_path)[source]
Load a PDF and return a list of (page_index, page_text).
- Return type:
List[Tuple[int,str]]
- maybe_translate_text(text)[source]
Detect language and translate to English if needed.
- Return type:
tuple[str,str,bool]
- mmr_rerank(query_emb, candidates, top_k, lam)[source]
Select a diverse set of relevant chunks using Max Marginal Relevance (MMR).
- Return type:
List[Dict[str,Any]]
- process_prompt(user_message)[source]
Answer a question by retrieving relevant chunks and generating a grounded response.
- Return type:
Dict[str,Any]
- safe_ollama_embed_batch(texts)[source]
Embed a batch; if the batch fails, embed items individually with shrink-on-failure.
- Return type:
List[List[float]]