Documentation Index Fetch the complete documentation index at: https://ragopt.aboneda.com/llms.txt
Use this file to discover all available pages before exploring further.
Having larger configurations and multiple choices may affect the optimization
speed and will require more trials to achieve better results.
Introduction
The RAG system uses a YAML-based configuration file to define all aspects of your retrieval-augmented generation pipeline. This includes indexing parameters, vector stores, embeddings, retrieval strategies, and language models.
Configuration Structure
Your configuration file consists of several main sections:
Indexing Parameters : Control how documents are chunked and processed
Vector Stores : Choose and configure your vector database
Embeddings : Select embedding models for document representation
Search Configuration : Define retrieval strategies
Language Models : Configure generation models
Retrieval Settings : Fine-tune the number of retrieved documents
Search Space Types
Each parameter has a searchspace_type that defines how it can be configured:
Numeric parameters with a range of values chunk_size :
searchspace_type : continuous
bounds : [ 500 , 2000 ]
dtype : int
Parameters with predefined choices vector_store :
searchspace_type : categorical
choices :
faiss : { ... }
chroma : { ... }
True/false parameters use_reranker :
searchspace_type : boolean
allow_multiple : true
Indexing Parameters
Configure how your documents are processed and chunked.
Size of text chunks in characters - Range : 200-2000 - Recommendation :
500-1000 for most use cases
Overlap between consecutive chunks in characters - Range : 0-500 -
Recommendation : 10-20% of chunk_size
Maximum tokens for generation - Range : 100-1000
Temperature for generation randomness - Range : 0.0-1.0 - Lower values :
More deterministic - Higher values : More creative
Example
chunk_size :
searchspace_type : continuous
bounds : [ 500 , 2000 ]
dtype : int
chunk_overlap :
searchspace_type : continuous
bounds : [ 0 , 500 ]
dtype : int
temperature :
searchspace_type : continuous
bounds : [ 0.0 , 1.0 ]
dtype : float
Vector Stores
Choose from multiple vector database options, each with different pricing models:
FAISS : Local, free vector store with no cloud costs
Chroma : Open-source vector database with persistent storage
Pinecone : Managed vector database with cloud pricing
Weaviate : Open-source vector database with cloud version available
Configuration Structure
vector_store :
searchspace_type : categorical
choices :
<store_name> :
api_key : "your_api_key"
api_base : null
index_name : "your_index"
cloud_config : null
pricing :
storage_per_gb_month : 0.0
read_operations_per_1k : 0.0
write_operations_per_1k : 0.0
query_per_1k : 0.0
Examples
faiss :
api_key : null
api_base : null
index_name : null
cloud_config : null
pricing :
storage_per_gb_month : 0.0
read_operations_per_1k : 0.0
write_operations_per_1k : 0.0
query_per_1k : 0.0
chroma: api_key: null api_base: null index_name:
cloud_config : null pricing : storage_per_gb_month : 0.0 read_operations_per_1k :
0.0 write_operations_per_1k : 0.0 query_per_1k : 0.0 ```
</Tab>
<Tab title="Weaviate">
``` yaml
weaviate :
api_key : "wv_***"
api_base : "https://your-instance.weaviate.network"
index_name : "Documents"
cloud_config : null
pricing :
storage_per_gb_month : 0.25
read_operations_per_1k : 0.0001
write_operations_per_1k : 0.0005
query_per_1k : 0.0003
Embeddings
Select embedding models to convert text into vector representations.
Supported Providers
OpenAI : High-quality embeddings with various model sizes
HuggingFace : Free, open-source embedding models
Sentence Transformers : Optimized models for semantic similarity
Claude : Anthropic’s embedding models
OpenAI Embeddings
embedding :
searchspace_type : categorical
choices :
openai :
api_key : "sk-proj-***"
api_base : null
models :
- "text-embedding-ada-002"
- "text-embedding-3-small"
- "text-embedding-3-large"
pricing :
text-embedding-ada-002 :
cost_per_1k_tokens : 0.0001
text-embedding-3-small :
cost_per_1k_tokens : 0.00002
text-embedding-3-large :
cost_per_1k_tokens : 0.00013
Recommendation : text-embedding-3-small offers the best balance of cost
and performance
HuggingFace Embeddings
huggingface :
api_key : "hf_***"
api_base : null
models :
- "all-MiniLM-L6-v2"
- "sentence-transformers/all-mpnet-base-v2"
pricing :
all-MiniLM-L6-v2 :
cost_per_1k_tokens : 0.0
sentence-transformers/all-mpnet-base-v2 :
cost_per_1k_tokens : 0.0
HuggingFace models are free for local or self-hosted deployments
sentence-transformers :
api_key : null
api_base : null
models :
- "all-MiniLM-L6-v2"
- "multi-qa-mpnet-base-dot-v1"
- "paraphrase-multilingual-mpnet-base-v2"
pricing :
all-MiniLM-L6-v2 :
cost_per_1k_tokens : 0.0
Claude Embeddings
claude :
api_key : "sk-ant-api03-***"
api_base : null
models :
- "claude-3-embedding"
pricing :
claude-3-embedding :
cost_per_1k_tokens : 0.00001
Search Types
Configure retrieval strategies for finding relevant documents.
search_type :
searchspace_type : categorical
choices : [ "similarity" , "mmr" , "bm25" , "tfidf" , "hybrid" ]
Available Search Types
similarity : Pure vector similarity search using cosine similarity
mmr : Maximum Marginal Relevance for diversity in results
bm25 : Traditional keyword-based search algorithm
tfidf : Term frequency-inverse document frequency ranking
hybrid : Combines vector and keyword search for best results
Retrieval Settings
k :
searchspace_type : continuous
bounds : [ 1 , 20 ]
dtype : int
Number of documents to retrieve - Range : 1-20 - Lower values : More
focused context - Higher values : Broader context, may include noise
Reranking
Optionally rerank retrieved documents for better relevance.
use_reranker :
searchspace_type : boolean
allow_multiple : true
reranker :
searchspace_type : categorical
choices :
cross_encoder :
model : "cross-encoder/ms-marco-MiniLM-L-6-v2"
top_k : 5
colbert :
model : "colbert-ir/colbertv2.0"
top_k : 5
bge :
model : "BAAI/bge-reranker-large"
top_k : 5
Available Reranker Types
cross_encoder : Cross-attention models for precise relevance scoring
colbert : Late interaction models for efficient reranking
bge : BGE reranker models from Beijing Academy of AI
Language Models
Configure the LLM for generation. Multiple providers are supported.
OpenAI Models
llm :
searchspace_type : categorical
choices :
openai :
api_key : "sk-proj-***"
api_base : null
models : [ "gpt-3.5-turbo" , "gpt-4" , "gpt-4-turbo" , "gpt-4o" ]
pricing :
gpt-3.5-turbo :
input : 0.0005
output : 0.0015
gpt-4o :
input : 0.005
output : 0.015
cache_read : 0.0025
cache_creation : 0.0125
Anthropic Models
anthropic :
api_key : "sk-ant-api03-***"
api_base : null
models :
- "claude-3-haiku-20240307"
- "claude-3-7-sonnet-latest"
- "claude-opus-4-1-20250805"
pricing :
claude-3-haiku-20240307 :
input : 0.00025
output : 0.00125
claude-opus-4-1-20250805 :
input : 0.015
output : 0.075
Azure OpenAI
azure :
api_key : "***"
api_base : "https://your-resource.openai.azure.com/"
models : [ "gpt-35-turbo" , "gpt-4" ]
pricing :
gpt-35-turbo :
input : 0.0005
output : 0.0015
DeepSeek Models
deepseek :
api_key : "sk-***"
api_base : null
models : [ "deepseek-chat" , "deepseek-coder" ]
pricing :
deepseek-chat :
input : 0.00014
output : 0.00028
HuggingFace Models (Free)
huggingface :
api_key : "hf_***"
api_base : null
models :
- "arnir0/Tiny-LLM"
- "gpt2-medium"
- "BEE-spoke-data/smol_llama-101M-GQA"
pricing :
gpt2-medium :
input : 0.0
output : 0.0
Complete Configuration Example
Here’s a complete configuration file with all sections:
# Indexing parameters
chunk_size :
searchspace_type : continuous
bounds : [ 500 , 2000 ]
dtype : int
max_tokens :
searchspace_type : continuous
bounds : [ 100 , 1000 ]
dtype : int
chunk_overlap :
searchspace_type : continuous
bounds : [ 0 , 500 ]
dtype : int
temperature :
searchspace_type : continuous
bounds : [ 0.0 , 1.0 ]
dtype : float
# Vector store
vector_store :
searchspace_type : categorical
choices :
faiss :
api_key : null
pricing :
storage_per_gb_month : 0.0
# Search configuration
search_type :
searchspace_type : categorical
choices : [ "similarity" , "mmr" , "hybrid" ]
# Embeddings
embedding :
searchspace_type : categorical
choices :
openai :
api_key : "sk-proj-***"
models : [ "text-embedding-3-small" ]
pricing :
text-embedding-3-small :
cost_per_1k_tokens : 0.00002
# Retrieval
k :
searchspace_type : continuous
bounds : [ 1 , 20 ]
dtype : int
# Reranking
use_reranker :
searchspace_type : boolean
allow_multiple : true
# Generation
llm :
searchspace_type : categorical
choices :
openai :
api_key : "sk-proj-***"
models : [ "gpt-4o" ]
pricing :
gpt-4o :
input : 0.005
output : 0.015
Replace all API keys in your configuration file with appropriate placeholders or environment variable references:
api_key : "${OPENAI_API_KEY}" # Reference environment variable