Skip to main content
Having larger configurations and multiple choices may affect the optimization speed and will require more trials to achieve better results.

Introduction

The RAG system uses a YAML-based configuration file to define all aspects of your retrieval-augmented generation pipeline. This includes indexing parameters, vector stores, embeddings, retrieval strategies, and language models.

Configuration Structure

Your configuration file consists of several main sections:
  • Indexing Parameters: Control how documents are chunked and processed
  • Vector Stores: Choose and configure your vector database
  • Embeddings: Select embedding models for document representation
  • Search Configuration: Define retrieval strategies
  • Language Models: Configure generation models
  • Retrieval Settings: Fine-tune the number of retrieved documents

Search Space Types

Each parameter has a searchspace_type that defines how it can be configured:
Numeric parameters with a range of values
    chunk_size:
      searchspace_type: continuous
      bounds: [500, 2000]
      dtype: int
Parameters with predefined choices
    vector_store:
      searchspace_type: categorical
      choices:
        faiss: {...}
        chroma: {...}
True/false parameters
    use_reranker:
      searchspace_type: boolean
      allow_multiple: true

Indexing Parameters

Configure how your documents are processed and chunked.
chunk_size
int
default:"512"
Size of text chunks in characters - Range: 200-2000 - Recommendation: 500-1000 for most use cases
chunk_overlap
int
default:"50"
Overlap between consecutive chunks in characters - Range: 0-500 - Recommendation: 10-20% of chunk_size
max_tokens
int
default:"500"
Maximum tokens for generation - Range: 100-1000
temperature
float
default:"0.7"
Temperature for generation randomness - Range: 0.0-1.0 - Lower values: More deterministic - Higher values: More creative

Example

chunk_size:
  searchspace_type: continuous
  bounds: [500, 2000]
  dtype: int

chunk_overlap:
  searchspace_type: continuous
  bounds: [0, 500]
  dtype: int

temperature:
  searchspace_type: continuous
  bounds: [0.0, 1.0]
  dtype: float

Vector Stores

Choose from multiple vector database options, each with different pricing models:
  • FAISS: Local, free vector store with no cloud costs
  • Chroma: Open-source vector database with persistent storage
  • Pinecone: Managed vector database with cloud pricing
  • Weaviate: Open-source vector database with cloud version available

Configuration Structure

vector_store:
  searchspace_type: categorical
  choices:
    <store_name>:
      api_key: "your_api_key"
      api_base: null
      index_name: "your_index"
      cloud_config: null
      pricing:
        storage_per_gb_month: 0.0
        read_operations_per_1k: 0.0
        write_operations_per_1k: 0.0
        query_per_1k: 0.0

Examples

    faiss:
      api_key: null
      api_base: null
      index_name: null
      cloud_config: null
      pricing:
        storage_per_gb_month: 0.0
        read_operations_per_1k: 0.0
        write_operations_per_1k: 0.0
        query_per_1k: 0.0

Embeddings

Select embedding models to convert text into vector representations.

Supported Providers

  • OpenAI: High-quality embeddings with various model sizes
  • HuggingFace: Free, open-source embedding models
  • Sentence Transformers: Optimized models for semantic similarity
  • Claude: Anthropic’s embedding models

OpenAI Embeddings

embedding:
  searchspace_type: categorical
  choices:
    openai:
      api_key: "sk-proj-***"
      api_base: null
      models:
        - "text-embedding-ada-002"
        - "text-embedding-3-small"
        - "text-embedding-3-large"
      pricing:
        text-embedding-ada-002:
          cost_per_1k_tokens: 0.0001
        text-embedding-3-small:
          cost_per_1k_tokens: 0.00002
        text-embedding-3-large:
          cost_per_1k_tokens: 0.00013
Recommendation: text-embedding-3-small offers the best balance of cost and performance

HuggingFace Embeddings

huggingface:
  api_key: "hf_***"
  api_base: null
  models:
    - "all-MiniLM-L6-v2"
    - "sentence-transformers/all-mpnet-base-v2"
  pricing:
    all-MiniLM-L6-v2:
      cost_per_1k_tokens: 0.0
    sentence-transformers/all-mpnet-base-v2:
      cost_per_1k_tokens: 0.0
HuggingFace models are free for local or self-hosted deployments

Sentence Transformers

sentence-transformers:
  api_key: null
  api_base: null
  models:
    - "all-MiniLM-L6-v2"
    - "multi-qa-mpnet-base-dot-v1"
    - "paraphrase-multilingual-mpnet-base-v2"
  pricing:
    all-MiniLM-L6-v2:
      cost_per_1k_tokens: 0.0

Claude Embeddings

claude:
  api_key: "sk-ant-api03-***"
  api_base: null
  models:
    - "claude-3-embedding"
  pricing:
    claude-3-embedding:
      cost_per_1k_tokens: 0.00001

Search Types

Configure retrieval strategies for finding relevant documents.
search_type:
  searchspace_type: categorical
  choices: ["similarity", "mmr", "bm25", "tfidf", "hybrid"]

Available Search Types

  • similarity: Pure vector similarity search using cosine similarity
  • mmr: Maximum Marginal Relevance for diversity in results
  • bm25: Traditional keyword-based search algorithm
  • tfidf: Term frequency-inverse document frequency ranking
  • hybrid: Combines vector and keyword search for best results

Retrieval Settings

k:
  searchspace_type: continuous
  bounds: [1, 20]
  dtype: int
k
int
default:"5"
Number of documents to retrieve - Range: 1-20 - Lower values: More focused context - Higher values: Broader context, may include noise

Reranking

Optionally rerank retrieved documents for better relevance.
use_reranker:
  searchspace_type: boolean
  allow_multiple: true

reranker:
  searchspace_type: categorical
  choices:
    cross_encoder:
      model: "cross-encoder/ms-marco-MiniLM-L-6-v2"
      top_k: 5
    colbert:
      model: "colbert-ir/colbertv2.0"
      top_k: 5
    bge:
      model: "BAAI/bge-reranker-large"
      top_k: 5

Available Reranker Types

  • cross_encoder: Cross-attention models for precise relevance scoring
  • colbert: Late interaction models for efficient reranking
  • bge: BGE reranker models from Beijing Academy of AI

Language Models

Configure the LLM for generation. Multiple providers are supported.

OpenAI Models

llm:
  searchspace_type: categorical
  choices:
    openai:
      api_key: "sk-proj-***"
      api_base: null
      models: ["gpt-3.5-turbo", "gpt-4", "gpt-4-turbo", "gpt-4o"]
      pricing:
        gpt-3.5-turbo:
          input: 0.0005
          output: 0.0015
        gpt-4o:
          input: 0.005
          output: 0.015
          cache_read: 0.0025
          cache_creation: 0.0125

Anthropic Models

anthropic:
  api_key: "sk-ant-api03-***"
  api_base: null
  models:
    - "claude-3-haiku-20240307"
    - "claude-3-7-sonnet-latest"
    - "claude-opus-4-1-20250805"
  pricing:
    claude-3-haiku-20240307:
      input: 0.00025
      output: 0.00125
    claude-opus-4-1-20250805:
      input: 0.015
      output: 0.075

Azure OpenAI

azure:
  api_key: "***"
  api_base: "https://your-resource.openai.azure.com/"
  models: ["gpt-35-turbo", "gpt-4"]
  pricing:
    gpt-35-turbo:
      input: 0.0005
      output: 0.0015

DeepSeek Models

deepseek:
  api_key: "sk-***"
  api_base: null
  models: ["deepseek-chat", "deepseek-coder"]
  pricing:
    deepseek-chat:
      input: 0.00014
      output: 0.00028

HuggingFace Models (Free)

huggingface:
  api_key: "hf_***"
  api_base: null
  models:
    - "arnir0/Tiny-LLM"
    - "gpt2-medium"
    - "BEE-spoke-data/smol_llama-101M-GQA"
  pricing:
    gpt2-medium:
      input: 0.0
      output: 0.0

Complete Configuration Example

Here’s a complete configuration file with all sections:
# Indexing parameters
chunk_size:
  searchspace_type: continuous
  bounds: [500, 2000]
  dtype: int

max_tokens:
  searchspace_type: continuous
  bounds: [100, 1000]
  dtype: int

chunk_overlap:
  searchspace_type: continuous
  bounds: [0, 500]
  dtype: int

temperature:
  searchspace_type: continuous
  bounds: [0.0, 1.0]
  dtype: float

# Vector store
vector_store:
  searchspace_type: categorical
  choices:
    faiss:
      api_key: null
      pricing:
        storage_per_gb_month: 0.0

# Search configuration
search_type:
  searchspace_type: categorical
  choices: ["similarity", "mmr", "hybrid"]

# Embeddings
embedding:
  searchspace_type: categorical
  choices:
    openai:
      api_key: "sk-proj-***"
      models: ["text-embedding-3-small"]
      pricing:
        text-embedding-3-small:
          cost_per_1k_tokens: 0.00002

# Retrieval
k:
  searchspace_type: continuous
  bounds: [1, 20]
  dtype: int

# Reranking
use_reranker:
  searchspace_type: boolean
  allow_multiple: true

# Generation
llm:
  searchspace_type: categorical
  choices:
    openai:
      api_key: "sk-proj-***"
      models: ["gpt-4o"]
      pricing:
        gpt-4o:
          input: 0.005
          output: 0.015
Replace all API keys in your configuration file with appropriate placeholders or environment variable references:
api_key: "${OPENAI_API_KEY}" # Reference environment variable