Skip to main content
The RAGPipelineManager is the heart of the optimization process. It orchestrates component loading, caching, and configuration sampling to efficiently evaluate thousands of RAG configurations.

Overview

The manager handles:
  • Component Loading: Initialize LLMs, embeddings, vector stores, and rerankers
  • Caching: Reuse components across configurations to save time and cost
  • Configuration Sampling: Generate RAG configs from the search space
  • Encoding/Decoding: Convert between RAGConfig objects and optimization tensors
  • Parallel Processing: Batch evaluation with thread pools
Important: You typically don’t need to interact with the RAG Manager directly. The Optimizer class handles it automatically.

How It Works

The manager operates in several key phases:

1. Component Initialization

When created, the manager:
from rag_opt.search_space import RAGSearchSpace
from rag_opt import RAGPipelineManager

search_space = RAGSearchSpace.from_yaml("./rag_config.yaml")

manager = RAGPipelineManager(
    search_space=search_space,
    eager_load=False,  # Load components on-demand
    max_workers=5      # Parallel workers
)

2. Lazy Loading & Caching

Components are loaded once and cached:
First Request: LLM(gpt-3.5) → Initialize → Cache
Second Request: LLM(gpt-3.5) → Return from Cache ✓
This dramatically reduces:
  • API initialization overhead
  • Memory usage
  • Evaluation time

3. Configuration Sampling

The manager samples configurations from the search space:
# Sample RAG configurations
from rag_opt import SamplerType,RAGPipelineManager
from rag_opt.search_space import RAGSearchSpace

search_space = RAGSearchSpace.from_yaml("./rag_config.yaml")
manager = RAGPipelineManager(
    search_space=search_space,
    eager_load=False,  # Load components on-demand
    max_workers=5      # Parallel workers
)

configs = manager.sample(
    n_samples=2,
    sampler_type=SamplerType.SOBOL
)

# Each config contains:
# - chunk_size, chunk_overlap, max_tokens
# - search_type, k
# - LLM, embeddings, vector store selections
# - temperature, reranker settings

4. Encoding for Optimization

Converts RAGConfig ↔ Tensor for Bayesian Optimization:

configs = manager.sample(
    n_samples=2,
    sampler_type=SamplerType.SOBOL
)
config = configs[0]

# Config → Pytorch Tensors (for optimizer)
tensor = manager.encode_rag_config_to_tensor(config)

# Tensors → Config (decode optimizer output)
config = manager.decode_sample_to_rag_config(tensor)
This allows the optimizer to work in continuous space while evaluating discrete configurations.

5. RAG Instance Creation

Creates RAGWorkflow instances with cached components:
rag = manager.create_rag_instance(
    config=rag_config,
    documents=train_docs,
    initialize=True
)

# rag now contains:
# - Cached LLM
# - Cached embeddings
# - Fresh vector store (per config)
# - Cached reranker (if enabled)

Integration with Optimizer

The optimizer uses the manager internally:
from rag_opt.optimizer import Optimizer

my_manager = RAGPipelineManager(
    search_space=search_space,
    eager_load=False,
    max_workers=5
)
optimizer = Optimizer(
    train_dataset=train_dataset,
    config_path="rag_config.yaml",
    verbose=True,
    custom_rag_pipeline_manager=my_manager
)
# Manager is created automatically
# optimizer.rag_pipeline_manager is ready to use

Create Custom Manager

class MyManager(AbstractRAGPipelineManager):
    """ Create a custom pipeline manager """
    # NOTE:: u have to create list of abstract methods. For more info see
    # https://github.com/GaiaAI-Hub/rag-opt/blob/main/src/rag_opt/_manager.py