Self-Hosted

This guide will walk you through installing and running simba on your local system using both pip, git or docker you can choose the method that suits you best, if you want to use the SDK for free, we recommand using the pip installation method, if you want to have more control over the source code we recommand installing the full system. If you want to use the prebuilt solution, we recommand docker.

Installation Methods

pip (SDK & Core)
Source (Full System)
Docker

Install simba-core

simba-core is the PyPi package that contains the server logic and API, it is necessary to run it to be able to use the SDK

pip install simba-core

To install the dependencies faster we recommand using uv

pip install uv 
uv pip install simba-core

Create a config.yaml file

The config.yaml file is one of the most important files of this setup, because it’s what will parameter the Embedding model, vector store type, retreival strategy , database, worker celery for parsing and also the llm you’re usingGo to your project root and modify config.yaml, you can get inspired from this one below

project:
  name: "Simba"
  version: "1.0.0"
  api_version: "/api/v1"

paths:
  base_dir: null  # Will be set programmatically
  faiss_index_dir: "vector_stores/faiss_index"
  vector_store_dir: "vector_stores"

llm:
  provider: "openai" #OPTIONS:ollama,openai
  model_name: "gpt-4o-mini"
  temperature: 0.0
  max_tokens: null
  streaming: true
  additional_params: {}

embedding:
  provider: "huggingface"
  model_name: "BAAI/bge-base-en-v1.5"
  device: "cpu"  # OPTIONS: cpu,cuda,mps
  additional_params: {}

vector_store:
  provider: "faiss"
  collection_name: "simba_collection"

  additional_params: {}

chunking:
  chunk_size: 512
  chunk_overlap: 200

retrieval:
  method: "hybrid" # OPTIONS: default, semantic, keyword, hybrid, ensemble, reranked
  k: 5
  # Method-specific parameters
  params:
    # Semantic retrieval parameters
    score_threshold: 0.5
    
    # Hybrid retrieval parameters
    prioritize_semantic: true
    
    # Ensemble retrieval parameters
    weights: [0.7, 0.3]  # Weights for semantic and keyword retrievers
    
    # Reranking parameters
    reranker_model: colbert
    reranker_threshold: 0.7

# Database configuration
database:
  provider: litedb # Options: litedb, sqlite
  additional_params: {}

celery: 
  broker_url: ${CELERY_BROKER_URL:-redis://redis:6379/0}
  result_backend: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1}

The config file should be at the same place where your running simba, otherwise that’s not going to work

Create .env file

If you need to use openai, or mistral AI, or you want to log the chatbot traces using langsmith, or use ollama, you should specify it in your .env

OPENAI_API_KEY=your_openai_api_key  #(optional)
MISTRAL_API_KEY=your_mistral_api_key #(optional)
LANGCHAIN_TRACING_V2=true #(optional) 
LANGCHAIN_API_KEY=your_langchain_api_key (#optional)
REDIS_HOST=localhost 
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/1

Run the server

Now that you have your .env, and config.yaml, you can run the following command

simba server

This will start the server at http://localhost:8000. You will see a logging message in the console

Starting Simba server...
INFO:     Started server process [62940]
INFO:     Waiting for application startup.
2025-03-12 16:42:50 - simba.__main__ - INFO - ==================================================
2025-03-12 16:42:50 - simba.__main__ - INFO - Starting SIMBA Application
2025-03-12 16:42:50 - simba.__main__ - INFO - ==================================================
2025-03-12 16:42:50 - simba.__main__ - INFO - Project Name: Simba
2025-03-12 16:42:50 - simba.__main__ - INFO - Version: 1.0.0
2025-03-12 16:42:50 - simba.__main__ - INFO - LLM Provider: openai
2025-03-12 16:42:50 - simba.__main__ - INFO - LLM Model: gpt-4o
2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Provider: huggingface
2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Model: BAAI/bge-base-en-v1.5
2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Device: mps
2025-03-12 16:42:50 - simba.__main__ - INFO - Vector Store Provider: faiss
2025-03-12 16:42:50 - simba.__main__ - INFO - Database Provider: litedb
2025-03-12 16:42:50 - simba.__main__ - INFO - Retrieval Method: hybrid
2025-03-12 16:42:50 - simba.__main__ - INFO - Retrieval Top-K: 5
2025-03-12 16:42:50 - simba.__main__ - INFO - Base Directory: /Users/mac/Documents/simba
2025-03-12 16:42:50 - simba.__main__ - INFO - Upload Directory: /Users/mac/Documents/simba/uploads
2025-03-12 16:42:50 - simba.__main__ - INFO - Vector Store Directory: /Users/mac/Documents/simba/vector_stores
2025-03-12 16:42:50 - simba.__main__ - INFO - ==================================================
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Install SDK

You can now install the SDK to start using simba SDK in local mode

pip install simba-client

Basic usage

from simba_sdk import SimbaClient
      
client = SimbaClient(api_url="http://localhost:8000") 
document = client.documents.create(file_path="path/to/your/document.pdf")
document_id = document[0]["id"]
      
parsing_result = client.parser.parse_document(document_id,parser="docling", sync=True)
      
retrieval_results = client.retriever.retrieve(query="your-query")
      
for result in retrieval_results["documents"]:
  print(f"Content: {result['page_content']}")
  print(f"Metadata: {result['metadata']['source']}")
  print("====" * 10)

Dependencies

Simba has the following key dependencies:

Core Dependencies

FastAPI: Web framework for the backend API
Ollama: For running the LLM inference (optional)
Redis: For caching and task queues
PostgreSQL: For database interactions
Celery: Distributed task queue for background processing
Pydantic: Data validation and settings management

Vector Store Support

FAISS: Facebook AI Similarity Search for efficient vector storage
Chroma: ChromaDB integration for document embeddings
Pinecone (optional): For cloud-based vector storage
Milvus (optional): For distributed vector search

Embedding Models

OpenAI: For text embeddings
HuggingFace Transformers (optional): For text processing

Frontend

React: UI library
TypeScript: For type-safe JavaScript
Vite: Frontend build tool
Tailwind CSS: Utility-first CSS framework

Troubleshooting

to be added…

Next Steps

Once you have Simba installed, proceed to:

Getting Started

Core Concepts

Installation Methods

Dependencies

Troubleshooting

Next Steps

Getting Started

Core Concepts

​Installation Methods

​Dependencies

​Troubleshooting

​Next Steps

Installation Methods

Dependencies

Troubleshooting

Next Steps