Cognitor: open-source semantic search engine. Automatically chunks, embeds and indexes the content of a target folder, making it searchable semantically.
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
https://github.com/tanaos/cognitor
Cognitor is an open-source semantic search engine and vector database which automatically chunks, embeds and indexes the entire content of a target folder (and its subfolders), making it easily searchable by both AI agents and humans. Processing happens 100% locally by default, via sentence-transformers.
It provides a simple REST API to query the indexed data via natural language, and can be used as a standalone semantic search engine, a vector database, or as a backend for your applications.
How does it work?
Cognitor consists of two main components:
- Search engine: a vector database which stores document embeddings, full text and metadata, and provides a simple REST API to query the indexed information.
- Worker: a background process that monitors a specified folder for changes, automatically chunks and embeds the content of the files, and updates the vector database accordingly.
How to use?
1. Clone the repo
git clone https://github.com/tanaos/cognitor.git cd cognitor 2. Start search engine + worker
Configure the following environment variables in your .env file (at the root of the project):
# Absolute path on your host machine to ingest DOCS_FOLDER=/path/to/your/docs # Name of the collection in which the worker will store the indexed documents COGNITOR_COLLECTION_NAME=cognitor-worker-documents Start both the search engine and the worker with
docker compose --profile worker up -d 3. Integrate with your applications
We provide SDKs for:
Alternatively, you can use any HTTP client to interact with the REST API exposed on http://localhost:7530 or the Swagger UI at http://localhost:7530/docs.
Sample Python integration
Install the SDK:
pip install cognitor Use it in your code:
from cognitor import Cognitor with Cognitor("http://localhost:7530") as client: # Check if the search engine is ready to accept requests print(client.health_ready()) # "ready" or "loading" # Search by text query response = client.search("my-collection", query_text="Hello", top_k=10) print(response) See the Python SDK page for more examples and documentation.
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.