Python SDK
Cosdata Python SDK
A Python SDK for interacting with the Cosdata Vector Database.
Installation
You can install the Cosdata Python SDK from PyPI:
pip install cosdata-client
Basic Usage
Connecting to Cosdata
First, import the Cosdata client and establish a connection:
from cosdata import Client
# Initialize the client with your server detailsclient = Client( host="http://127.0.0.1:8443", # Default host username="admin", # Default username password="admin", # Default password verify=False # SSL verification)
Creating a Collection
Create a new vector collection:
# Create a collection for storing 768-dimensional vectorscollection = client.create_collection( name="my_collection", dimension=768, # Vector dimension description="My vector collection")
Creating an Index
Create an index for efficient vector search:
# Create an index with custom parametersindex = collection.create_index( distance_metric="cosine", # Default: cosine num_layers=10, # Default: 10 max_cache_size=1000, # Default: 1000 ef_construction=128, # Default: 128 ef_search=64, # Default: 64 neighbors_count=32, # Default: 32 level_0_neighbors_count=64 # Default: 64)
Adding Vectors
Add vectors to your collection using transactions:
# Generate some example vectorsimport numpy as np
def generate_random_vector(id: int, dimension: int) -> dict: values = np.random.uniform(-1, 1, dimension).tolist() return { "id": f"vec_{id}", "dense_values": values, "document_id": f"doc_{id//10}", # Group vectors into documents "metadata": { # Optional metadata "created_at": "2024-03-20", "category": "example" } }
# Generate and insert vectorsvectors = [generate_random_vector(i, 768) for i in range(100)]
# Add vectors using a transactionwith collection.transaction() as txn: # Single vector upsert txn.upsert_vector(vectors[0]) # Batch upsert for remaining vectors txn.batch_upsert_vectors(vectors[1:], max_workers=8, max_retries=3)
Searching Vectors
Perform similarity search:
# Search for similar vectorsresults = collection.search.dense( query_vector=vectors[0]["dense_values"], # Use first vector as query top_k=5, # Number of nearest neighbors return_raw_text=True)
# Fetch a specific vectorvector = collection.vectors.get("vec_1")
Managing Collections
List and get information about collections:
# Get collection informationcollection_info = collection.get_info()print(f"Collection info: {collection_info}")
# List all collectionsprint("Available collections:")for coll in client.collections(): print(f" - {coll.name}")
Version Management
Manage collection versions:
# Get current versioncurrent_version = collection.versions.get_current()print(f"Current version: {current_version}")
# List all versionsall_versions = collection.versions.list()for v in all_versions: print(f"Version {v['hash']}: {v['version_number']}")
Generating Embeddings
Cosdata SDK provides a convenience utility for generating embeddings using cosdata-fastembed. This is optional—if you already have your own embeddings, you can use those directly. If you want to generate embeddings in Python, you can use the following utility:
from cosdata.embedding import embed_texts
texts = [ "Cosdata makes vector search easy!", "This is a test of the embedding utility."]embeddings = embed_texts(texts, model_name="thenlper/gte-base") # Specify any supported model
- See the cosdata-fastembed supported models list for available model names and dimensions.
- The output is a list of lists (one embedding per input text), ready to upsert into your collection.
- If
cosdata-fastembed
is not installed, a helpful error will be raised.
API Reference
Client
The main client for interacting with the Vector Database API.
client = Client( host="http://127.0.0.1:8443", # Optional username="admin", # Optional password="admin", # Optional verify=False # Optional)
Methods:
create_collection(name: str, dimension: int = 1024, description: Optional[str] = None, dense_vector: Optional[Dict[str, Any]] = None, sparse_vector: Optional[Dict[str, Any]] = None, tf_idf_options: Optional[Dict[str, Any]] = None) -> Collection
- Returns a
Collection
object. Collection info can be accessed viacollection.get_info()
:{"name": str,"description": str,"dense_vector": {"enabled": bool, "dimension": int},"sparse_vector": {"enabled": bool},"tf_idf_options": {"enabled": bool}}
- Returns a
collections() -> List[Collection]
- Returns a list of
Collection
objects.
- Returns a list of
get_collection(name: str) -> Collection
- Returns a
Collection
object for the given name.
- Returns a
Collection
The Collection class provides access to all collection-specific operations.
collection = client.create_collection( name="my_collection", dimension=768, description="My collection")
Methods:
create_index(distance_metric: str = "cosine", num_layers: int = 7, max_cache_size: int = 1000, ef_construction: int = 512, ef_search: int = 256, neighbors_count: int = 32, level_0_neighbors_count: int = 64) -> Index
- Returns an
Index
object. Index info can be fetched (if implemented) as:{"dense": {...},"sparse": {...},"tf-idf": {...}}
- Returns an
create_sparse_index(name: str, quantization: int = 64, sample_threshold: int = 1000) -> Index
create_tf_idf_index(name: str, sample_threshold: int = 1000, k1: float = 1.2, b: float = 0.75) -> Index
get_index(name: str) -> Index
get_info() -> dict
- Returns collection metadata as above.
delete() -> None
load() -> None
unload() -> None
transaction() -> Transaction
(context manager)
Transaction
The Transaction class provides methods for vector operations.
with collection.transaction() as txn: txn.upsert_vector(vector) # Single vector txn.batch_upsert_vectors(vectors, max_workers=8, max_retries=3) # Multiple vectors, with parallelism and retries
Methods:
upsert_vector(vector: Dict[str, Any]) -> None
batch_upsert_vectors(vectors: List[Dict[str, Any]], max_workers: Optional[int] = None, max_retries: int = 3) -> None
vectors
: List of vector dictionaries to upsertmax_workers
: Number of threads to use for parallel upserts (default: all available CPU threads)max_retries
: Number of times to retry a failed batch (default: 3)
commit() -> None
abort() -> None
Search
The Search class provides methods for vector similarity search.
results = collection.search.dense( query_vector=vector, top_k=5, return_raw_text=True)
Methods:
dense(query_vector: List[float], top_k: int = 5, return_raw_text: bool = False) -> dict
- Returns:
{"results": [{"id": str,"document_id": str,"score": float,"text": str | None},...]}
- Returns:
sparse(query_terms: List[dict], top_k: int = 5, early_terminate_threshold: float = 0.0, return_raw_text: bool = False) -> dict
- Same structure as above.
text(query_text: str, top_k: int = 5, return_raw_text: bool = False) -> dict
- Same structure as above.
Vectors
The Vectors class provides methods for vector operations.
vector = collection.vectors.get("vec_1")exists = collection.vectors.exists("vec_1")
Methods:
get(vector_id: str) -> Vector
- Returns a
Vector
dataclass object with attributes:vector.id: strvector.document_id: Optional[str]vector.dense_values: Optional[List[float]]vector.sparse_indices: Optional[List[int]]vector.sparse_values: Optional[List[float]]vector.text: Optional[str]
- Returns a
get_by_document_id(document_id: str) -> List[Vector]
- Returns a list of
Vector
objects as above.
- Returns a list of
exists(vector_id: str) -> bool
- Returns
True
if the vector exists, elseFalse
.
- Returns
Versions
The Versions class provides methods for version management.
current_version = collection.versions.get_current()all_versions = collection.versions.list()
Methods:
list() -> dict
- Returns:
{"versions": [{"hash": str,"version_number": int,"timestamp": int,"vector_count": int},...],"current_hash": str}
- Returns:
get_current() -> Version
- Returns a
Version
dataclass object with attributes:version.hash: strversion.version_number: intversion.timestamp: intversion.vector_count: intversion.created_at: datetime # property for creation time
- Returns a
get(version_hash: str) -> Version
- Same as above.
Embedding Utility
-
embed_texts(texts: List[str], model_name: str = "BAAI/bge-small-en-v1.5") -> List[List[float]]
- Generates embeddings for a list of texts using cosdata-fastembed. Returns a list of embedding vectors (as plain Python lists). Raises ImportError if cosdata-fastembed is not installed.
Example:
from cosdata.embedding import embed_textsembeddings = embed_texts(["hello world"], model_name="thenlper/gte-base")
Best Practices
-
Connection Management
- Reuse the client instance across your application
- The client automatically handles authentication and token management
-
Vector Operations
- Use transactions for batch operations
- The context manager (
with
statement) automatically handles commit/abort - Maximum batch size is 200 vectors per transaction
-
Error Handling
- All operations raise exceptions on failure
- Use try/except blocks for error handling
- Transactions automatically abort on exceptions when using the context manager
-
Performance
- Adjust index parameters based on your use case
- Use appropriate vector dimensions
- Consider batch sizes for large operations
-
Version Management
- Create versions before major changes
- Use versions to track collection evolution
- Clean up old versions when no longer needed
License
This project is licensed under the MIT License - see the LICENSE file for details.