Customization
Cosdata offers extensive customization options to tailor the system to your specific requirements.
Vector Configurations
Dimension Settings
Cosdata supports vectors of various dimensions:
- Low-dimensional (32-256): Ideal for simple embeddings
- Medium-dimensional (256-768): Suitable for most language models
- High-dimensional (768+): For advanced embedding models
When creating a collection, you can specify the exact dimension:
{ "name": "testdb", "description": "Test collection for vector database", "dense_vector": { "enabled": true, "auto_create_index": false, "dimension": 1024 }, "sparse_vector": { "enabled": false, "auto_create_index": false }}
Distance Metrics
Choose from multiple distance metrics:
- Euclidean (L2)
- Cosine Similarity
- Dot Product
- Manhattan (L1)
When creating an index, you can specify the distance metric:
{ "collection_name": "testdb", "name": "testdb_index", "distance_metric_type": "cosine", "quantization": "scalar", "data_type": "u8", "index_type": "hnsw"}
Index Tuning
HNSW Parameters
Fine-tune your HNSW indexes with these advanced parameters:
{ "collection_name": "testdb", "name": "testdb_index", "distance_metric_type": "cosine", "quantization": "scalar", "data_type": "u8", "index_type": "hnsw", "params": { "num_layers": 5, "max_cache_size": 1000 }}
Key parameters include:
num_layers
: Controls the number of layers in the HNSW graphmax_cache_size
: Maximum number of vectors to cache for performance
Filtering Options
Implement custom filters to narrow search results:
# Example: Searching with metadata filtersresults = client.search( collection_name="products", query_vector=embedding, filter={ "category": "electronics", "price": {"$lt": 1000} }, limit=10)
Transaction Management
Cosdata provides ACID transaction guarantees for vector operations, allowing you to:
- Group related operations in a single transaction
- Ensure atomicity across multiple vector operations
- Maintain consistency during concurrent access
Transaction Best Practices
For optimal performance:
- Keep transaction duration short
- Batch vector operations (100-1000 vectors per batch)
- Always commit or abort transactions to release resources
- Implement proper error handling with retry logic
Deployment Options
Cosdata can be deployed in various configurations:
- Standalone: Single-node deployment for simplicity
- Distributed: Multi-node deployment for high availability
- Hybrid: Combination of on-premise and cloud resources
Performance Optimization
For production deployments, consider these optimization strategies:
- Use parallel requests for large datasets
- Monitor response times and transaction timeouts
- Index important vector fields
- Normalize vectors to unit length (values between -1.0 and 1.0)
- Choose appropriate similarity metrics for your use case