syzgydb

package module

v0.0.0-...-e21bc5c Latest Latest Go to latest Published: Nov 1, 2024 License: MIT Imports: 26 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/smhanov/syzgydb

README ¶

Syzgy DB

Introduction

SyzgyDB is a high-performance, embeddable vector database designed for applications requiring efficient handling of large datasets. Written in Go, it leverages disk-based storage to minimize memory usage, making it ideal for systems with limited resources. SyzgyDB supports a range of distance metrics, including Euclidean and Cosine, and offers multiple quantization levels to optimize storage and search performance.

With built-in integration for the Ollama server, SyzgyDB can automatically generate vector embeddings from text and images, simplifying the process of adding and querying data. This makes it well-suited for use cases such as image and video retrieval, recommendation systems, natural language processing, anomaly detection, and bioinformatics. With its RESTful API, SyzgyDB provides easy integration and management of collections and records, enabling developers to perform fast and flexible vector similarity searches.

Features

Disk-Based Storage: Operates with minimal memory usage by storing data on disk.
Automatic Embedding Generation: Seamlessly integrates with the Ollama server to generate vector embeddings from text and images, reducing the need for manual preprocessing.
Vector Quantization: Supports multiple quantization levels (4, 8, 16, 32, 64 bits) to optimize storage and performance.
Distance Metrics: Supports Euclidean and Cosine distance calculations for vector similarity.
Scalable: Efficiently handles large datasets with support for adding, updating, and removing documents.
Search Capabilities: Provides nearest neighbor and radius-based search functionalities.
Python Client: A Python client is available for easy integration with Python projects.

Running with Docker

docker run -p 8080:8080 -v /path/to/your/data:/data smhanov/syzgydb

This command will:

Pull the smhanov/syzgydb image from Docker Hub.
Map port 8080 of the container to port 8080 on your host machine.
Map the /data directory inside the container to /path/to/your/data on your host system, ensuring that your data is persisted outside the container.

Configuration

The configuration settings can be specified on the command line, using an environment variable, or in a file /etc/syzgydb.conf.

Configuration Setting	Description	Default Value
`DATA_FOLDER`	Specifies where the persistent files are kept.	`./data` (command line) or `/data` (Docker)
`OLLAMA_SERVER`	The optional Ollama server used to create embeddings.	`localhost:11434`
`TEXT_MODEL`	The name of the text embedding model to use with Ollama.	`all-minilm` (384 dimensions)
`IMAGE_MODEL`	The name of the image embedding model to use with Ollama.	`minicpm-v`

RESTful API

SyzgyDB provides a RESTful API for managing collections and records. Below are the available endpoints and example curl requests.

Collections API

A collection is a database, and you can create them and get information about them.

Create a Collection

Endpoint: POST /api/v1/collections Description: Creates a new collection with specified parameters. Request Body (JSON):

{
  "name": "collection_name",
  "vector_size": 128,
  "quantization": 64,
  "distance_function": "cosine"
}

Example curl:

curl -X POST http://localhost:8080/api/v1/collections -H "Content-Type: application/json" -d '{"name":"collection_name","vector_size":128,"quantization":64,"distance_function":"cosine"}'

Drop a Collection

Endpoint: DELETE /api/v1/collections/{collection_name} Description: Deletes the specified collection. Example curl:

curl -X DELETE http://localhost:8080/api/v1/collections/collection_name

Get Collection Info

Endpoint: GET /api/v1/collections/{collection_name} Description: Retrieves information about a collection. Example curl:

curl -X GET http://localhost:8080/api/v1/collections/collection_name

Data API

Insert / update records

Endpoint: POST /api/v1/collections/{collection_name}/records Description: Inserts multiple records into a collection. Overwrites if the ID exists. You can provide either a vector or a text field for each record. If a text field is provided, the server will automatically generate the vector embedding using the Ollama server. If an image field is provided, it should be in base64 format. Request Body (JSON):

[
  {
    "id": 1234567890,
    "text": "example text", // Optional: Provide text to generate vector
    "vector": [0.1, 0.2, ..., 0.5], // Optional: Directly provide a vector
    "metadata": {
      "key1": "value1",
      "key2": "value2"
    }
  },
  {
    "id": 1234567891,
    "text": "another example text",
    "metadata": {
      "key1": "value3"
    }
  }
]

Example curl:

curl -X POST http://localhost:8080/api/v1/collections/collection_name/records -H "Content-Type: application/json" -d '[{"id":1234567890,"vector":[0.1,0.2,0.3,0.4,0.5],"metadata":{"key1":"value1","key2":"value2"}},{"id":1234567891,"text":"example text","metadata":{"key1":"value1","key2":"value2"}}]'

Update a Record's Metadata

Endpoint: PUT /api/v1/collections/{collection_name}/records/{id}/metadata Description: Updates metadata for a record. Request Body (JSON):

{
  "metadata": {
    "key1": "new_value1",
    "key3": "value3"
  }
}

Example curl:

curl -X PUT http://localhost:8080/api/v1/collections/collection_name/records/1234567890/metadata -H "Content-Type: application/json" -d '{"metadata":{"key1":"new_value1","key3":"value3"}}'

Delete a Record

Endpoint: DELETE /api/v1/collections/{collection_name}/records/{id} Description: Deletes a record. Example curl:

curl -X DELETE http://localhost:8080/api/v1/collections/collection_name/records/1234567890

Get All Document IDs

Endpoint: GET /api/v1/collections/{collection_name}/ids Description: Retrieves a JSON array of all document IDs in the specified collection. Example curl:

curl -X GET http://localhost:8080/api/v1/collections/collection_name/ids

Search Records

Endpoint: POST /api/v1/collections/{collection_name}/search Description: Searches for records based on the provided criteria. If no search parameters are provided, it lists all records in the collection, allowing pagination with limit and offset.

Request Body (JSON):

{
  "vector": [0.1, 0.2, 0.3, ..., 0.5], // Optional: Provide a vector for similarity search
  "text": "example text",              // Optional: Provide text to generate vector for search
  "k": 5,                              // Optional: Number of nearest neighbors to return
  "radius": 0,                       // Optional: Radius for range search
  "limit": 0,                         // Optional: Maximum number of records to return
  "offset": 0,                         // Optional: Number of records to skip for pagination
  "precision": "",                 // Optional: Set to "exact" for exhaustive search
  "filter": "age >= 18 AND status == 'active'" // Optional: Query filter expression
}

Parameters Explanation:

vector: A numerical array representing the query vector. Used for similarity searches. If provided, the search will be based on this vector.
text: A string input that will be converted into a vector using the Ollama server. This is an alternative to providing a vector directly.
k: Specifies the number of nearest neighbors to return. Used when performing a k-nearest neighbor search.
radius: Defines the radius for a range search. All records within this distance from the query vector will be returned.
limit: Limits the number of records returned in the response. Useful for paginating results.
offset: Skips the specified number of records before starting to return results. Used in conjunction with limit for pagination.
precision: Specifies the search precision. Defaults to "medium". Set to "exact" to perform an exhaustive search of all points.
filter: A string containing a query filter expression. This allows for additional filtering of results based on metadata fields. See the Query Filter Language section for more details.

Example curl:

curl -X POST http://localhost:8080/api/v1/collections/collection_name/search -H "Content-Type: application/json" -d '{"vector":[0.1,0.2,0.3,0.4,0.5],"k":5,"limit":10,"offset":0,"filter":"age >= 18 AND status == \"active\""}'

Usage Scenarios:

List All Records: Call the endpoint with no parameters to list all records, using limit and offset to paginate.
Text-Based Search: Provide a text parameter to perform a search based on the text's vector representation.
Vector-Based Search: Use the vector parameter for direct vector similarity searches.
Range Query: Specify a radius to perform a range query, returning all records within the specified distance.
K-Nearest Neighbors: Use the k parameter to find the top k nearest records to the query vector.
Filtered Search: Use the filter parameter to apply additional constraints based on metadata fields.

Usage in a Go Project

You don't need to use the docker or REST api. You can build it right in to your go project. Here's how.

    import "github.com/smhanov/syzgydb"

Creating a Collection

To create a new collection, define the collection options and initialize the collection:

options := syzgydb.CollectionOptions{
    Name:           "example.dat",
    DistanceMethod: syzgydb.Euclidean, // or Cosine
    DimensionCount: 128,       // Number of dimensions for each vector
    Quantization:   64,        // Quantization level (4, 8, 16, 32, 64)
}

collection := syzgydb.NewCollection(options)

Adding Documents

Add documents to the collection by specifying an ID, vector, and optional metadata:

vector := []float64{0.1, 0.2, 0.3, ..., 0.128} // Example vector
metadata := []byte("example metadata")

collection.AddDocument(1, vector, metadata)

Searching

Perform a search to find similar vectors using either nearest neighbor or radius-based search:

searchVector := []float64{0.1, 0.2, 0.3, ..., 0.128} // Example search vector

// Nearest neighbor search
args := syzgydb.SearchArgs{
    Vector:   searchVector,
    K: 5, // Return top 5 results
}

results := collection.Search(args)

// Radius-based search
args = syzgydb.SearchArgs{
    Vector: searchVector,
    Radius: 0.5, // Search within a radius of 0.5
}

results = collection.Search(args)

Using a Filter Function

You can apply a filter function during the search to include only documents that meet certain criteria. There are two ways to create a filter function:

Using a custom function:

filterFn := func(id uint64, metadata []byte) bool {
    return id%2 == 0 // Include only documents with even IDs
}

args := syzgydb.SearchArgs{
    Vector:   searchVector,
    K: 5, // Return top 5 results
    Filter:   filterFn,
}

results := collection.Search(args)

Using the BuildFilter method with a query string:

queryString := `age >= 18 AND status == \"active\"`
filterFn, err := syzgydb.BuildFilter(queryString)
if err != nil {
    log.Fatalf("Error building filter: %v", err)
}

args := syzgydb.SearchArgs{
    Vector:   searchVector,
    K: 5, // Return top 5 results
    Filter:   filterFn,
}

results := collection.Search(args)

The BuildFilter method allows you to create a filter function from a query string using the Query Filter Language described in this document. This provides a flexible way to filter search results based on metadata fields without writing custom Go code for each filter.

Updating and Removing Documents

Update the metadata of an existing document or remove a document from the collection:

// Update document metadata
err := collection.UpdateDocument(1, []byte("updated metadata"))

// Remove a document
err = collection.RemoveDocument(1)

Dumping the Collection

To dump the collection for inspection or backup, use the DumpIndex function:

syzgydb.DumpIndex("example.dat")

Python Client

A Python client for SyzgyDB is available, making it easy to integrate SyzgyDB with your Python projects.

Installation

You can install the Python client using pip:

pip install syzgy

The Python client package is available on PyPI at https://pypi.org/project/syzgy/0.1.0/

For usage instructions and more details, please refer to the Python client documentation.

Query Filter Language

SyzgyDB supports a powerful query filter language that allows you to filter search results based on metadata fields. This language can be used in the filter parameter of the search API.

Basic Syntax

Field Comparison: field_name operator value
- Example: age >= 18
Logical Operations: Combine conditions using AND, OR, NOT
- Example: (age >= 18 AND status == "active") OR role == "admin"
Parentheses: Use to group conditions and control evaluation order
- Example: (status == "active" AND age >= 18) OR role == "admin"

Supported Operators

Comparison: ==, !=, >, <, >=, <=
String Operations: CONTAINS, STARTS_WITH, ENDS_WITH, MATCHES (regex)
Existence: EXISTS, DOES NOT EXIST
Array Operations: IN, NOT IN

Functions

field.length: Returns the length of a string or array

Examples

Basic Comparison:
```
age >= 18 AND status == "active"
```

String Operations:

name STARTS_WITH "John" AND email ENDS_WITH "@example.com"

Array Operations:
```
status IN ["important", "urgent"] 
```

Nested Fields:

user.profile.verified == true AND user.friends.length > 5

Existence Checks:

phone_number EXISTS AND emergency_contact DOES NOT EXIST

Combining Existence with Other Conditions:

(status == "active" OR status == "pending") AND profile_picture EXISTS

Complex Query:

(status == "active" AND age >= 18) OR (role == "admin" AND NOT (department == "IT")) AND last_login EXISTS

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue to discuss improvements or report bugs.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Documentation ¶

Overview ¶

Package syzgydb provides an embeddable vector database written in Go, designed to efficiently handle large datasets by keeping data on disk rather than in memory. This makes SyzygyDB ideal for systems with limited memory resources.

What is a Vector Database? ¶

A vector database is a specialized database designed to store and query high-dimensional vector data. Vectors are numerical representations of data points, often used in machine learning and data science to represent features of objects, such as images, text, or audio. Vector databases enable efficient similarity searches, allowing users to find vectors that are close to a given query vector based on a specified distance metric.

Features ¶

- Disk-Based Storage: Operates with minimal memory usage by storing data on disk. - Vector Quantization: Supports multiple quantization levels (4, 8, 16, 32, 64 bits) to optimize storage and performance. - Distance Metrics: Supports Euclidean and Cosine distance calculations for vector similarity. - Scalable: Efficiently handles large datasets with support for adding, updating, and removing documents. - Search Capabilities: Provides nearest neighbor and radius-based search functionalities.

Usage ¶

## Creating a Collection

To create a new collection, define the collection options and initialize the collection:

options := CollectionOptions{
    Name:           "example_collection",
    DistanceMethod: Euclidean, // or Cosine
    DimensionCount: 128,       // Number of dimensions for each vector
    Quantization:   64,        // Quantization level (4, 8, 16, 32, 64)
}

collection := NewCollection(options)

## Adding Documents

Add documents to the collection by specifying an ID, vector, and optional metadata:

vector := []float64{0.1, 0.2, 0.3, ..., 0.128} // Example vector
metadata := []byte("example metadata")

collection.AddDocument(1, vector, metadata)

## Searching

Perform a search to find similar vectors using either nearest neighbor or radius-based search:

searchVector := []float64{0.1, 0.2, 0.3, ..., 0.128} // Example search vector

// Nearest neighbor search
args := SearchArgs{
    Vector:   searchVector,
    MaxCount: 5, // Return top 5 results
}

results := collection.Search(args)

// Radius-based search
args = SearchArgs{
    Vector: searchVector,
    Radius: 0.5, // Search within a radius of 0.5
}

results = collection.Search(args)

## Updating and Removing Documents

Update the metadata of an existing document or remove a document from the collection:

// Update document metadata
err := collection.UpdateDocument(1, []byte("updated metadata"))

// Remove a document
err = collection.removeDocument(1)

## Dumping the Collection

To dump the collection for inspection or backup, use the DumpIndex function:

DumpIndex("example_collection")

Contributing ¶

Contributions are welcome! Please feel free to submit a pull request or open an issue to discuss improvements or report bugs.

License ¶

This project is licensed under the MIT License. See the LICENSE file for details.

Index ¶

Constants
func Configure(cfg Config)
func DumpIndex(filename string)
func EmbedText(texts []string, useCache bool) ([][]float64, error)
func ExportJSON(c *Collection, w io.Writer) error
func ImportJSON(collectionName string, r io.Reader) error
func RunServer()
func SpanLog(format string, v ...interface{})
type Collection
- func NewCollection(options CollectionOptions) (*Collection, error)
- func (c *Collection) AddDocument(id uint64, vector []float64, metadata []byte)
- func (c *Collection) Close() error
- func (c *Collection) ComputeStats() CollectionStats
- func (c *Collection) GetAllIDs() []uint64
- func (c *Collection) GetDocument(id uint64) (*Document, error)
- func (c *Collection) GetDocumentCount() int
- func (c *Collection) GetOptions() CollectionOptions
- func (c *Collection) Search(args SearchArgs) SearchResults
- func (c *Collection) UpdateDocument(id uint64, newMetadata []byte) error
type CollectionOptions
type CollectionStats
type Config
type DataStream
type Document
type EmbedTextFunc
type FileMode
type FilterFn
- func BuildFilter(queryIn string) (FilterFn, error)
type FreeSpan
type IndexEntry
type SearchArgs
type SearchResult
type SearchResults
type Server
type Span
type SpanFile
- func OpenFile(filename string, mode FileMode) (*SpanFile, error)
- func (db *SpanFile) Close() error
- func (db *SpanFile) GetStats() (size uint64, numRecords int)
- func (db *SpanFile) IterateRecords(callback func(recordID string, sr *SpanReader) error) error
- func (db *SpanFile) IterateSortedRecords(callback func(recordID string, sr *SpanReader) error) error
- func (db *SpanFile) ReadRecord(recordID string) (*Span, error)
- func (db *SpanFile) RemoveRecord(recordID string) error
- func (db *SpanFile) WriteRecord(recordID string, dataStreams []DataStream) error
type SpanReader

Constants ¶

View Source

const (
	StopSearch    = iota // Indicates to stop the search due to an error
	PointAccepted        // Indicates the point was accepted and is better
	PointChecked         // Indicates the point was checked unnecessarily
	PointIgnored         // no action taken; pretend point did not exist
)

View Source

const (
	Euclidean = iota
	Cosine
)

Variables ¶

This section is empty.

Functions ¶

func Configure ¶

func Configure(cfg Config)

func DumpIndex ¶

func DumpIndex(filename string)

DumpIndex reads the specified file and displays its contents in a human-readable format.

func EmbedText ¶

func EmbedText(texts []string, useCache bool) ([][]float64, error)

EmbedText connects to the configured Ollama server and runs the configured text model to generate an embedding for the given text.

func ExportJSON ¶

func ExportJSON(c *Collection, w io.Writer) error

func ImportJSON ¶

func ImportJSON(collectionName string, r io.Reader) error

func RunServer ¶

func RunServer()

func SpanLog ¶

func SpanLog(format string, v ...interface{})

Types ¶

type Collection ¶

type Collection struct {
	CollectionOptions
	// contains filtered or unexported fields
}

Collection represents a collection of documents, supporting operations such as adding, updating, removing, and searching documents.

func NewCollection ¶

func NewCollection(options CollectionOptions) (*Collection, error)

NewCollection creates a new Collection with the specified options. It initializes the collection's memory file and pivots manager.

func (*Collection) AddDocument ¶

func (c *Collection) AddDocument(id uint64, vector []float64, metadata []byte)

AddDocument adds a new document to the collection with the specified ID, vector, and metadata. It manages pivots and encodes the document for storage.

func (*Collection) Close ¶

func (c *Collection) Close() error

Close closes the memfile associated with the collection.

Returns: - An error if the memfile cannot be closed.

func (*Collection) ComputeStats ¶

func (c *Collection) ComputeStats() CollectionStats

ComputeStats gathers and returns statistics about the collection. It returns a CollectionStats object filled with the relevant statistics.

func (*Collection) GetAllIDs ¶

func (c *Collection) GetAllIDs() []uint64

GetAllIDs returns a sorted list of all document IDs in the collection.

func (*Collection) GetDocument ¶

func (c *Collection) GetDocument(id uint64) (*Document, error)

GetDocument retrieves a document from the collection by its ID. It returns the document or an error if the document is not found.

func (*Collection) GetDocumentCount ¶

func (c *Collection) GetDocumentCount() int

GetDocumentCount returns the total number of documents in the collection.

This method provides a quick way to determine the size of the collection by returning the count of document IDs stored in the memfile.

func (*Collection) GetOptions ¶

func (c *Collection) GetOptions() CollectionOptions

GetOptions returns the collection options used to create the collection.

func (*Collection) Search ¶

func (c *Collection) Search(args SearchArgs) SearchResults

Search returns the search results, including the list of matching documents and the percentage of the database searched.

func (*Collection) UpdateDocument ¶

func (c *Collection) UpdateDocument(id uint64, newMetadata []byte) error

UpdateDocument updates the metadata of an existing document in the collection. It returns an error if the document is not found.

type CollectionOptions ¶

type CollectionOptions struct {
	// Name is the identifier for the collection.
	Name string `json:"name"`

	// DistanceMethod specifies the method used to calculate distances between vectors.
	// It can be either Euclidean or Cosine.
	DistanceMethod int `json:"distance_method"`

	// DimensionCount is the number of dimensions for each vector in the collection.
	DimensionCount int `json:"dimension_count"`

	// Quantization specifies the bit-level quantization for storing vectors.
	// Supported values are 4, 8, 16, 32, and 64, with 64 as the default.
	Quantization int `json:"quantization"`

	// FileMode specifies the mode for opening the memfile.
	FileMode FileMode `json:"-"`
}

CollectionOptions defines the configuration options for creating a Collection.

type CollectionStats ¶

type CollectionStats struct {
	// Number of documents in the collection
	DocumentCount int `json:"document_count"`

	// Number of dimensions in each document vector
	DimensionCount int `json:"dimension_count"`

	// Quantization level used for storing vectors
	Quantization int `json:"quantization"`

	// Distance method used for calculating distances
	// cosine or euclidean
	DistanceMethod string `json:"distance_method"`

	// Storage on disk used by the collection
	StorageSize int64 `json:"storage_size"`

	// Average distance between random pairs of documents
	AverageDistance float64 `json:"average_distance"`
}

Contains statistics about the collection

type Config ¶

type Config struct {
	OllamaServer string `mapstructure:"ollama_server"`
	TextModel    string `mapstructure:"text_model"`
	ImageModel   string `mapstructure:"image_model"`
	DataFolder   string `mapstructure:"data_folder"`
	SyzgyHost    string `mapstructure:"syzgy_host"`
	HTMLRoot     string `mapstructure:"html_root"`

	// If non-zero, we will use psuedorandom numbers so everything is predictable for testing.
	RandomSeed int64
}

Config holds the configuration settings for the service.

type DataStream ¶

type DataStream struct {
	StreamID uint8
	Data     []byte
}

type Document ¶

type Document struct {
	// ID is the unique identifier for the document.
	ID uint64

	// Vector is the numerical representation of the document.
	Vector []float64

	// Metadata is additional information associated with the document.
	Metadata []byte
}

Document represents a single document in the collection, consisting of an ID, vector, and metadata.

type EmbedTextFunc ¶

type EmbedTextFunc func(text []string, useCache bool) ([][]float64, error)

type FileMode ¶

type FileMode int

const (
	CreateIfNotExists  FileMode = 0 // Create the file only if it doesn't exist
	ReadWrite          FileMode = 1 // Open the file for read/write access
	ReadOnly           FileMode = 2 // Open the file for read-only access
	CreateAndOverwrite FileMode = 3 // Always create and overwrite the file if it exists
)

type FilterFn ¶

type FilterFn func(id uint64, metadata []byte) bool

func BuildFilter ¶

func BuildFilter(queryIn string) (FilterFn, error)

BuildFilter compiles the query into a filter function that can be used with SearchArgs.

type FreeSpan ¶

type FreeSpan struct {
	Offset uint64
	Length uint64
}

type IndexEntry ¶

type IndexEntry struct {
	Offset         uint64
	Span           *Span
	SequenceNumber uint64
}

type SearchArgs ¶

type SearchArgs struct {
	// Vector is the search vector used to find similar documents.
	Vector []float64

	// Filter is an optional function to filter documents based on their ID and metadata.
	Filter FilterFn

	// K specifies the maximum number of nearest neighbors to return.
	K int

	// Radius specifies the maximum distance for radius-based search.
	Radius float64

	// when MaxCount and Radius are both 0 we will return all the documents in order of id.
	// These specify the offset and limit
	Offset    int
	Limit     int
	Precision string
}

SearchArgs defines the arguments for performing a search in the collection.

type SearchResult ¶

type SearchResult struct {
	// ID is the unique identifier of the document in the search result.
	ID uint64

	// Metadata is the associated metadata of the document in the search result.
	Metadata []byte

	// Distance is the calculated distance from the search vector to the document vector.
	Distance float64
}

SearchResult represents a single result from a search operation, including the document ID, metadata, and distance.

type SearchResults ¶

type SearchResults struct {
	// Results is a slice of SearchResult containing the documents that matched the search criteria.
	Results []SearchResult

	// PercentSearched indicates the percentage of the database that was searched to obtain the results.
	PercentSearched float64
}

SearchResults contains the results of a search operation, including the list of results and the percentage of the database searched.

type Server ¶

type Server struct {
	// contains filtered or unexported fields
}

type Span ¶

type Span struct {
	MagicNumber    uint32
	Length         uint64
	SequenceNumber uint32
	RecordID       string
	DataStreams    []DataStream
	Checksum       uint32
}

type SpanFile ¶

type SpanFile struct {
	// contains filtered or unexported fields
}

func OpenFile ¶

func OpenFile(filename string, mode FileMode) (*SpanFile, error)

func (*SpanFile) Close ¶

func (db *SpanFile) Close() error

func (*SpanFile) GetStats ¶

func (db *SpanFile) GetStats() (size uint64, numRecords int)

func (*SpanFile) IterateRecords ¶

func (db *SpanFile) IterateRecords(callback func(recordID string, sr *SpanReader) error) error

func (*SpanFile) IterateSortedRecords ¶

func (db *SpanFile) IterateSortedRecords(callback func(recordID string, sr *SpanReader) error) error

func (*SpanFile) ReadRecord ¶

func (db *SpanFile) ReadRecord(recordID string) (*Span, error)

func (*SpanFile) RemoveRecord ¶

func (db *SpanFile) RemoveRecord(recordID string) error

func (*SpanFile) WriteRecord ¶

func (db *SpanFile) WriteRecord(recordID string, dataStreams []DataStream) error

type SpanReader ¶

type SpanReader struct {
	// contains filtered or unexported fields
}

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
query

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

Syzgy DB

Table of Contents

Introduction

Features

Running with Docker

Configuration

RESTful API

Collections API

Create a Collection

Drop a Collection

Get Collection Info

Data API

Insert / update records

Update a Record's Metadata

Delete a Record

Get All Document IDs

Search Records

Usage in a Go Project

Creating a Collection

Adding Documents

Searching

Using a Filter Function

Updating and Removing Documents

Dumping the Collection

Python Client

Installation

Query Filter Language

Basic Syntax

Supported Operators

Functions

Examples

Contributing

License

Documentation ¶

Overview ¶

What is a Vector Database? ¶

Features ¶

Usage ¶

Contributing ¶

License ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func Configure ¶

func DumpIndex ¶

func EmbedText ¶

func ExportJSON ¶

func ImportJSON ¶

func RunServer ¶

func SpanLog ¶

Types ¶

type Collection ¶

func NewCollection ¶

func (*Collection) AddDocument ¶

func (*Collection) Close ¶

func (*Collection) ComputeStats ¶

func (*Collection) GetAllIDs ¶

func (*Collection) GetDocument ¶

func (*Collection) GetDocumentCount ¶

func (*Collection) GetOptions ¶

func (*Collection) Search ¶

func (*Collection) UpdateDocument ¶

type CollectionOptions ¶

type CollectionStats ¶

type Config ¶

type DataStream ¶

type Document ¶

type EmbedTextFunc ¶

type FileMode ¶

type FilterFn ¶

func BuildFilter ¶

type FreeSpan ¶

type IndexEntry ¶

type SearchArgs ¶

type SearchResult ¶

type SearchResults ¶

type Server ¶

type Span ¶