dataset

package

v3.4.2 Latest Latest Go to latest Published: Feb 14, 2025 License: AGPL-3.0 Imports: 19 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/grafana/loki

Documentation ¶

Overview ¶

Package dataset contains utilities for working with datasets. Datasets hold columnar data across multiple pages.

Index ¶

func CompareValues(a, b Value) int
func Iter(ctx context.Context, columns []Column) result.Seq[Row]
type BuilderOptions
type Column
type ColumnBuilder
- func NewColumnBuilder(name string, opts BuilderOptions) (*ColumnBuilder, error)
- func (cb *ColumnBuilder) Append(row int, value Value) error
- func (cb *ColumnBuilder) Backfill(row int)
- func (cb *ColumnBuilder) EstimatedSize() int
- func (cb *ColumnBuilder) Flush() (*MemColumn, error)
- func (cb *ColumnBuilder) Reset()
type ColumnInfo
type CompressionOptions
type Dataset
- func FromMemory(columns []*MemColumn) Dataset
- func Sort(ctx context.Context, set Dataset, sortBy []Column, pageSizeHint int) (Dataset, error)
type MemColumn
- func (c *MemColumn) ColumnInfo() *ColumnInfo
- func (c *MemColumn) ListPages(_ context.Context) result.Seq[Page]
type MemPage
- func (p *MemPage) PageInfo() *PageInfo
- func (p *MemPage) ReadPage(_ context.Context) (PageData, error)
type Page
type PageData
type PageInfo
type Pages
type Row
type Value
- func Int64Value(v int64) Value
- func StringValue(v string) Value
- func Uint64Value(v uint64) Value
- func (v Value) Int64() int64
- func (v Value) IsNil() bool
- func (v Value) IsZero() bool
- func (v Value) String() string
- func (v Value) Type() datasetmd.ValueType
- func (v Value) Uint64() uint64

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CompareValues ¶

func CompareValues(a, b Value) int

CompareValues returns -1 if a<b, 0 if a==b, or 1 if a>b. CompareValues panics if a and b are not the same type.

As a special case, either a or b may be nil. Two nil values are equal, and a nil value is always less than a non-nil value.

func Iter ¶

func Iter(ctx context.Context, columns []Column) result.Seq[Row]

Iter iterates over the rows for the given list of columns. Each Row in the returned sequence will only contain values for the columns matching the columns argument. Values in each row match the order of the columns argument slice.

Iter lazily fetches pages as needed.

Types ¶

type BuilderOptions ¶

type BuilderOptions struct {
	// PageSizeHint is the soft limit for the size of the page. Builders try to
	// fill pages as close to this size as possible, but the actual size may be
	// slightly larger or smaller.
	PageSizeHint int

	// Value is the value type of data to write.
	Value datasetmd.ValueType

	// Encoding is the encoding algorithm to use for values.
	Encoding datasetmd.EncodingType

	// Compression is the compression algorithm to use for values.
	Compression datasetmd.CompressionType

	// CompressionOptions holds optional configuration for compression.
	CompressionOptions CompressionOptions
}

BuilderOptions configures common settings for building pages.

type Column ¶

type Column interface {
	// ColumnInfo returns the metadata for the Column.
	ColumnInfo() *ColumnInfo

	// ListPages returns the set of ordered pages in the column.
	ListPages(ctx context.Context) result.Seq[Page]
}

A Column represents a sequence of values within a dataset. Columns are split up across one or more [Page]s to limit the amount of memory needed to read a portion of the column at a time.

type ColumnBuilder ¶

type ColumnBuilder struct {
	// contains filtered or unexported fields
}

A ColumnBuilder builds a sequence of Value entries of a common type into a column. Values are accumulated into a buffer and then flushed into [MemPage]s once the size of data exceeds a configurable limit.

func NewColumnBuilder ¶

func NewColumnBuilder(name string, opts BuilderOptions) (*ColumnBuilder, error)

NewColumnBuilder creates a new ColumnBuilder from the optional name and provided options. NewColumnBuilder returns an error if the options are invalid.

func (*ColumnBuilder) Append ¶

func (cb *ColumnBuilder) Append(row int, value Value) error

Append adds a new value into cb with the given zero-indexed row number. If the row number is higher than the current number of rows in cb, null values are added up to the new row.

Append returns an error if the row number is out-of-order.

func (*ColumnBuilder) Backfill ¶

func (cb *ColumnBuilder) Backfill(row int)

Backfill adds NULLs into cb up to (but not including) the provided row number. If values exist up to the provided row number, Backfill does nothing.

func (*ColumnBuilder) EstimatedSize ¶

func (cb *ColumnBuilder) EstimatedSize() int

EstimatedSize returns the estimated size of all data in cb. EstimatedSize includes the compressed size of all cut pages in cb, followed by the size estimate of the in-progress page.

Because compression isn't considered for the in-progress page, EstimatedSize tends to overestimate the actual size after flushing.

func (*ColumnBuilder) Flush ¶

func (cb *ColumnBuilder) Flush() (*MemColumn, error)

Flush converts data in cb into a MemColumn. Afterwards, cb is reset to a fresh state and can be reused.

func (*ColumnBuilder) Reset ¶

func (cb *ColumnBuilder) Reset()

Reset clears all data in cb and resets it to a fresh state.

type ColumnInfo ¶

type ColumnInfo struct {
	Name        string                    // Name of the column, if any.
	Type        datasetmd.ValueType       // Type of values in the column.
	Compression datasetmd.CompressionType // Compression used for the column.

	RowsCount        int // Total number of rows in the column.
	ValuesCount      int // Total number of non-NULL values in the column.
	CompressedSize   int // Total size of all pages in the column after compression.
	UncompressedSize int // Total size of all pages in the column before compression.

	Statistics *datasetmd.Statistics // Optional statistics for the column.
}

ColumnInfo describes a column.

type CompressionOptions ¶

type CompressionOptions struct {
	// Zstd holds encoding options for Zstd compression. Only used for
	// [datasetmd.COMPRESSION_TYPE_ZSTD].
	Zstd []zstd.EOption
}

CompressionOptions customizes the compressor used when building pages.

type Dataset ¶

type Dataset interface {
	// ListColumns returns the set of [Column]s in the Dataset. The order of
	// Columns in the returned sequence must be consistent across calls.
	ListColumns(ctx context.Context) result.Seq[Column]

	// ListPages retrieves a set of [Pages] given a list of [Column]s.
	// Implementations of Dataset may use ListPages to optimize for batch reads.
	// The order of [Pages] in the returned sequence must match the order of the
	// columns argument.
	ListPages(ctx context.Context, columns []Column) result.Seq[Pages]

	// ReadPages returns the set of [PageData] for the specified slice of pages.
	// Implementations of Dataset may use ReadPages to optimize for batch reads.
	// The order of [PageData] in the returned sequence must match the order of
	// the pages argument.
	ReadPages(ctx context.Context, pages []Page) result.Seq[PageData]
}

A Dataset holds a collection of [Columns], each of which is split into a set of Pages and further split into a sequence of [Values].

Dataset is read-only; callers must not modify any of the values returned by methods in Dataset.

func FromMemory ¶

func FromMemory(columns []*MemColumn) Dataset

FromMemory returns an in-memory Dataset from the given list of [MemColumn]s.

func Sort ¶

func Sort(ctx context.Context, set Dataset, sortBy []Column, pageSizeHint int) (Dataset, error)

Sort returns a new Dataset with rows sorted by the given sortBy columns in ascending order. The order of columns in the new Dataset will match the order in set. pageSizeHint specifies the page size to target for newly created pages.

If sortBy is empty or if the columns in sortBy contain no rows, Sort returns set.

type MemColumn ¶

type MemColumn struct {
	Info  ColumnInfo // Information about the column.
	Pages []*MemPage // The set of pages in the column.
}

MemColumn holds a set of pages of a common type.

func (*MemColumn) ColumnInfo ¶

func (c *MemColumn) ColumnInfo() *ColumnInfo

ColumnInfo implements Column and returns c.Info.

func (*MemColumn) ListPages ¶

func (c *MemColumn) ListPages(_ context.Context) result.Seq[Page]

ListPages implements Column and iterates through c.Pages.

type MemPage ¶

type MemPage struct {
	Info PageInfo // Information about the page.
	Data PageData // Data for the page.
}

MemPage holds an encoded (and optionally compressed) sequence of Value entries of a common type. Use ColumnBuilder to construct sets of pages.

func (*MemPage) PageInfo ¶

func (p *MemPage) PageInfo() *PageInfo

PageInfo implements Page and returns p.Info.

func (*MemPage) ReadPage ¶

func (p *MemPage) ReadPage(_ context.Context) (PageData, error)

ReadPage implements Page and returns p.Data.

type Page ¶

type Page interface {
	// PageInfo returns the metadata for the Page.
	PageInfo() *PageInfo

	// ReadPage returns the [PageData] for the Page.
	ReadPage(ctx context.Context) (PageData, error)
}

A Page holds an encoded and optionally compressed sequence of [Value]s within a Column.

type PageData ¶

type PageData []byte

PageData holds the raw data for a page. Data is formatted as:

<uvarint(presence-bitmap-size)> <presence-bitmap> <values-data>

The presence-bitmap is a bitmap-encoded sequence of booleans, where values describe which rows are present (1) or nil (0). The presence bitmap is always stored uncompressed.

values-data is then the encoded and optionally compressed sequence of non-NULL values.

type PageInfo ¶

type PageInfo struct {
	UncompressedSize int    // UncompressedSize is the size of a page before compression.
	CompressedSize   int    // CompressedSize is the size of a page after compression.
	CRC32            uint32 // CRC32 checksum of the page after encoding and compression.
	RowCount         int    // RowCount is the number of rows in the page, including NULLs.
	ValuesCount      int    // ValuesCount is the number of non-NULL values in the page.

	Encoding datasetmd.EncodingType // Encoding used for values in the page.
	Stats    *datasetmd.Statistics  // Optional statistics for the page.
}

PageInfo describes a page.

type Pages ¶

type Pages []Page

Pages is a set of [Page]s.

type Row ¶

type Row struct {
	Index  int     // Index of the row in the dataset.
	Values []Value // Values for the row, one per [Column].
}

A Row in a Dataset is a set of values across multiple columns with the same row number.

type Value ¶

type Value struct {
	// contains filtered or unexported fields
}

A Value represents a single value within a dataset. Unlike [any], Values can be constructed without allocations. The zero Value corresponds to nil.

func Int64Value ¶

func Int64Value(v int64) Value

Int64Value rerturns a Value for an int64.

func StringValue ¶

func StringValue(v string) Value

StringValue returns a Value for a string.

func Uint64Value ¶

func Uint64Value(v uint64) Value

Uint64Value returns a Value for a uint64.

func (Value) Int64 ¶

func (v Value) Int64() int64

Int64 returns v's value as an int64. It panics if v is not a datasetmd.VALUE_TYPE_INT64.

func (Value) IsNil ¶

func (v Value) IsNil() bool

IsNil returns whether v is nil.

func (Value) IsZero ¶

func (v Value) IsZero() bool

IsZero reports whether v is the zero value.

func (Value) String ¶

func (v Value) String() string

String returns v's value as a string. Because of Go's String method convention, if v is not a string, String returns a string of the form "VALUE_TYPE_T", where T is the underlying type of v.

func (Value) Type ¶

func (v Value) Type() datasetmd.ValueType

Type returns the datasetmd.ValueType of v. If v is nil, Type returns datasetmd.VALUE_TYPE_UNSPECIFIED.

func (Value) Uint64 ¶

func (v Value) Uint64() uint64

Uint64 returns v's value as a uint64. It panics if v is not a datasetmd.VALUE_TYPE_UINT64.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL