tessera

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 5, 2024 License: Apache-2.0 Imports: 17 Imported by: 3

README

Trillian Tessera

Go Report Card OpenSSF Scorecard Slack Status

Trillian Tessera is a Go library for building tile-based transparency logs (tlogs). It is the logical successor to the approach Trillian v1 takes in building and operating logs.

The implementation and its APIs bake-in current best-practices based on the lessons learned over the past decade of building and operating transparency logs in production environments and at scale.

Tessera was introduced at the Transparency.Dev summit in October 2024. Watch Introducing Trillian Tessera for all the details, but here's a summary of the high level goals:

  • tlog-tiles API and storage
  • Support for both cloud and on-premises infrastructure
  • Make it easy to build and deploy new transparency logs on supported infrastructure
    • Library instead of microservice architecture
    • No additional services to manage
    • Lower TCO for operators compared with Trillian v1
  • Fast sequencing and integration of entries
  • Optional functionality which can be enabled for those ecosystems/logs which need it (only pay the cost for what you need):
    • "Best-effort" de-duplication of entries
    • Synchronous integration
  • Broadly similar write-throughput and write-availability, and potentially far higher read-throughput and read-availability compared to Trillian v1 (dependent on underlying infrastructure)
  • Enable building of arbitrary log personalities, including support for the peculiarities of a Static CT API compliant log.

The main non-goal is to support transparency logs using anything other than the tlog-tiles API. While it is possible to deploy a custom personality in front of Tessera that adapts the tlog-tiles API into any other API, this strategy will lose a lot of the read scaling that Tessera is designed for.

Status

Tessera is under active development, with the alpha milestone coming soon. Users of GCP, MySQL, and POSIX are welcome to try the relevant Getting Started guide.

Roadmap

Alpha expected by Q4 2024, and production ready in the first half of 2025.

What’s happening to Trillian v1?

Trillian v1 is still in use in production environments by multiple organisations in multiple ecosystems, and is likely to remain so for the mid-term.

New ecosystems, or existing ecosystems looking to evolve, should strongly consider planning a migration to Tessera and adopting the patterns it encourages. Note that to achieve the full benefits of Tessera, logs must use the tlog-tiles API.

Concepts

This section introduces concepts and terms that will be used throughout the user guide.

Sequencing

When data is added to a log, it is first stored in memory for some period (this can be controlled via the batching options). If the process dies in this state, the entry will be lost.

Once a batch of entries is processed by the sequencer, the new data will transition from a volatile state to one where it is durably assigned an index. If the process dies in this state, the entry will be safe, though it will not be available through the read API of the log until the leaf has been Integrated. Once an index number has been issued to a leaf, no other data will ever be issued the same index number. All index numbers are contiguous and start from 0.

[!IMPORTANT] Within a batch, there is no guarantee about which order index numbers will be assigned. The only way to ensure that sequential calls to Add are given sequential indices is by blocking until a sequencing batch is completed. This can be achieved by configuring a batch size of 1, though this will make sequencing expensive!

Integration

Integration is a background process that happens when a Tessera storage implementation has been created. This process takes sequenced entries and merges them into the log. Once this process has been completed, a new entry will:

  • Be available via the read API at the index that was returned from sequencing
  • Have Merkle tree hashes that commit to this data being included in the tree
  • Be committed to by the latest Checkpoint (and any Checkpoints issued after this point)

[!IMPORTANT] There is currently no easy way to determine that integration has completed. This isn't an issue if the personality process is continually running. For personalities that require periods of downtime, #341 tracks adding an API to allow for clean shutdown.

Usage

Getting Started

The best place to start is the codelab. This will walk you through setting up your first log, writing some entries to it via HTTP, and inspecting the contents.

Take a look at the example personalities in the /cmd/ directory:

  • posix: example of operating a log backed by a local filesystem
    • This example runs an HTTP web server that takes arbitrary data and adds it to a file-based log.
  • mysql: example of operating a log that uses MySQL
    • This example is easiest deployed via docker compose, which allows for easy setup and teardown.
  • gcp: example of operating a log running in GCP.
  • aws: example of operating a log running on AWS.
  • posix-oneshot: example of a command line tool to add entries to a log stored on the local filesystem
    • This example is not a long-lived process; running the command integrates entries into the log which lives only as files.

The main.go files for each of these example personalities try to strike a balance when demonstrating features of Tessera between simplicity, and demonstrating best practices. Please raise issues against the repo, or chat to us in Slack if you have ideas for making the examples more accessible!

Writing Personalities
Introduction

Tessera is a library written in Go. It is designed to efficiently serve logs that allow read access via the tlog-tiles API. The code you write that calls Tessera is referred to as a personality, because it tailors the generic library to your ecosystem.

Before starting to write your own personality, it is strongly recommended that you have familiarized yourself with the provided personalities referenced in Getting Started. When writing your Tessera personality, the biggest decision you need to make first is which of the native implementations to use:

Each of these implementations has a very similar API, but they have different characteristics.

The easiest implementations to operate and to scale are the cloud implementations: GCP and AWS. These are the recommended choice for the majority of users running in production.

If you aren't using a cloud provider, then your options are MySQL and POSIX:

  • POSIX is the simplest to get started with as it needs little in the way of extra infrastructure, and if you already serve static files as part of your business/project this could be a good fit.
  • Alternatively, if you are used to operating user-facing applications backed by a RDBMS, then MySQL could be a natural fit.

To get a sense of the rough performance you can expect from the different backends, take a look at docs/performance.md.

Setup

Once you've picked a storage implementation, you can start writing your personality! You'll need to import the Tessera library:

# This imports the library at main.
# This should be set to the latest release version to get a stable release.
go get github.com/transparency-dev/trillian-tessera@main
Constructing the Storage Object

Now you'll need to instantiate the storage object for the native implementation you are using:

import (
    "context"

    tessera "github.com/transparency-dev/trillian-tessera"
    "github.com/transparency-dev/trillian-tessera/storage/aws"
    "github.com/transparency-dev/trillian-tessera/storage/gcp"
    "github.com/transparency-dev/trillian-tessera/storage/mysql"
    "github.com/transparency-dev/trillian-tessera/storage/posix"
)

func main() {
    // Choose one!
    storage, err := aws.New(ctx, awsConfig)
    storage, err := gcp.New(ctx, gcpConfig)
    storage, err := mysql.New(ctx, db)
    storage, err := posix.New(ctx, dir, doCreate)
}

See the documentation for each storage implementation to understand the parameters that each takes. Each of these New calls are variadic functions, which is to say they take any number of trailing arguments. The optional arguments that can be passed in allow Tessera behaviour to be tuned. Take a look at the functions in the trillian-tessera root package named With*, e.g. WithBatching to see the available options are how they should be used.

The final part of configuring this storage object is to set up the mix-ins that you want to use. Mix-ins are optional libraries you can use to provide common log behaviours without writing it yourself. The currently supported mix-ins are:

  • Deduplication
    • In-memory (cheap, but very limited deduplication behaviour)
    • Persistent (expensive, but can strongly ensure the log contains no duplicates)
      • TODO(mhutchinson): link to these implementations when they are written
  • Synchronous Integration

See Mix-ins after reading the rest of this section for more details.

Writing to the Log

Now you should have a storage object configured for your environment, and the correct mix-ins set up. Now the fun part - writing to the log!

func main() {
    storage, err := ...
    idx, err := storage.Add(ctx, tessera.NewEntry(data))()

Whichever storage option you use, writing to the log follows the same pattern: simply call Add with a new entry created with the data to be added as a leaf in the log. This method returns a future of the form func() (idx uint64, err error). When called, this future function will block until the data passed into Add has been sequenced and an index number is assigned (or until failure, in which case an error is returned). Once this index has been returned, the new data is sequenced, but not necessarily integrated into the log.

As discussed above in Integration, sequenced entries will be asynchronously integrated into the log and be made available via the read API. Some personalities may need to block until this has been performed, e.g. because they will provide the requester with an inclusion proof, which requires integration. Such personalities are recommended to use Synchronous Integration to perform this blocking.

Reading from the Log

Data that has been written to the log needs to be made available for clients and verifiers. Tessera makes the log readable via the tlog-tiles API. In the case of AWS and GCP, the data to be served is written to object storage and served directly by the cloud provider. The log operator only needs to ensure that these object storage instances are publicly readable, and set up a URL to point to them.

In the case of MySQL and POSIX, the log operator will need to take more steps to make the data available. POSIX writes out the files exactly as per the API spec, so the log operator can serve these via an HTTP File Server.

MySQL is the odd implementation in that it requires personality code to handle read traffic. See the example personalities written for MySQL to see how this Go web server should be configured.

Mix-ins

Deduplication

Deduplicating entries means that the log will only store each unique entry once. Deduplication is recommended for logs that take public submissions, such as CT. While checking for duplicates is somewhat expensive, it protects the log from a type of DoS attack where users can feed the log back into itself, causing it to grow out of control. It also protects against clients that may send the same request for logging multiple times (perhaps as a programming error, or by design for reliability).

Logs that do not allow public submissions directly to the log may want to operate without deduplication, instead relying on the personality to never generate duplicates. This can allow for significantly cheaper operation and faster write throughput.

Synchronous Integration

Synchronous Integration is provided by tessera.IntegrationAwaiter. This allows personality calls to Add to block until the new leaf is integrated into the tree.

Contributing

See CONTRIBUTING.md for details.

License

This repo is licensed under the Apache 2.0 license, see LICENSE for details

Contact

Acknowledgements

Tessera builds upon the hard work, experience, and lessons from many many folks involved in transparency ecosystems over the years.

Documentation

Overview

Package tessera provides an implementation of a tile-based logging framework.

Index

Constants

View Source
const (
	// DefaultBatchMaxSize is used by storage implementations if no WithBatching option is provided when instantiating it.
	DefaultBatchMaxSize = 256
	// DefaultBatchMaxAge is used by storage implementations if no WithBatching option is provided when instantiating it.
	DefaultBatchMaxAge = 250 * time.Millisecond
	// DefaultCheckpointInterval is used by storage implementations if no WithCheckpointInterval option is provided when instantiating it.
	DefaultCheckpointInterval = 10 * time.Second
)

Variables

View Source
var ErrPushback = errors.New("too many unintegrated entries")

ErrPushback is returned by underlying storage implementations when there are too many entries with indices assigned but which have not yet been integrated into the tree.

Personalities encountering this error should apply back-pressure to the source of new entries in an appropriate manner (e.g. for HTTP services, return a 503 with a Retry-After header).

Functions

func InMemoryDedupe

func InMemoryDedupe(delegate func(ctx context.Context, e *Entry) IndexFuture, size uint) func(context.Context, *Entry) IndexFuture

InMemoryDedupe wraps an Add function to prevent duplicate entries being written to the underlying storage by keeping an in-memory cache of recently seen entries. Where an existing entry has already been `Add`ed, the previous `IndexFuture` will be returned. When no entry is found in the cache, the delegate method will be called to store the entry, and the result will be registered in the cache.

Internally this uses a cache with a max size configured by the size parameter. If the entry being `Add`ed is not found in the cache, then it calls the delegate.

This object can be used in isolation, or in conjunction with a persistent dedupe implementation. When using this with a persistent dedupe, the persistent layer should be the delegate of this InMemoryDedupe. This allows recent duplicates to be deduplicated in memory, reducing the need to make calls to a persistent storage.

func NewCertificateTransparencySequencedWriter

func NewCertificateTransparencySequencedWriter(s Storage) func(context.Context, *ctonly.Entry) IndexFuture

NewCertificateTransparencySequencedWriter returns a function which knows how to add a CT-specific entry type to the log.

This entry point MUST ONLY be used for CT logs participating in the CT ecosystem. It should not be used as the basis for any other/new transparency application as this protocol: a) embodies some techniques which are not considered to be best practice (it does this to retain backawards-compatibility with RFC6962) b) is not compatible with the https://c2sp.org/tlog-tiles API which we _very strongly_ encourage you to use instead.

Users of this MUST NOT call `Add` on the underlying storage directly.

Returns a future, which resolves to the assigned index in the log, or an error.

func WithBatching

func WithBatching(maxSize uint, maxAge time.Duration) func(*options.StorageOptions)

WithBatching configures the batching behaviour of leaves being sequenced. A batch will be allowed to grow in memory until either:

  • the number of entries in the batch reach maxSize
  • the first entry in the batch has reached maxAge

At this point the batch will be sent to the sequencer.

Configuring these parameters allows the personality to tune to get the desired balance of sequencing latency with cost. In general, larger batches allow for lower cost of operation, where more frequent batches reduce the amount of time required for entries to be included in the log.

If this option isn't provided, storage implementations with use the DefaultBatchMaxSize and DefaultBatchMaxAge consts above.

func WithCTLayout

func WithCTLayout() func(*options.StorageOptions)

WithCTLayout instructs the underlying storage to use a Static CT API compatible scheme for layout.

func WithCheckpointInterval

func WithCheckpointInterval(interval time.Duration) func(*options.StorageOptions)

WithCheckpointInterval configures the frequency at which Tessera will attempt to create & publish a new checkpoint.

Well behaved clients of the log will only "see" newly sequenced entries once a new checkpoint is published, so it's important to set that value such that it works well with your ecosystem.

Regularly publishing new checkpoints:

  • helps show that the log is "live", even if no entries are being added.
  • enables clients of the log to reason about how frequently they need to have their view of the log refreshed, which in turn helps reduce work/load across the ecosystem.

Note that this option probably only makes sense for long-lived applications (e.g. HTTP servers).

If this option isn't provided, storage implementations will use the DefaultCheckpointInterval const above.

func WithCheckpointSigner

func WithCheckpointSigner(s note.Signer, additionalSigners ...note.Signer) func(*options.StorageOptions)

WithCheckpointSigner is an option for setting the note signer and verifier to use when creating and parsing checkpoints.

A primary signer must be provided: - the primary signer is the "canonical" signing identity which should be used when creating new checkpoints.

Zero or more dditional signers may also be provided. This enables cases like:

  • a rolling key rotation, where checkpoints are signed by both the old and new keys for some period of time,
  • using different signature schemes for different audiences, etc.

When providing additional signers, their names MUST be identical to the primary signer name, and this name will be used as the checkpoint Origin line.

Checkpoints signed by these signer(s) will be standard checkpoints as defined by https://c2sp.org/tlog-checkpoint.

func WithPushback

func WithPushback(maxOutstanding uint) func(*options.StorageOptions)

WithPushback allows configuration of when the storage should start pushing back on add requests.

maxOutstanding is the number of "in-flight" add requests - i.e. the number of entries with sequence numbers assigned, but which are not yet integrated into the log.

Types

type Entry

type Entry struct {
	// contains filtered or unexported fields
}

Entry represents an entry in a log.

func NewEntry

func NewEntry(data []byte) *Entry

NewEntry creates a new Entry object with leaf data.

func (Entry) Data

func (e Entry) Data() []byte

Data returns the raw entry bytes which will form the entry in the log.

func (Entry) Identity

func (e Entry) Identity() []byte

Identity returns an identity which may be used to de-duplicate entries and they are being added to the log.

func (Entry) Index

func (e Entry) Index() *uint64

Index returns the index assigned to the entry in the log, or nil if no index has been assigned.

func (Entry) LeafHash

func (e Entry) LeafHash() []byte

LeafHash is the Merkle leaf hash which will be used for this entry in the log. Note that in almost all cases, this should be the RFC6962 definition of a leaf hash.

func (*Entry) MarshalBundleData

func (e *Entry) MarshalBundleData(index uint64) []byte

MarshalBundleData returns this entry's data in a format ready to be appended to an EntryBundle.

Note that MarshalBundleData _may_ be called multiple times, potentially with different values for index (e.g. if there's a failure in the storage when trying to persist the assignment), so index should not be considered final until the storage Add method has returned successfully with the durably assigned index.

type IndexFuture

type IndexFuture func() (uint64, error)

IndexFuture is the signature of a function which can return an assigned index or error.

Implementations of this func are likely to be "futures", or a promise to return this data at some point in the future, and as such will block when called if the data isn't yet available.

type IntegrationAwaiter

type IntegrationAwaiter struct {
	// contains filtered or unexported fields
}

IntegrationAwaiter allows client threads to block until a leaf is both sequenced and integrated. A single long-lived IntegrationAwaiter instance should be reused for all requests in the application code as there is some overhead to each one; the core of an IntegrationAwaiter is a poll loop that will fetch checkpoints whenever it has clients waiting.

The expected call pattern is:

i, cp, err := awaiter.Await(ctx, storage.Add(myLeaf))

When used this way, it requires very little code at the point of use to block until the new leaf is integrated into the tree.

func NewIntegrationAwaiter

func NewIntegrationAwaiter(ctx context.Context, readCheckpoint func(ctx context.Context) ([]byte, error), pollPeriod time.Duration) *IntegrationAwaiter

NewIntegrationAwaiter provides an IntegrationAwaiter that can be cancelled using the provided context. The IntegrationAwaiter will poll every `pollPeriod` to fetch checkpoints using the `readCheckpoint` function.

func (*IntegrationAwaiter) Await

func (a *IntegrationAwaiter) Await(ctx context.Context, future IndexFuture) (uint64, []byte, error)

Await blocks until the IndexFuture is resolved, and this new index has been integrated into the log, i.e. the log has made a checkpoint available that commits to this new index. When this happens, Await returns the index at which the leaf has been added, and a checkpoint that commits to this index.

This operation can be aborted early by cancelling the context. In this event, or in the event that there is an error getting a valid checkpoint, an error will be returned from this method.

type Storage

type Storage interface {
	// Add should duably assign an index to the provided Entry, returning a future to access that value.
	//
	// Implementations MUST call MarshalBundleData method on the entry before persisting/integrating it.
	Add(context.Context, *Entry) IndexFuture
}

Storage described the expected functions from Tessera storage implementations.

Directories

Path Synopsis
api
Package api contains the tiles definitions from the [tlog-tiles API].
Package api contains the tiles definitions from the [tlog-tiles API].
layout
Package layout contains routines for specifying the path layout of Tessera logs, which is really to say that it provides functions to calculate paths used by the [tlog-tiles API].
Package layout contains routines for specifying the path layout of Tessera logs, which is really to say that it provides functions to calculate paths used by the [tlog-tiles API].
Package client provides client support for interacting with logs that uses the [tlog-tiles API].
Package client provides client support for interacting with logs that uses the [tlog-tiles API].
cmd
conformance/aws
aws is a simple personality allowing to run conformance/compliance/performance tests and showing how to use the Tessera AWS storage implmentation.
aws is a simple personality allowing to run conformance/compliance/performance tests and showing how to use the Tessera AWS storage implmentation.
conformance/gcp
gcp is a simple personality allowing to run conformance/compliance/performance tests and showing how to use the Tessera GCP storage implmentation.
gcp is a simple personality allowing to run conformance/compliance/performance tests and showing how to use the Tessera GCP storage implmentation.
conformance/mysql
mysql is a simple personality allowing to run conformance/compliance/performance tests and showing how to use the Tessera MySQL storage implmentation.
mysql is a simple personality allowing to run conformance/compliance/performance tests and showing how to use the Tessera MySQL storage implmentation.
conformance/posix
posix runs a web server that allows new entries to be POSTed to a tlog-tiles log stored on a posix filesystem.
posix runs a web server that allows new entries to be POSTed to a tlog-tiles log stored on a posix filesystem.
examples/posix-oneshot
posix-oneshot is a command line tool for adding entries to a local tlog-tiles log stored on a posix filesystem.
posix-oneshot is a command line tool for adding entries to a local tlog-tiles log stored on a posix filesystem.
Package ctonly has support for CT Tiles API.
Package ctonly has support for CT Tiles API.
internal
hammer
hammer is a tool to load test a Tessera log.
hammer is a tool to load test a Tessera log.
parse
Package parse contains internal methods for parsing data structures quickly, if unsafely.
Package parse contains internal methods for parsing data structures quickly, if unsafely.
storage
aws
Package aws contains an AWS-based storage implementation for Tessera.
Package aws contains an AWS-based storage implementation for Tessera.
gcp
Package gcp contains a GCP-based storage implementation for Tessera.
Package gcp contains a GCP-based storage implementation for Tessera.
internal
Package storage provides implementations and shared components for tessera storage backends.
Package storage provides implementations and shared components for tessera storage backends.
mysql
Package mysql contains a MySQL-based storage implementation for Tessera.
Package mysql contains a MySQL-based storage implementation for Tessera.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL
JackTT - Gopher 🇻🇳