tensorfs

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 15, 2025 License: BSD-3-Clause Imports: 19 Imported by: 22

README

tensorfs: a virtual filesystem for tensor data

tensorfs is a virtual file system that implements the Go fs interface, and can be accessed using fs-general tools, including the cogent core filetree and the goal shell.

Values are represented using the [tensor] package universal data type: the tensor.Tensor, which can represent everything from a single scalar value up to n-dimensional collections of patterns, in a range of data types.

A given Node in the file system is either:

  • A Value, with a tensor encoding its value. These are terminal "leaves" in the hierarchical data tree, equivalent to "files" in a standard filesystem.
  • A Directory, with an ordered map of other Node nodes under it.

Each Node has a name which must be unique within the directory. The nodes in a directory are processed in the order of its ordered map list, which initially reflects the order added, and can be re-ordered as needed. An alphabetical sort is also available with the Alpha versions of methods, and is the default sort for standard FS operations.

The hierarchical structure of a filesystem naturally supports various kinds of functions, such as various time scales of logging, with lower-level data aggregated into upper levels. Or hierarchical splits for a pivot-table effect.

Usage

There are two main APIs, one for direct usage within Go, and another that is used by the goal framework for interactive shell-based access, which always operates relative to a current working directory.

Go API

The primary Go access function is the generic Value:

tsr := tensorfs.Value[float64](dir, "filename", 5, 5)

This returns a tensor.Values for the node "filename" in the directory Node dir with the tensor shape size of 5x5, and float64 values.

If the tensor was previously created, then it is returned, and otherwise it is created. This provides a robust single-function API for access and creation, and it doesn't return any errors, so the return value can used directly, in inline expressions etc.

For efficiency, there are no checks on the existing value relative to the arguments passed, so if you end up using the same name for two different things, that will cause problems that will hopefully become evident. If you want to ensure that the size is correct, you should use an explicit tensor.SetShapeSizes call, which is still quite efficient if the size is the same. You can also have an initial call to Value that has no size args, and then set the size later -- that works fine.

There are also functions for high-frequency types, defined on the Node: Float64, Float32, Int, and StringValue (String is taken by fmt.Stringer, StringValue is used in tensor), e.g.,:

tsr := dir.Float64("filename", 5, 5)

There are also a few other variants of the Value functionality:

  • Scalar calls Value with a size of 1.
  • Values makes multiple tensor values of the same shape, with a final variadic list of names.
  • ValueType takes a reflect.Kind arg for the data type, which can then be a variable.
  • SetTensor sets a tensor to a node of given name, creating the node if needed. This is also available as the Set method on a directory node.

DirTable returns a table.Table with all the tensors under a given directory node, which can then be used for making plots or doing other forms of data analysis. This works best when each tensor has the same outer-most row dimension. The table is persistent and very efficient, using direct pointers to the underlying tensor values.

Directories

Directories are Node elements that have a nodes value (ordered map of named nodes) instead of a tensor value.

The primary way to make / access a subdirectory is the Dir method:

subdir := dir.Dir("subdir")

If the subdirectory doesn't exist yet, it will be made, and otherwise it is returned. Any errors will be logged and a nil returned, likely causing a panic unless you expect it to fail and check for that.

There are parallel Node and Value access methods for directory nodes, with the Value ones being:

  • tsr := dir.Value("name") returns tensor directly, will panic if not valid
  • tsrs, err := dir.Values("name1", "name2") returns a slice of tensor values within directory by name. a plain .Values() returns all values.
  • tsrs := dir.ValuesFunc(<filter func>) walks down directories (unless filtered) and returns a flat list of all tensors found. Goes in "directory order" = order nodes were added.
  • tsrs := dir.ValuesAlphaFunc(<filter func>) is like ValuesFunc but traverses in alpha order at each node.
Existing items and unique names

As in a real filesystem, names must be unique within each directory, which creates issues for how to manage conflicts between existing and new items. To make the overall framework maximally robust and eliminate the need for a controlled initialization-then-access ordering, we generally adopt the "Recycle" logic:

  • Return an existing item of the same name, or make a new one.

In addition, if you really need to know if there is an existing item, you can use the Node method to check for yourself -- it will return nil if no node of that name exists. Furthermore, the global NewDir function returns an fs.ErrExist error for existing items (e.g., use errors.Is(fs.ErrExist)), as used in various os package functions.

goal Command API

The following shell command style functions always operate relative to the global CurDir current directory and CurRoot root, and goal in math mode exposes these methods directly. Goal operates on tensor valued variables always.

  • Chdir("subdir") change current directory to subdir.
  • Mkdir("subdir") make a new directory.
  • List() print a list of nodes.
  • tsr := Get("mydata") get tensor value at "mydata" node.
  • Set("mydata", tsr) set tensor to "mydata" node.

Documentation

Index

Constants

View Source
const (
	// Preserve is used for Overwrite flag, indicating to not overwrite and preserve existing.
	Preserve = false

	// Overwrite is used for Overwrite flag, indicating to overwrite existing.
	Overwrite = true
)
View Source
const (
	Short = false
	Long  = true

	DirOnly   = false
	Recursive = true
)

Variables

This section is empty.

Functions

func Chdir

func Chdir(dir string) error

Chdir changes the current working tensorfs directory to the named directory.

func DirFromTable

func DirFromTable(dir *Node, dt *table.Table)

DirFromTable sets tensor values under given directory node to the columns of the given table.Table. Also sets the DirTable to this table.

func DirTable

func DirTable(dir *Node, fun func(node *Node) bool) *table.Table

DirTable returns a table.Table with all of the tensor values under the given directory, with columns as the Tensor values elements in the directory and any subdirectories, using given filter function. This is a convenient mechanism for creating a plot of all the data in a given directory. If such was previously constructed, it is returned from "DirTable" where it is stored for later use. Row count is updated to current max row. Set DirTable = nil to regenerate.

func Get

func Get(name string) tensor.Tensor

Get returns the tensor value at given path relative to the current working directory. This is the direct pointer to the node, so changes to it will change the node. Clone the tensor to make a new copy disconnected from the original.

func List

func List(opts ...string) error

List lists files using arguments (options and path) from the current directory.

func NewValues

func NewValues[T tensor.DataTypes](dir *Node, shape []int, names ...string)

NewValues makes new tensor Node value(s) (as a tensor.Tensor) of given data type and shape sizes, in given directory. Any existing nodes with the same names are recycled without checking or updating the data type or sizes. See the Value documentation for more info.

func Record

func Record(tsr tensor.Tensor, name string)

Record saves given tensor to current directory with given name.

func Scalar

func Scalar[T tensor.DataTypes](dir *Node, name string) tensor.Values

Scalar returns a scalar Node value (as a tensor.Tensor) of given data type, in given directory and name. If it already exists, it is returned without checking against args, else a new one is made. See the Value documentation for more info.

func Set

func Set(name string, tsr tensor.Tensor) error

Set sets tensor to given name or path relative to the current working directory. If the node already exists, its previous tensor is updated to the given one; if it doesn't, then a new node is created.

func Value

func Value[T tensor.DataTypes](dir *Node, name string, sizes ...int) tensor.Values

Value creates / returns a Node with given name as a tensor.Tensor of given data type and shape sizes, in given directory Node. If it already exists, it is returned as-is (no checking against the type or sizes provided, for efficiency -- if there is doubt, check!), otherwise a new tensor is created. It is fine to not pass any sizes and use `SetShapeSizes` method later to set the size.

func ValueType

func ValueType(dir *Node, name string, typ reflect.Kind, sizes ...int) tensor.Values

ValueType creates / returns a Node with given name as a tensor.Tensor of given data type specified as a reflect.Kind, with shape sizes, in given directory Node. Supported types are string, bool (for [Bool]), float32, float64, int, int32, and byte. If it already exists, it is returned as-is (no checking against the type or sizes provided, for efficiency -- if there is doubt, check!), otherwise a new tensor is created. It is fine to not pass any sizes and use `SetShapeSizes` method later to set the size.

Types

type DirFile

type DirFile struct {
	File
	// contains filtered or unexported fields
}

DirFile represents a directory data item for reading, as fs.ReadDirFile.

func (*DirFile) Close

func (f *DirFile) Close() error

func (*DirFile) ReadDir

func (f *DirFile) ReadDir(n int) ([]fs.DirEntry, error)

type File

type File struct {
	bytes.Reader
	Node *Node
	// contains filtered or unexported fields
}

File represents a data item for reading, as an fs.File. All io functionality is handled by bytes.Reader.

func (*File) Close

func (f *File) Close() error

func (*File) Stat

func (f *File) Stat() (fs.FileInfo, error)

type Node

type Node struct {
	// Parent is the parent data directory.
	Parent *Node

	// Tensor is the tensor value for a file or leaf Node in the FS,
	// represented using the universal [tensor] data type of
	// [tensor.Tensor], which can represent anything from a scalar
	// to n-dimensional data, in a range of data types.
	Tensor tensor.Tensor

	// DirTable is a summary [table.Table] with columns comprised of Value
	// nodes in the directory, which can be used for plotting or other operations.
	DirTable *table.Table
	// contains filtered or unexported fields
}

Node is the element type for the filesystem, which can represent either a tensor Value as a "file" equivalent, or a "directory" containing other Nodes. The tensor.Tensor can represent everything from a single scalar value up to n-dimensional collections of patterns, in a range of data types. Directories have an ordered map of nodes.

var (
	// CurDir is the current working directory.
	CurDir *Node

	// CurRoot is the current root tensorfs system.
	// A default root tensorfs is created at startup.
	CurRoot *Node
)

func Mkdir

func Mkdir(dir string) *Node

Mkdir creates a new directory with the specified name in the current directory. It returns an existing directory of the same name without error.

func NewDir

func NewDir(name string, parent ...*Node) (*Node, error)

NewDir returns a new tensorfs directory with the given name. If parent != nil and a directory, this dir is added to it. If the parent already has an node of that name, it is returned, with an fs.ErrExist error. If the name is empty, then it is set to "root", the root directory. Note that "/" is not allowed for the root directory in Go fs. If no parent (i.e., a new root) and CurRoot is nil, then it is set to this.

func SetTensor

func SetTensor(dir *Node, tsr tensor.Tensor, name string) *Node

SetTensor creates / recycles a node and sets to given existing tensor with given name.

func (*Node) Add

func (dir *Node) Add(it *Node) error

Add adds an node to this directory data node. The only errors are if this node is not a directory, or the name already exists, in which case an fs.ErrExist is returned. Names must be unique within a directory.

func (*Node) Bytes

func (nd *Node) Bytes() []byte

Bytes returns the byte-wise representation of the data Value. This is the actual underlying data, so make a copy if it can be unintentionally modified or retained more than for immediate use.

func (*Node) CalcAll

func (d *Node) CalcAll() error

CalcAll calls function set by [Node.SetCalcFunc] for all items in this directory and all of its subdirectories. Calls Calc on items from ValuesFunc(nil)

func (*Node) Clone

func (nd *Node) Clone() *Node

Clone returns a copy of this node, recursively cloning directory nodes if it is a directory.

func (*Node) Copy

func (dir *Node) Copy(overwrite bool, to string, from ...string) error

Copy copies node(s) from given paths to given path or directory. if there are multiple from nodes, then to must be a directory. must be called on a directory node.

func (*Node) CopyFromValue

func (d *Node) CopyFromValue(frd *Node)

CopyFromValue copies value from given source node, cloning it.

func (*Node) Dir

func (dir *Node) Dir(name string) *Node

Dir creates a new directory under given dir with the specified name if it doesn't already exist, otherwise returns the existing one. Path / slash separators can be used to make a path of multiple directories. It logs an error and returns nil if this dir node is not a directory.

func (*Node) DirAtPath

func (dir *Node) DirAtPath(dirPath string) (*Node, error)

DirAtPath returns directory at given relative path from this starting dir.

func (*Node) Float32

func (dir *Node) Float32(name string, sizes ...int) *tensor.Float32

Float32 creates / returns a Node with given name as a tensor.Float32 for given shape sizes, in given directory Node. See [Values] function for more info.

func (*Node) Float64

func (dir *Node) Float64(name string, sizes ...int) *tensor.Float64

Float64 creates / returns a Node with given name as a tensor.Float64 for given shape sizes, in given directory Node. See [Values] function for more info.

func (*Node) Info

func (nd *Node) Info() (fs.FileInfo, error)

func (*Node) Int

func (dir *Node) Int(name string, sizes ...int) *tensor.Int

Int creates / returns a Node with given name as a tensor.Int for given shape sizes, in given directory Node. See [Values] function for more info.

func (*Node) IsDir

func (nd *Node) IsDir() bool

func (*Node) KnownFileInfo

func (nd *Node) KnownFileInfo() fileinfo.Known

func (*Node) List

func (dir *Node) List(long, recursive bool) string

List returns a listing of nodes in the given directory.

  • long = include detailed information about each node, vs just the name.
  • recursive = descend into subdirectories.

func (*Node) ListLong

func (dir *Node) ListLong(recursive bool, ident int) string

ListLong returns a detailed listing of given directory.

func (*Node) ListShort

func (dir *Node) ListShort(recursive bool, ident int) string

ListShort returns a name-only listing of given directory.

func (*Node) ModTime

func (nd *Node) ModTime() time.Time

func (*Node) Mode

func (nd *Node) Mode() fs.FileMode

func (*Node) Name

func (nd *Node) Name() string

func (*Node) Node

func (dir *Node) Node(name string) *Node

Node returns a Node in given directory by name. This is for fast access and direct usage of known nodes, and it will panic if this node is not a directory. Returns nil if no node of given name exists.

func (*Node) NodeAtPath

func (dir *Node) NodeAtPath(name string) (*Node, error)

NodeAtPath returns node at given relative path from this starting dir.

func (*Node) Nodes

func (dir *Node) Nodes(names ...string) ([]*Node, error)

Nodes returns a slice of Nodes in given directory by names variadic list. If list is empty, then all nodes in the directory are returned. returned error reports any nodes not found, or if not a directory.

func (*Node) NodesAlphaFunc

func (dir *Node) NodesAlphaFunc(fun func(nd *Node) bool) []*Node

NodesAlphaFunc returns leaf nodes under given directory, filtered by given function, with nodes at each directory level traversed in alphabetical order, recursively descending into directories to return a flat list of the entire subtree, in directory order (e.g., order added). The function can filter out directories to prune the tree. If func is nil, all leaf Nodes are returned.

func (*Node) NodesFunc

func (dir *Node) NodesFunc(fun func(nd *Node) bool) []*Node

NodesFunc returns leaf Nodes under given directory, filtered by given function, recursively descending into directories to return a flat list of the entire subtree, in directory order (e.g., order added). The function can filter out directories to prune the tree. If func is nil, all leaf Nodes are returned.

func (*Node) Open

func (nd *Node) Open(name string) (fs.File, error)

Open opens the given node at given path within this tensorfs filesystem.

func (*Node) Path

func (dir *Node) Path() string

Path returns the full path to this data node

func (*Node) ReadDir

func (nd *Node) ReadDir(dir string) ([]fs.DirEntry, error)

ReadDir returns the contents of the given directory within this filesystem. Use "." (or "") to refer to the current directory.

func (*Node) ReadFile

func (nd *Node) ReadFile(name string) ([]byte, error)

ReadFile reads the named file and returns its contents. A successful call returns a nil error, not io.EOF. (Because ReadFile reads the whole file, the expected EOF from the final Read is not treated as an error to be reported.)

The caller is permitted to modify the returned byte slice. This method should return a copy of the underlying data.

func (*Node) Set

func (dir *Node) Set(name string, tsr tensor.Tensor) *Node

Set creates / returns a Node with given name setting value to given Tensor, in given directory Node. Calls SetTensor.

func (*Node) SetMetaItems

func (d *Node) SetMetaItems(key string, value any, names ...string) error

SetMetaItems sets given metadata for Value items in given directory with given names. Returns error for any items not found.

func (*Node) Size

func (nd *Node) Size() int64

Size returns the size of known data Values, or it uses the Sizer interface, otherwise returns 0.

func (*Node) Stat

func (nd *Node) Stat(name string) (fs.FileInfo, error)

Stat returns a FileInfo describing the file. If there is an error, it should be of type *PathError.

func (*Node) String

func (nd *Node) String() string

func (*Node) StringValue

func (dir *Node) StringValue(name string, sizes ...int) *tensor.String

StringValue creates / returns a Node with given name as a tensor.String for given shape sizes, in given directory Node. See [Values] function for more info.

func (*Node) Sub

func (nd *Node) Sub(dir string) (fs.FS, error)

Sub returns a data FS corresponding to the subtree rooted at dir.

func (*Node) Sys

func (nd *Node) Sys() any

Sys returns the Dir or Value

func (*Node) Type

func (nd *Node) Type() fs.FileMode

func (*Node) Value

func (dir *Node) Value(name string) tensor.Tensor

Value returns the tensor.Tensor value for given node within this directory. This will panic if node is not found, and will return nil if it is not a Value (i.e., it is a directory).

func (*Node) Values

func (dir *Node) Values(names ...string) ([]tensor.Tensor, error)

Values returns a slice of tensor values in the given directory, by names variadic list. If list is empty, then all value nodes in the directory are returned. returned error reports any nodes not found, or if not a directory.

func (*Node) ValuesAlphaFunc

func (dir *Node) ValuesAlphaFunc(fun func(nd *Node) bool) []tensor.Tensor

ValuesAlphaFunc returns all Value nodes (tensors) in given directory, recursively descending into directories to return a flat list of the entire subtree, filtered by given function, with nodes at each directory level traversed in alphabetical order. The function can filter out directories to prune the tree. If func is nil, all Values are returned.

func (*Node) ValuesFunc

func (dir *Node) ValuesFunc(fun func(nd *Node) bool) []tensor.Tensor

ValuesFunc returns all tensor Values under given directory, filtered by given function, in directory order (e.g., order added), recursively descending into directories to return a flat list of the entire subtree. The function can filter out directories to prune the tree, e.g., using `IsDir` method. If func is nil, all Value nodes are returned.

type Nodes

type Nodes = keylist.List[string, *Node]

Nodes is a map of directory entry names to Nodes. It retains the order that nodes were added in, which is the natural order nodes are processed in.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL
JackTT - Gopher 🇻🇳