Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CleanString ¶
Types ¶
type Basic ¶
type Basic struct {
// contains filtered or unexported fields
}
Basic is the simplest Extractor implementation.
func NewBasicExtractor ¶
NewBasicExtractor creates a new Basic instance.
func (Basic) Extract ¶
func (b Basic) Extract(_ context.Context, ri *storageProvider.ResourceInfo) (Document, error)
Extract literally just rearranges the inputs and processes them into a Document.
type Document ¶
type Document struct { Title string Name string Content string Size uint64 Mtime string MimeType string Tags []string Audio *libregraph.Audio `json:"audio,omitempty"` Image *libregraph.Image `json:"image,omitempty"` Location *libregraph.GeoCoordinates `json:"location,omitempty"` Photo *libregraph.Photo `json:"photo,omitempty"` }
Document wraps all resource meta fields, it is used as a content extraction result.
type Extractor ¶
type Extractor interface {
Extract(ctx context.Context, ri *provider.ResourceInfo) (Document, error)
}
Extractor is responsible to extract content and meta information from documents.
type Retriever ¶
type Retriever interface {
Retrieve(ctx context.Context, rID *provider.ResourceId) (io.ReadCloser, error)
}
Retriever is the interface that wraps the basic Retrieve method. 🐕 It requests and then returns a resource from the underlying storage.
type Tika ¶
type Tika struct { *Basic Retriever ContentExtractionSizeLimit uint64 CleanStopWords bool // contains filtered or unexported fields }
Tika is used to extract content from a resource, it uses apache tika to retrieve all the data.
func NewTikaExtractor ¶
func NewTikaExtractor(gatewaySelector pool.Selectable[gateway.GatewayAPIClient], logger log.Logger, cfg *config.Config) (*Tika, error)
NewTikaExtractor creates a new Tika instance.
Click to show internal directories.
Click to hide internal directories.