search

package
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 15, 2025 License: AGPL-3.0 Imports: 21 Imported by: 0

Documentation

Overview

Package search implements an interface for searching search engines.

Index

Constants

View Source
const DefaultTimeout = time.Second * 10

Default timeout setting.

View Source
const DefaultUserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.3"

DefaultUserAgent is the user agent that is used when the UserAgent field in Config is left empty.

Variables

View Source
var DefaultEngines = sync.OnceValue(func() []string {
	engines := []string{}
	for name := range defaultEngines {
		engines = append(engines, name)
	}
	return engines
})
View Source
var (
	// The engine returned a captcha and the request was left unfulfilled.
	// This may either be due to your instance being temporarily banned or
	// due to changes in the search engine itself which will require
	// changes in the engine's code.
	ErrCaptcha = errors.New("engine returned captcha response")
)

Well-defined errors.

View Source
var Supported = sync.OnceValue(func() []string {
	supportedEngines := []string{}
	for name := range engines {
		supportedEngines = append(supportedEngines, name)
	}
	return supportedEngines
})

Functions

func Add

func Add(name string, isDefault bool, fn Initializer)

Add adds a search engine to the list of supported engines.

If a name is already in use, Add panics.

func CleanURL

func CleanURL(url string) string

Removes tracking parameters from URLs.

func DocumentFromHttpResponse added in v0.2.0

func DocumentFromHttpResponse(res *http.Response) (*goquery.Document, error)

Shared code between HtmlGet and HtmlPost.

Types

type Config

type Config struct {
	// Type determines what backend to use for this engine.
	// For example, if you wanted to use the "example" engine for
	// example.com, then you would put "example" into this field.
	//
	// If you leave this blank, then it defaults to Name.
	// An empty Name and Type is an error.
	Type string `yaml:"type,omitempty"`

	// Name of the engine; when retrieving search results, this is the
	// string that would put be in the "sources" field.
	//
	// If left blank, then it defaults to Type.
	// An empty Name and Type is an error.
	Name string `yaml:"name,omitempty"`

	// Specifies the user agent that is used when making requests to the engine.
	//
	// srchd tries to mock a Chrome browser and as such uses a Chrome user
	// agent by default; see [DefaultUserAgent].
	// You should not change this value unless you have a reason to.
	UserAgent string `yaml:"user_agent,omitempty"`

	// Timeout is the total amount of time an engine will wait to retrieve
	// a full HTTP response.
	//
	// If set to 0, then [DefaultTimeout] is used.
	Timeout stringDuration `yaml:"timeout"`

	// Weight determines the order in which results are ranked on srchd's
	// frontend.
	//
	// An engine with a higher weight will have its results placed higher
	// than those of lower weight.
	//
	// Note that results are combined with the weight taken into
	// consideration and have their score recalculated, so if multiple
	// search engines return the same result then it will likely be your
	// top search result.
	//
	// A zero weight is analogous to a weight of 1.0.
	//
	// Note that this field *should not* affect the engines themselves;
	// this field exists here solely for ranking in srchd.
	Weight float64 `yaml:"weight"`

	// Enable HTTP request logging and possibly extra debugging settings in
	// the engine itself.
	//
	// You should always leave this at false unless you are debugging an
	// engine, because it reveals information about searches.
	Debug bool `yaml:"debug"`

	// Configures a HTTP proxy to be used by this engine.
	// Useful if you want to pipe requests elsewhere, such as to another
	// country or through something like Tor.
	//
	// If this value is not set, then it falls back to the HTTP_PROXY
	// environment variable.
	// If this value is set to "-", then no proxy will be used regardless
	// of what HTTP_PROXY is set to.
	HttpProxy string `yaml:"http_proxy"`

	// Enable HTTP/3 using quic-go.
	QUIC bool `yaml:"quic"`

	// Enable zero roundtrip time for a performance boost on subsequent
	// connections.
	// Requires quic to be true.
	//
	// Note that using 0RTT can have implications on the security of your
	// connections as it becomes possible to replay the data you send to
	// the server so generally it is only safe to use it if the requests
	// you are doing are idempotent.
	// For srchd, this is always the case as of writing.
	//
	// For more information, refer to section 8 of RFC 8446:
	// https://datatracker.ietf.org/doc/html/rfc8446#section-8
	QUIC_0RTT bool `yaml:"quic-0rtt"`

	// Extra contains extra settings that have no corresponding field in
	// this struct.
	//
	// The info contained within is generally [Engine] specific, and may or
	// may not be optional.
	// Refer to your [Engine] for possible/necessary configuration values.
	Extra map[string]any `yaml:"-"`

	// Provide an existing HTTP client instead of creating one from the
	// settings; it is recommended that you still create it using
	// NewHttpClient, but if this field is filled then NewHttpClient will
	// return this irregardless of the configuration.
	//
	// This field exists primarily for mocking HTTP responses when
	// performing testing.
	HttpClient *HttpClient `yaml:"-"`
}

Engine configuration. Specifies settings that controls how the engine behaves.

This struct should not be modified once passed to an engine.

The zero-value is safe to use, and the struct itself may be unmarshaled in YAML configuration files.

func (Config) MustNew

func (c Config) MustNew() Engine

MustNew attempts to initialize an Engine from the configuration, but panics if it fails to do so.

func (Config) New

func (c Config) New() (Engine, error)

Initializes the specified from struct values.

func (Config) NewHttpClient

func (c Config) NewHttpClient() *HttpClient

Create a HttpClient according to values set in the configuration.

Note that if the HttpClient field is specified in the Config struct, then its value will be returned.

func (*Config) UnmarshalYAML

func (c *Config) UnmarshalYAML(data *yaml.Node) error

UnmarshalJSON parses a JSON configuration.

This is required so we can use extra keys.

type Engine

type Engine interface {
	// Ping checks to see if the engine is reachable.
	Ping(ctx context.Context) error

	// Search attempts to query the engine and returns a number of results.
	Search(ctx context.Context, query string, page int) ([]Result, error)
}

Engine is an interface that implements the bare essentials for doing web searches.

type HttpClient

type HttpClient struct {
	// Timeout is the maximum amount of time to wait for the request to
	// complete.
	Timeout time.Duration

	// UserAgent holds the value of the User-Agent header of HTTP requests.
	//
	// If UserAgent is empty, then [DefaultUserAgent] is used.
	UserAgent string

	// Debug logs all HTTP requests sent through this HttpClient if it is
	// true before the first request is made.
	Debug bool

	// Send requests using this HTTP proxy.
	//
	// This does not default to the HTTP_PROXY environment variable and
	// must be explicitly set to use a proxy for all HTTP requests.
	HttpProxy string

	// Enable HTTP/3 using quic-go.
	QUIC bool

	// Enable zero roundtrip time for a performance boost on subsequent
	// connections.
	// Requires QUIC to be true.
	//
	// Using 0-RTT can have implications on the security of your connections as it
	// becomes possible to replay the data you send to the server.
	// Generally it is only safe to use it if the requests you are doing are
	// idempotent.
	// For srchd, this is always the case as of writing.
	//
	// For more information, refer to section 8 of RFC 8446:
	// https://datatracker.ietf.org/doc/html/rfc8446#section-8
	QUIC_0RTT bool

	// Specify a cookie jar to use.
	//
	// If left nil, no cookies will be saved.
	CookieJar http.CookieJar
	// contains filtered or unexported fields
}

HttpClient is a helpful wrapper around net/http.Client that does useful things to HTTP requests and responses you would've had to write anyway.

The zero value is ready to use.

func (*HttpClient) Client added in v0.2.0

func (h *HttpClient) Client() *http.Client

Client fetches the net/http.Client for this specific HTTP client.

Do not change fields of the returned Client struct once you have performed a request.

func (*HttpClient) Context

Creates a new context from a parent context.

func (*HttpClient) Do

func (h *HttpClient) Do(req *http.Request) (*http.Response, error)

func (*HttpClient) Get

func (h *HttpClient) Get(ctx context.Context, url string) (*http.Response, error)

Get performs a GET request on a given URL.

If the server responds with a non-200 status code, then the returned response will be nil and err will be of type HttpError.

func (*HttpClient) HtmlGet added in v0.2.0

func (h *HttpClient) HtmlGet(ctx context.Context, url string) (*goquery.Document, error)

Helper function to fetch HTML using a GET request and automatically parse it.

If the server responds with a non-200 status code, then the returned response will be nil and err will be of type HttpError.

func (*HttpClient) HtmlPost added in v0.2.0

func (h *HttpClient) HtmlPost(ctx context.Context, url string, contentType string, body []byte) (*goquery.Document, error)

Helper function to fetch HTML using a GET request and automatically parse it.

If the server responds with a non-200 status code, then the returned response will be nil and err will be of type HttpError.

func (*HttpClient) New

func (h *HttpClient) New(ctx context.Context, method, url string, body []byte, contentType ...string) (*http.Request, error)

New creates a new HTTP request.

func (*HttpClient) Post

func (h *HttpClient) Post(ctx context.Context, url string, contentType string, body []byte) (*http.Response, error)

Post performs a POST request on a given URL.

If the server responds with a non-200 status code, then the returned response will be nil and err will be of type HttpError.

type HttpError

type HttpError struct {
	// Status code of response.
	Status int

	// URL of request.
	URL string

	// Method of request.
	Method string
}

HttpError represents a generic HTTP error.

func (HttpError) Error

func (h HttpError) Error() string

type Initializer

type Initializer func(config Config) (Engine, error)

An Initializer is a function that initializes an engine from a config.

type Result

type Result struct {
	// Title is the title of the webpage for this result.
	Title string `json:"title,omitempty"`

	// Description is a small snippet of text from the webpage for this
	// result, usually containing a portion or all of the query.
	Description string `json:"description,omitempty"`

	// Link is the URL of this result.
	Link string `json:"link,omitempty"`

	// Sources holds all engine names that had this result.
	//
	// Engines must only populate this with their name.
	// Results are merged and this field will be populated based upon what
	// engines return a result similar to this one.
	Sources []string `json:"sources,omitempty"`

	// Score holds the score for this result.
	//
	// Engines should not fill this value.
	Score float64 `json:"score,omitempty"`
}

Result represents a single search result from an Engine.

func (*Result) FancyURL

func (r *Result) FancyURL() string

Strips the preceeding http:// or https:// from the link.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL
JackTT - Gopher 🇻🇳