Documentation
¶
Overview ¶
Package search implements an interface for searching search engines.
Index ¶
- Constants
- Variables
- func Add(name string, isDefault bool, fn Initializer)
- func CleanURL(url string) string
- func DocumentFromHttpResponse(res *http.Response) (*goquery.Document, error)
- type Config
- type Engine
- type HttpClient
- func (h *HttpClient) Client() *http.Client
- func (h *HttpClient) Context(ctx context.Context) (context.Context, context.CancelFunc)
- func (h *HttpClient) Do(req *http.Request) (*http.Response, error)
- func (h *HttpClient) Get(ctx context.Context, url string) (*http.Response, error)
- func (h *HttpClient) HtmlGet(ctx context.Context, url string) (*goquery.Document, error)
- func (h *HttpClient) HtmlPost(ctx context.Context, url string, contentType string, body []byte) (*goquery.Document, error)
- func (h *HttpClient) New(ctx context.Context, method, url string, body []byte, contentType ...string) (*http.Request, error)
- func (h *HttpClient) Post(ctx context.Context, url string, contentType string, body []byte) (*http.Response, error)
- type HttpError
- type Initializer
- type Result
Constants ¶
const DefaultTimeout = time.Second * 10
Default timeout setting.
const DefaultUserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.3"
DefaultUserAgent is the user agent that is used when the UserAgent field in Config is left empty.
Variables ¶
var DefaultEngines = sync.OnceValue(func() []string { engines := []string{} for name := range defaultEngines { engines = append(engines, name) } return engines })
var ( // The engine returned a captcha and the request was left unfulfilled. // This may either be due to your instance being temporarily banned or // due to changes in the search engine itself which will require // changes in the engine's code. ErrCaptcha = errors.New("engine returned captcha response") )
Well-defined errors.
Functions ¶
func Add ¶
func Add(name string, isDefault bool, fn Initializer)
Add adds a search engine to the list of supported engines.
If a name is already in use, Add panics.
Types ¶
type Config ¶
type Config struct { // Type determines what backend to use for this engine. // For example, if you wanted to use the "example" engine for // example.com, then you would put "example" into this field. // // If you leave this blank, then it defaults to Name. // An empty Name and Type is an error. Type string `yaml:"type,omitempty"` // Name of the engine; when retrieving search results, this is the // string that would put be in the "sources" field. // // If left blank, then it defaults to Type. // An empty Name and Type is an error. Name string `yaml:"name,omitempty"` // Specifies the user agent that is used when making requests to the engine. // // srchd tries to mock a Chrome browser and as such uses a Chrome user // agent by default; see [DefaultUserAgent]. // You should not change this value unless you have a reason to. UserAgent string `yaml:"user_agent,omitempty"` // Timeout is the total amount of time an engine will wait to retrieve // a full HTTP response. // // If set to 0, then [DefaultTimeout] is used. Timeout stringDuration `yaml:"timeout"` // Weight determines the order in which results are ranked on srchd's // frontend. // // An engine with a higher weight will have its results placed higher // than those of lower weight. // // Note that results are combined with the weight taken into // consideration and have their score recalculated, so if multiple // search engines return the same result then it will likely be your // top search result. // // A zero weight is analogous to a weight of 1.0. // // Note that this field *should not* affect the engines themselves; // this field exists here solely for ranking in srchd. Weight float64 `yaml:"weight"` // Enable HTTP request logging and possibly extra debugging settings in // the engine itself. // // You should always leave this at false unless you are debugging an // engine, because it reveals information about searches. Debug bool `yaml:"debug"` // Configures a HTTP proxy to be used by this engine. // Useful if you want to pipe requests elsewhere, such as to another // country or through something like Tor. // // If this value is not set, then it falls back to the HTTP_PROXY // environment variable. // If this value is set to "-", then no proxy will be used regardless // of what HTTP_PROXY is set to. HttpProxy string `yaml:"http_proxy"` // Enable HTTP/3 using quic-go. QUIC bool `yaml:"quic"` // Enable zero roundtrip time for a performance boost on subsequent // connections. // Requires quic to be true. // // Note that using 0RTT can have implications on the security of your // connections as it becomes possible to replay the data you send to // the server so generally it is only safe to use it if the requests // you are doing are idempotent. // For srchd, this is always the case as of writing. // // For more information, refer to section 8 of RFC 8446: // https://datatracker.ietf.org/doc/html/rfc8446#section-8 QUIC_0RTT bool `yaml:"quic-0rtt"` // Extra contains extra settings that have no corresponding field in // this struct. // // The info contained within is generally [Engine] specific, and may or // may not be optional. // Refer to your [Engine] for possible/necessary configuration values. Extra map[string]any `yaml:"-"` // Provide an existing HTTP client instead of creating one from the // settings; it is recommended that you still create it using // NewHttpClient, but if this field is filled then NewHttpClient will // return this irregardless of the configuration. // // This field exists primarily for mocking HTTP responses when // performing testing. HttpClient *HttpClient `yaml:"-"` }
Engine configuration. Specifies settings that controls how the engine behaves.
This struct should not be modified once passed to an engine.
The zero-value is safe to use, and the struct itself may be unmarshaled in YAML configuration files.
func (Config) MustNew ¶
MustNew attempts to initialize an Engine from the configuration, but panics if it fails to do so.
func (Config) NewHttpClient ¶
func (c Config) NewHttpClient() *HttpClient
Create a HttpClient according to values set in the configuration.
Note that if the HttpClient field is specified in the Config struct, then its value will be returned.
func (*Config) UnmarshalYAML ¶
UnmarshalJSON parses a JSON configuration.
This is required so we can use extra keys.
type Engine ¶
type Engine interface { // Ping checks to see if the engine is reachable. Ping(ctx context.Context) error // Search attempts to query the engine and returns a number of results. Search(ctx context.Context, query string, page int) ([]Result, error) }
Engine is an interface that implements the bare essentials for doing web searches.
type HttpClient ¶
type HttpClient struct { // Timeout is the maximum amount of time to wait for the request to // complete. Timeout time.Duration // UserAgent holds the value of the User-Agent header of HTTP requests. // // If UserAgent is empty, then [DefaultUserAgent] is used. UserAgent string // Debug logs all HTTP requests sent through this HttpClient if it is // true before the first request is made. Debug bool // Send requests using this HTTP proxy. // // This does not default to the HTTP_PROXY environment variable and // must be explicitly set to use a proxy for all HTTP requests. HttpProxy string // Enable HTTP/3 using quic-go. QUIC bool // Enable zero roundtrip time for a performance boost on subsequent // connections. // Requires QUIC to be true. // // Using 0-RTT can have implications on the security of your connections as it // becomes possible to replay the data you send to the server. // Generally it is only safe to use it if the requests you are doing are // idempotent. // For srchd, this is always the case as of writing. // // For more information, refer to section 8 of RFC 8446: // https://datatracker.ietf.org/doc/html/rfc8446#section-8 QUIC_0RTT bool // Specify a cookie jar to use. // // If left nil, no cookies will be saved. CookieJar http.CookieJar // contains filtered or unexported fields }
HttpClient is a helpful wrapper around net/http.Client that does useful things to HTTP requests and responses you would've had to write anyway.
The zero value is ready to use.
func (*HttpClient) Client ¶ added in v0.2.0
func (h *HttpClient) Client() *http.Client
Client fetches the net/http.Client for this specific HTTP client.
Do not change fields of the returned Client struct once you have performed a request.
func (*HttpClient) Context ¶
func (h *HttpClient) Context(ctx context.Context) (context.Context, context.CancelFunc)
Creates a new context from a parent context.
func (*HttpClient) Get ¶
Get performs a GET request on a given URL.
If the server responds with a non-200 status code, then the returned response will be nil and err will be of type HttpError.
func (*HttpClient) HtmlGet ¶ added in v0.2.0
Helper function to fetch HTML using a GET request and automatically parse it.
If the server responds with a non-200 status code, then the returned response will be nil and err will be of type HttpError.
func (*HttpClient) HtmlPost ¶ added in v0.2.0
func (h *HttpClient) HtmlPost(ctx context.Context, url string, contentType string, body []byte) (*goquery.Document, error)
Helper function to fetch HTML using a GET request and automatically parse it.
If the server responds with a non-200 status code, then the returned response will be nil and err will be of type HttpError.
func (*HttpClient) New ¶
func (h *HttpClient) New(ctx context.Context, method, url string, body []byte, contentType ...string) (*http.Request, error)
New creates a new HTTP request.
func (*HttpClient) Post ¶
func (h *HttpClient) Post(ctx context.Context, url string, contentType string, body []byte) (*http.Response, error)
Post performs a POST request on a given URL.
If the server responds with a non-200 status code, then the returned response will be nil and err will be of type HttpError.
type HttpError ¶
type HttpError struct { // Status code of response. Status int // URL of request. URL string // Method of request. Method string }
HttpError represents a generic HTTP error.
type Initializer ¶
An Initializer is a function that initializes an engine from a config.
type Result ¶
type Result struct { // Title is the title of the webpage for this result. Title string `json:"title,omitempty"` // Description is a small snippet of text from the webpage for this // result, usually containing a portion or all of the query. Description string `json:"description,omitempty"` // Link is the URL of this result. Link string `json:"link,omitempty"` // Sources holds all engine names that had this result. // // Engines must only populate this with their name. // Results are merged and this field will be populated based upon what // engines return a result similar to this one. Sources []string `json:"sources,omitempty"` // Score holds the score for this result. // // Engines should not fill this value. Score float64 `json:"score,omitempty"` }
Result represents a single search result from an Engine.