Documentation
¶
Overview ¶
Package unfurlist implements a service that unfurls URLs and provides more information about them.
The current version supports Open Graph and oEmbed formats, Twitter card format is also planned. If the URL does not support common formats, unfurlist falls back to looking at common HTML tags such as <title> and <meta name="description">.
The endpoint accepts GET and POST requests with `content` as the main argument. It then returns a JSON encoded list of URLs that were parsed.
If an URL lacks an attribute (e.g. `image`) then this attribute will be omitted from the result.
Example:
?content=Check+this+out+https://www.youtube.com/watch?v=dQw4w9WgXcQ
Will return:
Type: "application/json" [ { "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "title": "Rick Astley - Never Gonna Give You Up (Video)", "url_type": "video.other", "description": "Rick Astley - Never Gonna Give You Up...", "site_name": "YouTube", "favicon": "https://www.youtube.com/yts/img/favicon_32-vflOogEID.png", "image": "https://i.ytimg.com/vi/dQw4w9WgXcQ/maxresdefault.jpg" } ]
If handler was configured with FetchImageSize=true in its config, each hash may have additional fields `image_width` and `image_height` specifying dimensions of image provided by `image` attribute.
Additionally you can supply `callback` to wrap the result in a JavaScript callback (JSONP), the type of this response would be "application/x-javascript"
If an optional `markdown` boolean argument is set (markdown=true), then provided content is parsed as markdown formatted text and links are extracted in context-aware mode — i.e. preformatted text blocks are skipped.
Security ¶
Care should be taken when running this inside internal network since it may disclose internal endpoints. It is a good idea to run the service on a separate host in an isolated subnet.
Alternatively access to internal resources may be limited with firewall rules, i.e. if service is running as 'unfurlist' user on linux box, the following iptables rules can reduce chances of it connecting to internal endpoints (note this example is for ipv4 only!):
iptables -A OUTPUT -m owner --uid-owner unfurlist -p tcp --syn \ -d 127/8,10/8,169.254/16,172.16/12,192.168/16 \ -j REJECT --reject-with icmp-net-prohibited ip6tables -A OUTPUT -m owner --uid-owner unfurlist -p tcp --syn \ -d ::1/128,fe80::/10 \ -j REJECT --reject-with adm-prohibited
Index ¶
- Constants
- func New(conf ...ConfFunc) http.Handler
- func ParseURLs(content string) []string
- type ConfFunc
- func WithBlocklistPrefixes(prefixes []string) ConfFunc
- func WithBlocklistTitles(substrings []string) ConfFunc
- func WithExtraHeaders(hdr map[string]string) ConfFunc
- func WithFetchers(fetchers ...FetchFunc) ConfFunc
- func WithHTTPClient(client *http.Client) ConfFunc
- func WithImageDimensions(enable bool) ConfFunc
- func WithLogger(l Logger) ConfFunc
- func WithMaxResults(n int) ConfFunc
- func WithMemcache(client *memcache.Client) ConfFunc
- func WithOembedLookupFunc(fn oembed.LookupFunc) ConfFunc
- type FetchFunc
- type Logger
- type Metadata
Examples ¶
Constants ¶
const DefaultMaxResults = 20
DefaultMaxResults is maximum number of urls to process if not configured by WithMaxResults function
Variables ¶
This section is empty.
Functions ¶
func New ¶
New returns new initialized unfurl handler. If no configuration functions provided, sane defaults would be used.
func ParseURLs ¶
ParseURLs tries to extract unique url-like (http/https scheme only) substrings from given text. Results may not be proper urls, since only sequence of matched characters are searched for. This function is optimized for extraction of urls from plain text where it can be mixed with punctuation symbols: trailing symbols []()<>,;. are removed, but // trailing >]) are left if any opening <[( is found inside url.
Example ¶
text := `This text contains various urls mixed with different reserved per rfc3986 characters: http://google.com, https://doist.com/#about (also see https://todoist.com), <http://example.com/foo>, **[markdown](http://daringfireball.net/projects/markdown/)**, http://marvel-movies.wikia.com/wiki/The_Avengers_(film), https://pt.wikipedia.org/wiki/Mamão. https://docs.live.net/foo/?section-id={D7CEDACE-AEFB-4B61-9C63-BDE05EEBD80A}, http://example.com/?param=foo;bar HTTPS://EXAMPLE.COM/UPPERCASE hTtP://example.com/mixedCase ` for _, u := range ParseURLs(text) { fmt.Println(u) }
Output: http://google.com https://doist.com/#about https://todoist.com http://example.com/foo http://daringfireball.net/projects/markdown/ http://marvel-movies.wikia.com/wiki/The_Avengers_(film) https://pt.wikipedia.org/wiki/Mamão https://docs.live.net/foo/?section-id={D7CEDACE-AEFB-4B61-9C63-BDE05EEBD80A} http://example.com/?param=foo;bar HTTPS://EXAMPLE.COM/UPPERCASE hTtP://example.com/mixedCase
Types ¶
type ConfFunc ¶
type ConfFunc func(*unfurlHandler) *unfurlHandler
ConfFunc is used to configure new unfurl handler; such functions should be used as arguments to New function
func WithBlocklistPrefixes ¶
WithBlocklistPrefixes configures unfurl handler to skip unfurling urls matching any provided prefix
func WithBlocklistTitles ¶
WithBlocklistTitles configures unfurl handler to skip unfurling urls that return pages which title contains one of substrings provided
func WithExtraHeaders ¶
WithExtraHeaders configures unfurl handler to add extra headers to each outgoing http request
func WithFetchers ¶
WithFetchers attaches custom fetchers to unfurl handler created by New().
func WithHTTPClient ¶
WithHTTPClient configures unfurl handler to use provided http.Client for outgoing requests
func WithImageDimensions ¶
WithImageDimensions configures unfurl handler whether to fetch image dimensions or not.
func WithLogger ¶
WithLogger configures unfurl handler to use provided logger
func WithMaxResults ¶
WithMaxResults configures unfurl handler to only process n first urls it finds. n must be positive.
func WithMemcache ¶
WithMemcache configures unfurl handler to cache metadata in memcached
func WithOembedLookupFunc ¶
func WithOembedLookupFunc(fn oembed.LookupFunc) ConfFunc
WithOembedLookupFunc configures unfurl handler to use custom oembed.LookupFunc for oembed lookups.
type FetchFunc ¶
FetchFunc defines custom metadata fetchers that can be attached to unfurl handler
func GoogleMapsFetcher ¶
GoogleMapsFetcher returns FetchFunc that recognizes some Google Maps urls and constructs metadata for them containing preview image from Google Static Maps API. The only argument is the API key to create image links with.
type Logger ¶
Logger describes set of methods used by unfurl handler for logging; standard lib *log.Logger implements this interface.
type Metadata ¶
type Metadata struct { Title string Type string // TODO: make this int8 w/enum constants Description string Image string // image/thumbnail url ImageWidth int ImageHeight int }
Metadata represents metadata retrieved by FetchFunc. At least one of Title, Description or Image attributes are expected to be non-empty.
Source Files
¶
Directories
¶
Path | Synopsis |
---|---|
cmd
|
|
unfurlist
Command unfurlist implements http server exposing API endpoint
|
Command unfurlist implements http server exposing API endpoint |
internal
|
|
useragent
This is a vendored copy of https://github.com/artyom/useragent
|
This is a vendored copy of https://github.com/artyom/useragent |