Documentation
¶
Index ¶
- Constants
- func JaccardDistance(left, right *WordSet) float64
- func JaccardSimilarity(left, right string) float64
- func MinHashSimilar(left, right string) bool
- func Similar(left, right string) bool
- func SimilarWordSets(left, right *WordSet) bool
- func StringsSimilar(left, right string) bool
- type MinHash
- type WordSet
Constants ¶
const (
SimilarityThreshold = 0.799
)
We say that two jobs are too similar if they contain 79.9% of the same unique words
Variables ¶
This section is empty.
Functions ¶
func JaccardDistance ¶
JaccardDistance calculate the similarity between two wordSets Useful for determining if two block of texts are similar
Explanation: https://en.wikipedia.org/wiki/Jaccard_index
func JaccardSimilarity ¶
Calculates similarity between two arbitrary strings
func MinHashSimilar ¶
MinHashSimilar takes two min hash strings as input and returns whether they are similar.
func Similar ¶
Similar is based on Jaccard index If their similarity thresholds are too high then we return true
func SimilarWordSets ¶
Same as Similarity but for word sets
func StringsSimilar ¶
compare two strings to see if they are similar
Types ¶
type MinHash ¶
type MinHash []int
func GenerateMinHash ¶
GenerateMinHash generates a minhash from a document string This is used to crate a MinHash from scratch
func MinHashFromStr ¶
MinHashFromStr takes a min hash string and converts it into a MinHash object
type WordSet ¶
type WordSet struct {
// contains filtered or unexported fields
}
func NewWordSet ¶
func NewWordSet() *WordSet