5.

Finding near-duplicates with Jaccard similarity and MinHash - Made of Bugs

blog.nelhage.com/post/fuzzy-dedup