Locality Sensitive Hashing in Fourier Frequency Domain for Soft Set Containment Search

# 210






Abstract

In many search applications related to passage retrieval, text entailment, and subgraph search, the query and each ‘document’ is a set of elements, with a document being relevant if it contains the query. These elements are not represented by atomic IDs, but by embedded representations, thereby extending set containment to soft set containment. Recent applications address soft set containment by encoding sets into fixed-size vectors and checking for element wise vector dominance. This 0/1 property can be relaxed to an asymmetric hinge distance for scoring and ranking candidate documents. Here we focus on data-sensitive, trainable indices for fast retrieval of relevant documents. Existing LSH methods are designed for mostly symmetric or few simple asymmetric distance functions, which are not suitable for hinge distance. Instead, we transform hinge distance into a proposed dominance similarity measure, to which we then apply a Fourier transform, thereby expressing dominance similarity as an expectation of inner products of functions in the frequency domain. Next, we approximate the expectation with an importance-sampled estimate. The overall consequence is that now we can use a traditional LSH, but in the frequency domain. To ensure that the LSH uses hash bits efficiently, we learn hash functions that are sensitive to both corpus and query distributions, mapped to the frequency domain. Our experiments show that the proposed asymmetric dominance similarity is critical to the targeted applications, and that our LSH, which we call FOURIER HASHNET, provides a better query time vs. retrieval quality trade-off, compared to several baselines. Both the Fourier transform and the trainable hash codes contribute to performance gains.

Abir De, Assistant Professor, IIT Bombay

Abir De is an Assistant Professor in the Department of Computer Science and Engineering at IIT Bombay. His current research focus is designing machine learning models and methods for structured objects, e.g., graphs and sets. He is a recipient of various research awards including Indian National Academy of Engineering Young Engineer Award, Prof. Krithi Ramamritham Award for Creative Research, and Google India PhD Fellowship. He did his Ph.D in in Computer Science from the Indian Institute of Technology Kharagpur, and was a Postdoctoral researcher at MPI-SWS.