Studies in algorithms
Abstract
This work is comprised of three separate parts: (1) Lower bounds for linear degeneracy testing (based on joint work with Bernard Chazelle [5]); (2) Aggregating inconsistent information (based on joint work with Moses Charikar and Alantha Newman [3]); and (3) The fast Johnson-Lindenstrauss transform and approximate nearest neighbor searching (based on joint work with Bernard Chazelle [6]). The first part discusses a fundamental computational geometric problem called rSUM: given n real numbers, do any r of them sum up to 0? It is the simplest case of a more general family of problems called degeneracy testing. This seemingly naive problem is in the core of the di cult y in designing algorithms for more interesting problems in computational geometry. The computational model assumed is the linear decision tree. This model was successfully used in many other results on the computational complexity of geometric problems. This work extends and improves a seminal result by Erickson [46], and sheds light not only on the complexity of rSUM as a computational problem but also on the combinatorial structure (known as the linear arrangement) attached to it. The second part studies optimization algorithms designed for integrating information coming from di erent sources. This framework includes the well-known problem of voting from the old theory of social choice. It has been known for centuries that voting and collaborative decision making is di cult t (and interesting) due to certain inherent paradoxes that arise. More recently, the computational aspects of these problems have been studied, and several hardness results were proved. The recent interest in voting and the theory of social choice theory in the context of computer science was motivated by more "modern " problems related to the age of information: If several algorithms are used for approximately solving a problem using di erent heuristics, how do we aggregate the corresponding outputs into one single output? In some cases there are reasons to believe that an aggregate output is better than each one of the individual outputs (voters). We design improved algorithms for two important problems known as rank aggregation and consensus clustering. In our analysis we prove new results on optimization over binary relations (in particular, order and equivalence relations). The third part revisits the computational aspects of a well-known lemma by Johnson and Lindenstrauss from the mid 80 s. The Johnson-Lindenstrauss lemma states the surprising fact that anynite subset of a Euclidean space can be almost isometrically embedded in a space of dimension at most logarithmic in the size of the subset. In fact, a suitably chosen random linear transformation does the trick. The algorithmic results were quickly reaped by researchers interested in improving algorithms suering from running time and/or space depending heavily on the dimensionality of the problem, most notably proximity based problems such as clustering and nearest neighbor searching. Many new computationally-friendly versions of the original J-L lemma were proved. These versions simplied the distribution from which the random linear transformation was chosen, but did not yield better than a constant factor improvement in its running time. In this work we dene a new distribution on linear transformations with a signicant computational improvement. We call it the Fast Johnson-Lindenstrauss Transform (FJLT), and show how to apply it to nearest neighbor searching in Euclidean space. In the last chapter of this part we propose a dierent approach (unrelated to the FJLT) for improving nearest neighbor searching in the Hamming cube. Interestingly, we achieved this latter improvement before working on the FJLT, and it provided evidence and motivation for an FJLT-type result.