TY - GEN
T1 - Strong lower bounds for approximating distribution support size and the distinct elements problem
AU - Raskhodnikova, Sofya
AU - Ron, Dana
AU - Shpilka, Amir
AU - Smith, Adam
PY - 2007
Y1 - 2007
N2 - We consider the problem of approximating the support size of a distribution from, a small number of samples, when each element in the distribution appears with probability at least 1/n. This problem is closely related to the problem of approximating the number of distinct elements in a sequence of length n. For both problems, we prove a nearly linear in n lower bound on the query complexity, applicable even for approximation with additive error. At the heart of the lower bound is a construction of two positive integer random variables, X1 and X2, with very different expectations and the following condition on the first k moments: E[X1]/E[X2] =E[X12]/ E[X22] = ... = E[X 1k]/ E[X2k]. Our lower bound method is also applicable to other problems. In particular, it gives new lower bounds for the sample complexity of (1) approximating the entropy of a distribution and (2) approximating how well a given string is compressed by the Lempel-Ziv scheme.
AB - We consider the problem of approximating the support size of a distribution from, a small number of samples, when each element in the distribution appears with probability at least 1/n. This problem is closely related to the problem of approximating the number of distinct elements in a sequence of length n. For both problems, we prove a nearly linear in n lower bound on the query complexity, applicable even for approximation with additive error. At the heart of the lower bound is a construction of two positive integer random variables, X1 and X2, with very different expectations and the following condition on the first k moments: E[X1]/E[X2] =E[X12]/ E[X22] = ... = E[X 1k]/ E[X2k]. Our lower bound method is also applicable to other problems. In particular, it gives new lower bounds for the sample complexity of (1) approximating the entropy of a distribution and (2) approximating how well a given string is compressed by the Lempel-Ziv scheme.
UR - http://www.scopus.com/inward/record.url?scp=46749118848&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=46749118848&partnerID=8YFLogxK
U2 - 10.1109/FOCS.2007.4389525
DO - 10.1109/FOCS.2007.4389525
M3 - Conference contribution
AN - SCOPUS:46749118848
SN - 0769530109
SN - 9780769530109
T3 - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
SP - 559
EP - 569
BT - Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2007
T2 - 48th Annual Symposium on Foundations of Computer Science, FOCS 2007
Y2 - 20 October 2007 through 23 October 2007
ER -