Lab Homepage for Constantinos Daskalakis

Studying computational theory and its interplay with Game Theory, Machine Learning, Statistics and Economics

News

[7/31/2018] Prof. Daskalakis is awarded the Nevalinna Prize!
[7/5/2018] Prof. Daskalakis is named a Simons Foundation Investigator in Theoretical CS
[5/16/2018] Andrew has won the MIT SuperUROP award for his work
[4/20/2018] Gautam will be joining the University of Waterloo’s Cheriton School of Computer Science in July 2019
[4/1/2018] Manolis has received the Google PhD Fellowship

Members

Selected Publications

A selected group of publications (in reverse chronological order) ranging across several subfields written by members of the Daskalakis group.

$\newcommand{\eps}{\varepsilon}$ $\newcommand{\poly}{\text{poly}}$
ML Black-box Adversarial Attacks with Limited Queries and Information (ICML 2018)
https//arxiv.org/abs/1804.08598.html
Andrew Ilyas, Logan Engstrom, Anish Athalye, Jessy Lin

Current neural network-based classifiers are susceptible to adversarial examples even in the black-box setting, where the attacker only has query access to the model. In practice, the threat model for real-world systems is often more restrictive than the typical black-box model where the adversary can observe the full output of the network on arbitrarily many chosen inputs. We define three realistic threat models that more accurately characterize many real-world classifiers: the query-limited setting, the partial-information setting, and the label-only setting. We develop new attacks that fool classifiers under these more restrictive threat models, where previous methods would be impractical or ineffective. We demonstrate that our methods are effective against an ImageNet classifier under our proposed threat models. We also demonstrate a targeted black-box attack against a commercial classifier, overcoming the challenges of limited query access, partial information, and other practical issues to break the Google Cloud Vision API.

GT ML Training GANs with Optimism (ICLR 2018)
https//arxiv.org/abs/1711.00141.html
Constantinos Daskalakis, Andrew Ilyas, Vasilis Syrgkanis, Haoyang Zeng

We address the issue of limit cycling behavior in training Generative Adversarial Networks and propose the use of Optimistic Mirror Decent (OMD) for training Wasserstein GANs. Recent theoretical results have shown that optimistic mirror decent (OMD) can enjoy faster regret rates in the context of zero-sum games. WGANs is exactly a context of solving a zero-sum game with simultaneous no-regret dynamics. Moreover, we show that optimistic mirror decent addresses the limit cycling problem in training WGANs. We formally show that in the case of bi-linear zero-sum games the last iterate of OMD dynamics converges to an equilibrium, in contrast to GD dynamics which are bound to cycle. We also portray the huge qualitative difference between GD and OMD dynamics with toy examples, even when GD is modified with many adaptations proposed in the recent literature, such as gradient penalty or momentum. We apply OMD WGAN training to a bioinformatics problem of generating DNA sequences. We observe that models trained with OMD achieve consistently smaller KL divergence with respect to the true underlying distribution, than models trained with GD variants. Finally, we introduce a new algorithm, Optimistic Adam, which is an optimistic variant of Adam. We apply it to WGAN training on CIFAR10 and observe improved performance in terms of inception score as compared to Adam.

ST Concentration of Multilinear Functions of the Ising Model with Applications to Network Data (STOC 2018)
https//arxiv.org/abs/1710.04170.html
Constantinos Daskalakis, Nishanth Dikkala, Gautam Kamath

We prove near-tight concentration of measure for polynomial functions of the Ising model under high temperature. For any degree $d$, we show that a degree-$d$ polynomial of a $n$-spin Ising model exhibits exponential tails that scale as $\exp(-r^{2/d})$ at radius $r=\tilde{\Omega}_d(n^{d/2})$. Our concentration radius is optimal up to logarithmic factors for constant $d$, improving known results by polynomial factors in the number of spins. We demonstrate the efficacy of polynomial functions as statistics for testing the strength of interactions in social networks in both synthetic and real world data.

ML TCS Certified Computation from Unreliable Datasets (COLT 2018)
https//arxiv.org/abs/1709.03926.html
Themis Gouleakis, Christos Tzamos, Manolis Zampetakis

A wide range of learning tasks require human input in labeling massive data. The collected data though are usually low quality and contain inaccuracies and errors. As a result, modern science and business face the problem of learning from unreliable data sets. In this work, we provide a generic approach that is based on \textit{verification} of only few records of the data set to guarantee high quality learning outcomes for various optimization objectives. Our method, identifies small sets of critical records and verifies their validity. We show that many problems only need $\text{poly}(1/\varepsilon)$ verifications, to ensure that the output of the computation is at most a factor of $(1 \pm \varepsilon)$ away from the truth. For any given instance, we provide an \textit{instance optimal} solution that verifies the minimum possible number of records to approximately certify correctness. Then using this instance optimal formulation of the problem we prove our main result: “every function that satisfies some Lipschitz continuity condition can be certified with a small number of verifications”. We show that the required Lipschitz continuity condition is satisfied even by some NP-complete problems, which illustrates the generality and importance of this theorem. In case this certification step fails, an invalid record will be identified. Removing these records and repeating until success, guarantees that the result will be accurate and will depend only on the verified records. Surprisingly, as we show, for several computation tasks more efficient methods are possible. These methods always guarantee that the produced result is not affected by the invalid records, since any invalid record that affects the output will be detected and verified.

ST TCS Which Distribution Distances are Sublinearly Testable? (SODA 2018)
https//arxiv.org/abs/1708.00002.html
Constantinos Daskalakis, Gautam Kamath, John Wright

Given samples from an unknown distribution $p$ and a description of a distribution $q$, are $p$ and $q$ close or far? This question of “identity testing” has received significant attention in the case of testing whether $p$ and $q$ are equal or far in total variation distance. However, in recent work, the following questions have been been critical to solving problems at the frontiers of distribution testing: -Alternative Distances: Can we test whether $p$ and $q$ are far in other distances, say Hellinger? -Tolerance: Can we test when $p$ and $q$ are close, rather than equal? And if so, close in which distances? Motivated by these questions, we characterize the complexity of distribution testing under a variety of distances, including total variation, $\ell_2$, Hellinger, Kullback-Leibler, and $\chi^2$. For each pair of distances $d_1$ and $d_2$, we study the complexity of testing if $p$ and $q$ are close in $d_1$ versus far in $d_2$, with a focus on identifying which problems allow strongly sublinear testers (i.e., those with complexity $O(n^{1 - \gamma})$ for some $\gamma > 0$ where $n$ is the size of the support of the distributions $p$ and $q$). We provide matching upper and lower bounds for each case. We also study these questions in the case where we only have samples from $q$ (equivalence testing), showing qualitative differences from identity testing in terms of when tolerance can be achieved. Our algorithms fall into the classical paradigm of $\chi^2$-statistics, but require crucial changes to handle the challenges introduced by each distance we consider. Finally, we survey other recent results in an attempt to serve as a reference for the complexity of various distribution testing problems.

ML Synthesizing Robust Adversarial Examples (ICML 2018)
https//arxiv.org/abs/1707.07397.html
Anish Athalye, Logan Engstrom, Andrew Ilyas, Kevin Kwok

Standard methods for generating adversarial examples for neural networks do not consistently fool neural network classifiers in the physical world due to a combination of viewpoint shifts, camera noise, and other natural transformations, limiting their relevance to real-world systems. We demonstrate the existence of robust 3D adversarial objects, and we present the first algorithm for synthesizing examples that are adversarial over a chosen distribution of transformations. We synthesize two-dimensional adversarial images that are robust to noise, distortion, and affine transformation. We apply our algorithm to complex three-dimensional objects, using 3D-printing to manufacture the first physical adversarial objects. Our results demonstrate the existence of 3D adversarial objects in the physical world.

ML TCS A Converse to Banach's Fixed Point Theorem and its CLS Completeness (STOC 2018)
https//arxiv.org/abs/1702.07339.html
Constantinos Daskalakis, Christos Tzamos, Manolis Zampetakis

Banach’s fixed point theorem for contraction maps has been widely used to analyze the convergence of iterative methods in non-convex problems. It is a common experience, however, that iterative maps fail to be globally contracting under the natural metric in their domain, making the applicability of Banach’s theorem limited. We explore how generally we can apply Banach’s fixed point theorem to establish the convergence of iterative methods when pairing it with carefully designed metrics. Our first result is a strong converse of Banach’s theorem, showing that it is a universal analysis tool for establishing global convergence of iterative methods to unique fixed points, and for bounding their convergence rate. In other words, we show that, whenever an iterative map globally converges to a unique fixed point, there exists a metric under which the iterative map is contracting and which can be used to bound the number of iterations until convergence. We illustrate our approach in the widely used power method, providing a new way of bounding its convergence rate through contraction arguments. We next consider the computational complexity of Banach’s fixed point theorem. Making the proof of our converse theorem constructive, we show that computing a fixed point whose existence is guaranteed by Banach’s fixed point theorem is CLS-complete. We thus provide the first natural complete problem for the class CLS, which was defined in [Daskalakis, Papadimitriou 2011] to capture the complexity of problems such as P-matrix LCP, computing KKT-points, and finding mixed Nash equilibria in congestion and network coordination games.

ST Testing Ising Models (SODA 2018)
https//arxiv.org/abs/1612.03147.html
Constantinos Daskalakis, Nishanth Dikkala, Gautam Kamath

Given samples from an unknown multivariate distribution $p$, is it possible to distinguish whether $p$ is the product of its marginals versus $p$ being far from every product distribution? Similarly, is it possible to distinguish whether $p$ equals a given distribution $q$ versus $p$ and $q$ being far from each other? These problems of testing independence and goodness-of-fit have received enormous attention in statistics, information theory, and theoretical computer science, with sample-optimal algorithms known in several interesting regimes of parameters. Unfortunately, it has also been understood that these problems become intractable in large dimensions, necessitating exponential sample complexity. Motivated by the exponential lower bounds for general distributions as well as the ubiquity of Markov Random Fields (MRFs) in the modeling of high-dimensional distributions, we initiate the study of distribution testing on structured multivariate distributions, and in particular the prototypical example of MRFs: the Ising Model. We demonstrate that, in this structured setting, we can avoid the curse of dimensionality, obtaining sample and time efficient testers for independence and goodness-of-fit. One of the key technical challenges we face along the way is bounding the variance of functions of the Ising model.

GT ML Learning Multi-item Auctions with (or without) Samples (FOCS 2017)
https//arxiv.org/abs/1709.00228.html
Yang Cai, Constantinos Daskalakis

We provide algorithms that learn simple auctions whose revenue is approximately optimal in multi-item multi-bidder settings, for a wide range of valuations including unit-demand, additive, constrained additive, XOS, and subadditive. We obtain our learning results in two settings. The first is the commonly studied setting where sample access to the bidders’ distributions over valuations is given, for both regular distributions and arbitrary distributions with bounded support. Our algorithms require polynomially many samples in the number of items and bidders. The second is a more general max-min learning setting that we introduce, where we are given “approximate distributions,” and we seek to compute an auction whose revenue is approximately optimal simultaneously for all “true distributions” that are close to the given ones. These results are more general in that they imply the sample-based results, and are also applicable in settings where we have no sample access to the underlying distributions but have estimated them indirectly via market research or by observation of previously run, potentially non-truthful auctions. Our results hold for valuation distributions satisfying the standard (and necessary) independence-across-items property. They also generalize and improve upon recent works, which have provided algorithms that learn approximately optimal auctions in more restricted settings with additive, subadditive and unit-demand valuations using sample access to distributions. We generalize these results to the complete unit-demand, additive, and XOS setting, to i.i.d. subadditive bidders, and to the max-min setting. Our results are enabled by new uniform convergence bounds for hypotheses classes under product measures. Our bounds result in exponential savings in sample complexity compared to bounds derived by bounding the VC dimension, and are of independent interest.

ST Priv'IT: Private and Sample Efficient Identity Testing (ICML 2017)
https//arxiv.org/abs/1703.10127.html
Bryan Cai, Constantinos Daskalakis, Gautam Kamath

We develop differentially private hypothesis testing methods for the small sample regime. Given a sample $\cal D$ from a categorical distribution $p$ over some domain $\Sigma$, an explicitly described distribution $q$ over $\Sigma$, some privacy parameter $\varepsilon$, accuracy parameter $\alpha$, and requirements $\beta_{\rm I}$ and $\beta_{\rm II}$ for the type I and type II errors of our test, the goal is to distinguish between $p=q$ and $d_{\rm{TV}}(p,q) \geq \alpha$. We provide theoretical bounds for the sample size $|{\cal D}|$ so that our method both satisfies $(\varepsilon,0)$-differential privacy, and guarantees $\beta_{\rm I}$ and $\beta_{\rm II}$ type I and type II errors. We show that differential privacy may come for free in some regimes of parameters, and we always beat the sample complexity resulting from running the $\chi^2$-test with noisy counts, or standard approaches such as repetition for endowing non-private $\chi^2$-style statistics with differential privacy guarantees. We experimentally compare the sample complexity of our method to that of recently proposed methods for private hypothesis testing.

ST TCS Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing (COLT 2017)
https//arxiv.org/abs/1612.03164.html
Constantinos Daskalakis, Qinxuan Pan

We show that the square Hellinger distance between two Bayesian networks on the same directed graph, $G$, is subadditive with respect to the neighborhoods of $G$. Namely, if $P$ and $Q$ are the probability distributions defined by two Bayesian networks on the same DAG, our inequality states that the square Hellinger distance, $H^2(P,Q)$, between $P$ and $Q$ is upper bounded by the sum, $\sum_v H^2(P_{{v} \cup \Pi_v}, Q_{{v} \cup \Pi_v})$, of the square Hellinger distances between the marginals of $P$ and $Q$ on every node $v$ and its parents $\Pi_v$ in the DAG. Importantly, our bound does not involve the conditionals but the marginals of $P$ and $Q$. We derive a similar inequality for more general Markov Random Fields. As an application of our inequality, we show that distinguishing whether two Bayesian networks $P$ and $Q$ on the same (but potentially unknown) DAG satisfy $P=Q$ vs $d_{\rm TV}(P,Q)>\epsilon$ can be performed from $\tilde{O}(|\Sigma|^{3/4(d+1)} \cdot n/\epsilon^2)$ samples, where $d$ is the maximum in-degree of the DAG and $\Sigma$ the domain of each variable of the Bayesian networks. If $P$ and $Q$ are defined on potentially different and potentially unknown trees, the sample complexity becomes $\tilde{O}(|\Sigma|^{4.5} n/\epsilon^2)$, whose dependence on $n, \epsilon$ is optimal up to logarithmic factors. Lastly, if $P$ and $Q$ are product distributions over ${0,1}^n$ and $Q$ is known, the sample complexity becomes $O(\sqrt{n}/\epsilon^2)$, which is optimal up to constant factors.

ML ST Ten Steps of EM Suffice for Mixtures of Two Gaussians (COLT 2017)
https//arxiv.org/abs/1609.00368.html
Constantinos Daskalakis, Christos Tzamos, Manolis Zampetakis

The Expectation-Maximization (EM) algorithm is a widely used method for maximum likelihood estimation in models with latent variables. For estimating mixtures of Gaussians, its iteration can be viewed as a soft version of the k-means clustering algorithm. Despite its wide use and applications, there are essentially no known convergence guarantees for this method. We provide global convergence guarantees for mixtures of two Gaussians with known covariance matrices. We show that the population version of EM, where the algorithm is given access to infinitely many samples from the mixture, converges geometrically to the correct mean vectors, and provide simple, closed-form expressions for the convergence rate. As a simple illustration, we show that, in one dimension, ten steps of the EM algorithm initialized at infinity result in less than 1\% error estimation of the means. In the finite sample regime, we show that, under a random initialization, $\tilde{O}(d/\epsilon^2)$ samples suffice to compute the unknown vectors to within $\epsilon$ in Mahalanobis distance, where $d$ is the dimension. In particular, the error rate of the EM based estimator is $\tilde{O}\left(\sqrt{d \over n}\right)$ where $n$ is the number of samples, which is optimal up to logarithmic factors.

TCS Faster Sublinear Algorithms using Conditional Sampling (SODA 2017)
https//arxiv.org/abs/1608.04759.html
Themistoklis Gouleakis, Christos Tzamos, Manolis Zampetakis

A conditional sampling oracle for a probability distribution D returns samples from the conditional distribution of D restricted to a specified subset of the domain. A recent line of work (Chakraborty et al. 2013 and Cannone et al. 2014) has shown that having access to such a conditional sampling oracle requires only polylogarithmic or even constant number of samples to solve distribution testing problems like identity and uniformity. This significantly improves over the standard sampling model where polynomially many samples are necessary. Inspired by these results, we introduce a computational model based on conditional sampling to develop sublinear algorithms with exponentially faster runtimes compared to standard sublinear algorithms. We focus on geometric optimization problems over points in high dimensional Euclidean space. Access to these points is provided via a conditional sampling oracle that takes as input a succinct representation of a subset of the domain and outputs a uniformly random point in that subset. We study two well studied problems: k-means clustering and estimating the weight of the minimum spanning tree. In contrast to prior algorithms for the classic model, our algorithms have time, space and sample complexity that is polynomial in the dimension and polylogarithmic in the number of points. Finally, we comment on the applicability of the model and compare with existing ones like streaming, parallel and distributed computational models.

PR A Size-Free CLT for Poisson Multinomials and its Applications (STOC 2016)
https//arxiv.org/abs/1511.03641.html
Constantinos Daskalakis, Anindya De, Gautam Kamath, Christos Tzamos

An $(n,k)$-Poisson Multinomial Distribution (PMD) is the distribution of the sum of $n$ independent random vectors supported on the set ${\cal B}_k={e_1,\ldots,e_k}$ of standard basis vectors in $\mathbb{R}^k$. We show that any $(n,k)$-PMD is ${\rm poly}\left({k\over \sigma}\right)$-close in total variation distance to the (appropriately discretized) multi-dimensional Gaussian with the same first two moments, removing the dependence on $n$ from the Central Limit Theorem of Valiant and Valiant. Interestingly, our CLT is obtained by bootstrapping the Valiant-Valiant CLT itself through the structural characterization of PMDs shown in recent work by Daskalakis, Kamath, and Tzamos. In turn, our stronger CLT can be leveraged to obtain an efficient PTAS for approximate Nash equilibria in anonymous games, significantly improving the state of the art, and matching qualitatively the running time dependence on $n$ and $1/\varepsilon$ of the best known algorithm for two-strategy anonymous games. Our new CLT also enables the construction of covers for the set of $(n,k)$-PMDs, which are proper and whose size is shown to be essentially optimal. Our cover construction combines our CLT with the Shapley-Folkman theorem and recent sparsification results for Laplacian matrices by Batson, Spielman, and Srivastava. Our cover size lower bound is based on an algebraic geometric construction. Finally, leveraging the structural properties of the Fourier spectrum of PMDs we show that these distributions can be learned from $O_k(1/\varepsilon^2)$ samples in ${\rm poly}_k(1/\varepsilon)$-time, removing the quasi-polynomial dependence of the running time on $1/\varepsilon$ from the algorithm of Daskalakis, Kamath, and Tzamos.

GT ML Learning in Auctions: Regret is Hard, Envy is Easy (FOCS 2016)
https//arxiv.org/abs/1511.01411.html
Constantinos Daskalakis, Vasilis Syrgkanis

A line of recent work provides welfare guarantees of simple combinatorial auction formats, such as selling m items via simultaneous second price auctions (SiSPAs) (Christodoulou et al. 2008, Bhawalkar and Roughgarden 2011, Feldman et al. 2013). These guarantees hold even when the auctions are repeatedly executed and players use no-regret learning algorithms. Unfortunately, off-the-shelf no-regret algorithms for these auctions are computationally inefficient as the number of actions is exponential. We show that this obstacle is insurmountable: there are no polynomial-time no-regret algorithms for SiSPAs, unless RP$\supseteq$ NP, even when the bidders are unit-demand. Our lower bound raises the question of how good outcomes polynomially-bounded bidders may discover in such auctions. To answer this question, we propose a novel concept of learning in auctions, termed “no-envy learning.” This notion is founded upon Walrasian equilibrium, and we show that it is both efficiently implementable and results in approximately optimal welfare, even when the bidders have fractionally subadditive (XOS) valuations (assuming demand oracles) or coverage valuations (without demand oracles). No-envy learning outcomes are a relaxation of no-regret outcomes, which maintain their approximate welfare optimality while endowing them with computational tractability. Our results extend to other auction formats that have been studied in the literature via the smoothness paradigm. Our results for XOS valuations are enabled by a novel Follow-The-Perturbed-Leader algorithm for settings where the number of experts is infinite, and the payoff function of the learner is non-linear. This algorithm has applications outside of auction settings, such as in security games. Our result for coverage valuations is based on a novel use of convex rounding schemes and a reduction to online convex optimization.

GT TCS Who to Trust for Truthfully Maximizing Welfare? (EC 2016)
https//arxiv.org/abs/1507.02301.html
Dimitris Fotakis, Christos Tzamos, Emmanouil Zampetakis

We introduce a general approach based on \emph{selective verification} and obtain approximate mechanisms without money for maximizing the social welfare in the general domain of utilitarian voting. Having a good allocation in mind, a mechanism with verification selects few critical agents and detects, using a verification oracle, whether they have reported truthfully. If yes, the mechanism produces the desired allocation. Otherwise, the mechanism ignores any misreports and proceeds with the remaining agents. We obtain randomized truthful (or almost truthful) mechanisms without money that verify only $O(\ln m / \epsilon)$ agents, where $m$ is the number of outcomes, independently of the total number of agents, and are $(1-\epsilon)$-approximate for the social welfare. We also show that any truthful mechanism with a constant approximation ratio needs to verify $\Omega(\log m)$ agents. A remarkable property of our mechanisms is \emph{robustness}, namely that their outcome depends only on the reports of the truthful agents.

ST TCS Optimal Testing for Properties of Distributions (NIPS 2015)
https//arxiv.org/abs/1507.05952.html
Jayadev Acharya, Constantinos Daskalakis, Gautam Kamath

Given samples from an unknown distribution $p$, is it possible to distinguish whether $p$ belongs to some class of distributions $\mathcal{C}$ versus $p$ being far from every distribution in $\mathcal{C}$? This fundamental question has received tremendous attention in statistics, focusing primarily on asymptotic analysis, and more recently in information theory and theoretical computer science, where the emphasis has been on small sample size and computational complexity. Nevertheless, even for basic properties of distributions such as monotonicity, log-concavity, unimodality, independence, and monotone-hazard rate, the optimal sample complexity is unknown. We provide a general approach via which we obtain sample-optimal and computationally efficient testers for all these distribution families. At the core of our approach is an algorithm which solves the following problem: Given samples from an unknown distribution $p$, and a known distribution $q$, are $p$ and $q$ close in $\chi^2$-distance, or far in total variation distance? The optimality of our testers is established by providing matching lower bounds with respect to both $n$ and $\varepsilon$. Finally, a necessary building block for our testers and an important byproduct of our work are the first known computationally efficient proper learners for discrete log-concave and monotone hazard rate distributions.

GT Strong Duality for a Multiple-Good Monopolist (EC 2015)
https//arxiv.org/abs/1409.4150.html
Constantinos Daskalakis, Alan Deckelbaum, Christos Tzamos

We characterize optimal mechanisms for the multiple-good monopoly problem and provide a framework to find them. We show that a mechanism is optimal if and only if a measure $\mu$ derived from the buyer’s type distribution satisfies certain stochastic dominance conditions. This measure expresses the marginal change in the seller’s revenue under marginal changes in the rent paid to subsets of buyer types. As a corollary, we characterize the optimality of grand-bundling mechanisms, strengthening several results in the literature, where only sufficient optimality conditions have been derived. As an application, we show that the optimal mechanism for $n$ independent uniform items each supported on $[c,c+1]$ is a grand-bundling mechanism, as long as $c$ is sufficiently large, extending Pavlov’s result for $2$ items [Pavlov’11]. At the same time, our characterization also implies that, for all $c$ and for all sufficiently large $n$, the optimal mechanism for $n$ independent uniform items supported on $[c,c+1]$ is not a grand bundling mechanism.

ML ST Optimum Statistical Estimation with Strategic Data Sources (COLT 2015)
https//arxiv.org/abs/1408.2539.html
Yang Cai, Constantinos Daskalakis, Christos H. Papadimitriou

We propose an optimum mechanism for providing monetary incentives to the data sources of a statistical estimator such as linear regression, so that high quality data is provided at low cost, in the sense that the sum of payments and estimation error is minimized. The mechanism applies to a broad range of estimators, including linear and polynomial regression, kernel regression, and, under some additional assumptions, ridge regression. It also generalizes to several objectives, including minimizing estimation error subject to budget constraints. Besides our concrete results for regression problems, we contribute a mechanism design framework through which to design and analyze statistical estimators whose examples are supplied by workers with cost for labeling said examples.

GT A Counter-Example to Karlin's Strong Conjecture for Fictitious Play (FOCS 2014)
https//arxiv.org/abs/1412.4840.html
Constantinos Daskalakis, Qinxuan Pan

Fictitious play is a natural dynamic for equilibrium play in zero-sum games, proposed by [Brown 1949], and shown to converge by [Robinson 1951]. Samuel Karlin conjectured in 1959 that fictitious play converges at rate $O(1/\sqrt{t})$ with the number of steps $t$. We disprove this conjecture showing that, when the payoff matrix of the row player is the $n \times n$ identity matrix, fictitious play may converge with rate as slow as $\Omega(t^{-1/n})$.

GT Understanding Incentives: Mechanism Design becomes Algorithm Design (FOCS 2013)
https//arxiv.org/abs/1305.4002.html
Yang Cai, Constantinos Daskalakis, S. Matthew Weinberg

We provide a computationally efficient black-box reduction from mechanism design to algorithm design in very general settings. Specifically, we give an approximation-preserving reduction from truthfully maximizing \emph{any} objective under \emph{arbitrary} feasibility constraints with \emph{arbitrary} bidder types to (not necessarily truthfully) maximizing the same objective plus virtual welfare (under the same feasibility constraints). Our reduction is based on a fundamentally new approach: we describe a mechanism’s behavior indirectly only in terms of the expected value it awards bidders for certain behavior, and never directly access the allocation rule at all. Applying our new approach to revenue, we exhibit settings where our reduction holds \emph{both ways}. That is, we also provide an approximation-sensitive reduction from (non-truthfully) maximizing virtual welfare to (truthfully) maximizing revenue, and therefore the two problems are computationally equivalent. With this equivalence in hand, we show that both problems are NP-hard to approximate within any polynomial factor, even for a single monotone submodular bidder. We further demonstrate the applicability of our reduction by providing a truthful mechanism maximizing fractional max-min fairness. This is the first instance of a truthful mechanism that optimizes a non-linear objective.

PR Learning Poisson Binomial Distributions (STOC 2012)
https//arxiv.org/abs/1107.2702.html
Constantinos Daskalakis, Ilias Diakonikolas, Rocco A. Servedio

We consider a basic problem in unsupervised learning: learning an unknown \emph{Poisson Binomial Distribution}. A Poisson Binomial Distribution (PBD) over ${0,1,\dots,n}$ is the distribution of a sum of $n$ independent Bernoulli random variables which may have arbitrary, potentially non-equal, expectations. These distributions were first studied by S. Poisson in 1837 \cite{Poisson:37} and are a natural $n$-parameter generalization of the familiar Binomial Distribution. Surprisingly, prior to our work this basic learning problem was poorly understood, and known results for it were far from optimal. We essentially settle the complexity of the learning problem for this basic class of distributions. As our first main result we give a highly efficient algorithm which learns to $\eps$-accuracy (with respect to the total variation distance) using $\tilde{O}(1/\eps^3)$ samples \emph{independent of $n$}. The running time of the algorithm is \emph{quasilinear} in the size of its input data, i.e., $\tilde{O}(\log(n)/\eps^3)$ bit-operations. (Observe that each draw from the distribution is a $\log(n)$-bit string.) Our second main result is a {\em proper} learning algorithm that learns to $\eps$-accuracy using $\tilde{O}(1/\eps^2)$ samples, and runs in time $(1/\eps)^{\poly (\log (1/\eps))} \cdot \log n$. This is nearly optimal, since any algorithm {for this problem} must use $\Omega(1/\eps^2)$ samples. We also give positive and negative results for some extensions of this learning problem to weighted sums of independent Bernoulli random variables.

Daskalakis Group Homepage ()
/

## News [7/31/2018] Prof. Daskalakis is awarded the Nevalinna Prize!
[7/5/2018] Prof. Daskalakis is named a Simons Foundation Investigator in Theoretical CS
[5/16/2018] [Andrew](http://andrewilyas.com) has won the MIT SuperUROP award for his work
[4/20/2018] [Gautam](http://www.gautamkamath.com) will be joining the University of Waterloo's Cheriton School of Computer Science in July 2019
[4/1/2018] [Manolis](http://www.mit.edu/~mzampet) has received the Google PhD Fellowship ## Members - Yuval Dagan - [Nishanth Dikkala](http://people.csail.mit.edu/nishanthd/) - [Andrew Ilyas](http://andrewilyas.com) - Siddarth Jayanti - Sujit Rao - [Manolis Zampetakis](http://www.mit.edu/~mzampet/) - **PhD Alumni**: [Gautam Kamath](http://www.gautamkamath.com) (Simons Institute (postdoc) -> UWaterloo CS Asst. Prof), [Matt Weinberg](https://www.cs.princeton.edu/~smattw/) (Princeton CS Asst. Prof), [Yang Cai](https://www.cs.mcgill.ca/~cai/) (McGill CS Asst. Prof), Alan Deckelbaum (Renaissance Technologies), [Christos Tzamos](http://christos.me) (MSR (postdoc) -> UW-Madison Asst. Prof.) - **Postdoctoral Alumni**: [Ioannis Panageas](https://sites.google.com/site/panageasj/home) (SUTD CS Asst. Prof), [Nick Gravin](https://logic.pdmi.ras.ru/~gravin/) (Shanghai University of Finance and Economics Assc. Prof), [Nima Haghpanah](http://personal.psu.edu/nuh47) (Penn State Economics Asst. Prof) {% include papers.html %} MIT Accessibility

MIT Accessibility