I'm a PhD candidate in Computer Science at Princeton University, advised by Elad Hazan. My research is focused on reinforcement learning, and seeks to address the challenges that accompany partially observable and large (or continuous) state spaces. I'm also invested in issues concerning privacy and fairness in machine learning.

I did my bachelors at IIT Kanpur, where I worked on sketch-based algorithms for machine learning, and space lower bounds in the streaming model. I've also spent time at Microsoft Research in Redmond working on program synthesis.

We give a polynomial-time algorithm for learning latent-state linear dynamical systems without system identification, and without assumptions on the spectral radius of the system's transition matrix. The algorithm extends the recently introduced technique of spectral filtering, previously applied only to systems with a symmetric transition matrix, using a novel convex relaxation to allow for the efficient identification of phases.

We present an efficient and practical algorithm for the online prediction of discrete-time linear dynamical systems. Despite the non-convex optimization problem, our algorithm comes with provable guarantees: it has near-optimal regret bounds compared to the best LDS in hindsight, while overparameterizing the model by a small logarithmic factor. Our analysis brings together ideas from improper learning through convex relaxations, online regret minimization, and the spectral theory of Hankel matrices.

We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal $O(\sqrt{T})$
regret bounds. In the full-information setting, our results demonstrate that $\epsilon$-differential privacy may be ensured for free. In particular, the regret bounds scale as $O\left(\sqrt{T}+\frac{1}{\epsilon}\right)$. For bandit linear optimization, and as a special case, for non-stochastic multi-armed bandits, the proposed algorithm achieves a regret of $O\left(\frac{1}{\epsilon}\sqrt{T}\right)$ while the previous best regret bound was $O\left(\frac{1}{\epsilon}T^{\frac{2}{3}}\right)$.

We consider regret minimization in repeated games with non-convex loss functions. Minimizing the standard notion of regret is computationally intractable. Thus, we define a natural notion of regret which permits efficient optimization and generalizes offline guarantees for convergence to an approximate local optimum. We give gradient-based methods that achieve optimal regret, which in turn guarantee convergence to equilibrium in this framework.

Manuscripts The Case for Full-Matrix Adaptive Regularization
with Naman Agarwal, Brian Bullins, Xinyi Chen, Elad Hazan, Cyril Zhang, Yi Zhang Preprint, 2018 Towards Provable Control for Unknown Linear Dynamical Systems
with Sanjeev Arora, Elad Hazan, Holden Lee, Cyril Zhang, Yi Zhang
Prelim at ICLR Workshop, 2018 Dynamic Task Allocation for Crowdsourcing
with Irineo Cabreros, Angela Zhou
Prelim at ICML Workshop on Data Efficient Machine Learning, 2016