# Conference on Learning Theory — June 25–June 27, 2012 — Edinburgh, Scotland

## Machine learning and AI via large scale brain simulations

### Andrew Ng

Stanford University

http://ai.stanford.edu/~ang/

### Abstract

By building large-scale simulations of cortical (brain) computations, can we enable revolutionary progress in AI and machine learning?

Machine learning often works very well, but can be a lot of work to apply because it requires spending a long time engineering the input representation (or "features") for each specific problem. This is true for machine learning applications in vision, audio, text/NLP and other problems.

To address this, researchers have recently developed "unsupervised feature learning" and "deep learning" algorithms that can automatically learn feature representations from unlabeled data, thus bypassing much of this time-consuming engineering. Many of these algorithms are developed using simple simulations of cortical (brain) computations, and build on such ideas as sparse coding and deep belief networks. By doing so, they exploit large amounts of unlabeled data (which is cheap and easy to obtain) to learn a good feature representation. These methods have also surpassed the previous state-of-the-art on a number of problems in vision, audio, and text. In this talk, I describe some of the key ideas behind unsupervised feature learning and deep learning, and present a few algorithms. I also describe some of the open theoretical problems that pertain to unsupervised feature learning and deep learning, and speculate on how large-scale brain simulations may enable us to make significant progress in machine learning and AI, especially computer perception.

Andrew Ng received his PhD from Berkeley, and is now an Associate Professor of Computer Science at Stanford University, where he works on machine learning and AI. He is also Director of the Stanford AI Lab, which is home to about 15 professors and 150 PhD students and post docs. His previous work includes autonomous helicopters, the STanford AI Robot (STAIR) project, and ROS (probably the most widely used open-source robotics software platform today). He current work focuses on neuroscience-informed deep learning and unsupervised feature learning algorithms. His group has won best paper/best student paper awards at ICML, ACL, CEAS, 3DRR. He is a recipient of the Alfred P. Sloan Fellowship, and the 2009 IJCAI Computers and Thought award. He also works on free online education, and recently taught a machine learning class ([ ml-class.org ]) to over 100,000 students. He is also a co-founder of Coursera, which is working with top universities to offer online courses to anyone in the world, for free.

## Mirror Descent Algorithms for Large-Scale Convex Optimization

(Invited Tutorial Lecture)

Georgia Institute of Technology

http://www.isye.gatech.edu/~nemirovs

### Abstract

Mirror Descent is a technique for solving nonsmooth problems with convex structure, primarily, convex minimization and convex-concave saddle point problems. Mirror Descent utilizes first order information on the problem and is a far-reaching extension of the classical Subgradient Descent algorithm (N. Shor, 1967). This technique allows to adjust, to some extent, the algorithms to the geometry of the problem at hand and under favorable circumstances results in nearly dimension-independent and unimprovable in the large scale case convergence rates. As a result, in some important cases (e.g., when solving large-scale deterministic and stochastic convex problems on the domains like Euclidean/$\ell_1$/nuclear norm balls), Mirror Descent algorithms become the methods of choice when low and medium accuracy solutions are sought.

In the tutorial, we outline the basic Mirror Descent theory for deterministic and stochastic convex minimization and convex-concave saddle point problems, including recent developments aimed at accelerating MD algorithms by utilizing problem's structure.

COLT Mirror Descent Tutorial

Dr. Arkadi Nemirovski gor his Ph.D. (Math) in 1974 from Moscow State University. He is a professor in ISyE and holds the John Hunter Chair. Dr. Nemirovski has made fundamental contributions in continuous optimization in the last thirty years that have significantly shaped the field. In recognition of his contributions to convex optimization, Nemirovski was awarded the 1982 Fulkerson Prize from the Mathematical Programming Society and the American Mathematical Society (joint with L. Khachiyan and D. Yudin), the Dantzig Prize from the Mathematical Programming Society and the Society for Industrial and Applied Mathematics in 1991 (joint with M. Grotschel). In recognition of his seminal and profound contributions to continuous optimization, Nemirovski was awarded the 2003 John von Neumann Theory Prize by the Institute for Operations Research and the Management Sciences (along with Michael Todd). He continues to make significant contributions in almost all aspects of continuous optimization: complexity, numerical methods, stochastic optimization, and non-parametric statistics.

## Phase Transitions, Algorithmic Barriers, and Data Clustering

### Dimitris Achlioptas

CTI & UC Santa Cruz

http://users.soe.ucsc.edu/~optas/

### Abstract

Constraint Satisfaction Problems (CSPs) are the common abstraction of real-life problems ranging from air-traffic control to protein folding. Their ubiquity presses forward a fundamental question: why are certain CSP instances exceptionally hard while other, seemingly similar, instances are quite easy? To study this phenomenon we consider probability distributions over CSP instances formed by adding constraints one-by-one, uniformly and independently. We will see that for many random CSPs there exists a broad regime in which exponentially many solutions provably exist, yet all known efficient algorithms fail to find even one solution. To understand the origin of this algorithmic barrier we examine how the solution-space geometry evolves as constraints are added. We prove in a precise mathematical sense that the barrier faced by algorithms corresponds to a phase transition in solution-space geometry: at some point, the set of solutions shatters and goes from resembling a single giant ball to exponentially many clusters of well-separated solutions. More speculatively, we will also discuss how this shattering phenomenon can perhaps be used to reframe the clustering problem in machine learning using ideas from statistical physics.

Dimitris Achlioptas joined the Department of Computer Science of UC Santa Cruz in 2005 after being with Microsoft Research, Redmond from 1998. In theory, his expertise lies in the interaction between randomness and computation and his work on that topic has appeared in journals including Nature, Science, and the Annals of Mathematics. For that work he has received an NSF CAREER award, a Sloan Fellowship, and an IDEAS Starting Grant from the European Research Council. In practice, he likes to think about scalability questions and holds 18 US Patents on topics ranging from load balancing and cache optimization to web search personalization. In his free time he enjoys overworking.