## Tenth Anniversary Symposium

## September 27, 2013

## Schedule

- 8:30- Coffee and Bagels
- 8:55-10:30 Session 1 (45 minute talks)
- Welcome: David McAllester
- Michael Jordan
- Michael Collins
- 10:30-11:00 Break - Coffee etc
- 11:00-12:45 Session 2 (30 minute talks)
- John Lafferty
- Shai Shalev-Shwartz
- Adam Kalai
- Graduation + a few words from Sadaoki Furui
- 12:45-1:30 Lunch
- 1:30-3:00 Session 3 (30 minute talks)
- 3:00-4:00 Posters + Coffee
- 4:00-6:00 Session 4 (30 minute talks)

## Speakers

### Julia Chuzhoy

**Bio:** Julia Chuzhoy is an Associate Professor at TTIC. She received her Ph.D. in Computer Science from Technion, Israel, and spent several years as a postdoc at MIT, University of Pennsylvania and Institute for Advanced Study, before joining TTIC at 2007. Chuzhoy's research area is theoretical computer science, with the main emphasis on design and analysis of algorithms, approximation of NP-hard problems and hardness approximation proofs.

**Home Page:** http://ttic.uchicago.edu/~cjulia/

**Title:** Grid Minors, Routing Problems, and Connections between them

**Abstract:** One of the key results in Robertson and Seymour's seminal work on graph minors is the Grid-Minor Theorem. The theorem states that every graph of large treewidth contains a large grid minor. For a graph if treewidth k, the best current bound on the size of the grid minor is \Omega(\sqrt{\log k/\log\log k}).

In a typical routing problem, we are given a graph G and a set {(s_1,t_1), (s_k,t_k)} of demand pairs. The goal is to route as many of the demand pairs as possible, by connecting each such pair with a path, so that the load on each edge of G is small. When the number of the demand pairs k is bounded by a constant, the problem can be solved efficiently, by using the Grid-Minor Theorem, along with other tools from the Graph Minor Theory. However, when k is not bounded, the problem is NP-hard.

In this talk we present new approximation algorithms for routing problems, that achieve a polylogarithmic approximation, when the allowed load on every edge is 2. These algorithms lead to new structural results about graphs, which, in turn, allow us to improve the best current bounds for the Grid-Minor theorem: if k denotes the graph treewidth and f(k) the size of the largest grid minor, then we show that f(k) is at least polynomial in k.

Based on joint work with Shi Li and Chandra Chekuri.

### Michael Collins

**Bio:** My research interests are in natural language processing, and machine learning. I completed a PhD in computer science from the University of Pennsylvania in December 1998. From January 1999 to November 2002 I was a researcher at AT&T Labs-Research, and from January 2003 until December 2010 I was an assistant/associate professor at MIT. I joined Columbia University in January 2011.

**Home Page:** http://www.cs.columbia.edu/~mcollins/

**Title:** Spectral Learning of Latent-Variable PCFGs

**Abstract:** Statistical models with hidden or latent variables are of great importance in natural language processing, speech, and many other fields. The EM algorithm is a remarkably successful method for parameter estimation within these models: it is simple, it is often relatively efficient, and it has well understood formal properties. It does, however, have a major limitation: it has no guarantee of finding the global optimum of the likelihood function. From a theoretical perspective, this means that the EM algorithm is not guaranteed to give consistent parameter estimates. From a practical perspective, problems with local optima can be difficult to deal with.

In this talk I'll describe a spectral algorithm for learning of latent-variable probabilistic context-free grammars (L-PCFGs). L-PCFGs have been shown to be effective models for natural language parsing. Under a separation (singular value) condition, our algorithm provides consistent parameter estimates; this is in contrast with previous work, which has used the EM algorithm for parameter estimation, with the usual problems of local optima. Experiments show that the algorithm gives the same level of accuracy as EM, but is an order or magnitude more efficient. The method involves a generalization of previous work on spectral learning of hidden Markov models (Hsu et al., 2009).

This is joint work with Shay Cohen, Karl Stratos, Dean Foster, and Lyle Ungar.

### Michael Jordan

**Bio:** Michael I. Jordan is the Pehong Chen Distinguished Professor in the Department of Electrical Engineering and Computer Science and the Department of Statistics at the University of California, Berkeley.

His research in recent years has focused on Bayesian nonparametric analysis, probabilistic graphical models, spectral methods, kernel machines and applications to problems in signal processing, statistical genetics, computational biology, information retrieval and natural language processing. Prof. Jordan was elected a member of the National Academy of Sciences (NAS) in 2010, of the National Academy of Engineering (NAE) in 2010, and of the American Academy of Arts and Sciences in 2011. He is a Fellow of the American Association for the Advancement of Science (AAAS). He has been named a Neyman Lecturer and a Medallion Lecturer by Institute of Mathematical Statistics (IMS). He is a Fellow of the IMS, a Fellow of the IEEE, a Fellow of the AAAI, and a Fellow of the ASA.

**Home Page:** http://www.cs.berkeley.edu/~jordan/

**Title:** On the Computational and Statistical Interface and "Big Data"

**Abstract:** The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the statistical and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in "Big Data" is apparent from their sharply divergent nature at an elementary level---in computer science, the growth of the number of data points is a source of "complexity" that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of "simplicity" in that inferences are generally stronger and asymptotic results can be invoked. I present three research vignettes that pursue this theme, the first involving the deployment of resampling methods such as the bootstrap on parallel and distributed computing platforms, the second involving large-scale matrix completion, and the third introducing a methodology of "algorithmic weakening," whereby hierarchies of convex relaxations are used to control statistical risk as data accrue.

[Joint work with Venkat Chandrasekaran, Ariel Kleiner, Lester Mackey, Purna Sarkar, and Ameet Talwalkar].

### Sham Kakade

**Bio:** I am a senior research scientist at Microsoft Research, New England, a relatively new lab in Cambridge, MA. Previously, I was an associate professor at the Department of Statistics, Wharton, University of Pennsylvania (from 2010-2012), and I was an assistant professor at the Toyota Technological Institute at Chicago. Before this, I did a postdoc in the Computer and Information Science department at the University of Pennsylvania under the supervision of Michael Kearns. I completed my PhD at the Gatsby Unit where my advisor was Peter Dayan. Before Gatsby, I was an undergraduate at Caltech where I did my BS in physics.

**Home Page:** http://research.microsoft.com/en-us/um/people/skakade/

**Title:** Tensor Decompositions for Learning Latent Variable Models

**Abstract:** In many applications, we face the challenge of modeling the interactions between multiple observations. A popular and successful approach in machine learning and AI is to hypothesize the existence of certain latent (or hidden) causes which help to explain the correlations in the observed data. The (unsupervised) learning problem is to accurately estimate a model with only samples of the observed variables. For example, in document modeling, we may wish to characterize the correlational structure of the "bag of words" in documents. Here, a standard model is to posit that documents are about a few topics (the hidden variables) and that each active topic determines the occurrence of words in the document. The learning problem is, using only the observed words in the documents (and not the hidden topics), to estimate the topic probability vectors (i.e. discover the strength by which words tend to appear under different topcis). In practice, a broad class of latent variable models is most often fit with either local search (such as the EM algorithm) or sampling based approaches.

This talk will discuss a general and (computationally and statistically) efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---by exploiting a certain tensor structure in their low-order observable moments. Specifically, parameter estimation is reduced to the problem of extracting a certain decomposition of a tensor derived from the (typically second- and third-order) moments; this particular tensor decomposition can be viewed as a natural generalization of the singular value decomposition for matrices.

Furthermore, we will also discuss some of the difficulties in the analysis of the EM algorithm and some related conjectures.

### Adam Kalai

**Bio:** Adam Tauman Kalai received his BA (1996) from Harvard, and MA (1998) and PhD (2001) under the supervision of Avrim Blum from CMU. After an NSF postdoctoral fellowship at M.I.T. with Santosh Vempala, he served as an assistant professor at the Toyota Technological institute at Chicago and then at Georgia Tech. He is now a Senior Researcher at Microsoft Research New England. His honors include an NSF CAREER award, and an Alfred P. Sloan fellowship. His research focuses on human computation, machine learning, and algorithms.

**Home Page:** http://research.microsoft.com/en-us/um/people/adum/

**Title:** Learning to Program by Example

**Abstract:** An old dream is that people will teach computers to do certain types of repetitive tasks by giving examples rather than writing programs. We'll discuss why the difficulties in programming by example as well as overview recent approaches in machine learning and human-computer interaction to help tackle this problem on the domain of text processing by example.

Joint work with Sumit Gulwani, Butler Lampson, Aditya Menon, Rob Miller, Omer Tamuz, Shubham Tulsiani, and Kuat Yessenov

### John Lafferty

**Bio:** I am a professor at the University of Chicago, with a joint appointment in Statistics (primary) and Computer Science, and a faculty member of the College of the University of Chicago. My main research area is statistical machine learning, with a focus on computational and statistical aspects of nonparametric methods, high-dimensional data, graphical models, and applications. Prior to joining the University of Chicago in 2011, I was Professor of Computer Science, Machine Learning, and Statistics at Carnegie Mellon University, where I am currently an Adjunct Professor.

**Home Page:** http://galton.uchicago.edu/~lafferty/

**Title:** Information, Optimization, and Estimation

**Abstract:** We present recent research that revisits classical nonparametric estimation theory from a computational perspective, and classical optimization theory from a statistical perspective. In particular, we formulate an extension to Pinsker's theorem in the setting of rate distortion theory, and study stochastic convex optimization in terms of local minimax complexity, establishing information theoretic lower bounds in both settings. Joint work with Yuancheng Zhu of the University of Chicago Department of Statistics.

### John Langford

**Bio:** John Langford is the author of the weblog hunch.net and the principal developer of Vowpal Wabbit. John works at Microsoft Research New York, of which he was one of the founding members, and was previously affiliated with Yahoo! Research, Toyota Technological Institute, and IBM's Watson Research Center. He studied Physics and Computer Science at the California Institute of Technology, earning a double bachelor's degree in 1997, and received his Ph.D. in Computer Science from Carnegie Mellon University in 2002. He was the program co-chair for the 2012 International Conference on Machine Learning.

**Home Page:** http://www.hunch.net/

**Title:** Extreme Multiclass

**Abstract:** Standard multiclass classification techniques require O(k) computation during train and test where k is the number of classes, yet the information theoretic lower bound is O(log k). This gap matters little when k is on the order of 10 or 100, but at 10^4 or 10^6 it matters a great deal. I will discuss the theory of extreme multiclass classification including consistency, robustness, efficiency, and structure learning.

### Seiichi Mita

**Bio:** Dr. Mita received the B.S. degree, the M.S. degree and the Ph.D. degree in electrical engineering from Kyoto University in 1969, 1971, 1989 respectively. He studied at Hitachi Central Research Laboratory, Kokubunji, Japan from 1971 to 1991 on signal processing and coding methods. He moved to Data Storage & Retrieval Systems Division, Hitachi, Ltd. in 1991. He developed channel coding methods and these LSI chips for magnetic disk drives. Now, he is a professor of Toyota Technological Institute (TTI) in Nagoya since 1999 and a director of Research Center for Smart Vehicles at TTI. Currently, he is greatly interested in the research area of autonomous vehicles and sensing systems. He is a member of the Institute of Electronics, Information and Communication Engineers and the Institute of Image Information and Television Engineers in Japan. He is also a member of IEEE. He received the best paper award of IEEE Consumer Electronics Society in 1986 and the best paper and the author awards of the Institute of Television Engineers in Japan in 1987 and 1992 respectively.

**Home Page:** http://www.toyota-ti.ac.jp/english/research/staff/prof/post-61.html

**Title:** Collaborative works between TTI and TTIC during past 10 years

**Abstract:** I would like to introduce our institution briefly, and our collaborations for education including TTI joint seminars, machine learning lectures by TTIC faculty and visiting studentship that were conducted during past 10 years. Next, I would like to explain some research achievements through our collaborative works. Research Center for Smart Vehicles were established in April, 2010 based on these research results and expectation for future development of collaborative works. I would like to introduce the latest research activities for autonomous vehicle development in this center and also other research collaborations.

### Sasha Razborov

**Bio:** Alexander Razborov received his B.S. degree from Moscow State University in 1985 and his PhD and Doctoral degrees from the Steklov Mathematical Institute in Moscow (1987 and 1991, respectively). He has been on the Steklov faculty since that. In 2000-2008 Dr. Razborov was holding a visiting position at the Institute for Advanced Study, and in 2008 he joined the CS Department at the University of Chicago as a Professor.

Dr. Razborov's primary research area is complexity theory, and he is specifically interested in circuit complexity, proof complexity, quantum computations and communication complexity. Previously he made contributions to combinatorial group theory, and currently he is also exploring some areas in discrete mathematics like extremal combinatorics.

LITERATURE

L. Lovasz. Large Networks and Graph Limits, American Mathematical Society, 2012.

A. Razborov, Flag Algebras: an Interim Report, in the volume „The Mathematics of Paul Erdos II”, Springer, 2013.

A. Razborov, What is a Flag Algebra, to appear in the Notices of the AMS (http://people.cs.uchicago.edu/~razborov/files/flag_ams.pdf)

**Home Page:** http://people.cs.uchicago.edu/~razborov/

**Title:** Continuous Combinatorics

**Abstract:** Combinatorics was conceived, and then developed over centuries as a discipline about finite structures. In the modern world, however, its applications increasingly pertain to structures that, although finite, are extremely large: the Internet network, social networks, statistical physics, to name just a few. And the numerical characteristics researchers are normally interested in are "continuous" in the sense that small perturbations in the structure do not change the output very much. This makes it very natural to try to think of the "limit theory" of such objects by pretending that "very large" actually means "infinite". It turns out that this mathematical abstraction is very useful and instructive and leads to unexpected connections with many other things, both in mathematics and computer science.

Two complementing approaches to constructing such a theory and applying it elsewhere are known as graph limits and flag algebras, and in our talk we review as much of it as the time permits.

### Shai Shalev-Shwartz

**Bio:** I am a faculty member at the School of Computer Science and Engineering at the Hebrew University of Jerusalem, Israel. I received my PhD from the Hebrew University in 2007, and was a research assistant professor at the Toyota Technological Institute at Chicago until June 2009.

**Home Page:** http://www.cs.huji.ac.il/~shais/

**Title:** What is learnable and how to learn?

**Abstract:** The fundamental theorem of statistical learning states that learnability is equivalent to uniform convergence of the empirical risk to the population risk, which is equivalent to finiteness of the VC dimension. This equivalence also tells us how to learn, since if a problem is at all learnable, it is learnable via empirical risk minimization over the hypothesis class. We show that the equivalence breaks down in multiclass categorization problems. Furthermore, we show that a successful multiclass learner must be very different than empirical risk minimizer over the hypothesis class. This theoretical study has some interesting practical implications.

### Masashi Sugiyama

**Bio:** Masashi Sugiyama was born in Osaka, Japan, in 1974. He received the degrees of Bachelor of Engineering, Master of Engineering, and Doctor of Engineering in Computer Science from Tokyo Institute of Technology, Japan in 1997, 1999, and 2001, respectively. In 2001, he was appointed Assistant Professor in the same institute, and from 2003, he is Associate Professor. He received Alexander von Humboldt Foundation Research Fellowship and stayed at Fraunhofer Institute, Berlin, Germany, from 2003 to 2004. In 2006, he received European Commission Program Erasmus Mundus Scholarship and stayed at University of Edinburgh, Edinburgh, UK. He was awarded Faculty Award from IBM in 2007 for his contribution to machine learning under non-stationarity, and Nagao Special Researcher Award from IPSJ in 2011 for his contribution to the density-ratio paradigm of machine learning. His research interest includes theories and algorithms of machine learning and data mining, and a wide range of applications such as signal processing, image processing, and robot control.

**Home Page:** http://sugiyama-www.cs.titech.ac.jp/~sugi/

**Title:** Machine Learning with Density Ratio Estimation

**Abstract:** In statistical machine learning, avoiding density estimation is essential because it is often more difficult than solving a target machine learning problem itself. This general idea is often referred to as Vapnik's principle, and the support vector machine is one of the successful realizations of this principle. Following this spirit, we developed a new machine learning framework based on the ratio of probability density functions. This density-ratio framework includes various important machine learning tasks such as transfer learning, outlier detection, feature selection, clustering, and conditional density estimation, and all these tasks can be effectively and efficiently solved in a unified manner by directly estimating the density ratio without going through density estimation. In this talk, I give an overview of theory, algorithms, and application of density ratio estimation.

### Raquel Urtasun

**Bio:** Since September 2009, I'm an Assistant Professor at TTI-Chicago, a philanthropically endowed academic institute located in the campus of the University of Chicago. I was a visiting professor at ETH Zurich during the spring semester of 2010, working with Prof. Marc Pollefeys. Previously I was a postdoctoral research scientist at UC Berkeley EECS and ICSI. Before that, I was a postdoctoral associate at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT, where I worked with Prof. Trevor Darrell. I completed my PhD at the Computer Vision Laboratory, at EPFL, Switzerland, on June 2006, under the supervision of Prof. Pascal Fua and Prof. David Fleet at the University of Toronto. I previously worked as a research assistant at the Ecole National Superior de Telecomunication (ENST) in Paris, in the Image Procesing Department. I graduated as an Electric Engineer in the Universidad Publica de Navarra, Pamplona, Spain. I did my Master Thesis in Eurecom, France. My major interests are statistical learning and computer vision, with a particular interest in non-parametric Bayesian statistics, latent variable models, structured prediction and their application to 3D scene understanding and human pose estimation. I also like to do computer graphics from time to time.

**Home Page:** http://ttic.uchicago.edu/~rurtasun/

**Title:** Visual Scene Understanding for Autonomous Systems

**Abstract:** Developing autonomous systems that are able to assist humans in everyday's tasks is one of the grand challenges in modern computer science. Notable examples are personal robotics for the elderly and people with disabilities, as well as autonomous driving systems which can help decrease fatalities caused by traffic accidents. In order to perform tasks such as navigation, recognition and manipulation of objects, these systems should be able to efficiently extract 3D knowledge of their environment. In this talk, I'll show how graphical models provide a great mathematical formalism to extract this knowledge. In particular, I'll focus on a few examples, including 3D reconstruction, 3D object and layout estimation and self-localization.