**Geoffrey Hinton**, Distinguished Professor, CS Department, University of Toronto, and Distinguished Researcher, Google.

**Title:** Dark Knowledge

**Abstract:** A simple way to improve classification performance is to average the predictions of a large ensemble of different classifiers. This is great for winning competitions but requires too much computation at test time for practical applications such as speech recognition. In a widely ignored paper in 2006, Caruana and his collaborators showed that the knowledge in the ensemble could be transferred to a single, efficient model by training the single model to mimic the log probabilities of the ensemble average. This technique works because most of the knowledge in the learned ensemble is in the relative probabilities of extremely improbable wrong answers. For example, the ensemble may give a BMW a probability of one in a billion of being a garbage truck but this is still far greater (/imgn the log domain) than its probability of being a carrot. This “dark knowledge”, which is practically invisible in the class probabilities, defines a similarity metric over the classes that makes it much easier to learn a good classifier. I will describe a new variation of this technique called “distillation” and will show some surprising examples in which good classifiers over all of the classes can be learned from data in which some of the classes are entirely absent, provided the targets come from an ensemble that has been trained on all of the classes. I will also show how this technique can be used to improve a state-of-the-art acoustic model and will discuss its application to learning large sets of specialist models without overfitting. This is joint work with Oriol Vinyals and Jeff Dean.

**Bio:** Geoffrey Hinton received his BA in experimental psychology from Cambridge in 1970 and his PhD in Artificial Intelligence from Edinburgh in 1978. He did postdoctoral work at Sussex University and the University of California San Diego and spent five years as a faculty member in the Computer Science department at Carnegie-Mellon University. He then became a fellow of the Canadian Institute for Advanced Research and moved to the Department of Computer Science at the University of Toronto. He spent three years from 1998 until 2001 setting up the Gatsby Computational Neuroscience Unit at University College London and then returned to the University of Toronto where he is a University Professor. He is the director of the program on “Neural Computation and Adaptive Perception” which is funded by the Canadian Institute for Advanced Research.

Geoffrey Hinton is a fellow of the Royal Society, the Royal Society of Canada, and the Association for the Advancement of Artificial Intelligence. He is an honorary foreign member of the American Academy of Arts and Sciences, and a former president of the Cognitive Science Society. He has received honorary doctorates from the University of Edinburgh and the University of Sussex. He was awarded the first David E. Rumelhart prize (2001), the IJCAI award for research excellence (2005), the IEEE Neural Network Pioneer award (1998), the ITAC/NSERC award for contributions to information technology (1992) the Killam prize for Engineering (2012) and the NSERC Herzberg Gold Medal (2010) which is Canada’s top award in Science and Engineering.

Geoffrey Hinton designs machine learning algorithms. His aim is to discover a learning procedure that is efficient at finding complex structure in large, high-dimensional datasets and to show that this is how the brain learns to see. He was one of the researchers who introduced the back-propagation algorithm that has been widely used for practical applications. His other contributions to neural network research include Boltzmann machines, distributed representations, time-delay neural nets, mixtures of experts, variational learning, products of experts and deep belief nets. His current main interest is in unsupervised learning procedures for multi-layer neural networks with rich sensory input.

**Trevor Darrell**, Professor, CS Division, University of California, Berkeley, and the International Computer Science Institute, Berkeley. Director, Berkeley Vision and Learning Center.

**Title:** Large-scale detector adaptation and other recent results

**Abstract:** In this talk I’ll review recent progress towards robust and effective perceptual representation learning. I’ll describe new methods for large-scale detection, whereby robust detectors can be learned from weakly labeled training data, following paradigms of domain adaptation and multiple instance learning. I’ll discuss how such models can be used not only for detection but also for pose prediction and further for effective fine-grained recognition, extending traditional convolutional neural network models to include explicit pose-normalized descriptors. Finally, and time permitting (pardon the pun), I’ll review our recent work on anytime recognition, which provides methods that strive to provide the best answer possible, even with a limited (and unknown) time budget.

**Bio:** Prof. Trevor Darrell’s group is co-located at the University of California, Berkeley, and the UCB-affiliated International Computer Science Institute (ICSI), also located in Berkeley, CA. Prof. Darrell is on the faculty of the CS Division of the EECS Department at UCB and is the vision group lead at ICSI. Darrell’s group develops algorithms for large-scale perceptual learning, including object and activity recognition and detection, for a variety of applications including multimodal interaction with robots and mobile devices. His interests include computer vision, machine learning, computer graphics, and perception-based human computer interfaces. Prof. Darrell was previously on the faculty of the MIT EECS department from 1999-2008, where he directed the Vision Interface Group. He was a member of the research staff at Interval Research Corporation from 1996-1999, and received the S.M., and PhD. degrees from MIT in 1992 and 1996, respectively. He obtained the B.S.E. degree from the University of Pennsylvania in 1988, having started his career in computer vision as an undergraduate researcher in Ruzena Bajcsy’s GRASP lab.

**Luca Trevisan**, Professor, Computer Science Division and Simons Institute for the Theory of Computing, U.C. Berkeley

**Title:** Graph Partitioning Algorithms and Laplacian Eigenvalues

**Abstract:** Spectral graph theory studies applications of linear algebra to graph theory and to the design and analysis of graph algorithms. “Spectral” graph algorithms are algorithms that exploit properties of the eigenvalues and eigenvectors of matrices associated with a graph, such as the Laplacian matrix. Spectral partitioning and clustering algorithms usually work well in practice, but the theory still gives only an incomplete rigorous understanding of their performance. We report on some progress in this direction.

The Cheeger inequality is a classical result in spectral graph theory which states that the second Laplacian eigenvalue of a graph is small if and only if the graph has a sparse cut. The proof of the Cheeger inequality also gives a worst-case analysis of the “sweep” spectral partitioning algorithm of Fiedler as an approximation algorithm for the sparsest cut problem.

We discuss three generalizations of this result:

(i) the k-th Laplacian eigenvalue is small if and only if the vertices can be partitioned into k subsets, each defining a sparse cut.

(ii) if the k-th Laplacian eigenvalue is large, then Fiedler’s sweep algorithm performs better than the worst-case bounds implied by Cheeger’s inequality. This gives an explanation for the good performance of Fiedler’s algorithm for some types of graphs.

(iii) if the k-th Laplacian eigenvalue is small and the (k+1)-st is large, then the vertices can be partitioned into k subsets such that each subset defines a sparse cut and each subset induces an expanding subgraph. This points to a rigorous justification for the good performance of spectral clustering algorithms.

**Bio:** Luca Trevisan is a professor of electrical engineering and computer science at U.C. Berkeley and a senior scientist at the Simons Institute for the Theory of Computing. Luca received his Dottorato (PhD) in 1997, from the Sapienza University of Rome, working with Pierluigi Crescenzi. After graduating, Luca was a post-doc at MIT and at DIMACS, and he was on the faculty of Columbia University, U.C. Berkeley, and Stanford, before returning to Berkeley in 2014.

Luca’s research is in theoretical computer science, and most of his work has been in two areas: (/img) pseudo-randomness and its relation to average-case complexity and derandomization; and (ii) the theory of probabilistically checkable proofs and its relation to the approximability of combinatorial optimization problems. In the past three years he has been working on spectral graph theory and its applications to graph algorithms.

Luca received the STOC’97 Danny Lewin (best student paper) award, the 2000 Oberwolfach Prize, and the 2000 Sloan Fellowship. He was an invited speaker at the 2006 International Congress of Mathematicians in Madrid.

**Stephen Boyd**, Professor, Electrical Engineering, Stanford University.

**Title:** Convex Optimization: From embedded real-time to large-scale distributed

**Abstract:** Convex optimization has emerged as useful tool for applications that include data analysis and model fitting, resource allocation, engineering design, network design and optimization, finance, and control and signal processing. After an overview, the talk will focus on two extremes: real-time embedded convex optimization, and distributed convex optimization. Code generation can be used to generate extremely efficient and reliable solvers for small problems, that can execute in milliseconds or microseconds, and are ideal for embedding in real-time systems. At the other extreme, we describe methods for large-scale distributed optimization, which coordinate many solvers to solve enormous problems.

**Bio:** Stephen Boyd is the Samsung Professor of Engineering, and Professor of Electrical Engineering in the Information Systems Laboratory at Stanford University. He received the A.B. degree in Mathematics from Harvard University in 1980, and the Ph.D. in Electrical Engineering and Computer Science from the University of California, Berkeley, in 1985, and then joined the faculty at Stanford. His current research focus is on convex optimization applications in control, signal processing, and circuit design.

All talks will be held at TTIC in room #526 located at 6045 South Kenwood Avenue (intersection of 61st street and Kenwood Avenue)