The Machine Learning Seminar Series features weekly talks on recent research in machine learning.

The seminar is typically held Fridays 10-11am at TTIC

To receive announcements regarding the talks please subscribe to the mailing list..

Date: November 18, 2016 10-11am

Venue: Room 530, 6045 S Kenwood Ave, Chicago, IL 60637

Speaker: Weiran Wang, TTIC

Abstract: We study the sample and time complexity of canonical correlation analysis (CCA) in the population setting. With mild assumptions on the data distribution, we show that in order to achieve $\epsilon$-suboptimality in a properly defined measure of alignment between the estimated canonical directions and the population solution, we can solve the empirical objective exactly with $O(\frac{1}{\epsilon^? \Delta^2 \gamma^2}$ samples, where $\Delta$ is the singular value gap of the whitened cross-covariance matrix and $\frac{1}{\gamma}$ is an upper bound of the condition numbers of the auto-covariance matrices. Moreover, we can achieve the same learning accuracy by drawing the same level of samples and solving the empirical objective approximately with a stochastic optimization algorithm; this algorithm is based on the shift-and-invert power iterations and only needs to process the dataset for $O(log(1/\epsilon)$ passes. Finally, we show that, given an estimate of the canonical correlation, the streaming version of shift-and-invert power iterations achieves the same learning accuracy with the same level of sample complexity.

This is joint work with Jialei Wang, Dan Garber, and Nathan Srebro.

Date: November 11, 2016 10-11am

Venue: Room 530, 6045 S Kenwood Ave, Chicago, IL 60637

Speaker: Mark Hallen, TTIC

Abstract: Computational protein design algorithms search for modifications in protein systems that will optimize a desired function. Usually, this function involves chemical binding of a protein to another protein or to a drug. Binding is exquisitely sensitive to the 3-D geometry adopted by the system, with low-energy geometries preferred, so to optimize binding we must be able to optimize energy with respect to molecular geometry. Thus, our optimization—both over chemical modifications and over 3-D geometries—must take as input an “energy function,” which maps atomic coordinates to energy. In this work, we present methods to represent the energy function that are amenable to more efficient and more accurate protein design calculations than were previously possible. We pose the calculation of this representation as a machine learning problem. Two relatively simple models are found to be helpful: a continuous polynomial representation and a discrete expansion in local terms. These models open the possibility of modeling binding much more realistically in protein design calculations, while performing optimization efficiently.

Date: November 4, 2016 10-11am

Venue: Room 526, 6045 S Kenwood Ave, Chicago, IL 60637

Speaker: Eva Dyer Northwestern University

Abstract: As the size and complexity of neural datasets continue to grow, there is an increasing need for scalable and black-box approaches to extract knowledge from these data. In the first half of this talk, I will discuss my recent work in developing methods to uncover the structure and function of neural circuits in large-scale neural recordings. My aim is to provide an introduction to modern neural datasets for a ML audience, and discuss some of the challenges that we now face in developing learning algorithms for these data. In the second half of the talk, I will discuss my recent work in global black-box optimization (at UAI 2016). The idea behind our approach is to learn a convex relaxation of a (possibly) non-convex function by estimating its convex envelope (i.e., tight convex lower bound). We show that our approach can lead to provable convergence to the global optimum for classes of smooth functions.

This is work with Mohammad Gheshlaghi Azar (Google DeepMind), Konrad Kording (Northwestern), and Bobby Kasthuri (University of Chicago, Argonne National Laboratory).

Date: October 28, 2016 10-11am

Venue: Room 526, 6045 S Kenwood Ave, Chicago, IL 60637

Abstract: Extracting knowledge and providing insights into the complex mechanisms underlying noisy high-dimensional data sets is of utmost importance in many scientific domains. Networks are an example of simple, yet powerful tools for capturing relationships among entities over time. For example, in social media, networks represent connections between different individuals and the type of interaction that two individuals have. In systems biology, networks can represent the complex regulatory circuitry that controls cell behavior. Unfortunately the relationships between entities are not always observable and need to be inferred from nodal measurements.

I will present a line of work that deals with the estimation and inference in high-dimensional semi-parametric elliptical copula models. I will explain why these models are useful in exploring complex systems, how to efficiently estimate parameters of the model, and also provide theoretical guarantees that justify usage of the models in scenarios where more rigid Gaussian graphical models are commonly used.

Joint work with Rina Foygel Barber, Junwei Lu, and Han Liu.

Date: October 21, 2016 10-11am

Venue: Room 530, 6045 S Kenwood Ave, Chicago, IL 60637

Abstract: In this work, we show that there are no spurious local minima in the non-convex factorized parametrization of low-rank matrix recovery from incoherent linear measurements. With noisy measurements we show all local minima are very close to a global optimum. Together with a curvature bound at saddle points, this yields a polynomial time global convergence guarantee for stochastic gradient descent from random initialization.

(Joint work with Behnam Neyshabur, Nathan Srebro)

Date: October 14, 2016 10-11am

Venue: Room 526, 6045 S Kenwood Ave, Chicago, IL 60637

Speaker: Blake Woodworth, TTIC

Abstract: We provide tight upper and lower bounds on the complexity of minimizing the average of m convex functions using gradient and prox oracles of the component functions. We show a significant gap between the complexity of deterministic vs randomized optimization. For smooth functions, we show that accelerated gradient descent (AGD) and an accelerated variant of SVRG are optimal in the deterministic and randomized settings respectively, and that a gradient oracle is sufficient for the optimal rate. For non-smooth functions, having access to prox oracles reduces the complexity and we present optimal methods based on smooth- ing AGD that improve over methods using just gradient accesses.

Date: October 7, 2016 10-11am

Venue: Room 530, 6045 S Kenwood Ave, Chicago, IL 60637

Speaker: Nathan Srebro, TTIC

Abstract: As machine learning increasingly replaces human judgment in decisions protected by anti discrimination law, the problem of algorithmicly measuring and ensuring fairness in machine learning is pressing. What does it mean for a predictor to not discriminate with respect to protected group (e.g. according to race, gender, etc)?

We propose a notion of non-discrimination that can be measured statistically, used algorithmicly, and avoids many of the pitfalls of previous definitions. We further study what type of discrimination and non-discrimination can be identified with oblivious tests, which treat the predictor as an opaque black-box, and what different oblivious tests tell us about possible discrimination.

(Joint work with Mortiz Hardt and Eric Price)

Date: September 30, 2016 10-11am

Venue: Room 526, 6045 S Kenwood Ave, Chicago, IL 60637

Speaker: Hao Tang, TTIC

Abstract: Recent work on discriminative segmental models has shown that they can achieve competitive speech recognition performance, using features based on deep neural frame classifiers. However, segmental models can be more challenging to train than standard frame-based approaches. While some segmental models have been successfully trained end-to-end, there is a lack of understanding of their training under different settings and with different losses.

We investigate a model class based on recent successful approaches, consisting of a linear model that combines segmental features based on an LSTM frame classifier. Similarly to hybrid HMM-neural network models, segmental models of this class can be trained in two stages (frame classifier training followed by linear segmental model weight training), end-toend (joint training of both frame classifier and linear weights), or with end-to-end fine-tuning after two-stage training.

In this work we present several findings. First, models in this class can be trained with various kinds of losses. We present segmental models trained end-to-end with hinge loss, log loss, latent hinge loss, and marginal log loss. We compare several losses for the case where training alignments are available as well as where they are not. We find that in general, marginal log loss provides the most consistent strong performance without requiring ground-truth alignments. We also find that training with dropout is very important in obtaining good performance with end-to-end training. Finally, the best results are often obtained by a combination of two-stage training and fine-tuning.