Held in cooperation with the University of Chicago Department of Computer Science

- TTIC 31020 - Introduction to Statistical Machine Learning (100 units)

Greg Shakhnarovich -**TTIC Room 526 TR 1:30-2:50pm***Final Exam Dec 6, 1:30-3:30pm* - TTIC 31150 - Mathematical Toolkit (CMSC 31150) (100 units)

Madhur Tulsiani -**TTIC Room 530 TR 9:00-10:20am***Final Exam Dec 6, 8:00-10:00am* - TTIC 31120 - Statistical and Computational Learning Theory (100 units)

Nathan Srebro -**TTIC Room 530 TR 10:30am-11:50am***Final Exam Dec 6, 10:30-12:30pm*

- TTIC 31010 - Algorithms (100 units)

Julia Chuzhoy -**TTIC Room 530 TR 9:00-10:20am***Final Exam Mar 14, 8:00-10:00am*

Tutorial meetings -**TTIC Room 526 W 3:00-4:00pm**

Course Canvas link - TTIC 31230 - Fundamentals of Deep Learning (CMSC 31230) (100 units)

David McAllester -**TTIC Room 526 TR 1:30-2:50pm** - TTIC 31060 - Computability and Complexity Theory (CMSC 38500, MATH 30500) (100 units)

Alexander Razborov -**TR 10:30-11:50am Ryerson 277** - TTIC 31100 Computational and Metric Geometry (CMSC 39010) (100 units)

Yury Makarychev -**TTIC Room 530 MW 1:30-2:50pm** - TTIC 31160 - Topics in Bioinformatics (100 units)

Jinbo Xu -**TTIC Room 526 MW 9:00-10:20am**

- TTIC 31220 - Unsupervised Learning and Data Analysis (100 units)

Karen Livescu -**TTIC Room 526 TR 10:30-11:50am** - TTIC Course 31170 - Planning, Learning, and Estimation for Robotics and Artificial Intelligence (100 units)

Matthew Walter -**TTIC Room 530 TR 9:00-10:20am** - TTIC Course 31210 - Advanced Natural Language Processing (100 units)

Kevin Gimpel -**TTIC Room 530 MW 1:30-2:50pm**

100 units

TBD

Weekly lectures and discussions by TTIC researchers introducing their research and research problems. Provides a broad view of research carried out at TTIC. Course is pass/fail credit. Satisfies one quarter of credit (of the three required) to fulfill the Research at TTIC Series Requirement. (See Academic Program Guide for details)

100 units

Chuzhoy, Julia, and Makarychev, Yury

This is a graduate level course on algorithms with the emphasis on central combinatorial optimization problems and advanced methods for algorithm design and analysis. Topics covered include asymptotic analysis, greedy algorithms, dynamic programming, amortized analysis, randomized algorithms and probabilistic methods, combinatorial optimization and approximation algorithms, linear programming, and advanced data structures.

The course textbook is “Algorithm Design” by Kleinberg and Tardos

1. Greedy algorithms (1 week)

2. Dynamic programming (1 week)

3. Amortized analysis (1 week)

4-6. Max flow, min-cut, bipartite matching and their applications (3 weeks)

7. Linear programming, LP-duality (1 week)

8. NP-hardness (1 week)

9. Approximation algorithms (1 week)

10. Randomized algorithms (1 week)

Assumes familiarity with proofs and an the asymptotic notation. Some basic knowledge of the notion of NP-hardness is also required.

Expected outcomes:

- Ability to design and rigorously analyze algorithms using paradigms such as greedy or dynamic programming.
- Understand the use of linear programming in optimization. Be able to formulate problems as linear programs.
- Understand linear programming duality and applications to problems such as max- flow/min-cut. Be able to write duals for linear programs.

Prerequisites: Assumes familiarity with proofs and an the asymptotic notation. Some basic knowledge of the notion of NP-hardness is also required.

100 units

Shakhnarovich, Greg

A systematic introduction to machine learning, covering theoretical as well as practical aspects of the use of statistical methods. Topics include linear models for classification and regression, support vector machines, regularization and model selection, and introduction to structured prediction and deep learning. Application examples are taken from areas like information retrieval, natural language processing, computer vision and others.

Prerequisites: Probability, Linear Algebra, Undergraduate Algorithms.

I will focus on supervised learning and only talk about unsupervised settings when necessary (e.g., mixture models and density estimation for generative methods for classification). So, no clustering. There will be twenty 1.5 hour lectures; numbers in parentheses are estimated # of lectures per topic.

- intro to ML, motivation etc. (1)
- refresher on probability and algebra (1)
- statistical framework for learning; loss/risk; least squares regression (1)
- noise models; error decomposition; bias/variance and overfitting (1)
- estimation theory; ML/MAP, overfitting, bias/variance (1)
- model complexity, sparsity (L1/L2) in regression; stepwise methods for L0 sparsity (1)
- classification; Fisher’s LDA, logistic regression and softmax (1)
- ensemble methods, boosting (1)
- generative models, Naive Bayes, multivariate Gaussians (1)
- mixture models; EM (2)
- SVM and kernels (2)
- nonparametric methods; nearest neighbors, density esimation (1)
- multilayer neural networks and deep learning (1)
- information theory and learning; information criteria, MDL and their connections to regularization (1)
- experiment design and evaluation in ML (1)
- advanced topics TBD (1)
- wrap-up and review (1)

Prerequisites: knowledge of basic linear algebra, probability and calculus.

Expected outcomes:

- Understand the notion of fitting a model to data and concepts such as model complexity, overfitting and generalization, and bias-variance tradeoff in estimation.
- Learn and be able to apply some of the fundamental learning methods, such as logistic regression, support vector machines, boosting, decision trees, neural networks.
- Learn the basics of optimization techniques such as gradient descent and the general EM algorithm.
- Familiarity with multivariate Gaussians and mixtures of Gaussians.
- Understand fundamental concepts in information theory (entropy, KL-divergence) and their relationship to machine learning.

100 units

McAllester, David

This course covers the foundations of mathematics from a classical (nonconstructive) type-theoretic perspective, the general notion of a mathematical structure, the general notion of isomorphism, and the role of axiomatizations and constructions in mathematical definitions. The definition of the real numbers used as a fundamental example. The course also covers the notion of definability in well-typed formalisms. A primary example is the non-definability of a linear bijection between a vector space and its dual. Ontologies (types) relevant to machine learning are emphasized such as the type theory of PCA, CCA and Banach spaces (norms and dual norms).

There are again two lectures per week. Introduction and course outline.

Part I: Logic with Abstraction Barriers.

Sequent inference rules and proofs. Types and variable declarations. Free and bound variables. Structures and Isomorphism.

Isomorphism as equivalence under an abstract interface.

Part II: Case Studies in Abstraction

The natural numbers, integers, rationals and reals. Vector spaces. A formal treatment of the non-existence of a canonical basis (coordinate system), canonical inner product, or canonical isomorphism with the dual space. Coordinate-free treatment of matrix algebra.

Equivalences between matrix (operator) types. The fact that least squares regression does not require an ambient inner product but regularization does. Gradient descent. Gradient descent requires an ambient inner product. Newton’s method does not. The covariance matrix of a probability distribution on a vector space. The fact that the multivariate central limit theorem does not require an ambient inner product. Canonical correlation analysis also does not require an ambient inner product. PCA requires and ambient inner product.

Norms and Banach spaces. Dual norms. Measure spaces. Hilbert space. Differentiable manifolds. Information geometry. Jeffery’s prior. Natural Gradient Descent.

Expected outcomes:

- Be able to write rigorous proofs and reason formally about various mathematical notions.
- Understand basics of linear algebra, eigenvalues and eigenvectors from the viewpoint of operators.
- Understand various algorithms such as SVMs, PCA, CCA and gradient descent from the operator viewpoint.

100 units

McAllester, David

Introduction to deep learning for computer vision. Although deep learning based computer vision systems are evolving rapidly, this course attempts to teach material that will remain relevant and useful as the field changes. The course begins with general deep learning methods relevant to many applications and gradually focuses to a greater extent on computer vision. The course will emphasize theoretical and intuitive understanding to the extent possible.

Topics:

- Introduction to multi-layer perceptrons and backpropogation.
- Convolutional neural networks (CNNs)
- Recurrent neural networks (RNNs).
- General frameworks and mathematical notations for expressing neural networks.
- Implementing a framework — CUDA.
- The theory and practice of stochastic gradient descent.
- Theoretical guarantees for SGD.
- First order vs. second order methods.
- Adagrad, RMSprop, ADAM
- Vector-Hessian products
- Learning rates and momentum.

- Vanishing Gradients and exploding gradients and methods to counter these problems.
- LSTMs, GRUs, highway networks, and Resnet.

- Regularization
- L2 regularization
- Dropouts
- Early Stopping

- A survey of vision tasks and deep approaches to solving them.
- Image classification
- Object Detection
- Semantic Segmentation
- Image captioning
- Visual question answering

Expected outcomes:

- Ability to apply the methods for designing and training general deep learning architectures.
- An understanding of the general issues and phenomenon sufficient to guide architecture design.
- An understanding of the basic vision tasks to which deep methods are being applied and how deep methods approach these tasks.

Prerequisites: Introduction to machine learning.

100 units

Xu, Jinbo

This course will focus on the application of mathematical models and computer algorithms to studying molecular biology. In particular, this course will cover the following topics.

- homology search ( 1 week)
- sequence alignment and motif discovery ( 1 week)
- next-gen sequencing and genome assembly ( 1 week)
- protein sequence/structure analysis including alignment, classification, structure and function prediction ( 2 weeks).
- RNA sequence/structure analysis including alignment, classification and prediction ( 1 week)
- gene expression analysis (1 week)
- biological network analysis ( 1 week)
- phylogeny ( 1 week)

Expected outcomes:

- Ability to use popular bioinformatics tools to generate biologically meaningful results
- Ability to interpret biological results generated by a bioinformatics tool
- Ability to implement basic machine learning and optimization algorithms for important biology problems
- Application of some basic models and algorithms such as Support Vector Machines, Hidden Markov Model, Dynamic Programming to solve biology problems

Prerequisites: None

100 units

Razborov, Alexander

Part one consists of models for defining computable functions: primitive recursive functions, (general) recursive functions, and Turing machines; the Church-Turing Thesis; unsolvable problems; diagonalization; and properties of computably enumerable sets. Part two deals with Kolmogorov (resource bounded) complexity: the quantity of information in individual objects. Part three covers functions computable with time and space bounds of the Turing machine: polynomial time computability, the classes P and NP, NP- complete problems, polynomial time hierarchy, and P-space complete problems.

The tentative plan for this course is to do the following three parts.

Part I. Models for defining computable functions: primitive recursive functions, (general) recursive functions, and Turing machines; the Church-Turing Thesis; unsolvable problems; diagonalization; and properties of computably enumerable sets.

Part II. Kolmogorov (resource bounded) complexity: the quantity of information in individual objects.

Part III. Functions computable with time and space bounds of the Turing machine: polynomial time computability, the classes P and NP, NP-complete problems, polynomial time hierarchy, and P-space complete problems.

Here is the Web page of the previous version of this course: http://people.cs.uchicago.edu/~razborov/teaching/spring14.html. But I will be very willing to adjust the speed, style and content depending on the background of the students enrolled in the course.

Expected outcomes:

- Ability to identify algorithmic problems arising in various contexts.
- Application of classical recursion theory, Kolmogorov complexity and basic computational complexity to the analysis of problems arising in mathematics and beyond.
- Recognize and identify solvable and unsolvable problems, problems with high/low Kolmogorov complexity, as well as problems complete for basic complexity classes.

100 units

Srebro, Nati

The course will cover techniques in unconstrained and constrained convex optimization and a practical introduction to convex duality. The course will focus on (1) formulating and understanding convex optimization problems and studying their properties; (2) understanding and using the dual; and (3) presenting and understanding optimization approaches, including interior point methods and first order methods for non-smooth problems. Examples will be mostly from data fitting, statistics and machine learning.

Prerequisites: Linear Algebra, Multidimensional Calculus, Undergraduate Algorithms

100 units

Chuzhoy, Julia

This is a basic course on approximation algorithms, with the main focus on approximation algorithms for central combinatorial optimization problems. We will mostly focus on classical algorithmic results, but will also present some state of the art results and challenges in the area of approximation. The course will cover major algorithmic techniques, including LP-rounding, primal-dual schema, metric methods, SDP rounding and so on. While the main focus of the course is on algorithms, we will also discuss lower bounds on approximation and connections between algorithm design and lower bound proofs.

Assumes the knowledge of material covered in the Algorithms course.

Expected outcomes:

- Understand concepts such as approximation factor, polynomial time approximation schemes and hardness of approximation.
- Understand applications of linear programs (LPs) to design of approximation algorithms. Learn to analyze rounding algorithms for LPs and understand integrality gaps. Be able to apply LP duality.
- Understand semi-definite programming and its applications to approximation.

Prerequisites: Algorithms (TTIC31010 or CMSC 37000)

100 units

Livescu, Karen

Introduction to analysis of signals and linear time-invariant systems at a graduate level. Topics include: Continuous and discrete-time transforms (Fourier and others); linear filtering; sampling and aliasing; random processes and their interaction with linear systems; applications in areas such as speech and image processing and robotics.

Prerequisites: familiarity with basic linear algebra, notions of eigenvalues and eigenvectors, and (multivariate) Gaussian distributions.

Expected outcomes:

- Understand the properties and eigenfunctions of linear time-invariant (LTI) systems.
- Understand Fourier Series (discrete- and continuous-time), Fourier transform, and convolutions.
- Be able to analyze random processes, understand their stationarity properties and their interaction with LTI systems.
- Learn about sampling and aliasing, signal estimation minimizing mean square error and parameter estimation for random processes. Understand the Karhunen-Loeve transform.

100 units

Makarychev, Yury

The course covers fundamental concepts, algorithms and techniques in computational and metric geometry. Topics covered include: convex hulls, polygon triangulations, range searching, segment intersection, Voronoi diagrams, Delaunay triangulations, metric and normed spaces, low–distortion metric embeddings and their applications in approximation algorithms, padded decomposition of metric spaces, Johnson—Lindenstrauss transform and dimension reduction, approximate nearest neighbor search and locality–sensitive hashing.

The course textbook is “Computational Geometry” by M. de Berg, O. Cheong, M. van Kreveld, M. Overmars.

- Convexity: convex sets, convex hulls, vertices, supporting lines, edges, different definitions and basic properties, Caratheodory’s theorem
- Convex Hulls and Line Segment Intersections: Jarvis March, Andrew’s algorithm, sweep line algorithms, line segment intersection, Bentley—Ottmann algorithm
- Planar Graphs and Overlays: graphs, graph drawings, plane and planar graphs, Euler’s formula, data structure for plane graphs, computing overlays
- Orthogonal Range Searching (2 lectures): binary search, kd-trees, range trees
- Point Location: trapezoidal maps, randomized algorithm
- Voronoi Diagrams: Voronoi diagrams, Fortune’s algorithm
- Delaunay Triangulations (1.5 lectures): triangulations, Delaunay and locally Delaunay triangulations: definitions, existence and equivalence, duality between Delaunay triangulations and Voronoi diagrams, angle optimality
- Metric Spaces, Normed Spaces, Low Distortion Metric Embeddings (1.5 lectures): metric and normed spaces, Lipschitz maps, distortion, embeddings into Lp and lp
- Bourgain’s Theorem
- Sparsest Cut: approximation algorithm for Sparsest Cut
- Minimum Balanced Cut, Minimum Linear Arrangement, Sparsest Cut with Non-Uniform Demands, Expanders: polylog approximation algorithms for Balanced Cut and Minimum Linear Arrangement, expander graphs, integrality gap for Sparsest Cut, Sparsest Cut with non-uniform demands
- Minimum Multiway Cut, Minimum Multicut: approximation algorithms for Minimum Multiway Cut and Minimum Multicut
- Padded Decomposition, Hierarchically Separated Trees, Applications (2 lectures)
- Semidefinite Programming, Algorithm of Arora, Rao and Vazirani: semidefinite programming, ARV (high-level overview), delta separated sets, matching covers
- Dimension Reduction, Nearest Neighbor Search: dimension reduction, approximate nearest neighbor search, locality sensitive hashing
- Locality Sensitive Hashing, p–Stable Random Variables: locality sensitive hashing, p–stable random variables

Expected outcomes:

- Know standard algorithms and data structures for solving geometric problems
- Be able to design efficient algorithms and data structures for solving geometric problems
- Understand basic concepts of metric geometry such as metric and normed space, low distortion embedding, dimension reduction, nearest neighbor search.
- Understand applications of metric geometry to the field of approximation algorithms and other areas of computer science.

Prerequisites: undergraduate-level algorithms, linear algebra and probability classes; a good background in mathematical analysis/calculus

100 units

Livescu, Karen

This course will introduce techniques used in speech technologies, mainly focusing on speech recognition. Speech recognition is one of the oldest and most complex structured sequence prediction tasks receiving significant research and commercial attention, and therefore provides a good case study for many of the techniques that are used in other areas of artificial intelligence involving sequence modeling. It is also a good example of the effectiveness of combining statistics and learning with domain knowledge. The course will include practical homework exercises using Matlab and speech toolkits.

Prerequisites: a good background in basic probability.

Expected outcomes:

- Understand and apply tools for analyzing speech time series such as Fourier analysis and dynamic time warping.
- Understand and apply hidden Markov models, Gaussian mixtures, and the EM algorithm for speech problems.
- Understand and apply n-gram language models, smoothing techniques, and their application to speech recognition.
- Understand generative and discriminative structured prediction approaches for speech problems.

100 units

Srebro, Nati

We will discuss classic results and recent advances in statistical learning theory (mostly under the agnostic PAC model), touch on computational learning theory, and also explore the relationship with stochastic optimization and online regret analysis. Our emphasis will be on concept development and on obtaining a rigorous quantitative understanding of machine learning. We will also study techniques for analyzing and proving performance guarantees for learning methods.

Pre-Requisites: Mathematical maturity, as obtain, e.g., in a rigorous analysis course. Discrete Math (specifically combinatorics and asymptotic notation) Probability Theory Introduction to Machine Learning Algorithms; Basic Complexity Theory (NP-Hardness) Familiarity with Convex Optimization, Computational Complexity and background in Statistics can be helpful, but is not required.

Specific Topics:

- The Statistical Model (Learning Based on an IID Sample):
- The PAC (Probably Approximately Correct) and Agnostic PAC models.
- Stochastic Optimization
- Cardinality Bounds
- Description Length Bounds
- PAC-Bayes
- Compression Bounds
- The Growth Function and VC Dimension
- VC Subgraph Dimension and Fat Shattering Dimension
- Tight Characterization of Learning in terms of the VC and Fat Shattering Dimensions
- Covering Numbers
- Rademacher Averages, including Local Rademacher Analysis

- Uniform Learning and No-Free Lunch Theorems
- Online Learning, Online Optimization and Online Regret
- The Perceptron Rule and Online Gradient Descent
- Experts and the Winnow Rule
- Bregman Divergence and Online Mirror Descent
- Online to Batch Conversion

- Computational Lower Bounds:
- Computational Hardness of Proper Learning
- Cryptographic Hardness of Learning

- Additional Topics
- Stability Based Analysis
- Boosting: Weak Learning and the Margin Interpretation of Boosting.

Expected outcomes:

- Ability to recognize different learning models and make rigorous statements about learning methods
- Ability to use standard techniques to prove learning guarantees
- Ability to prove lower bounds for learning problems

100 units

Shakhnarovich, Greg

Developing autonomous systems that are able to assist us in everyday’s tasks is one of the grand challenges in modern computer science. While a variety of novel sensors have been developed in the past few years, in this class we will focus on the extraction of this knowledge from visual information alone. One of the most remarkable examples of successful recognition systems is our visual system, which is able to extract high-level information from very noisy and ambiguous data. Unfortunately, despite decades of research efforts, machines are still way below human performance. In this class we will study why this is the case.

The goal of this graduate class is to understand the different visual recognition tasks as well as the techniques employed to solve them. A strong component of the course will be statistical learning as it plays a key role in almost every modern visual recognition system. We will cover all stages of the recognition pipeline: low-level (e.g., features), mid-level (e.g., segmentation) as well as high-level reasoning (e.g., scene understanding).

Knowledge of machine learning and computer vision is not required, but highly recommended. The theoretical aspects of visual recognition will be covered during the lectures. The class will have a strong practical component, as the students will build the different recognition components during the homework sessions. A list of topics includes:

- Classification: features, bag of words (BOW), similarity between images, learning features as well as hashing schemes and retrieval.
- Detection: sliding window approaches, branch and bound, structure prediction, hough voting and NN approaches, hierarchical models.
- Segmentation: classical approaches (e.g., watershading) as well as modern structure pre- diction approaches including message passing and graph cuts for inference, and CRFs and structured-SVMs for learning.
- Modern 3D geometry and 3D scene understanding: stereo, scene layout (e.g., 3D box for indoor scenes, road layout for outdoor scenes).
- Pose estimation: pictorial structures (2D) as well as 3D pose estimation including particle filter-based approaches.

100 units

Urtasun, Raquel

A graphical model is a probabilistic model, where the conditional dependencies between the random variables is specified via a graph. Graphical models provide a flexible framework for modeling large collection of variables with complex interactions, as evidenced by their wide domain of application, including for example machine learning, computer vision, speech and computational biology. This course will provide a comprehensive survey of learning and inference methods in graphical models, including variational methods, primal-dual methods and sampling techniques.

100 units

Tulsiani, Madhur

The course is aimed at first-year graduate students and advanced undergraduates. The goal of the course is to collect and present important mathematical tools used in different areas of computer science. The course will mostly focus on linear algebra and probability.

We intend to cover the following topics and examples:

- Abstract linear algebra: vector spaces, linear transformations, Hilbert spaces, inner product, Gram-Schmidt orthogonalization, Eigenvalues and eigenvectors, SVD, least squares (under/over-constrained)
- Discrete probability: random variables, Markov, Chebyshev and Chernoff bounds.
- Gaussian variables, concentration inequalities, dimension reduction
- Martingales (time permitting)
- Stochastic Processes (time permitting)

Expected outcomes:

- Ability to write correctly typed rigorous proofs.
- Understanding of various notions of linear algebra in the context of abstract vector spaces.
- Ability to understand and analyze stochastic processes. Familiarity with discrete and continuous random variables and various concentration bounds.

Prerequisites: None

100 units

Xu, Jinbo

TTIC 31160 will focus on the application of mathematical models and computer algorithms to studying structure biology, in particular, protein, RNA and DNA molecule structures.

Here is a list of topics that I am going to cover in this course.

- Introduction to molecular structures (1 week)
- Bioinformatics for biological sequence analysis (1 week)
- Algorithms for molecule structure comparison and alignment (1 week)
- Algorithms for protein secondary structure prediction (0.5 week)
- Algorithms for protein tertiary structure prediction (1 week)
- Algorithms for RNA secondary structure prediction (1 week)
- Algorithms for RNA tertiary structure prediction (0.5 week)
- Algorithms for protein docking (1 week)
- Algorithms for protein-protein and protein-RNA interaction prediction (1 week)
- Algorithms for Chromatin structure determination ( 1 week)

There will be both homework and final projects.

Expected outcomes:

- Ability to formulate structure biology problems into a mathematical problem
- Application of advanced optimization algorithms (linear programming, semidefinite programming and graph algorithms) and machine learning models (probabilistic graphical models) to solve important problems in structure bioinformatics
- Mastery of current hot topics in structure bioinformatics
- Ability to conduct semi-independent research in structure bioinformatics

Prerequisites: None

100 units

Walter, Matthew

This course concerned with fundamental techniques in robotics and artificial intelligence (AI), with an emphasis on probabilistic inference, learning, and planning under uncertainty. The course will investigate the theoretical foundations underlying these topics as rigorous mathematical tools that enable solutions to real-world problems drawn broadly from robotics and AI. The course will cover topics that include: Bayesian filtering (Kalman filtering, particle filtering, and dynamic Bayesian networks), simultaneous localization and mapping, planning, Markov decision processes, partially observable Markov decision processes, reinforcement learning, and graphical models.

Expected outcomes:

- Understand the role of and probabilistic techniques for modeling and mitigating uncertainty in dynamic systems
- Demonstrate the ability to derive analytical and nonparametric solutions for recursive Bayesian estimation
- Formulate probabilistic models that represent the problem of robot localization and mapping, and show how these models afford the use of techniques from recursive estimation
- Understand algorithms for planning/search and decision making within deterministic and stochastic domains
- Demonstrate the ability to implement state-of-the-art algorithms for uncertainty mitigation and to apply these techniques to new problems and domains

Prerequisites: Basic familiarity with basic linear algebra; background in probability theory; basic programming experience

100 units

Walter, Matthew

Many problems in machine learning, computer vision, natural language processing, robotics, computational biology, and beyond require modeling complex interactions between large, heterogeneous collections of random variables. Graphical models combine probability theory and graph theory to provide a unifying framework for representing these relationships in a compact, structured form. Probabilistic graphical models decompose multivariate joint distributions into a set of local relationships among small subsets of random variables via a graph. These local interactions result in conditional independencies that afford efficient learning and inference algorithms. Moreover, their modular structure provides an intuitive language for expressing domain-specific knowledge, and facilitates the transfer of modeling advances to new applications.

This graduate-level course will provide a strong foundation for learning and inference with probabilistic graphical models. The course will first introduce the underlying representational power of graphical models, including Bayesian and Markov networks, and dynamic Bayesian networks. Next, the course will investigate contemporary approaches to statistical inference, both exact and approximate. The course will then survey state-of-the-art methods for learning the structure and parameters of graphical models.

Expected outcomes:

- Understand the representation of graphical models, including directed, undirected, and factor graph representations; factorization and Markov properties; and common spatial, temporal, hierarchical, and relational models.
- Develop a solid understanding of exponential families and the related issues of conjugate priors, ML estimation, and parameter estimation in directed and undirected graphical models.
- Demonstrate a familiarity with Gaussian graphical models, including Bayesian networks, Markov random fields, and inference algorithms under these models.
- Understand methods for exact inference, including variable elimination, belief propagation (message passing), and the junction tree algorithm.
- Understand techniques for approximate inference, particularly variational methods that frame inference as an optimization problem.
- Develop a familiarity with Markov Chain Monte Carlo (MCMC) techniques for inference, including Gibbs sampling, Rao-Blackwellization, and Metropolis-Hastings.
- Understand techniques for learning the structure and parameters of different families of graphical models both from observed and latent data.

Prerequisites: TTIC 31020 (or equivalent)

100 units

Gimpel, Kevin

This course will introduce fundamental concepts in natural language processing (NLP). NLP includes a range of research problems that involve computing with natural language. Some are user-facing applications, such as spam classification, question answering, summarization, and machine translation. Others serve supporting roles, such as part-of-speech tagging and syntactic parsing. Solutions draw from statistical machine learning, algorithms, and linguistics. There is particular interest in structured prediction in which the output structure is a sequence, tree, or sentence.

Topics include:

- text classification
- language modeling and smoothing
- words: lexical semantics, distributional representations, clusters, and embeddings
- sequence labeling tasks: part-of-speech tagging and named-entity recognition
- hidden Markov models and chain conditional random fields
- statistical syntactic parsing, including constituency parsing and dependency parsing
- parsing algorithms (CKY)
- unsupervised learning in NLP: syntactic analysis, topic models
- computational compositional semantics
- word alignment and machine translation
- neural network methods in NLP

Assignments include formal exercises as well as practical exercises involving implementing algorithms and using NLP toolkits.

Expected outcomes:

- Understand why computing with natural language is difficult (ambiguity & variability of linguistic expression)
- Understand and apply solutions to standard NLP tasks, such as naive Bayes for text classification and hidden Markov models / conditional random fields for sequence labeling
- Be able to derive dynamic programming algorithms to perform inference in structured output spaces, and to analyze their computational properties
- Understand common types of syntactic and semantic analysis, and how they are used in downstream applications
- Recognize and characterize the errors made by NLP systems

Prerequisites: basic knowledge of probability and calculus; basic programming experience.

100 units

Tulsiani, Madhur

This course is meant to serve as an introduction to some basic concepts in information theory and error-correcting codes, and some of their applications in computer science and statistics. We plan to cover the following topics:

- Introduction to entropy and source coding. Some applications of entropy to counting problems.
- Mutual information and KL-divergence. Method of types and hypothesis testing. I-projections and applications.
- Introduction to error-correcting codes. Unique and list decoding of Reed-Solomon and Reed-Muller codes.
- Applications of information theory to lower bounds in computational complexity and communication complexity.

Expected outcomes:

- Familiarity with concepts such as Entropy, Mutual information and KL-divergence.
- Familiarity with source and channel coding.
- Understanding of the method of types and ability to derive large-deviation bounds using information-theoretic concepts.
- Understanding of the notions of unique and list decoding for various codes.

Prerequisites: Discrete probability. Some knowledge of finite-field algebra is required for the part on error-correcting codes but required basics are reviewed in class.

100 units

Gimpel, Kevin

This course is a follow-up to TTIC 31190. It will go into more depth of the fundamentals of natural language processing (NLP) and cover a broader range of applications. Some of the class meetings will be hands-on, guided laboratory-style meetings; a laptop is strongly recommended for these class meetings, but not strictly required.

Topics include:

- grammatical formalisms (CFG, TSG, TAG, and CCG)
- exact and approximate parsing algorithms (CKY, shift-reduce, k-best parsing, cube pruning, etc.)
- logical semantics and semantic parsing
- semantic formalisms (abstract meaning representation, etc.)
- training and decoding criteria for NLP (e.g., minimum Bayes risk)
- unsupervised learning in NLP (EM for HMMs and PCFGs, topic models, Bayesian nonparametrics)
- advanced neural network methods in NLP, including recurrent, recursive, and convolutional networks, encoder-decoder architectures, and attention-based models
- the application of these techniques to important NLP applications, including: textual entailment, dialogue systems, machine translation, question answering, automatic summarization, and coreference resolution

Assignments include formal exercises as well as practical exercises involving implementing algorithms and using NLP and deep learning toolkits.

Expected outcomes:

- Be able to derive dynamic programming algorithms for inference with grammatical formalisms and other structured output spaces, and to analyze their computational properties
- Understand trade-offs of approximate inference algorithms used in NLP and be able to choose algorithms appropriately
- Be able to design generative models for textual data and derive statistical inference algorithms for quantities of interest
- Understand state-of-the-art solutions to key NLP applications, including approaches based on deep learning

Prerequisites: TTIC 31190 or permission of the instructor.

100 units

Livescu, Karen

This course will introduce concepts and techniques for analyzing and learning from unlabeled data. Unsupervised methods are used, for example, for visualization, data generation, and representation learning. The course will motivate settings in which unsupervised methods are needed or useful, and discuss relationships between unsupervised and supervised methods. Topics will include fixed feature representations such as Fourier methods and count-based features; linear and nonlinear dimensionality reduction; clustering; density estimation, mixture models, and expectation maximization; and semi-supervised/ distant-supervision settings. Linear, kernel, and deep neural network-based methods will be included. Assignments will include theoretical and practical exercises.

Prerequisites: a good background in basic probability, linear algebra, TTIC 31020, and familiarity and comfort with programming; or permission of the instructor.

Expected outcomes:

- Understand typical settings where unsupervised methods are used, including visualization, representation, analysis, and generation, and be able to choose relevant methods for a given situation
- Understand how supervised and unsupervised data and methods can be combined.
- Be able to analyze and visualize data using relevant fixed feature representations.
- Understand the motivation and application of dimensionality reduction techniques.
- Understand and be able to apply clustering and density estimation techniques.
- Understand the current state of the art and research landscape in selected areas.
- Develop proficiency in applying relevant techniques to real data in practical settings.

100 units

McAllester, David

Introduction to fundamental principles of deep learning. Although deep learning systems are evolving rapidly, this course attempts to teach material that will remain relevant and useful as the field changes. The course will emphasize theoretical and intuitive understanding to the extent possible.

Topics:

- Introduction to multi-layer perceptrons and backpropagation.
- Convolutional neural networks (CNNs) and Recurrent neural networks (RNNs).
- General frameworks and mathematical notations for expressing neural networks.
- Implementing a framework — CUDA.
- The theory and practice of stochastic gradient descent.
- Basic theoretical guarantees for non-convex SGD.
- Learning rates and momentum.
- First order vs. second order methods, Mirror Descent
- Adagrad, RMSprop, ADAM
- Vector-Hessian products

- Automated Differentiation.
- Symbolic differentiating arbitrary computer code.
- Checkpointing algorithms

- Batch Normalization
- Vanishing Gradients and exploding gradients and methods to counter these problems.
- LSTMs, GRUs, highway networks, and Resnet.

- Regularization
- L2 regularization
- Dropouts
- Early Stopping
- Generalization Bounds

- Sparsification
- Transfer Learning
- Attention and Memory
- Stack Architectures
- Generative Adversarial Networks
- Autoencoding

Expected outcomes:

- Ability to design and train novel deep learning architectures.
- An understanding of the general issues and phenomenon sufficient to guide architecture design and training.

Prerequisites: Introduction to machine learning.

100 units

Original academic research conducted under guidance of an instructor (normally student’s PhD advisor), directed at making progress towards advancing the state of the art in the chosen area.

Expected outcomes:

- Familiarity with peer-reviewed literature on the chosen topic of focus representing both current state of the art and historical developments.
- Ability to develop a short-, medium- and long-term research plan.
- Improved technical skills relevant to the research topic (examples include: using relevant mathematical tools, developing, documenting and using advanced software, designing and executing user studies, designing and conducting experiments).
- Ability to communicate research progress in regular meetings with advisor and colleagues.
- Ability to describe research results in a technical document at a level and in format appropriate for a submission to a peer-reviewed venue.

100 units

Original academic research conducted under guidance of an instructor (normally student’s PhD advisor), directed at making progress towards advancing the state of the art in the chosen area.

Expected outcomes:

- Familiarity with peer-reviewed literature on the chosen topic of focus representing both current state of the art and historical developments.
- Ability to develop a short-, medium- and long-term research plan.
- Improved technical skills relevant to the research topic (examples include: using relevant mathematical tools, developing, documenting and using advanced software, designing and executing user studies, designing and conducting experiments).
- Ability to communicate research progress in regular meetings with advisor and colleagues.
- Ability to describe research results in a technical document at a level and in format appropriate for a submission to a peer-reviewed venue.

100 units

Advisor

In-depth involvement in areas of computer science in a research lab, University or business setting. Internship activities and objectives must be related to the student’s program objectives. Required enrollment for F-1 CPT internship. Advisor’s Consent Required.