Held in cooperation with the University of Chicago Department of Computer Science

TTIC 31020 - Introduction to Machine Learning (100 units)

Instructor: Greg Shakhnarovich

Location: **TTIC Room 530**

Time: **Tuesday and Thursday - 11:00am-12:20pm | Tutorial: Friday - 10:00am-11:00pm**

TTIC 31230 - Fundamentals of Deep Learning (CMSC 31230) (100 units)

Instructor: David McAllester

Location: **TTIC Room 530**

Time: **Monday and Wednesday - 1:30-2:50pm**

TTIC 31200 - Information and Coding Theory (CMSC 37220) (100 units)

Instructor: Madhur Tulsiani

Location: **TTIC Room 530**

Time: **Tuesday and Thursday - 9:30-10:50am**

TTIC 31010 Algorithms (CMSC 37000) + Tutorial (100 units)

Instructor: Julia Chuzhoy

Location: **TTIC Room 530**

Time: **Tuesday and Thursday - 9:30-10:50am | Tutorial: Wednesday - 4:30-5:20pm (TTIC Room 529)**

TTIC 31120 - Statistical and Computational Learning Theory (100 units)

Instructor: Nati Srebro

Location: **TTIC Room 530**

Time: **Tuesday and Thursday - 11:00am-12:20pm**

TTIC 31100 Computational and Metric Geometry (CMSC 39010) (100 units)

Instructor: Yury Makarychev

Location: **TTIC Room 530**

Time: **Tuesday and Thursday - 11:00am-12:20pm**

TTIC 31150 - Mathematical Toolkit (CMSC 31150) (100 units)

Instructor: Avrim Blum

Location: **TTIC Room 530**

Time: **Monday and Wednesday - 1:30-2:50pm**

TTIC 31220 - Unsupervised Learning and Data Analysis (100 units)

Instructor: Karen Livescu

Location: **TTIC Room 530**

Time: **Monday and Wednesday - 3:00-4:20pm**

TTIC 31170: Planning, Learning, and Estimation for Robotics and Artificial Intelligence

Instructor: Matt Walter

Location: **TTIC Room 530**

Time: **Tuesday and Thursday - 9:30-10:50am**

Weekly lectures and discussions by TTIC researchers introducing their research and research problems. Provides a broad view of research carried out at TTIC. Course is pass/fail credit. Satisfies one quarter of credit (of the three required) to fulfill the Research at TTIC Series Requirement. (See Academic Program Guide for details)

100 units

This is a graduate level course on algorithms with the emphasis on central combinatorial optimization problems and methods for algorithm design and analysis. Topics covered include greedy algorithms, dynamic programming, algorithms for maximum flow and minimum cut, applications of linear programming, randomized algorithms, combinatorial optimization, and approximation algorithms. Time permitting, additional topics, such as online algorithms and probabilistic method will be covered.

The course textbook is “Algorithm Design” by Kleinberg and Tardos

Specific topics covered:

- Greedy algorithms
- Dynamic programming
- Max flow, min cut, bipartite matching, and their applications
- Linear programming, LP-duality
- NP-hardness
- Approximation algorithms
- Randomized algorithms
- (optional): online algorithms; probabilistic method

Expected outcomes:

- Ability to design and rigorously analyze algorithms using paradigms such as greedy or dynamic programming.
- Understand the use of linear programming in optimization. Be able to formulate problems as linear programs.
- Knowledge of maximum flow / minimum cut problems and algorithms for solving them, and ability to apply this knowledge to suitable graph optimization problems.
- Understand the notion of NP-hardness and ability to prove NP-hardness of problems via reductions.
- Understand linear programming duality and applications to problems such as max- flow/min-cut. Be able to write duals for linear programs.

Prerequisites: Assumes familiarity with proofs and formal reasoning, knowledge of basic graph notions (such as graphs, trees, paths etc) and algorithms (such as BFS, DFS, Minimum Spanning Tree etc). Also assumes familiarity with asymptotic notation and running time analysis of algorithms, as well as basic knowledge of the notion of NP-hardness.

100 units

A systematic introduction to machine learning, covering theoretical as well as practical aspects of the use of statistical methods. Topics include linear models for classification and regression, support vector machines, regularization and model selection, and introduction to structured prediction and deep learning. Application examples are taken from areas like information retrieval, natural language processing, computer vision and others.

Prerequisites: Probability, Linear Algebra, Undergraduate Algorithms.

I will focus on supervised learning and only talk about unsupervised settings when necessary (e.g., mixture models and density estimation for generative methods for classification). So, no clustering. There will be twenty 1.5 hour lectures; numbers in parentheses are estimated # of lectures per topic.

- intro to ML, motivation etc. (1)
- refresher on probability and algebra (1)
- statistical framework for learning; loss/risk; least squares regression (1)
- noise models; error decomposition; bias/variance and overfitting (1)
- estimation theory; ML/MAP, overfitting, bias/variance (1)
- model complexity, sparsity (L1/L2) in regression; stepwise methods for L0 sparsity (1)
- classification; Fisher’s LDA, logistic regression and softmax (1)
- ensemble methods, boosting (1)
- generative models, Naive Bayes, multivariate Gaussians (1)
- mixture models; EM (2)
- SVM and kernels (2)
- nonparametric methods; nearest neighbors, density esimation (1)
- multilayer neural networks and deep learning (1)
- information theory and learning; information criteria, MDL and their connections to regularization (1)
- experiment design and evaluation in ML (1)
- advanced topics TBD (1)
- wrap-up and review (1)

Prerequisites: knowledge of basic linear algebra, probability and calculus.

Expected outcomes:

- Understand the notion of fitting a model to data and concepts such as model complexity, overfitting and generalization, and bias-variance tradeoff in estimation.
- Learn and be able to apply some of the fundamental learning methods, such as logistic regression, support vector machines, boosting, decision trees, neural networks.
- Learn the basics of optimization techniques such as gradient descent and the general EM algorithm.
- Familiarity with multivariate Gaussians and mixtures of Gaussians.
- Understand fundamental concepts in information theory (entropy, KL-divergence) and their relationship to machine learning.

Course Webpage: https://canvas.uchicago.edu/courses/29361/assignments/syllabus

100 units

Introduction to the principles and practice of computer vision. This course will provide in-depth survey of topics involved in major tasks in moden vision, and offer hands-on experience in implementing some of them.

Prerequisites: a good background in linear algebra and probability (undergrad level) and background in machine learning (TTIC 31020 or equivalent) required. CMSC 25040 is recommended.

Topics:

- Image formation, representation, and compression
- Representation and perception of color
- Filtering and edge detection
- Image features, detectors, and interest point operators
- Model fitting, RANSAC and Hough transform
- Stereo and multi-view geometry
- Camera calibration
- Representation and perception of motion
- Representation and modeling of edges and regions
- Semantic vision: recognition, detection, and related tasks

Expected outcomes:

- Familiarity with models for of image formation and image analysis.
- Familiarity with major methods for image processing, alignment, and matching.
- Knowledge of principles and methods of 3D reconstruction from images and videos, and ability to build and diagnose implementations of these methods.
- Grasp of modern methods for object and scene categorization from images, and ability to build and diagnose implementations of such methods.

100 units

This course will focus on the application of mathematical models and computer algorithms to studying molecular biology. In particular, this course will cover the following topics.

- homology search ( 1 week)
- sequence alignment and motif discovery ( 1 week)
- next-gen sequencing and genome assembly ( 1 week)
- protein sequence/structure analysis including alignment, classification, structure and function prediction ( 2 weeks).
- RNA sequence/structure analysis including alignment, classification and prediction ( 1 week)
- gene expression analysis (1 week)
- biological network analysis ( 1 week)
- phylogeny ( 1 week)

Expected outcomes:

- Ability to use popular bioinformatics tools to generate biologically meaningful results
- Ability to interpret biological results generated by a bioinformatics tool
- Ability to implement basic machine learning and optimization algorithms for important biology problems
- Application of some basic models and algorithms such as Support Vector Machines, Hidden Markov Model, Dynamic Programming to solve biology problems

Prerequisites: None

100 units

Part one consists of models for defining computable functions: primitive recursive functions, (general) recursive functions, and Turing machines; the Church-Turing Thesis; unsolvable problems; diagonalization; and properties of computably enumerable sets. Part two deals with Kolmogorov (resource bounded) complexity: the quantity of information in individual objects. Part three covers functions computable with time and space bounds of the Turing machine: polynomial time computability, the classes P and NP, NP- complete problems, polynomial time hierarchy, and P-space complete problems.

The tentative plan for this course is to do the following three parts.

Part I. Models for defining computable functions: primitive recursive functions, (general) recursive functions, and Turing machines; the Church-Turing Thesis; unsolvable problems; diagonalization; and properties of computably enumerable sets.

Part II. Kolmogorov (resource bounded) complexity: the quantity of information in individual objects.

Part III. Functions computable with time and space bounds of the Turing machine: polynomial time computability, the classes P and NP, NP-complete problems, polynomial time hierarchy, and P-space complete problems.

Here is the Web page of the previous version of this course: http://people.cs.uchicago.edu/~razborov/teaching/spring14.html. But I will be very willing to adjust the speed, style and content depending on the background of the students enrolled in the course.

Expected outcomes:

- Ability to identify algorithmic problems arising in various contexts.
- Application of classical recursion theory, Kolmogorov complexity and basic computational complexity to the analysis of problems arising in mathematics and beyond.
- Recognize and identify solvable and unsolvable problems, problems with high/low Kolmogorov complexity, as well as problems complete for basic complexity classes.

100 units

The course will cover techniques in unconstrained and constrained convex optimization and a practical introduction to convex duality. The course will focus on (1) formulating and understanding convex optimization problems and studying their properties; (2) understanding and using the dual; and (3) presenting and understanding optimization approaches, including interior point methods and first order methods for non-smooth problems. Examples will be mostly from data fitting, statistics and machine learning.

Prerequisites: Linear Algebra, Multidimensional Calculus, Undergraduate Algorithms

Specific Topics:

- Formalization of optimization problems
- Smooth unconstrained optimization: Gradient Descent, Conjugate Gradient Descent, Newton and Quasi-Newton methods; Line Search methods
- Standard formulations of constrained optimization: Linear, Quadratic, Conic and Semi-Definite Programming
- KKT optimality conditions
- Lagrangian duality, constraint qualification, weak and strong duality
- Fenchel conjugacy and its relationship to Lagrangian duality
- Multi-objective optimization
- Equality constrained Newton method
- Log Barrier (Central Path) methods, and Primal-Dual optimization methods
- Overview of the simplex method as an example of an active set method.
- Overview of the first order oracle model, subgradient and separation oracles, and the Ellipsoid algorithm.
- Large-scale subgradient methods: subgradient descent, mirror descent, stochastic subgradient descent.

Expected outcomes:

- Ability to discuss and understand optimization problems in terms of required information/access, assumptions, iteration complexity and runtime
- Ability to identify convex and non-convex optimization problems
- Ability to make informed choices about the choice of optimization algorithm
- Ability to derive the dual problem, and use the dual and KKT conditions to reason about optimal solutions
- Familiarity with unconstrained optimization methods, including Gradient Descent, Conjugate Gradient Descent, Newton’s Method and Quasi-Newton Methods
- Familiarity with Interior Point methods for constrained optimization
- Familiarity with standard formulations including Linear Programming, Quadratic Programming and Semidefinite Programming
- Ability to cast abstract problems as constrained optimization problems, and in terms of standard formulations

100 units

This course studies approximation algorithms - algorithms that solve a given combinatorial optimization problem approximately. One major reason to study approximation algorithms is that many central combinatorial optimization problems are NP-hard, and so they are unlikely to have efficient algorithms that solve them exactly. Approximation algorithms provide a way to get around this intractability and to still efficiently obtain solutions that, while not being optimal, are close to optimal. There are however many other settings in which approximation algorithms may be desirable. For example, the running time of the algorithm or its simplicity may be more important in some settings than obtaining the best possible solution. Some other setting, such as streaming algorithms and sublinear algorithms also naturally lead to approximation.

The main focus of this course is on the design of approximation algorithms for combinatorial optimization problems. On the one hand, we will cover central tools and methods that are used in the design of approximation algorithms (and beyond). On the other hand, we will explore central combinatorial optimization problems that are used as building blocks in many different algorithms, in a variety of settings.

Much of this course is dedicated to approximation algorithms for NP-hard problems. Time permitting, we will also explore some additional settings where approximation takes place (such as very fast algorithms, streaming and/or sublinear algorithms). We will also explore the question of why some problems have good approximation algorithms while other do not, via hardness of approximation, or inapproximability, proofs.

Assumes the knowledge of material covered in the Algorithms course.

Expected outcomes:

- Understand concepts such as approximation factor, polynomial time approximation schemes and hardness of approximation.
- Familiarity with basic combinatorial optimization problems, such as Vertex Cover, Set Cover, Traveling Salesman, Steiner Tree, Sparsest Cut, and so on, and algorithmic techniques for solving them approximately.
- Understand applications of linear programs (LPs) to the design of approximation algorithms. Learn to analyze rounding algorithms for LPs and understand integrality gaps. Be able to apply LP duality.
- Understand semi-definite programming and its applications to approximation.

Prerequisites: Algorithms (TTIC31010 or CMSC 37000)

Course Webpage: https://canvas.uchicago.edu/courses/37476

100 units

Introduction to analysis of signals and linear time-invariant systems at a graduate level. Topics include: Continuous and discrete-time transforms (Fourier and others); linear filtering; sampling and aliasing; random processes and their interaction with linear systems; applications in areas such as speech and image processing and robotics.

Prerequisites: familiarity with basic linear algebra, notions of eigenvalues and eigenvectors, and (multivariate) Gaussian distributions.

Expected outcomes:

- Understand the properties and eigenfunctions of linear time-invariant (LTI) systems.
- Understand Fourier Series (discrete- and continuous-time), Fourier transform, and convolutions.
- Be able to analyze random processes, understand their stationarity properties and their interaction with LTI systems.
- Learn about sampling and aliasing, signal estimation minimizing mean square error and parameter estimation for random processes. Understand the Karhunen-Loeve transform.

100 units

The course covers fundamental concepts, algorithms and techniques in computational and metric geometry. Topics covered include: convexity and convex hulls, range searching, segment intersection, Voronoi diagrams, Delaunay triangulations, metric and normed spaces, low–distortion metric embeddings and their applications in approximation algorithms, stochastic decompositions of metric spaces, dimensionality reduction, approximate nearest neighbor search and locality–sensitive hashing.

The course textbook is “Computational Geometry” by M. de Berg, O. Cheong, M. van Kreveld, M. Overmars.

Specific topics covered:

- Convexity: convex sets, convex hulls, extreme points, different definitions and basic properties, Caratheodory’s theorem
- Convex Hulls and Line Segment Intersections: Jarvis March, Andrew’s algorithm, sweep line algorithms, line segment intersection, Bentley—Ottmann algorithm
- Orthogonal Range Searching: binary search, kd-trees, range trees
- Voronoi diagrams, Fortune’s algorithm
- Delaunay Triangulations: triangulations, Delaunay and locally Delaunay triangulations: definitions, existence and equivalence, duality between Delaunay triangulations and Voronoi diagrams, angle optimality
- Metric Spaces, Normed Spaces, Low Distortion Metric Embeddings: metric and normed spaces, Lipschitz maps, distortion, embeddings into Lp and lp
- Bourgain’s Theorem
- LP-based approximation algorithm for Sparsest Cut
- Minimum Balanced Cut, Minimum Linear Arrangement, Sparsest Cut with Non-Uniform Demands, Expanders: polylog approximation algorithms for Balanced Cut and Minimum Linear Arrangement, expander graphs, integrality gap for Sparsest Cut, Sparsest Cut with non-uniform demands
- Minimum Multiway Cut, Minimum Multicut: approximation algorithms for Minimum Multiway Cut and Minimum Multicut
- Stochastic decompositions of metric spaces, Hierarchically Separated Trees, Applications (2 lectures)
- Dimension Reduction, Nearest Neighbor Search: dimension reduction, approximate nearest neighbor search, locality sensitive hashing
- Locality Sensitive Hashing, p–Stable Random Variables: locality sensitive hashing, p–stable random variables

Expected outcomes:

- Know standard algorithms and data structures for solving geometric problems
- Be able to design efficient algorithms and data structures for solving geometric problems
- Understand basic concepts of metric geometry such as metric and normed space, low distortion embedding, dimension reduction, nearest neighbor search.
- Understand applications of metric geometry to the field of approximation algorithms and other areas of computer science.

Prerequisites: undergraduate-level algorithms, linear algebra, and probability classes; a solid background in mathematical analysis/calculus

100 units

This course will introduce techniques used in speech technologies, mainly focusing on speech recognition and related tasks. Speech recognition is one of the oldest and most complex structured sequence prediction tasks receiving significant research and commercial attention, and therefore provides a good case study for many of the techniques that are used in other areas of artificial intelligence involving sequence modeling. The course will cover core techniques in detail, including hidden Markov models, recurrent neural networks, and conditional random fields. The course will include practical homework exercises where we will build and experiment with speech processing models. Finally, it will include a sampling of the connections between speech and other modalities like images and text.

Prerequisites: a good background in basic probability, preferably also introductory machine learning.

Expected outcomes:

- Understand and apply tools for analyzing speech time series such as Fourier analysis and dynamic time warping.
- Understand and apply hidden Markov models, Gaussian mixtures, and the EM algorithm for speech problems.
- Understand and apply language models, smoothing techniques, and their application to speech recognition.
- Understand generative and discriminative structured prediction approaches for speech problems.
- Understand and apply modern deep learning tools for speech processing tasks.

100 units

We will discuss classic results and recent advances in statistical learning theory (mostly under the agnostic PAC model), touch on computational learning theory, and also explore the relationship with stochastic optimization and online regret analysis. Our emphasis will be on concept development and on obtaining a rigorous quantitative understanding of machine learning. We will also study techniques for analyzing and proving performance guarantees for learning methods.

Pre-Requisites: Mathematical maturity, as obtain, e.g., in a rigorous analysis course. Discrete Math (specifically combinatorics and asymptotic notation) Probability Theory Introduction to Machine Learning Algorithms; Basic Complexity Theory (NP-Hardness) Familiarity with Convex Optimization, Computational Complexity and background in Statistics can be helpful, but is not required.

Specific Topics:

- The Statistical Model (Learning Based on an IID Sample):
- The PAC (Probably Approximately Correct) and Agnostic PAC models.
- Stochastic Optimization
- Cardinality Bounds
- Description Length Bounds
- PAC-Bayes
- Compression Bounds
- The Growth Function and VC Dimension
- VC Subgraph Dimension and Fat Shattering Dimension
- Tight Characterization of Learning in terms of the VC and Fat Shattering Dimensions
- Covering Numbers
- Rademacher Averages, including Local Rademacher Analysis

- Uniform Learning and No-Free Lunch Theorems
- Online Learning, Online Optimization and Online Regret
- The Perceptron Rule and Online Gradient Descent
- Experts and the Winnow Rule
- Bregman Divergence and Online Mirror Descent
- Online to Batch Conversion

- Computational Lower Bounds:
- Computational Hardness of Proper Learning
- Cryptographic Hardness of Learning

- Additional Topics
- Stability Based Analysis
- Boosting: Weak Learning and the Margin Interpretation of Boosting.

Expected outcomes:

- Ability to recognize different learning models and make rigorous statements about learning methods
- Ability to use standard techniques to prove learning guarantees
- Ability to prove lower bounds for learning problems

Course Webpage: https://canvas.uchicago.edu/courses/29926/assignments/syllabus

100 units

The course is aimed at first-year graduate students and advanced undergraduates. The goal of the course is to collect and present important mathematical tools used in different areas of computer science. The course will mostly focus on linear algebra and probability.

We intend to cover the following topics and examples:

- Abstract linear algebra: vector spaces, linear transformations, Hilbert spaces, inner product, Gram-Schmidt orthogonalization, Eigenvalues and eigenvectors, SVD, least squares (under/over-constrained)
- Discrete probability: random variables, Markov, Chebyshev and Chernoff bounds.
- Gaussian variables, concentration inequalities, dimension reduction
- Martingales (time permitting)
- Stochastic Processes (time permitting)

Expected outcomes:

- Ability to write correctly typed rigorous proofs.
- Understanding of various notions of linear algebra in the context of abstract vector spaces.
- Ability to understand and analyze stochastic processes. Familiarity with discrete and continuous random variables and various concentration bounds.

Prerequisites: None

100 units

TTIC 31160 will focus on the application of mathematical models and computer algorithms to studying structure biology, in particular, protein, RNA and DNA molecule structures.

Here is a list of topics that I am going to cover in this course.

- Introduction to molecular structures (1 week)
- Bioinformatics for biological sequence analysis (1 week)
- Algorithms for molecule structure comparison and alignment (1 week)
- Algorithms for protein secondary structure prediction (0.5 week)
- Algorithms for protein tertiary structure prediction (1 week)
- Algorithms for RNA secondary structure prediction (1 week)
- Algorithms for RNA tertiary structure prediction (0.5 week)
- Algorithms for protein docking (1 week)
- Algorithms for protein-protein and protein-RNA interaction prediction (1 week)
- Algorithms for Chromatin structure determination ( 1 week)

There will be both homework and final projects.

Expected outcomes:

- Ability to formulate structure biology problems into a mathematical problem
- Application of advanced optimization algorithms (linear programming, semidefinite programming and graph algorithms) and machine learning models (probabilistic graphical models) to solve important problems in structure bioinformatics
- Mastery of current hot topics in structure bioinformatics
- Ability to conduct semi-independent research in structure bioinformatics

Prerequisites: None

100 units

This course concerned with fundamental techniques in robotics and artificial intelligence (AI), with an emphasis on probabilistic inference, learning, and planning under uncertainty. The course will investigate the theoretical foundations underlying these topics as rigorous mathematical tools that enable solutions to real-world problems drawn broadly from robotics and AI. The course will cover topics that include: Bayesian filtering (Kalman filtering, particle filtering, and dynamic Bayesian networks), simultaneous localization and mapping, planning, Markov decision processes, partially observable Markov decision processes, reinforcement learning, and graphical models.

Expected outcomes:

- Understand the role of and probabilistic techniques for modeling and mitigating uncertainty in dynamic systems
- Demonstrate the ability to derive analytical and nonparametric solutions for recursive Bayesian estimation
- Formulate probabilistic models that represent the problem of robot localization and mapping, and show how these models afford the use of techniques from recursive estimation
- Understand algorithms for planning/search and decision making within deterministic and stochastic domains
- Demonstrate the ability to implement state-of-the-art algorithms for uncertainty mitigation and to apply these techniques to new problems and domains

Prerequisites: Basic familiarity with basic linear algebra; background in probability theory; basic programming experience

100 units

Many problems in machine learning, computer vision, natural language processing, robotics, computational biology, and beyond require modeling complex interactions between large, heterogeneous collections of random variables. Graphical models combine probability theory and graph theory to provide a unifying framework for representing these relationships in a compact, structured form. Probabilistic graphical models decompose multivariate joint distributions into a set of local relationships among small subsets of random variables via a graph. These local interactions result in conditional independencies that afford efficient learning and inference algorithms. Moreover, their modular structure provides an intuitive language for expressing domain-specific knowledge, and facilitates the transfer of modeling advances to new applications.

This graduate-level course will provide a strong foundation for learning and inference with probabilistic graphical models. The course will first introduce the underlying representational power of graphical models, including Bayesian and Markov networks, and dynamic Bayesian networks. Next, the course will investigate contemporary approaches to statistical inference, both exact and approximate. The course will then survey state-of-the-art methods for learning the structure and parameters of graphical models.

- Review of probability theory
- Directed graphical models
- Undirected graphical models
- Conditional random fields
- Temporal models (hidden Markov models)
- Exponential families
- Variable elimination, sum-product
- Belief propagation
- MAP inference
- Variational inference
- Sampling-based inference
- Dual decomposition
- Multivariate Gaussians: Kalman filters and smoothing
- Learning for directed graphical models
- Learning for undirected graphical models
- Learning given unobserved data
- Learning for Gaussian models

Expected outcomes:

- Understand the representation of graphical models, including directed, undirected, and factor graph representations; factorization and Markov properties; and common spatial, temporal, hierarchical, and relational models.
- Develop a solid understanding of exponential families and the related issues of conjugate priors, ML estimation, and parameter estimation in directed and undirected graphical models.
- Demonstrate a familiarity with Gaussian graphical models, including Bayesian networks, Markov random fields, and inference algorithms under these models.
- Understand methods for exact inference, including variable elimination, belief propagation (message passing), and the junction tree algorithm.
- Understand techniques for approximate inference, including variational and Monte Carlo methods (e.g., Gibbs sampling, Rao-Blackwellization, and Metropolis-Hastings).
- Understand techniques for learning the structure and parameters of different families of graphical models both from observed and latent data.

Prerequisites: TTIC 31020 (or equivalent)

100 units

This course will introduce fundamental concepts in natural language processing (NLP). NLP includes a range of research problems that involve computing with natural language. Some are user-facing applications, such as spam classification, question answering, summarization, and machine translation. Others serve supporting roles, such as part-of-speech tagging and syntactic parsing. Solutions draw from machine learning (especially deep learning), algorithms, and linguistics. There is particular interest in structured prediction in which the output structure is a sequence, tree, or sentence.

Topics include:

- words: lexical semantics, distributional representations, clusters, and word embeddings
- sequences: language modeling and smoothing, tasks such as part-of-speech tagging and named-entity recognition, model families like hidden Markov models and conditional random fields, neural sequence encoders such as recurrent and convolutional neural networks
- trees: syntactic parsing, including constituency parsing and dependency parsing, context-free grammars, parsing algorithms
- computational semantics, including compositionality, reference, and shallow semantic parsing
- word alignment and machine translation

Assignments include formal exercises as well as practical exercises involving implementing algorithms and using NLP toolkits.

Expected outcomes:

- Understand key challenges of computing with natural language
- Understand and apply solutions to standard NLP tasks, such as hidden Markov models, conditional random fields, and bidirectional LSTM taggers for sequence labeling
- Be able to implement basic neural network architectures for core NLP tasks using deep learning toolkits
- Be able to derive dynamic programming algorithms to perform inference in structured output spaces, and to analyze their computational properties
- Understand common types of syntactic and semantic analysis, and how they are used in downstream applications
- Recognize and characterize the errors made by NLP systems

Prerequisites: basic knowledge of calculus, linear algebra, and probability; basic programming experience; a machine learning course is recommended but not required.

Course Webpage: https://home.ttic.edu/~kgimpel/teaching/31190-a20/index.html

100 units

This course is meant to serve as an introduction to some basic concepts in information theory and error-correcting codes, and some of their applications in computer science and statistics. We plan to cover the following topics:

- Introduction to entropy and source coding. Some applications of entropy to counting problems.
- Mutual information and KL-divergence. Method of types and hypothesis testing. I-projections and applications.
- Introduction to error-correcting codes. Unique and list decoding of Reed-Solomon and Reed-Muller codes.
- Applications of information theory to lower bounds in computational complexity and communication complexity.

Expected outcomes:

- Familiarity with concepts such as Entropy, Mutual information and KL-divergence.
- Familiarity with source and channel coding.
- Understanding of the method of types and ability to derive large-deviation bounds using information-theoretic concepts.
- Understanding of the notions of unique and list decoding for various codes.

Prerequisites: Discrete probability. Some knowledge of finite-field algebra is required for the part on error-correcting codes but required basics are reviewed in class.

100 units

This course will cover some of the basic theory underlying machine learning and the process of generalizing from data. We will talk about both the PAC model for batch learning (learning over one set of data with the intent of producing a predictor that performs well on new data) and models for learning from feedback over time (online learning). We will discuss important fundamental concepts including overfitting, uniform convergence, formal notions of Occam’s razor, VC-dimension, and regularization, as well as several classic algorithms including the Perceptron algorithm, SVMs, algorithms for combining expert advice, and boosting. We will also discuss limited-feedback (bandit) algorithms, reinforcement learning, connections between learning and game theory, and formal guarantees on privacy. This will be a proof-oriented course: our focus will be on proving performance guarantees for algorithms that aim to generalize from data as well as understanding what kinds of performance guarantees we can hope to prove.

Pre-Requisites:

The main pre-requisite is comfort with a proof-oriented course, having taken some algorithms class, and comfort with basic probabilistic modeling and reasoning. For example, 1000 programs are submitted to a stock-market prediction challenge, and we find that one of those programs has correctly predicted whether the market will go up or down the next week for 10 weeks in a row; should we feel confident that the program is a good one? Comfort with thinking about points and vectors in high-dimensional space is a plus.

Specific Topics Include:

- The PAC (Probably Approximately Correct) batch learning model
- Overfitting and generalization
- Cardinality and description-length bounds: Occam’s razor
- Regularization
- The online mistake bound model
- The Perceptron algorithm
- Hinge-loss and inseparable data
- Kernel functions, SVMs
- Online to batch conversion
- VC-dimension, the growth function, and Sauer’s lemma
- Uniform convergence proofs using ghost samples
- Boosting
- Learning and Game Theory
- Bandit algorithms
- Reinforcement learning
- Differential Privacy
- Additional possible topics: Semi-Supervised Learning, Rademacher Bounds, Limitations on Learning

Expected outcomes:

- Ability to recognize different learning models and make rigorous statements about learning methods
- Ability to use standard techniques to prove learning guarantees
- Ability to think critically about new learning paradigms

100 units

This course is a follow-up to TTIC 31190. It will go into more depth of the fundamentals of natural language processing (NLP) and cover a broader range of applications. Some of the class meetings will be hands-on, guided laboratory-style meetings; a laptop is strongly recommended for these class meetings, but not strictly required.

Topics include:

- grammatical formalisms (CFG, TSG, TAG, and CCG)
- exact and approximate parsing algorithms (CKY, shift-reduce, k-best parsing, cube pruning, etc.)
- logical semantics and semantic parsing
- semantic formalisms (abstract meaning representation, etc.)
- training and decoding criteria for NLP (e.g., minimum Bayes risk)
- unsupervised learning in NLP (EM for HMMs and PCFGs, topic models, Bayesian nonparametrics)
- advanced neural network methods in NLP, including recurrent, recursive, and convolutional networks, encoder-decoder architectures, and attention-based models
- the application of these techniques to important NLP applications, including: textual entailment, dialogue systems, machine translation, question answering, automatic summarization, and coreference resolution

Assignments include formal exercises as well as practical exercises involving implementing algorithms and using NLP and deep learning toolkits.

Expected outcomes:

- Be able to derive dynamic programming algorithms for inference with grammatical formalisms and other structured output spaces, and to analyze their computational properties
- Understand trade-offs of approximate inference algorithms used in NLP and be able to choose algorithms appropriately
- Be able to design generative models for textual data and derive statistical inference algorithms for quantities of interest
- Understand state-of-the-art solutions to key NLP applications, including approaches based on deep learning

Prerequisites: TTIC 31190 or permission of the instructor.

100 units

This course will introduce concepts and techniques for analyzing and learning from unlabeled data. Unsupervised methods are used, for example, for visualization, data generation, and representation learning. The course will motivate settings in which unsupervised methods are needed or useful, and discuss relationships between unsupervised and supervised methods. Topics will include fixed feature representations such as Fourier methods and count-based features; linear and nonlinear dimensionality reduction; clustering; density estimation, mixture models, and expectation maximization; and semi-supervised/ distant-supervision settings. Linear, kernel, and deep neural network-based methods will be included. Assignments will include theoretical and practical exercises.

Prerequisites: a good background in basic probability, linear algebra, TTIC 31020, and familiarity and comfort with programming; or permission of the instructor.

Expected outcomes:

- Understand typical settings where unsupervised methods are used, including visualization, representation, analysis, and generation, and be able to choose relevant methods for a given situation
- Understand how supervised and unsupervised data and methods can be combined.
- Be able to analyze and visualize data using relevant fixed feature representations.
- Understand the motivation and application of dimensionality reduction techniques.
- Understand and be able to apply clustering and density estimation techniques.
- Understand the current state of the art and research landscape in selected areas.
- Develop proficiency in applying relevant techniques to real data in practical settings.

100 units

Introduction to fundamental principles of deep learning. Deep learning systems are evolving rapidly and this course presents up to date material at a conceptual level. The course emphasizes theoretical and intuitive understanding rather than particular programming formalisms.

Topics:

- Information theory as an organizing principle for machine learning and deep learning in particular.
- Deep learning frameworks. The “educational framework” (EDF) witten in directly in NumPy.
- Deep networks for computer vision: Convolutional neural networks (CNNs) and Resnet and the general principles behind them.
- Deep networks for language processing: Recurrent neural networks (RNNs), the Transformer, their applications and the general principles behind them.
- The theory and practice of stochastic gradient descent.
- Regularization and Generalization.
- Generative Adversarial Networks (GANs)
- Variational Autoencoders (VAEs)
- Contrastive Predictive Coding (CPC)
- Energy Based Models
- Reinforcement learning and AlphaZero

Expected outcomes:

- An understanding of the general issues sufficient to guide architecture design and training.
- An ability to read and understand the current research literature in deep learning.

Prerequisites: linear algebra, vector calculus, familiarity with multivariate Gaussian probability distributions and Markov processes.

Course Website: https://mcallester.github.io/ttic-31230

100 units

This course considers problems in perception, navigation, and control, and their systems-level integration in the context of self-driving vehicles through an open-source curriculum for autonomy education that emphasizes hands-on experience. Integral to the course, students will collaborate to implement concepts covered in lecture on a low-cost autonomous vehicle with the goal of navigating a model town complete with roads, signage, traffic lights, obstacles, and citizens. The wheeled platform is equipped with a monocular camera and a performs all processing onboard with a Raspberry Pi 3, and must: follow lanes while avoiding obstacles, pedestrians and other robots; localize within a global map; navigate a city; and coordinate with other robots to avoid collisions. The platform and environment are carefully designed to allow a sliding scale of difficulty in perception, inference, and control tasks, making it usable in a wide range of applications, from undergraduate-level education to research-level problems. For example, one solitary robot can successfully wander the environment using only line detection and reactive control, while successful point-to-point navigation requires recognizing street signs. In turn, sign detections can be “simulated” either by using fiducials affixed to each sign, or it can be implemented using “real” object detection. Realizing more complex behaviors, such as vision-based decentralized multi-robot coordination, poses research-level challenges, especially considering resource constraints. In this manner, the course is well suited to facilitate undergraduate and graduate-level education in autonomy.

The course will be taught in concurrently and in conjunction with classes at the University of Montreal and ETH Zurich, which provides opportunities for interaction and collaboration across institutions.

Topics:

The course covers fundamental techniques in perception, planning, and control for autonomous agents and their integration as part of a complex system. Specific topics include:

- camera geometry, intrinsic/extrinsic calibration;
- minimal sufficient representations for visual tasks;
- nonlinear filtering, including robust localization and mapping (localize in a given map, or create your own map);
- shared control and level 2,3 autonomy;
- complex perception pipelines: (use of) object detection (reading traffic signs) and tracking;
- safety and correctness (navigate intersections); and
- signaling and coordination for distributed robotics (reason about the intent of the other drivers).

Expected outcomes:

- An understanding of fundamental techniques in perception, planning, and control for autonomous agents and their integration as part of a complex system
- Understand how to design these subsystems together, such that performance is maximized, while (shared) resource usage is minimized.
- Familiarity with basic practices of reliable system development, including test-driven and data-driven development.
- Understanding of the tools and the dynamics of software and hardware open-source development. All homework is shared on a central Git repository.
- Familiarity with the constraints related to co-design of integrated systems.

Prerequisites: There are no formal course requirements, though having taken TTIC 31170 – Planning, Learning, and Estimation for Robotics and Artificial Intelligence is desireable. Students must be familiar with the GNU/Linux development environment and are required to have access to a laptop with Ubuntu 16.04 installed.

100 units

Original academic research conducted under guidance of an instructor (normally student’s PhD advisor), directed at making progress towards advancing the state of the art in the chosen area.

Expected outcomes:

- Familiarity with peer-reviewed literature on the chosen topic of focus representing both current state of the art and historical developments.
- Ability to develop a short-, medium- and long-term research plan.
- Improved technical skills relevant to the research topic (examples include: using relevant mathematical tools, developing, documenting and using advanced software, designing and executing user studies, designing and conducting experiments).
- Ability to communicate research progress in regular meetings with advisor and colleagues.
- Ability to describe research results in a technical document at a level and in format appropriate for a submission to a peer-reviewed venue.

100 units

A structured study of literature on a topic chosen in consultation with an instructor (normally student’s PhD advisor)

Expected outcomes:

- Familiarity with peer-reviewed literature on the chosen topic of focus representing both current state of the art and historical developments.
- Ability to comprehend and summarize an article in a brief written form, and articulate its strengths, shortcomings, connections to other related work and to the student’s research agenda.
- Develop and refine the skills needed to conduct independent literature search using different available resources, following reference trees, and prioritizing reading.

Advisor

In-depth involvement in areas of computer science in a research lab, University or business setting. Internship activities and objectives must be related to the student’s program objectives. The Internship may require the student to be out of residence.

(Required enrollment for F-1 CPT internship for some international students).

Advisor’s Consent Required.

Internships do not award credit units, and do not fulfill set degree requirements.

Advisor

Part-Time Internship is an internship that requires minimal time of engagement: no more than 20 hours per week. The student remains in-residence at TTIC.

Advisor approval is required.

Internships do not award credit units, and do not fulfill set degree requirements.