
MEDS 6498: Special Topics in Biomedical ScienceSpring 2016
Machine Learning for Genomics (2 credits)
Instructor
Michael Duff
Cell & Genome Science Bldg, 400 Farmington Ave
Office: R1260 (Genetics & Genome Sciences Dept cube)
Office hrs: after class or by arrangement
860.970.2283 (txt only please, voice doesn't work)
moduff@gmail.com (please use
this rather than uchc address)
Time & Place
Tuesdays 10am11:50
Cell & Genome Science Bldg, 400 Farmington Ave, Rm R1390 (conference room adjacent to computer server room).
Topics
 General problems & methods
 pattern classification: generative vs discriminative approaches
 model selection: cross validation, Occam's razor
 Bayesian perspectives (naive Bayes, empirical Bayes)
 Mixture models, Hidden Markov Models—expectation maximization
 Techniques from computer science
 dynamic programming
 suffix trees
 from hash tables to Bloom filters
 R/python programming and packages
 Elements of Information theory
 Specific example applications: genome assembly using cuttingedge longread technology, metagenomic analysis,...
 Experimental design and causal inference (advanced topic)
Format
Small class size (<=5), weekly readings, informal
discussions/presentations, suggested exercises (ungraded, solutions
provided/discussed), and final project.
Resources
 The key resource for the first part of the course is a website
associated
with the text "Introduction to Statistical Learning" (ISL)
by Hastie & Tibshirani. A free pdf version of their text (6th printing, 2015),
along with lecture videos, is available
here.
The textbook pdf and lecture videos are verystrongly recommended.
 R programming
 Python programming
Meeting synopses

2016.01.19
Administrivia, establishing a sense of background/context/goals, preview of coming attractions
(syllabus). Statistical learning, supervised and unsupervised learning, regression vs classification,
curse of dimensionality, biasvariance trade off, knearest neighbors.
Reading:
 Machine learning
applications in genetics and genomics (Libbrecht & Noble, 2015) pdf
 Our course will begin by following the first
four chapters of "Introduction to Statistical Learning" (ISL)
by Hastie & Tibshirani. A free pdf version of their text (6th printing, 2015),
along with lecture videos, is available
here. Today's meeting covers ISL Chapters 1 and 2.
 The Signal
and the Noise, by Nate Silver (for fun).
NFL playoff predictions
Suggested exercises:
Next: Linear regression (Chapter 3 of ISL).

2016.01.26
Other weird examples re geometry in highdimensions. Linear regression: simple linear regression (1 predictor), optimal
coefficients (in the sense of RSS) and their standard errors,
multiple linear regression (matrix formulationnot in the book,
coefficients as orthogonal projection onto subspace spanned by
datamatrix column vectors), model selection (forward selection),
qualitative predictors (coding trick), modeling predictor
interactions and nonlinearity.
Reading:
Read Chapter 4 (Classification) from ISL for next time (& the
lecture videos are strongly recommended).
Suggested exercises:
 ISL Problems 5,6,7, p. 121
 Problem 13 p. 124.
Next: Classification (Chapter 4 of
ISL).
***HEADS UP!!
I'll be out of town during next week's meeting, but hoping to have
my old colleague Mohan Bolisetty (now at JAX) provide a substitute
session in which he describes his (and Gopi's) paper on nanopore
technology and analysis (gauging mutually exclusive splicing in fly),
with particular emphasis on fun with Python:
Determining
exon connectivity in complex mRNAs by nanopore sequencing

2016.02.02
Guest lecturer: Mohan Bolisetty, PhD (JAX)
Varieties of sequencing methodologies and instrumentation. Details of
sequencing, pros/cons of each approach—"in the near future we
should be able choose the sequencer for the question and no longer
have to fit a question to a sequencer." iPython notebook, reproducible
code for data analysis.
Beaker notebook for coding in multiple languages.

2016.02.09
Classification (ISL Chapter 4). Logistic Regression & Linear Discriminant
Analysis (LDA). Maximum Likelihood for determining Logistic Regression
coefficients, Bayes rule for LDA. "Bayesbalance" linear system for transcript
localization probabilities given fractionated RNAseq data. A proposed
application of Multinomial Logistic Regression for predicting
transcript localization.
Reading:
Read Chapter 5 (Resampling Methods) and begin Chapter 6 on Model
Selection and Regularization from ISL for next time (& as always the
lecture videos are strongly recommended).
Suggested exercises from Chapter 4 Classification:
 ISL Problems 1,2 p 168, and Problem 6 p. 170
 Problem 10 p. 171.
Next: Crossvalidation & the Bootstrap (Chapter 5 of
ISL, some of this should be a review). Model Selection and
Regularization (Chapter 6 of ISL).

2016.02.16
Homework solution for exercise 4.2 (LDA). Naive Bayes. ISL Chapter 5: Crossvalidation
(validation set, leaveoneout, kfold), bias/variance tradeoff (variance
of a sum of correlated random variables), crossvalidation for
classsification problems. Bootstrap (minvariance investment
example).
ISL Chapter 6: Subset selection (best, forward & backward greedy
selection), Shrinkage (ridge regression, Lasso).
Reading (for next week):
Chapter 8 (Decision Trees)—lecture videos strongly recommended.
Suggested exercises:
 ISL Problems 5.1 (did this in class), 5.2
 Mimic the explicit steps of the Lab in Section 5.3 in detail using RStudio
 Mimic the explicit steps of Lab 6.5 in detail
Next:
Chapter 8 of ISL—Decision trees, bagging, boosting, and random
forests.
Tentative plan for the rest of the term (many topics lie outside ISL):
 Feb 23—Decision trees, bagging/boosting, random forests
 Mar 01—Mixture models; Hidden Markov Models (HMMs)
 Mar 08—HMMs part 2, Expectation Maximization (EM)
 Mar 15—Spring Break!
 Mar 22—Perceptrons, Neural Networks
 Mar 29—Support Vector Machines (SVMs)
 Apr 05—DeepLearning architectures and training
 Apr 12—Information Theory
 Apr 19—TBD
 Apr 26—Project presentations

2016.02.23
Decision Trees and Ensemble methods (ISL Chapter 8).
(Regression) Recursive binary splitting, and tree pruning.
(Classification) Gini Index, entropy, node purity.
Bagging (OOB error estimation). Proof that only 2/3 of
data appears in boostrapped sample. Random Forests.
Boosting. AdaBoost algorithm.
Further resources for boosting:
Reading (for next week):
Next:
Hidden Markov Models (HMMs).

2016.03.01
Dynamic programming examples: (1) number of paths from
start to goal in a gridworld, (2) shortest such path computed
using DP in a forward sense from the start, (3) shortest path
computed using DP in a backward sense starting from the goal state.
Durbin Chap 3:
Markov chains. Alternative Markov chain models for CpG island regions
and nonregions. Logodds ratio. Hidden Markov Models for the CpG island
problem. Viterbi algorithm for mostprobable hidden state sequence
given an emission sequence. Forward algorithm for probability of an
emission sequence. Backward algorithm and the posterior probability
distribution for a particular hidden state.
Reading: For next time, read this excerpt about EM, from Tom Mitchell's
book "Machine Learning" (1997).
Next: Review of HMMs. The Expectation Maximization (EM) algorithm.

2016.03.08
HMM review in context of simplest example possible:
3 questions, dynamic programming for answering them.
Expectation Maximization (EM): simple case of 2Gaussians
mixture model. Deriving the Estep, stating the Mstep.
EM general theory. How this dictates the E&M steps
for the 2Gaussians example.
Reading (for next time):
Next: Perceptrons, Neuralnets,
backpropagation.

2016.03.22
Biological neurons and networks in the brain. The perceptron.
Singleunit limitations (XOR). Arbitary Boolean functions can
be realized with 2 layers. Perceptron training rule.
Delta rule, gradient descent
in weight space on the error over the training set. Chain rule.
Stochastic gradient ascent. Multilayer networks. The sigmoid
function and its derivative. The main ideas behind the
backpropagation algorithm and its derivation.
Regularization methods to control for overfitting.
Reading for next week:
Support Vector Machines,
Chapter 9 from Introduction to Statistical Learning (2015)
Next: Support Vector Machines

2016.03.29
Hyperplane math review. Separating hyperplanes. Maxmargin classifiers
(for linearlyseparable classes). Support vector classifiers (soft
marginsfor non separable classes) via solving a convex quadratic
programming problem. Support vector machines = support vector
classifiers + kernels (produces nonlinear separating boundaries).
SVM hacks for more than two classes.
Connection between logistic regression and support vector machines.
Reading for next week:
 I'm still trying to find one really good survey paper... For now, I'm suggesting this
Review paper,
which appeared in Nature in 2015.
 Also, you might try poking around this site, which includes a
reading list and tutorials.
 (For fun) Google DeepMind and publications.

AlphaGo.
Next: Deep Learning

2016.04.05
Review of multilayer neural networks and weighttraining
via stochastic gradient descent. Autoencoder example,
representation learning, implicit assumption that natural
signals are compositional hierarchies. Deep learning history and
visual neuroscience inspiration. Convolutional Networks (ConvNets):
local feature maps ("filterbanks") with shared weights and
convolutional tilings, max pooling (downsampling), Rectified
Linear Units (ReLU). GPU computing (NVidia Tesla k80, CUDA
software, deeplearning libraries). Recurrent neural networks
(and their unwinding).
DeepBind paper—beyond PWMs for
finding transcriptionfactor and RBP binding sites.
Oxford Nanopore, basic technology and recent update of
the basecaller from HMM to a deeplearning architecture
that includes LSTM recurrent layers.
My slides for today's lecture.
Reading for next week: TBD
 Here's an excerpt
from Claude Shannon's famous 1948 paper. (Astounding biography).
 A muchshorter, morereadable Introduction by Zoubin
Ghahramani (Cambridge Machine Learning Group).
Next: Information Theory

2016.04.12
Examples of noisy communication channels. Binary symmetric channel
model. Repetition codes. R_3 example. Optimal decoder is majority
vote (proof). Reduced information transfer rate. Block codes, the
(7,4) Hamming code. Parity bits (Venn circles). Generator matrix
form. Syndrome decoding for the Hamming code. Matrix form of
syndrome decoding. Higher rate than for R_3. What is achievable
in the rate/biterror plane? Shannon's noisy channel coding theorem.
Entropy, joint entropy, conditional entropy, mutual entropy.
Shannon's source coding theorem. Kullback Leibler Divergence
(not a vailid metric, but frequently used in machine learning to
gauge distance between two probability distributions). JensenShannon
(symmetrized) Divergence and its metric.
Next: No class meeting next week (April
19). Instead, I'll be available throughout the week to provide
feedback on your ideas for presentation. I should be generally available if
you catch me in my cube, but email me to ensure a proper meeting
time if you prefer.
***HEADS UP!!
Presentation day (2025 minutes for each of you) is April 26— 2 Weeks from today.
