Practicum Ratiocinativae

MEDS 6498: Special Topics in Biomedical Science---Spring 2016

Machine Learning for Genomics (2 credits)


Michael Duff
Cell & Genome Science Bldg, 400 Farmington Ave
Office: R1260 (Genetics & Genome Sciences Dept cube)
Office hrs: after class or by arrangement
860.970.2283 (txt only please, voice doesn't work) (please use this rather than uchc address)

Time & Place

Tuesdays 10am-11:50
Cell & Genome Science Bldg, 400 Farmington Ave, Rm R1390 (conference room adjacent to computer server room).


  • General problems & methods
    • pattern classification: generative vs discriminative approaches
    • model selection: cross validation, Occam's razor
    • Bayesian perspectives (naive Bayes, empirical Bayes)
    • Mixture models, Hidden Markov Models—expectation maximization
  • Techniques from computer science
    • dynamic programming
    • suffix trees
    • from hash tables to Bloom filters
    • R/python programming and packages
  • Elements of Information theory
  • Specific example applications: genome assembly using cutting-edge long-read technology, metagenomic analysis,...
  • Experimental design and causal inference (advanced topic)


Small class size (<=5), weekly readings, informal discussions/presentations, suggested exercises (ungraded, solutions provided/discussed), and final project.


Meeting synopses

  • 2016.01.19

    Administrivia, establishing a sense of background/context/goals, preview of coming attractions (syllabus). Statistical learning, supervised and unsupervised learning, regression vs classification, curse of dimensionality, bias-variance trade off, k-nearest neighbors.


    • Machine learning applications in genetics and genomics (Libbrecht & Noble, 2015) pdf

    • Our course will begin by following the first four chapters of "Introduction to Statistical Learning" (ISL) by Hastie & Tibshirani. A free pdf version of their text (6th printing, 2015), along with lecture videos, is available here. Today's meeting covers ISL Chapters 1 and 2.

    • The Signal and the Noise, by Nate Silver (for fun). NFL playoff predictions

    Suggested exercises:

    Next: Linear regression (Chapter 3 of ISL).

  • 2016.01.26

    Other weird examples re geometry in high-dimensions. Linear regression: simple linear regression (1 predictor), optimal coefficients (in the sense of RSS) and their standard errors, multiple linear regression (matrix formulation---not in the book, coefficients as orthogonal projection onto subspace spanned by data-matrix column vectors), model selection (forward selection), qualitative predictors (coding trick), modeling predictor interactions and nonlinearity.


    Read Chapter 4 (Classification) from ISL for next time (& the lecture videos are strongly recommended).

    Suggested exercises:

    • ISL Problems 5,6,7, p. 121
    • Problem 13 p. 124.

    Next: Classification (Chapter 4 of ISL).

    ***HEADS UP!!

    I'll be out of town during next week's meeting, but hoping to have my old colleague Mohan Bolisetty (now at JAX) provide a substitute session in which he describes his (and Gopi's) paper on nanopore technology and analysis (gauging mutually exclusive splicing in fly), with particular emphasis on fun with Python:

    Determining exon connectivity in complex mRNAs by nanopore sequencing

  • 2016.02.02

    Guest lecturer: Mohan Bolisetty, PhD (JAX)

    Varieties of sequencing methodologies and instrumentation. Details of sequencing, pros/cons of each approach—"in the near future we should be able choose the sequencer for the question and no longer have to fit a question to a sequencer." iPython notebook, reproducible code for data analysis. Beaker notebook for coding in multiple languages.

  • 2016.02.09

    Classification (ISL Chapter 4). Logistic Regression & Linear Discriminant Analysis (LDA). Maximum Likelihood for determining Logistic Regression coefficients, Bayes rule for LDA. "Bayes-balance" linear system for transcript localization probabilities given fractionated RNAseq data. A proposed application of Multinomial Logistic Regression for predicting transcript localization.


    Read Chapter 5 (Resampling Methods) and begin Chapter 6 on Model Selection and Regularization from ISL for next time (& as always the lecture videos are strongly recommended).

    Suggested exercises from Chapter 4 Classification:

    • ISL Problems 1,2 p 168, and Problem 6 p. 170
    • Problem 10 p. 171.

    Next: Cross-validation & the Bootstrap (Chapter 5 of ISL, some of this should be a review). Model Selection and Regularization (Chapter 6 of ISL).

  • 2016.02.16

    Homework solution for exercise 4.2 (LDA). Naive Bayes. ISL Chapter 5: Cross-validation (validation set, leave-one-out, k-fold), bias/variance tradeoff (variance of a sum of correlated random variables), cross-validation for classsification problems. Bootstrap (min-variance investment example). ISL Chapter 6: Subset selection (best, forward & backward greedy selection), Shrinkage (ridge regression, Lasso).

    Reading (for next week):

    Chapter 8 (Decision Trees)—lecture videos strongly recommended.

    Suggested exercises:

    • ISL Problems 5.1 (did this in class), 5.2
    • Mimic the explicit steps of the Lab in Section 5.3 in detail using RStudio
    • Mimic the explicit steps of Lab 6.5 in detail

    Next: Chapter 8 of ISL—Decision trees, bagging, boosting, and random forests.

    Tentative plan for the rest of the term (many topics lie outside ISL):

    • Feb 23—Decision trees, bagging/boosting, random forests
    • Mar 01—Mixture models; Hidden Markov Models (HMMs)
    • Mar 08—HMMs part 2, Expectation Maximization (EM)
    • Mar 15—Spring Break!
    • Mar 22—Perceptrons, Neural Networks
    • Mar 29—Support Vector Machines (SVMs)
    • Apr 05—Deep-Learning architectures and training
    • Apr 12—Information Theory
    • Apr 19—TBD
    • Apr 26—Project presentations

  • 2016.02.23

    Decision Trees and Ensemble methods (ISL Chapter 8). (Regression) Recursive binary splitting, and tree pruning. (Classification) Gini Index, entropy, node purity. Bagging (OOB error estimation). Proof that only 2/3 of data appears in boostrapped sample. Random Forests. Boosting. AdaBoost algorithm.

    Further resources for boosting:

    Reading (for next week):

    Next: Hidden Markov Models (HMMs).

  • 2016.03.01

    Dynamic programming examples: (1) number of paths from start to goal in a gridworld, (2) shortest such path computed using DP in a forward sense from the start, (3) shortest path computed using DP in a backward sense starting from the goal state. Durbin Chap 3: Markov chains. Alternative Markov chain models for CpG island regions and non-regions. Log-odds ratio. Hidden Markov Models for the CpG island problem. Viterbi algorithm for most-probable hidden state sequence given an emission sequence. Forward algorithm for probability of an emission sequence. Backward algorithm and the posterior probability distribution for a particular hidden state.

    Reading: For next time, read this excerpt about EM, from Tom Mitchell's book "Machine Learning" (1997).

    Next: Review of HMMs. The Expectation Maximization (EM) algorithm.

  • 2016.03.08

    HMM review in context of simplest example possible: 3 questions, dynamic programming for answering them. Expectation Maximization (EM): simple case of 2-Gaussians mixture model. Deriving the E-step, stating the M-step. EM general theory. How this dictates the E&M steps for the 2-Gaussians example.

    Reading (for next time):

    Next: Perceptrons, Neural-nets, backpropagation.

  • 2016.03.22

    Biological neurons and networks in the brain. The perceptron. Single-unit limitations (XOR). Arbitary Boolean functions can be realized with 2 layers. Perceptron training rule. Delta rule, gradient descent in weight space on the error over the training set. Chain rule. Stochastic gradient ascent. Multi-layer networks. The sigmoid function and its derivative. The main ideas behind the backpropagation algorithm and its derivation. Regularization methods to control for overfitting.

    Reading for next week: Support Vector Machines, Chapter 9 from Introduction to Statistical Learning (2015)

    Next: Support Vector Machines

  • 2016.03.29

    Hyperplane math review. Separating hyperplanes. Max-margin classifiers (for linearly-separable classes). Support vector classifiers (soft margins---for non separable classes) via solving a convex quadratic programming problem. Support vector machines = support vector classifiers + kernels (produces nonlinear separating boundaries). SVM hacks for more than two classes. Connection between logistic regression and support vector machines.

    Reading for next week:

    • I'm still trying to find one really good survey paper... For now, I'm suggesting this Review paper, which appeared in Nature in 2015.
    • Also, you might try poking around this site, which includes a reading list and tutorials.
    • (For fun) Google DeepMind and publications.
    • AlphaGo.

    Next: Deep Learning

  • 2016.04.05

    Review of multi-layer neural networks and weight-training via stochastic gradient descent. Autoencoder example, representation learning, implicit assumption that natural signals are compositional hierarchies. Deep learning history and visual neuroscience inspiration. Convolutional Networks (ConvNets): local feature maps ("filter-banks") with shared weights and convolutional tilings, max pooling (downsampling), Rectified Linear Units (ReLU). GPU computing (NVidia Tesla k80, CUDA software, deep-learning libraries). Recurrent neural networks (and their unwinding). DeepBind paper—beyond PWMs for finding transcription-factor and RBP binding sites. Oxford Nanopore, basic technology and recent update of the basecaller from HMM to a deep-learning architecture that includes LSTM recurrent layers.

    My slides for today's lecture.

    Reading for next week: TBD

    • Here's an excerpt from Claude Shannon's famous 1948 paper. (Astounding biography).
    • A much-shorter, more-readable Introduction by Zoubin Ghahramani (Cambridge Machine Learning Group).

    Next: Information Theory

  • 2016.04.12

    Examples of noisy communication channels. Binary symmetric channel model. Repetition codes. R_3 example. Optimal decoder is majority vote (proof). Reduced information transfer rate. Block codes, the (7,4) Hamming code. Parity bits (Venn circles). Generator matrix form. Syndrome decoding for the Hamming code. Matrix form of syndrome decoding. Higher rate than for R_3. What is achievable in the rate/bit-error plane? Shannon's noisy channel coding theorem. Entropy, joint entropy, conditional entropy, mutual entropy. Shannon's source coding theorem. Kullback Leibler Divergence (not a vailid metric, but frequently used in machine learning to gauge distance between two probability distributions). Jensen-Shannon (symmetrized) Divergence and its metric.

    Next: No class meeting next week (April 19). Instead, I'll be available throughout the week to provide feedback on your ideas for presentation. I should be generally available if you catch me in my cube, but email me to ensure a proper meeting time if you prefer.

    ***HEADS UP!!

    Presentation day (20-25 minutes for each of you) is April 26— 2 Weeks from today.