
MEDS 5325: Computational Genomics Practicum (2 Credits)Spring 2016
A practical introduction to computational genomics focussing on methods for processing/analyzing
Next Generation Sequencing (NGS) data. Programming: Introduction to the Linux command line,
elements of Python and R programming. Genomics software tools for performing sequence readalignments,
transcriptexpression profiling, and robust procedures for gauging differential gene expression.
Methods for genome assembly, genome variation detection, motiffinding, and datavisualization.
Statistical topics include: probability distributions, central limit theorem, hypothesis testing,
linear models, and dimensionality reduction.
Instructor
Michael Duff
Cell & Genome Science Bldg, 400 Farmington Ave
Office: R1260 (Genetics & Genome Sciences Dept cube)
Office hrs: after class or by arrangement
860.970.2283 (txt only please, voice doesn't work)
moduff@gmail.com (please use
this rather than uchc address)
Time & Place
Tuesdays 10am11:50
Cell & Genome Science Bldg, 400 Farmington Ave, Rm R1390 (conference
room adjacent to computer server room).
Syllabus
We will be focussing on how to process/analyze Next Generation Sequencing
(NGS) data, using real data sets (drawn from students' own projects
if possible), and learning what we need to learn as we go in order to get things done!
Our meetings
will be primarily devoted to discussion and working through
computational examples/casestudies, though at least part of the
course
will include lectures on practical probability and statistics.
We may also draw upon selected videolectures
from Coursera/EdX, expecially for aspects relating to programming
languages.
There will be graded homework assignments and exams, and a final project.
Jumpstart topics:
 getting set up on our laptops (macs)
 terminal
 texteditor (smultron/aquamacs)
 command line basics
 R programming
 Python programming
 file formats
 FASTA
 BED
 GTF
 FASTQ
 SAM, BAM
 BedTools
 Tuxedo tools
 bowtie, bowtie2, tophat
 cufflinks, cuffdiff
Some other topics that we could consider:
 R graphics
 gene clustering & heatmaps: Gene Cluster and TreeView
 HPC cluster accounts; maneuvering on a cluster
 how to make UCSC browser tracks
 BWA
 genome assembly: Velvet, Oasis
 variant analysis: freeBayes, VCF, Variant Annotation Integrator
 structural variation: DELLY
 statistical topics
 Binomial, Normal, and Poisson distributions
 Law of Large Numbers & Central Limit Theorem
 pvalues & multiple testing
 Bayes theorem, likelihood function (cufflinks revisited)
 linear models
 principle components analysis
 batch effects, SVAseq
 FPKM vs TPM
 kallisto & sleuth
 Bioconductor
 GO analysis: DAVID, FuncAssociate
 motif finding: MEME, HOMER
 CHiPseq peak calling
 ...more topics TBD
