Practicum Ratiocinativae

MEDS 5325: Computational Genomics Practicum (2 Credits)---Spring 2016




A practical introduction to computational genomics focussing on methods for processing/analyzing Next Generation Sequencing (NGS) data. Programming: Introduction to the Linux command line, elements of Python and R programming. Genomics software tools for performing sequence read-alignments, transcript-expression profiling, and robust procedures for gauging differential gene expression. Methods for genome assembly, genome variation detection, motif-finding, and data-visualization. Statistical topics include: probability distributions, central limit theorem, hypothesis testing, linear models, and dimensionality reduction.


Instructor

Michael Duff
Cell & Genome Science Bldg, 400 Farmington Ave
Office: R1260 (Genetics & Genome Sciences Dept cube)
Office hrs: after class or by arrangement
860.970.2283 (txt only please, voice doesn't work)
moduff@gmail.com (please use this rather than uchc address)


Time & Place

Tuesdays 10am-11:50
Cell & Genome Science Bldg, 400 Farmington Ave, Rm R1390 (conference room adjacent to computer server room).


Syllabus

We will be focussing on how to process/analyze Next Generation Sequencing (NGS) data, using real data sets (drawn from students' own projects if possible), and learning what we need to learn as we go in order to get things done!

Our meetings will be primarily devoted to discussion and working through computational examples/case-studies, though at least part of the course will include lectures on practical probability and statistics. We may also draw upon selected video-lectures from Coursera/EdX, expecially for aspects relating to programming languages. There will be graded homework assignments and exams, and a final project.

Jump-start topics:

  • getting set up on our laptops (macs)
    • terminal
    • text-editor (smultron/aquamacs)
  • command line basics
  • R programming
  • Python programming
  • file formats
    • FASTA
    • BED
    • GTF
    • FASTQ
    • SAM, BAM
  • BedTools
  • Tuxedo tools
    • bowtie, bowtie2, tophat
    • cufflinks, cuffdiff

Some other topics that we could consider:

  • R graphics
  • gene clustering & heatmaps: Gene Cluster and TreeView
  • HPC cluster accounts; maneuvering on a cluster
  • how to make UCSC browser tracks
  • BWA
  • genome assembly: Velvet, Oasis
  • variant analysis: freeBayes, VCF, Variant Annotation Integrator
  • structural variation: DELLY
  • statistical topics
    • Binomial, Normal, and Poisson distributions
    • Law of Large Numbers & Central Limit Theorem
    • p-values & multiple testing
    • Bayes theorem, likelihood function (cufflinks revisited)
    • linear models
    • principle components analysis
    • batch effects, SVAseq
  • FPKM vs TPM
  • kallisto & sleuth
  • Bioconductor
  • GO analysis: DAVID, FuncAssociate
  • motif finding: MEME, HOMER
  • CHiP-seq peak calling
  • ...more topics TBD