Fall 2012, Analyzing High Throughput Sequencing Data


The purpose of this course is to introduce students to the various applications of high-throughput sequencing including: chip-Seq, RNA-Seq, SNP calling, metagenomics, de-novo assembly and others. The course material will concentrate on presenting complete data analysis scenarios for each of these domains of applications and will introduce students to a wide variety of existing tools and techniques. We expect that by the end of the course work students will:

  • understand common bioinformatics data formats and standards
  • become familiar with the practice of analyzing short-read sequencing data from various instruments:
    • Illumina HiSeq sequencer
    • ABI SOLID sequencer
    • Roche 454 platforms
  • develop a computationally oriented thinking that is necessary to take on large-scale data analysis projects
  • understand data analysis principles of methodologies such as:
    • short read and long read alignments
    • Chip-Seq analysis and peak calling
    • interval query and manipulation
    • SNP calling and genomic variation detection
    • genome assembly with open source tools
    • metagenomics analysis
  • filter, extract and combine data with scripting languages
  • automate tasks with shell scripts to create reusable data pipelines
  • plot and visualize results with R and other packages

A laptop that has sufficient amount of battery power for 25 minute work may be required to perform data analysis tasks in class. Only Mac OSX (Tiger/Leopard) and Linux operating systems are supported.


Practical data analysis for life scientists 
BMMB 597D - Bio Data Analysis (2 cr.)
Schedule #398704
Tuesday/Thursday 2:30-3:20 in 120 Thomas Building
Limit of 25 students.   
Office hours: MW 2-3pm 502B Wartik

Lecture Notes

Lectures will appear below as they are presented. Homeworks are included in the handouts.

Grading and Homework

The final grade will be an average of the grades obtained on homework and a project. Please refer to the information in the first lecture. Homework will be handed out during each lectures in the form of exercises that will need to be turned in at the beginning of each week.

We want to emphasize that the primary goal of this course work is to improve students ability to handle and interpret data sets. Therefore the evaluation process is relative to the initial aptitudes. We aim to focus on developing permanent skills and talents that are not just immediately useful but also provide the foundation for further more in depth understanding of informatics in general.

Created by Istvan Albert • Last updated on Tuesday, March 31, 2015 • Site powered by PyBlue