Welcome to Yu Zhang's Home Page

Welcome to my Site

Thumbnail Caption

Hi, my name is Yu Zhang. I am an associate professor of Statistics at the Penn State University, University Park right in the middle of Pennsylvania.

I received my Ph.D. from University of Southern California with Michael S. Waterman, 2000-2004. After graduation, I spent 2 years at Harvard University as a postdoctor with Jun S. Liu. My hometown is Beijing in China, a city that you should visit sometime.


My Research

Thumbnail Caption

My current research interest is to develop computational methods to solve problems in molecular biology. I've been working on DNA sequence analysis, gene evolution, and genetics.

Some of my publications and software packages are listed blow, and a complete list of publications can be found in my Curriculum Vitae:

Zhang Y (2011) A Novel Bayesian Graphical Model for Genome-Wide Multi-SNP Association Mapping. Genet Epi, 36:36-37. Software: BEAM3

Zhang Y, Zhang J, Liu JS (2011) Block-based Bayesian epistasis association mapping with application to WTCCC type 1 diabetes data. Ann Appl Stat, 5:2052-2077. Software: BEAM2

Zhang Y and Liu JS (2011) Fast and accurate approximation to significance tests in genome-wide association studies. J Am Stat Assoc, 106:846-857. Software: GPASS

Zhang Y (2011) Bayesian epistasis association mapping via SNP imputation, Biostatistics, 12:211-222. Software: BEAMimpute

Chen KB and Zhang Y (2010) A varying threshold method for ChIP peak-calling using multiple sources of information, Bioinformatics, 26:i504-i510. Software: PASS2 (32bit) (64bit) (Source)

Zhang Y, Song GT, Vinar T, Green ED, Siepel A, Miller W (2009) Evolutionary history reconstruction for mammalian complex gene clusters, J Comp Biol, 16:1-20.

Zhang Y (2008) Poisson approximation for significance in genome-wide ChIP-chip tiling arrays, Bioinformatics. Software: PASS

Zhang Y, Song GT, Vinar T, Green ED, Siepel A, Miller W (2008) Reconstructing the evolutionary history of complex human gene clusters. RECOMB08.

Zhang Y (2008) Tree-guided Bayesian inference of population structures. Bioinformatics 24:965-971. Software: TIPS

Zhang Y, Liu J (2007) Bayesian inference of epistatic interactions in case-control studies. Nat Genet 39:1167-1173. Software: BEAM   source

Zhang Y, Niu T, Liu J (2006) A coalescence-guided hierarchical Bayesian method for haplotype inference. Am J Hum Genet 79:313-322. Software: CHB

Valouev A*, Zhang Y*, Schwartz DC, Waterman MS (2006) Refinement of optical map assemblies. Bioinformatics 22:1217-1224.
(* equal contribution).

Zhang Y, Waterman MS (2005) An Eulerian path approach to local multiple alignment for DNA sequences. PNAS 102:1285-1290. Software: EulerAlign


List of Software:

  • A set of software for disease association mapping:

  •    BEAM SNP-SNP interaction association mapping, assumes independence or 1-st order dependence between SNPs. Source code.
  •    BEAM2 SNP-SNP interaction association mapping based on SNP-block models, infers both SNP association and SNP block structures. Source code.
  •    BEAM3 SNP-SNP interaction association mapping based on graphical models, infers disease-SNP graph and automatically accounts for linkage disequilibrium. This package also implements the method in Zhang et al. (2014) for testing rare variants. Souce code (compiling needs GNU Scientific Library).
  •    BEAMimpute Improved upon BEAM2 to further impute untyped SNPs from a reference sample and test for marginal and interaction associations.
  • A set of software for ChIP peak calling:

  •    PASS Peak calling in ChIP data based on Poisson de-clumping, controls FWER and FDR.
  •    PASS2 Improved upon PASS to allow the user to provide a prior distribution where they believe the protein may bind to, so that the power can be improved (while still maintain a desired FDR or FWER), source code is available here.
  •    GPASS Detects SNP disease associations in case control studies, controls FWER and FDR adjusting for dependence/linkage disequilibrium.
  •    dCaP Joint peak caller and differential binding detector for ChIP-Seq data in multiple samples.
  •    IDEAS 2D genome segmentation for detecting functional elements and epigenomic variation/conservation across many cell types.

    More software for genetic data analysis:

  •    DBM Dynamic Bayesian Markov model for genotype calling, haplotype inference, de novo inference of population structure and local admixture for next-gen sequencing data.
  •    TIPS Tree based Bayesian detection method of subtle population structures.
  •    CHB Coalescence guided Baysian inference of haplotypes from genotype data.
  • Software for multiple sequence alignment:

  •    EulerAlign Alignment of DNA sequences using Eulerian graphs. The method is idea for dealing with large number of short sequences, such as many thousands sequences of 1kb long, do both global and local alignments.
  • My Portfolio