Venue: Seoul National University College of Medicine
Daehwan Kim, Ph.D.
Center for Computational Biology
McKusick-Nathans Institute of Genetic Medicine
Johns Hopkins University
Baltimore, MD, USA
Daehwan Kim is a post-doctoral research fellow in the McKusick-Nathans Institute of Genetic Medicine at Johns Hopkins University. He received his Ph.D. in Computer Science from the University of Maryland, College Park in 2013 under the supervision of Steven Salzberg, currently Director of the Center for Computational Biology at JHU. Dr. Kim’s expertise is in algorithms and high performance computing approaches for Next Generation Sequencing analysis, with a focus on sequence alignment and fusion gene/transcript discovery. He is the main developer of some popular software such as TopHat2, HISAT, HISAT2, and TopHat-Fusion. He has also been involved in the development of Bowtie2 and Cufflinks.
His HP at http://www.ccb.jhu.edu/people/infphilo/
Abstract of the Special Lecture
Graph-based alignment of NGS reads to a population of human genomes
In his presentation heI will first describe two popular alignment programs: (1) Bowtie2 for DNA-seq reads and (2) TopHat2 for RNA-seq reads.
He will then concentrate on discussing my new alignment programs, HISAT and its successor, HISAT2, as follows.
Since the introduction of next-generation sequencing (NGS) technologies, multiple large-scale human sequencing projects have been launched, including the 1000 Genomes Project, GTEx, and GEUVADIS. These projects have already yielded a large and growing amount of information about human genetic variation, including >110 million SNPs (in dbSNP). I have developed a novel indexing scheme that captures a wide representation of the human population by incorporating these variations into the reference genome. With an index that incorporates ~12.3M common SNPs from the dbSNP database, I have built a new alignment system, HISAT2. This system shows promise, with an index size of just 6.2 GB and among the fastest alignment programs, with greater alignment accuracy for reads containing SNPs. HISAT2 also has the potential to genotype essentially all the genes on the human genome on a desktop within a few hours. To demonstrate the capability of my initial genotyping work, I chose one gene family, Human Leukocyte Antigen (HLA) genes, which are among the most diverse human genes. I incorporated these HLA alleles and variants into the index of the human genome, requiring only a small addition in computational resources. Tests on Illumina’s Platinum Genomes data show that my method correctly identifies all 204 alleles of the six HLA genes for the 17 genomes, at a speed surpassing other currently available methods.