Da-Wei Huang talks with
ScienceWatch.com and answers a few questions about
this month's Fast Breaking Paper Paper in the field of Biology
& Biochemistry.
Article Title: Systematic and integrative analysis of
large gene lists using DAVID bioinformatics
resources
Authors: Huang, DW;Sherman, BT;Lempicki,
RA
Journal: NAT PROTOC, Volume: 4, Issue: 1, Page: 44-57, Year:
2009
* NCI, Lab Immunopathogenesis & Bioinformat, Clin Serv
Program, SAIC Frederick Inc, Frederick, MD 21702 USA.
* NCI, Lab Immunopathogenesis & Bioinformat, Clin Serv
Program, SAIC Frederick Inc, Frederick, MD 21702 USA.
Why do you think your paper is highly
cited?
The DAVID Bioinformatics Resources
(DAVID) was one of the earliest bioinformatics tools
for the emerging needs of the high-throughput gene functional annotation
analysis of large gene lists, which are usually derived from genome-wide
biological studies (such as from microarray or proteomics studies).
After continuous development and improvement for several years, DAVID's
quality and comprehensiveness place it in a leading position among other
similar tools, as evidenced by the fact that DAVID has been used
effectively in approximately 1,600 genomic publications according to Google
Scholar in January 2010.
This Nature Protocols paper summarizes the latest functions of
DAVID for users to more efficiently and more smoothly conduct the data
analysis tasks.
Does it describe a new discovery, methodology, or
synthesis of knowledge?
The higher quality and comprehensiveness of DAVID come from several novel
methods and ideas that other tools have not been able to extensively
address.
1) With DAVID, we developed a novel single-linkage-based agglomeration
method to construct one of the largest bio-knowledge databases of this
kind, the DAVID Knowledgebase. This work integrated more than 40 well-known
functional annotation categories from dozens of public databases, thereby
providing a solid foundation for DAVID data-mining algorithms.
"...to the best of my knowledge, the paper provides the
clearest principles and guidelines for genomics researchers
to conduct data analysis of this kind."
2) We developed a set of novel fuzzy-logic algorithms to address the
redundant relationships among many-genes-to-many-terms. This work can make
the annotation results more focused and cleaner by reducing the redundancy
and repeats.
3) We created a unique memory-based data IO method to enhance calculating
speed more than 10 times faster than the method based on database query.
Would you summarize the significance of your paper
in layman's terms?
The modern high-throughput technologies, such as gene chip, allow
investigators to simultaneously measure the changes of genome-wide genes
under certain diseases (e.g., cancer vs. normal).
After hundreds or even thousands of genes with changes are able to be
identified, biological interpretation of them is an important and necessary
downstream step for investigators to understand the biological mechanisms
of the diseases they are studying (e.g., cancer).
The Nature Protocols paper describes both underlying analytical
principles and step-by-step procedures to interpret biological mechanisms
from the results derived from the high-throughput technologies.
How did you become involved in this research, and
were there any problems along the way?
I had obtained extensive hands-on bioinformatics experience, particularly
in the areas of high-throughput DNA sequencing and microarray, through a
couple of jobs in biotech companies since 2000. I had almost decided my
next move would be to a pharmaceutical company, Eli Lilly, in the summer of
2004.
In the end, I decided to move back to an academic R&D environment to
lead the DAVID Bioinformatics Lab at NIH/SAIC-Frederick, the operations and
technical support contractor for NCI. After joining and leading the DAVID
Bioinformatics lab, I realized that I had made the right decision because I
completely fell in love with the DAVID project as proposed and focused by
the group.
The DAVID project allows talented team members to fully combine their
skills and knowledge of molecular biology, statistics, and computer
science. The team efforts eventually turned out nicely, with the current
version of DAVID becoming quite popular in the genomic community.
During the R&D course, we had many scientific challenges, which I
actually enjoyed very much rather than thinking of them as problems. I
found that it is indeed a challenging and difficult task to coordinate
activities, goals, and projects across inter-disciplinary lines, with
molecular biologists, computer programmers, and statisticians all as
members of the same team. Since bioinformatics represents a joining of
these areas, it seems that I will have to face such challenges during
day-to-day project management for the rest of my professional life.
My special thanks go to my bosses (Dr. Richard A. Lempicki, Dr. Michael W.
Baseler, Dr. H. Clifford Lane) and my team members (Mr. Brad T. Sherman,
Dr. Xin Zheng, Dr. Xiaojun Hu, et al.). Our science could not have gone
this far without them.
Where do you see your research leading in the
future?
The next generation of high-throughput technologies is moving toward
measurements of finer biological elements from gene intensity level, such
as ChIP-on-chip to measure promoter activities, SNP microarray to measure
genetic land markers, and exon microarray to measure alternative splicing
events.
We will expand DAVID functions to align with the analytical needs of the
newer technologies, which needs are still unmet in the genomic community.
Do you foresee any social or political
implications for your research?
Although my work does not have political impact, some aspects of social
implications are of immediate application. For example, to the best of my
knowledge, the paper provides the clearest principles and guidelines for
genomics researchers to conduct data analysis of this kind.
Da-Wei Huang, M.D.
Supervisor, DAVID Bioinformatics Lab
Laboratory of Immunopathogenesis and Bioinformatics
Clinical Services Program
SAIC-Frederick, Inc.
National Cancer Institute (NCI) at Frederick
Frederick, MD, USA
KEYWORDS: FUNCTIONAL-ANALYSIS; ONTOLOGY; TOOL; DAVID bioinformatics
resources.