Wednesday, June 23, 2010

Semi-automated ontology generation within OBO-Edit

Ontologies and taxonomies have proven highly beneficial for biocuration. The Open Biomedical Ontology (OBO) Foundry alone lists over 90 ontologies mainly built with OBO-Edit. Creating and maintaining such ontologies is a labour-intensive, difficult, manual process. Automating parts of it is of great importance for the further development of ontologies and for biocuration.
Results: We have developed the Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG), a system which supports the creation and extension of OBO ontologies by semi-automatically generating terms, definitions and parent–child relations from text in PubMed, the web and PDF repositories. DOG4DAG is seamlessly integrated into OBO-Edit. It generates terms by identifying statistically significant noun phrases in text. For definitions and parent–child relations it employs pattern-based web searches. We systematically evaluate each generation step using manually validated benchmarks. The term generation leads to high-quality terms also found in manually created ontologies. Up to 78% of definitions are valid and up to 54% of child–ancestor relations can be retrieved. There is no other validated system that achieves comparable results.
By combining the prediction of high-quality terms, definitions and parent–child relations with the ontology editor OBO-Edit we contribute a thoroughly validated tool for all OBO ontology engineers.
Availability: DOG4DAG is available within OBO-Edit 2.1 at http://www.oboedit.org

Monday, June 21, 2010

How Lie Detectors Work

A polygraph instrument is basically a combination of medical devices that are used to monitor changes occurring in the body. As a person is questioned about a certain event or incident, the examiner looks to see how the person's heart rate, blood pressure, respiratory rate and electrodermal activity (sweatiness, in this case of the fingers) change in comparison to normal levels. Fluctuations may indicate that person is being deceptive, but exam results are open to interpretation by the examiner. ­

Polygraph exams are most often associated with criminal investigations, but there are other instances in which they are used. You may one day be subject to a polygraph exam before being hired for a job: Many government entities, and some private-sector employers, will require or ask you to undergo a polygraph exam prior to employment.

Polygraph examinations are designed to look for significant involuntary responses going on in a person's body when that person is subjected to stress, such as the stress associated with deception. The exams are not able to specifically detect if a person is lying, according to polygragrapher Dr. Bob Lee , former executive director of operations at Axciton Systems, a manufacturer of polygraph instruments. But there are certain physiological responses that most of us undergo when attempting to deceive another person. By asking questions about a particular issue under investigation and examining a subject's­ physiological reactions to those questions, a polygraph examiner can determine if deceptive behavior is being demonstrated.

SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database

Proteomics, or the direct analysis of the expressed protein components of a cell, is critical to our understanding of cellular biological processes in normal and diseased tissue. A key requirement for its success is the ability to identify proteins in complex mixtures. Recent technological advances in tandem mass spectrometry has made it the method of choice for high-throughput identification of proteins. Unfortunately, the software for unambiguously identifying peptide sequences has not kept pace with the recent hardware improvements in mass spectrometry instruments. Critical for reliable high-throughput protein identification, scoring functions evaluate the quality of a match between experimental spectra and a database peptide. Current scoring function technology relies heavily on ad-hoc parameterization and manual curation by experienced mass spectrometrists. In this work, a two-stage stochastic model for the observed MS/MS spectrum, given a peptide is proposed. The model explicitly incorporates fragment ion probabilities, noisy spectra, and instrument measurement error. It describes how to compute this probability based score efficiently, using a dynamic programming technique. A prototype implementation demonstrates the effectiveness of the model.

Saturday, June 19, 2010


The turritopsis nutricula species of Jellyfish has been discovered to be the first, and possibly only, immortal creature. Once the creature reaches its adult form (pictured) it can, apparently, use transdifferentiation to transform its cells backwards to the polyp stage of its life and begin the whole cycle again. There's not much more to say beyond that - consider myself officially in awe.

Friday, June 18, 2010

Inferring combined CNV/SNP haplotypes from genotype data

Copy number variations (CNVs) are increasingly recognized as an substantial source of individual genetic variation, and hence there is a growing interest in investigating the evolutionary history of CNVs as well as their impact on complex disease susceptibility. CNV/SNP haplotypes are critical for this research, but although many methods have been proposed for inferring integer copy number, few have been designed for inferring CNV haplotypic phase and none of these are applicable at genome-wide scale. Here, we present a method for inferring missing CNV genotypes, predicting CNV allelic configuration and for inferring CNV haplotypic phase from SNP/CNV genotype data. Our method, implemented in the software polyHap v2.0, is based on a hidden Markov model, which models the joint haplotype structure between CNVs and SNPs. Thus, haplotypic phase of CNVs and SNPs are inferred simultaneously. A sampling algorithm is employed to obtain a measure of confidence/credibility of each estimate.

Results: We generated diploid phase-known CNV–SNP genotype datasets by pairing male X chromosome CNV–SNP haplotypes. We show that polyHap provides accurate estimates of missing CNV genotypes, allelic configuration and CNV haplotypic phase on these datasets. We applied our method to a non-simulated dataset—a region on Chromosome 2 encompassing a short deletion. The results confirm that polyHap's accuracy extends to real-life datasets.

Availability: Our method is implemented in version 2.0 of the polyHap software package and can be downloaded from http://www.imperial.ac.uk/medicine/people/l.coin

Tuesday, June 15, 2010

Why is it so difficult to find cancer cells?

Imagine you're standing in front of a sandbox. You've just been told that you'll receive a million dollars if you can find a certain piece of sand that's remarkable because it has a dot of black ink on it. How would you go about finding it? Do you even think you could?

Finding a single cancer cell in the human body is like looking for one grain of sand in a sandbox. You may think that it would be easier to find the cancer than the grain of sand; after all, hospitals are full of sophisticated diagnostic equipment that should help with the search. But there's no scan or test that can detect a cancer cell. The cell is simply too small, just one cell amidst the billions that make up the human body. Even small groups of cancerous cells are too small to see on test results, and sometimes larger groups of cells are hidden behind bodily organs so that they don't show up, either.

The undetected cancer cells have the opportunity to group together and form tumors. These tumors and cells also have a chance to spread throughout the body, a process known as metastasis. Large tumors and metastasized tumors are difficult to treat, which is why we hear so much about the importance of early detection. Unfortunately, it may be only after the cancer has grown or spread that symptoms start to occur. A tumor can grow so large that it affects organ function or causes bleeding or pain.

Metastasis can also cause the kind of symptoms that might send someone to the doctor; for example, cancer that has spread to the lungs can cause a persistent cough and chest infections, cancer that has spread to the liver can cause jaundice, and cancer that has spread to the lymph nodes can cause swelling. But the cancer may remain asymptomatic even then, which means it may do even more damage before it's found.

The difficulty of finding cancer cells not only presents delays in diagnosis, it can also create challenges to treatments. Cancer therapies such as radiation and chemotherapy are designed to target and kill cancerous cells in the midst of division. However, some cancerous cells lie dormant for periods of time, allowing them to survive several rounds of treatment. That's why cancer can come back years after a struggle with the disease.