DNALC Blogs » Sequencing

What is Bioinformatics?

Mohammed Khalfan — Tue, 08 May 2012 16:38:33 +0000

Bioinformatics is a relatively new field and as such, many people aren’t exactly sure what “bioinformatics” really is.

The NIH Biomedical Information Science and Technology Initiative defines bioinformatics as:

“Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.”

Still confused? Don’t fret, most people are when they hear that definition. I usually like to tell people:

“Bioinformatics combines the latest technology with biological research.”

Over the past decade or so, and even prior, computers have become an integral part of every industry. Biological research is no different. Computer technology has dramatically accelerated the rate at which scientists are able to acquire and analyze biological data. The vast amount of data that is produced more rapidly each day has introduced new challenges to the field, involving storing, organizing and archiving this data. The sharp increase in volume of data has also brought about the need for faster and better analysis and visualization tools. Each area of bioinformatics, from acquiring to storing to analyzing the data, has challenges of its own, and it is not uncommon for advancements in one area to drive advancements in another.

To gain a better understanding of the diversity of bioinformatics, let’s invent a hypothetical yet interesting problem that we want to tackle using bioinformatics:

Let’s assume we have a species of bacteria that is part of the normal millions of ‘good’ bacteria living on and inside healthy human beings; we’ll call this Bacteria X.0. One day Bacteria X started making people very ill. What happened to Bacteria X.0 to make it become the harmful Bacteria X.1? Let’s see how we could answer this question using bioinformatics, along the way gaining insight into the wonderful world of bioinformatics.

Using traditional molecular biology techniques, we isolate Bacteria X and extract its DNA. Then we “sequence” this DNA. Cue the first link in the bioinformatics chain: acquiring data! Acquiring data is the process of generating useable data from a biological sample. In our case, deriving and determining the DNA sequence of the Bacteria X genome.

The next link in the chain is storing this sequence data. While bacterial genomes are typically small, other genomes, such as those of human beings, can produce terabytes (1000 gigabytes) of data.

Now we analyze this sequence data. There are people who specialize in developing computational tools to analyze and visualize data, versus people who actually analyze the information. A typical analysis for our sample case might be to first graphically visualize and compare the genome of the original, harmless Bacteria X.0 with the genome of the new, harmful Bacteria X.1. A scientist might observe a segment of DNA in Bacteria X.1 which is not present in the original Bacteria X.0. This new region of DNA may be responsible for the harmful effects, so the next analysis steps might be to drill down deeper into this region and see what genes lie there, what the function of those genes are, where they may have come from, etc.

[Remember: all assumptions made and conclusions drawn in this example are hypothetical and for illustrative purposes only.]

In this example, we encountered at least 4 different specialized areas within the field of bioinformatics:

1)      Acquiring of data (working with machines and equipment, sequencing DNA)
2)      Storing data (typically working with databases)
3)      Developing tools to analyze and visualize data (programming)
4)      Analyzing data (statistics, analysis)

Typically, individuals will specialize in one particular area rather than working simultaneously across all these fields. That, combined with all the different applications of bioinformatics, means you could ask 100 different “bioinformaticians” what they do and get 100 very different answers!

Bioinformatics techniques are now employed in every area of biology and research, some of which include cancer research, crop yield optimization studies, medical genomics, ecology and evolution. The emerging field of DNA barcoding combines laboratory and bioinformatics techniques to catalogue all living species as well as identify new species. Since DNA is the blueprint of life, bioinformatics can be applied to any research involving living organisms (or organisms which once lived, see Otzi the Iceman).

One thing to remember: the four areas described above are not as simple as I’ve portrayed them to be. For example:

When sequencing a sample, you might be interested in sequencing RNA as opposed to DNA.
Before analyzing sequence data, the quality of this data must be validated. Sometimes large chunks of sequences need to be ‘put together’ (e.g., ‘genome assembly’). Both these areas (quality analysis and genome assembly) are highly sought after areas of specialization.
In addition to sequencing, data analysis can also generate vast amounts of new data.

The field of bioinformatics is ever changing and rapidly evolving. Techniques that were new 2–3 years ago might be outdated today; vice-versa, techniques that were unpractical 2–3 years ago might be invaluable today, thanks to advances in computational processing capabilities, for instance.

So, whether you’re interested in plants, animals, bacteria, fungi, virology, genetics, developing databases, writing code, statistics, engineering, computer hardware, or web technologies, there may be a spot waiting for you in the field of bioinformatics.

Hope to see you on the inside!

DNA sequencing helps discover cavemen’s tools and diet

Oscar Pineda-Catalan — Fri, 17 Feb 2012 20:57:58 +0000

In the 1970s a team of archaeologists led by Carl Gustafson unearthed the remains of a single, 3-ton, male mastodon (Mammut americanum, a close relative of mammoths and elephants), hunted and butchered by a group of men at the Manis site in the state of Washington, USA (Gustafson 1979). Among the mastodon remains they found a spear point that pierced a rib bone. Luckily for us the hunters did not recover the projectile weapon. We thus have evidence of the technology that cavemen in the Americas used to secure their food.

Originally Gustafson and his colleagues dated the mastodon hunting at Manis to more than 13,500 years ago. This was nearly 1,000 years before the Clovis culture, long considered to be the first culture in the New World. Their research was heavily criticized, due to limitations in the radiocarbon methodology used for dating the archaeological findings. However a recent publication supported their finding; an international group of researchers led by Michael Waters of Texas A&M University used a refined radiocarbon dating methodology and DNA analyses to demonstrate that the projectile found at the site came from a mastodon bone shaped as a spear point, handcrafted 13,800 years ago.

After careful DNA extractions of the hunted mastodon rib and the bone projectile found, the researchers successfully amplified a 69 base pair DNA fragment from the mitochondrial control region. Both samples produced identical sequences to mastodon DNA obtained previously, but distinct from other proboscideans (mammoth or elephant) by nine single nucleotide polymorphisms (SNPs).

These findings support the hypothesis that humans had permanent settlements in the Americas earlier than the Clovis culture (11,500 years ago). The bone projectile also shows that humans actively hunted megafauna (i.e., animals bigger than 50 kg) in this region. In addition, it suggests that the slow process of extinction of the biggest mammals inhabiting the Americas after the last glacial period (approximately 15,000 years ago), such as mammoths and mastodons, may have begun earlier than the time of the Clovis people.

Find out more about all these fascinating discoveries:

Gustafson, C. E., et al. (1979). The Manis mastodon site: early man on the Olympic Peninsula. Canadian Journal of Archaeology, 3: 157-164.
Radiocarbon dating methodology:
- www.c14dating.com
- http://en.wikipedia.org/wiki/Radiocarbon_dating
Waters, M. R., et al. (2011). Pre-Clovis mastodon hunting 13,800 years ago at the Manis Site, Washington. Science 334, 6054: 351-353.
Waters, M. R. et al. (2011). The Buttermilk Creek complex and the Origins of the Clovis at the Debra L. Friedkin Site, Texas. Science, 331, 6024: 1599-1603.

Cancer Genomics: so many mutations!

Bruce Nash — Fri, 18 Feb 2011 16:14:43 +0000

The human genome is the complete collection of over three billion bases in each of our cells. Cancers accumulate multiple changes, or mutations, in their DNA that contribute to the disease by changing how cells behave. For instance, cancers need nutrients to grow. Very often, they get these nutrients by producing signals that encourage new blood vessel formation. Finding the mutations that lead to cancer is very difficult. For one thing, even for cancers that affect the same tissue and look similar, the mutations can be very different. Also, one of the hallmarks of cancer is an increased rate of mutation. This means that cancer cells have many mutations, and most don’t contribute to the disease. For example, a lung cancer genome that was sequenced this year had nearly 23,000 mutations. Finding a mutation that contributes to cancer is like finding the right needle from a collection of needles in a haystack.

To find these driver mutations, scientists look for the ones that occur frequently. Until recently, this was very difficult to do. However, new sequencing technologies now let scientists look for mutations in genes at an incredible rate. The cost of sequencing is dropping dramatically; to the point where in the near future sequencing the DNA from a cancer may be sequenced as a diagnostic. Soon, it may be the cost of computing that limits our sequencing efforts.

Improvements in technology allow scientists to look at the genomes of many tumors, and there is an international effort to look at 25000 cancer genomes. This will provide the data that will let them find the mutations that lead to cancer, even if they occur in a small proportion of tumors of a particular kind. Already, hundreds of tumors have been studied in detail, which is giving scientists a good feel for the patterns of mutations that happen in cancer cells. So far, over 400 genes directly linked to cancer have been identified in this and other studies. Figuring out how these many genes contribute to cancer is likely to lead to huge advances in diagnosis and treatment, although the task remains gargantuan.

Copy number variation in Schizophrenia

Jason Williams — Tue, 01 Sep 2009 20:33:50 +0000

Ever had the feeling you have lost your marbles? According to the Phrase Finder that expression has conveyed a sense of loss, anger, and more recently a lack of common sense or sanity. As it turns out it may be the loss of certain segments of DNA (rather than simple mutations like SNPs) that may have a lot to do with the origins of mental illnesses like schizophrenia.

Now before you start thinking that schizophrenics are the only ones to lose their marbles (or large sections of their genomes), It has been previously shown by work like that of Jonathan Sebat of Cold Spring Harbor Laboratory, that deletions, including those larger than a kilobase are common within all of our genomes. Obviously, or not so obviously, most humans seem to get along quite fine with these deletions. However, it has recently been appreciated that many psychiatric disorders seem to be influenced by this genomic structural variation.

PSYCH-CNV’s project aims to look at how copy number variation (CNV) contributes to the development of schizophrenia, amongst other illnesses. In a recent paper entitled Large recurrent microdeletions associated with schizophrenia (Stefansson et.al., Nature 2008), it was hypothesized that rare copy number variations might carry the bulk of the risk. The paper went on to describe three regions where deletions were associated with schizophrenia and related psychoses. PSYC-CNV will focus on rare and de novo variations in order to explain how these unique changes in genome arrangement and organization can explain disorders.

Interestingly, a new method developed by the Sebat lab also aims at increasing our ability to detect these copy number changes and explore their meaning for schizophrenia. Instead of relying on microarrays, which often provide insufficent resolution to detect small CNVs, this is a next generation sequence based approach. Next generation sequencers, such as the Illumina platform used in the above refrenced paper, involve sequencing short reads of DNA which are then assembled into a genome. Sebat and his collaborators at Albert Einstein looked at a paramater called depth of read, instead of just analyzing paired end runs, the previous approach to detecting CNVs within a genome.