Bioinformatics is a relatively new field and as such, many people aren’t exactly sure what “bioinformatics” really is.

The NIH Biomedical Information Science and Technology Initiative defines bioinformatics as:

“Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.”

Still confused? Don’t fret, most people are when they hear that definition. I usually like to tell people:

“Bioinformatics combines the latest technology with biological research.”

Over the past decade or so, and even prior, computers have become an integral part of every industry. Biological research is no different. Computer technology has dramatically accelerated the rate at which scientists are able to acquire and analyze biological data. The vast amount of data that is produced more rapidly each day has introduced new challenges to the field, involving storing, organizing and archiving this data. The sharp increase in volume of data has also brought about the need for faster and better analysis and visualization tools. Each area of bioinformatics, from acquiring to storing to analyzing the data, has challenges of its own, and it is not uncommon for advancements in one area to drive advancements in another.

To gain a better understanding of the diversity of bioinformatics, let’s invent a hypothetical yet interesting problem that we want to tackle using bioinformatics:

Let’s assume we have a species of bacteria that is part of the normal millions of ‘good’ bacteria living on and inside healthy human beings; we’ll call this Bacteria X.0. One day Bacteria X started making people very ill. What happened to Bacteria X.0 to make it become the harmful Bacteria X.1? Let’s see how we could answer this question using bioinformatics, along the way gaining insight into the wonderful world of bioinformatics.

Using traditional molecular biology techniques, we isolate Bacteria X and extract its DNA. Then we “sequence” this DNA. Cue the first link in the bioinformatics chain: acquiring data! Acquiring data is the process of generating useable data from a biological sample. In our case, deriving and determining the DNA sequence of the Bacteria X genome.

The next link in the chain is storing this sequence data. While bacterial genomes are typically small, other genomes, such as those of human beings, can produce terabytes (1000 gigabytes) of data.

Now we analyze this sequence data. There are people who specialize in developing computational tools to analyze and visualize data, versus people who actually analyze the information. A typical analysis for our sample case might be to first graphically visualize and compare the genome of the original, harmless Bacteria X.0 with the genome of the new, harmful Bacteria X.1. A scientist might observe a segment of DNA in Bacteria X.1 which is not present in the original Bacteria X.0. This new region of DNA may be responsible for the harmful effects, so the next analysis steps might be to drill down deeper into this region and see what genes lie there, what the function of those genes are, where they may have come from, etc.

[Remember: all assumptions made and conclusions drawn in this example are hypothetical and for illustrative purposes only.]

In this example, we encountered at least 4 different specialized areas within the field of bioinformatics:

1)      Acquiring of data (working with machines and equipment, sequencing DNA)
2)      Storing data (typically working with databases)
3)      Developing tools to analyze and visualize data (programming)
4)      Analyzing data (statistics, analysis)

Typically, individuals will specialize in one particular area rather than working simultaneously across all these fields. That, combined with all the different applications of bioinformatics, means you could ask 100 different “bioinformaticians” what they do and get 100 very different answers!

Bioinformatics techniques are now employed in every area of biology and research, some of which include cancer research, crop yield optimization studies, medical genomics, ecology and evolution. The emerging field of DNA barcoding combines laboratory and bioinformatics techniques to catalogue all living species as well as identify new species. Since DNA is the blueprint of life, bioinformatics can be applied to any research involving living organisms (or organisms which once lived, see Otzi the Iceman).

One thing to remember: the four areas described above are not as simple as I’ve portrayed them to be. For example:

  • When sequencing a sample, you might be interested in sequencing RNA as opposed to DNA.
  • Before analyzing sequence data, the quality of this data must be validated. Sometimes large chunks of sequences need to be ‘put together’ (e.g., ‘genome assembly’). Both these areas (quality analysis and genome assembly) are highly sought after areas of specialization.
  • In addition to sequencing, data analysis can also generate vast amounts of new data.

The field of bioinformatics is ever changing and rapidly evolving. Techniques that were new 2–3 years ago might be outdated today; vice-versa, techniques that were unpractical 2–3 years ago might be invaluable today, thanks to advances in computational processing capabilities, for instance.

So, whether you’re interested in plants, animals, bacteria, fungi, virology, genetics, developing databases, writing code, statistics, engineering, computer hardware, or web technologies, there may be a spot waiting for you in the field of bioinformatics.

Hope to see you on the inside!