Updated: Apr 17
New technologies produce massive amounts of biological data. Bioinformatics gives scientists the ability to process these large amounts of data quickly.
Combing this data by hand would be time-consuming and expensive. To analyze data faster, scientists use computers. They can create computer algorithms to recognize patterns in biological data. This allows biological data to be processed much faster.
So bioinformaticians work at the intersection of computer science and biology.
DNA stores our genetic information. Your hair color, skin color, height, and more are all determined by your DNA. Your DNA is like a large book about you. All of your physical features are determined by your DNA, which is made of base pairs, or two molecules that bond together.
If DNA is a book, base pairs are letters. But instead of twenty-six letters like our alphabet has, DNA only has four: adenine, guanine, cytosine, and thymine. In DNA, adenine pairs with thymine, and cytosine pairs with guanine.
Words, Words, Words
DNA is a book of instructions about your physical features. Your cells can read this book. They can follow the instructions to make you . . . you! Bases are letters in this book.
Imagine reading a book with just a random collection of letters. You wouldn’t be able to read any useful information! For the letters to have meaning, they must be organized into words.
Similarly, base pairs are organized into codons—a pattern of three base pairs. Three base pairs code for an amino acid, which are the base unit of proteins. For example, if your cells read the bases ACG, they know the next link in the amino acid chain is threonine.
Many amino acids link together to form a protein. Similar to how words link together to form a sentence, codons chain together to form a gene. A gene is all of the codons needed to form a complete protein (or another biological unit).
Your cells keep reading your DNA three base pairs at a time until they reach a “stop” codon. A stop codon is a set of three base pairs that signals to the cell it’s time to stop making the chain of amino acids. A stop codon is like a period in a sentence; it signals the end.
We’re Made of Libraries
So far we’ve covered that you can think of DNA as a book, codons as words, and base pairs as letters. Now imagine I asked you to count how many times a character casts a spell in the Harry Potter series. Reading all seven books would take some time, but it would be possible.
Now imagine I asked you to identify every time a character casts a spell in every fantasy book in the last century. Now imagine we wanted to compare how many characters cast spells in fantasy books versus science fiction books. So, now you would need to read every Harry Potter book and countless other fantasy books and every science fiction book written in the last century. This would be thousands of books, and it would be impossible for one person to accomplish this task.
The human genome has upward of three billion base pairs. Reading the genome manually would be like a scientist reading multiple massive novels for every test subject. Oftentimes, scientists have dozens, hundreds, or even thousands of subjects in their studies. When scientists look at a data set, they may be interested in the number of times a certain set of codons appears in the genome. It would take decades or longer to count the codons by hand.
Computer Science Meets Biology
This is where bioinformatics comes in! While one scientist couldn’t count how many times a sequence of codons appears in a genome, a computer can. Scientists can program computers to examine huge amounts of biological data. Computers can identify patterns in the human genome.
For example, scientists could compare the genomes of healthy subjects and subjects with cancer. A particular set of codons may appear more frequently in people with cancer. This could mean people with this set of codons would have a higher risk of developing this cancer type. With further research, scientists might then be able to develop a screen for this gene to help patients understand their cancer risk.
Bioinformaticians use many tools. Different tools help them process, analyze, and store data.
For example, one tool called next-gen sequencing helps scan DNA.
Sometimes scientists want to compare a new DNA sequence to one already sequenced. To keep track of previously sequenced DNA, scientists use DNA databases and libraries. Sometimes scientists want to see if a sequence lines up with a different sequence. To do this, they use pairwise analysis tools.
There are countless other tools bioinformaticians use, and more are created every year.
Bioinformatics has so many uses beyond DNA. The -informatics ending of the word basically just means “looking at a lot of data with a computer.” The type of bioinformatics discussed above is a special type: genomics. There is proteomics (analyzing proteins), lipidomics (analyzing lipids), and many more.
Bioinformatics is even being used to fight climate change. The possibilities for this growing field are endless!
“Human Genome.” Encyclopædia Britannica (online), Feb. 10, 2023. https://www.britannica.com/science/human-genome.
Iberdrola. “¿Qué Es La Bioinformática y Cuál Es Su Impacto Sobre La Salud?” Iberdrola, Apr. 22, 2021. https://www.iberdrola.com/innovation/bioinformatics.
“Next-Generation Sequencing (NGS).” Illumina.com, 2023. https://www.illumina.com/science/technology/next-generation-sequencing.html.
Erin Kelley has degrees in computational biology, bioinformatics, and molecular biology. As part of the Working Fires social media team, Erin makes TikTok videos and Instagram reels that share the wonders of science. Her day job finds her in the lab, where she works as a research scientist.