Introduction: Explore Your Genome!

Curious in what lies within your 23 pairs of chromosomes? Interested in what's encoded in your 3.3 billion* basepairs? Intrigued in what your more than 21,000 genes do? Then join me in exploring the human genome!

I am a senior scientist with a bachelors in genetics and a masters in biotechnology, specializing in oncogenetics (the genetics of cancer) and blood group genetics. I'd like to take you all on a cursory view of the human genome and how you can explore the wonders packed within (almost) every cell in your body.

As an introduction, consider this your first, or maybe next step, in the growing DIY Bio movement. The growth of DIY Bio is not going unnoticed either, there are a number of articles from various sources such as Popular Science, Slate, Medium, The Scientist, Nature News, h+, MIT Technology Review, Singularity Hub, and EMBO Reports just to name a few. Not only that, DIY biologists are making a real contribution to science in the process!

*This represents a human haploid genome, or half the total basepairs within most of your body's tissue.

EnsEMBL Release 68. July 2012.

Step 1: Intro to Genetic Analysis

It would be impossible for me to distill my decade-plus worth of experience in genetic analysis down to one tidy instructable, so I'm not going to even try. Instead I'm going to introduce you to some of the easiest and most powerful methods that I am familiar with that are within the purview of the DIY-geneticist. I am also going to try to steer you away from potentially harmful pseudoscience while at the same time trying to keep the politics and divisiveness out. This means we will not be covering (nor will I address in the comments) topics such as eugenics, stem cells, grant funding, genetic engineering, GMOs, designer babies, anti-vaxxers, etc. So please, just don't ask. I do not profess to know everything there is to know about genetics and there are more appropriate venues for that exploration (i.e. iflscience).

Step 2: The Blueprint Analogy

As you probably already know, all analogies are flawed. The best model of an object is the object itself. But that doesn't do us much good when trying to understand something as complex as the human genome.

Myself, and probably countless others, were introduced to our genomic DNA as the blueprint on how we are made. While this is true to an extent, it leaves out layers of complexity. How do certain cells know to read only parts of the blueprint? What machinery gets used? Which materials are appropriate? How many copies are made? Who does the shipping and receiving?

In reality, the human genome is like a choose-your-own-adventure blueprint of epic proportions. Individuals (transcription factors) select the sections to read based upon which sections are highlighted or crossed-out (epigenetics) then different readers (RNA polymerases) tell different stories (mRNA) depending upon which versions of the whole story (genes) they tell (transcripts). Depending upon who hears the stories (post-transcription factors) other stories may be told (alternate splicing), they may be ignored (siRNA & miRNAs), told more/less (lncRNAs), and interpreted (tRNA) then made (ribosomes) into a number of different objects (isoforms) all from the same section of blueprint (genomic DNA). These objects (proteins) are then either used, broadcasted within the cell (signal transduction) and/or used to send messages outside the cell (intercellular communication).

Please keep in mind that this is all a gross oversimplification and it ignores the majority of what goes on inside the cell including maintenance, communication, import, export, mobility, motility, location, mortality, replication, senescence, transformation, information storage, alternate forms of transcription, alternate forms of transcription regulation (snoRNA, piRNA, etc.), many ncRNAs, defense, offense, metabolism, glycosylation, shuttling, recycling, post-translational modification...

Each cell, in many ways, is like an individual person with their own occupation. Separate but part of a [dystopian] society (tissues) where each communicates with neighbors and others far away within the world (your body). Each has the ability to grow, build, go rogue (cancer), die (apoptosis, cell death), interact with its environment, kill, and be killed. It's no wonder why scientific discoveries in medicine and genetics are slow and methodical, not only are researchers dealing with a compound and complicated system but they are doing so on a microscopic level.

Step 3: Where to Start

Exploring the human genome is a daunting task. If you were to print the whole genome out in size 12 font it would stretch across the continental United States or 5000 km. Typically researchers are not interested in the whole genome, however, they are interested in a gene or group of genes. If that is your case, you can skip down to the "Advanced DIYBio Genetic Analysis" steps. If, however, you are trying to figure out where to start, the folks at 23andMe have collected thousands of potential targets of interest in one inexpensive assay.

This is not the only place to start, however it is the one that is most accessible for DIY geneticists interested in their own genomes and it is the method I cover in this instructable.

Please note the above 23andMe links through the referral program, you may also access them simply by going to 23andme.com.

Step 4: 23andMe - Introduction

23andMe can do a better job at telling you what they do and how they do it, so I'll leave that up to them. Suffice it to say, they use a microarray genechip to genotype thousands of SNPs in your genome. This is different than what the Human Genome Project did back in 2001, and many since then, in that it looks only at polymorphisms of interest instead of every single nucleotide in your genome. This is how they can manage to get you results for $99 (USD).

Take any results you get with a grain of salt however. Actually, no, take them with the entire salt shaker. I say this for three main reasons.

First, the method used by 23andMe is high throughput and not clinically validated. That means the method is focused mainly on quantity, not quality, and that what you get as a result could be incomplete or altogether wrong. That's not to say that the method isn't precise, it is, but it's geared towards mass data sets that usually average out error over many samples and is more appropriate as a screen for areas of further investigation.

Second, in our cause equals effect world we like to think that just because we have a SNP that correlates with blue eyes, we'll have blue eyes, but that's not necessarily the case. Most conditions are polygenic, meaning they are informed by more than one gene, so it usually takes the combination of several genes or alleles to enact a change. Also, genetics plays only one role in most health conditions; personal choices and environment often play an equal or sometimes more important role in the onset of disease or health-related issues.

Third, many of the health-related aspects are based on correlation (remember, correlation does not imply causation). For health reporting there are two correlation methods at work; one relies on assumed cause-and-effect and the other is based on the genetic concept of linkage disequilibrium. Linkage disequilibrium basically states that when two genetic traits are near to each other on a chromosome, their inheritance is correlated allowing us to assume that when one trait is inherited others around it probably are too. However, in the second case of correlation that equates correlation with "cause-and-effect" we make some assumptions that may or may not be true. Take for example the false assumption that it only rains when you have off of work/school, therefore you being off makes it rain; logically you know that can't be true. This contrasts with the example that rain correlates with more traffic accidents, a logical cause-and-effect. When viewing health reports obtained with the data reported by 23andme you will see instances where one or both types of correlation are used.

Step 5: 23andMe - Navigating Your Results

If you do decide to go forward with 23andMe, it'll be a few weeks until after you register and send your kit back before your results are ready. Once they're ready, you'll most likely be limited to only Ancestry-related applications with your data, which are accessible from the 'My Results' and 'Family & Friends' tabs. Once registering, everyone has access to the 'Community' tab.

For now, the 'Health Reports' aspect of 23andMe is in limbo with the FDA. Don't fret though, if you want access to that type of analysis there are other options (see the Promethease step below).

For ancestry related content, you'll be able to do some straightforward stuff like finding relatives who are genetically related to you or build a family tree through MyHeritage. A fun function they also have is 'Ancestry Composition' which provides a map highlighted to where your DNA originates with varying degrees of confidence (from conservative to speculative). Most of this analysis, including the tracing of the maternal and paternal (males only) line, utilizes known haplotypes which follow the genetic law of segregation.

"Say what now???"

Glad you asked. Take a look at the next step, Haplotypes.

Step 6: Haplotypes

We were made to believe that in the genetic transaction that initiates life, we get half our DNA from our father and half our DNA from our mother. This isn't 100% accurate.

Everyone alive today (which may change with the advent of Three-Parent IVF to rectify mitochondrial diseases, see additional review by Amato et al. 2014) is genetically slightly more their mother than father; even more-so if they are male (quantitatively). This is because we inherit our mitochondrial DNA from our mother through the ovum. Using this fact, and the fact that males receive their Y-chromosome from their fathers, we can do some interesting lineage analysis.

This lineage analysis relies on the fact that every woman who has preceded you in your genetic line (your maternal lineage) has basically passed down the same mitochondrial DNA (mtDNA) for generation after generation (albeit with a few acquired mutations). Couple this with the fact that every male who has preceded you in your genetic line (your paternal lineage) has passed down basically the same Y-chromosome (again, albeit with a few acquired mutations). Understanding this, you can basically go back through time to where your maternal and paternal line originated since neither the mtDNA or Y-chromosome can be mixed with DNA from your father or mother, respectively. This is the fundamental definition of a haplotype, which is basically a set of traits that are linked generation after generation.

Step 7: 23andMe - Getting More From Your Raw Data

So your curiosity is whetted and now you want to know more?

In all honesty, I did 23andMe so that I could get to my genetic data. Not that ancestry doesn't interest me, but I'm a geneticist and rarely do you get the chance to investigate your own genome and especially not for less than $100 USD.

So hop down the rabbit hole by clicking on your name in the top right of the screen and select 'Browse Raw Data' then click on 'Download'. You'll have to enter some of your account info then select "All DNA" as your data set. Save the file somewhere accessible then go to Promethease. After agreeing to the initial statements regarding responsible use of the data, Privacy Policy and T&C you'll get the option to upload your data and process it for $5 USD.

Step 8: Promethease Overview

So, eight steps in and we finally get to do some genetic analysis! Woo-hoo!

One big thing to get out of the way though so, caveat scientist! Be objective when reading your reports. If you are a hypochondriac or prone to worry you may want to forgo your analysis because everyone's (read that, everyone's) report will make you think that death is imminent. Why? Because the list of "bad genotypes" is long and scary. But remember that these are correlations, some will be or are relevant but most are just genetic noise (remember that salt shaker and the lecture on correlation?). My results say that I am more likely to be lactose intolerant and develop Alzheimer's; I am not lactose intolerant nor is Alzheimer's prevalent in my family. But my results also say I am a beta thalassemia carrier (true) and I have an elevated risk of glaucoma (which does run in my family).

It's better to view your results as a backdrop in which you begin to piece together parts of your life and lineage into an intelligible story to share with your physician. If you're at higher risk of COPD, put down that cigarette; if you're a carrier for a disease, check with your partner before procreating; if you have the genotype for soapy tasting cilantro, get your salsa on the side. Information is power and with power comes responsibility (thanks Spidey).

Now on to the reports.

When you open up the folder for your Promethease reports you'll find an html file aptly named 'report'. This is what I like to think of as the main report, it breaks things up into easily discernible categories. Under each category you'll find genotypes that associate with a given trait. These genotypes may either be causative (like my beta thalassemia carrier status) or correlative (like my glaucoma risk), it's important to know the difference (we'll get to this).

Some reports, presentations and layouts may change with the version of Promethease reports used. All references to Promethease in this instructable use version 0.1.16 with results generated on February 5, 2014.

Step 9: Promethease Medical Condition Report

This is another one of the three reports I find most useful from Promethease, it's also a good place to start if you don't know exactly what you are looking for. The Medical Condition Report breaks the collected genotypes into categories and, using overlay data from SNPedia, gives a little infographic displaying what proportion of SNPs from your set that are good/bad/unknown. The infographic is not there to give you a risk status, per se, it's there to help you browse the categories.

When browsing the categories you can expand each condition by clicking on the "...more..." link. This lists each SNP separately and gives you additional information for each as well as color-codes each according to it's purported repute. The Promethease site has a comprehensive help page on how to read your genotype results here, so I won't clutter up this 'ible with the same information.

Step 10: Promethease UI2 Report

Out of the three more useful reports in my opinion, this is the the most fun to browse, the most powerful and the most comprehensive. It's hard to put a name to this report, which is why it was probably given the rather ambiguous title of UI2. This is the report I use almost exclusively due to it's powerful filtering ability and short synopses associated with each SNP. It lists all the SNPs in your genotype panel ranked by 'Magnitude', or impact relative to your phenotype ranging from 0-4. For instance, me having a Y-chromosome has a Magnitude of 4 because it has a large phenotypic impact, it's not necessarily good or bad. Ergo, you will also find alleles associated with eye color, taste receptors, skin color and other clinically irrelevant traits towards the top.

Mixed in with those otherwise clinically irrelevant alleles* you will find clinically relevant alleles. Their clinical relevance is either empirically confirmed (via ClinVar) or inferred; this distinction is important because it carries with it the added benefit of well vetted supporting evidence. You will notice that in the image above, listed after my gender, are two negatively associated alleles and one positively associated allele all with magnitudes 3.5+. You will also notice that the "good" allele, a rarer mutation in TIRAP that confers disease resistance by altering the way my cells signal for invasive antigens, is in ClinVar but my beta thalassemia carrier status isn't. While not all SNPs that have a well validated clinical component are in ClinVar, we simply need to click into the beta thalassemia "genoset" to see the individual SNPs, one of which is the specific SNP causing my beta thalassemia carrier status (rs11549407) and it is in ClinVar.

*Please note I alternate between SNP/polymorphism/allele/trait/genotype regularly, while these words are not directly interchangeable they can be synonymous depending upon their use or confer a certain context. "Traits" confer a phenotype, a physical manifestation of a genotype. Genotypes can be an allele or collection of alleles which can be SNPs or polymorphisms (of which a SNP is a specific kind of polymorphism). While the lexicon of genetics is not immutable, these terms are fairly static and well defined. Their distinctions, however, are beyond the scope of most individuals' necessity to understand genetics at a DIY level.

Step 11: Resources

So now that you have found a few SNPs, traits or genes you want to know more about what do you do?

Well, luckily there is no shortage of resources when it comes to exploring the human genome. While there are many options out there, it's prudent to only utilize curated resources that are scientifically vetted. Below is a short list of pertinent resources followed by a few resources that are not as academically vetted (think Wikipedia versus Encyclopedia Britannica):

Entrez:

This is a good place to start when you don't know where to start. Simply by typing your query into the Entrez search function, it will find all applicable records across the pertinent databases within NCBI. From here you can find related publications at in PubMed and PubMed Central (free), clinical significance of variants in ClinVar, along with tens of other important databases.

OMIM (Online Mendelian Inheritance in Man):

This is the canonical reference when looking up gene function and history. Some of the more well-researched genes will also have a list of known variants which is really helpful when understanding what certain variants/alleles do. You should start here when investigating a particular gene or variant.

HGVS (Human Genome Variation Society):

HGVS is sort of like the MLA for genomics. They are the organization, in conjunction with HUGO, IFHGS and HGNC, that creates standards in how to read and write variations in the genome against the recognized reference sequences (more about this later). HGVS is an important resource when trying to understand the syntax used when specifying variations in DNA, RNA or proteins. This will come in handy when trying to match up variants you find with what's in literature.

e-PCR:

This site, and more importantly its Reverse e-PCR site, enables you to test primers against sequence tagged sites (STS) which are basically unique sections of DNA in the genome. This will help once you design your primers to check and see if they are amplifying what you want them to and nothing else.

PubMed:

Think of PubMed as the card catalog for scientific publications. Most publication entries will have a short abstract, a link to the full publication as well as links to related publications and reviews (publications summarizing other salient publications). There is also a less comprehensive version of PubMed featuring free papers, PubMed Central; linking only to papers published in public access journals, free versions of previously paid publications and publications made free by agreements within government grants.

BLAST:

The Basic Local Alignment Search Tool (BLAST) allows users to input their own sequence and search it against a variety of genomes. The input can be of various forms (various nucleotide and protein) and allows users to compare within species and between (depending upon which BLAST used). Outputs are aligned to the reference sequence, ranked and scored. This tool is indispensable for identifying unknown sequences or for determining whether or not a polymorphism is deleterious by seeing how conserved that region is in other organisms.

Gene:

I personnaly find the Gene database to be one of the most useful at NCBI for genomic analysis. It allows you to search for genes by gene name or symbol by organism. Entries contain a wealth of information including a short gene summary, links to other relevant databases, chromosome location, an interactive map, a bibliography, associated conditions/diseases, pathways and interactions, ontology, and reference sequences. Perhaps one of the most interesting links is the Variation Viewer which provides an easily searchable list of polymorphisms (identified by rs number from dbSNP) in graphical and tabular format, perfect for analysis.

dbSNP:

Here you will find all records of published SNPs within the SNP database. All entries have been confirmed through the user submission process. Some entries are just the SNP number, flanking sequence and little more while others have linked information for publications, allele frequencies, studies, and a host of other interesting things. The Reference SNP (RefSNP or "rs") number is a unique identifier that allows for disambiguous identification of a (short) polymorphism, sort of like a barcode or serial number.

There are also a number of user-curated and DIY-centric resources:

SNPedia:

Think of this as sort of a user-curated version of the SNP database listed above but with a different feel and functionality. It is mostly centered around the SNPs identified through 23andMe, which makes sense since it was developed by the same team as Promethease, which also directly reports to SNPedia. Most of the entries are mapped back to their SNP database entries and just like the Wikipedia versus Encyclopedia Britannica analogy above, expect more varied content but at a more accessible level.

DIYbio:

This site contains a decent amount of information for the burgeoning DIY biologist who is looking to start a lab or join one. It's infrequently updated, however it's valuable just for it's list of local DIY bio labs/groups.

OpenWetWare:

This wiki is a good resource for DIY biologist looking to build their own lab equipment and develop their own studies/experiments. There is a wealth of protocols on everything from DNA extraction to yeast propagation, there's even more resources on sequence analysis if this instructable really piqued your interest!

scistarter:

A general science repository for both DIY and sponsored studies in all realms of science. Broadly coined Citizen Science, there are studies there where you can contribute your 23andMe data through DIYgenomics and there are also studies there for the gut microbiome (thought of as one of the next -omics frontiers in health) through both ubiome and American Gut.

Please note, all logos are properties of their respective organizations

Step 12: My Recommended Path

Personally, whenever I have a SNP or gene I want to know more about, I follow a loose procedure to determine if it's something I would like to research more.

First I typically start by going to OMIM and typing in the gene that I am interested in (or gene the SNP is a part of). If it's a gene that is well studied you will find a wealth of information about what it is associated with, the research history, where it's located, it's function, possibly some orthologues, and any common gene variants. If I'm lucky, the SNP that I am interested in has already been defined as a known variant (in my case 141900.0312 which is also linked out to the OMIM number for the beta thalassemia phenotype, OMIM # 613985).

After I read up on the OMIM entries I note any cited publications that I would like to read later then click through the ensembl link to the actual SNP record through the dbSNP database. In dbSNP you can find out almost everything you might want to know about the SNP. The dbSNP records are dynamic in that they are not only interactive but constantly updated. For instance the one I am interested in, rs11549407, still doesn't have the mutation I have (C>T) defined or the population frequency in it however it is linked to ClinVar and the OMIM entries.

Finally from here I'll look up the publications I noted in PubMed or PubMed Central as well as any new ones I might find by using the information I learned in the dbSNP record as search terms. After poking around some more on the 'Additional Links' on the HBB gene page, I found out that hemoglobin mutations (of which beta thalassemia is caused by) have their own database called HbVar and I was able to uncover additional information that way. If you're still unable to find information regarding your SNP or gene you can try going to less vetted resources such as SNPedia and WikiGenes (just make sure you chose genes for Homo sapiens).

Now you don't have to stop there, however this is probably as far as you can go without getting into some real advanced DIY-biologist techniques. The next few steps will open the door to genetic analysis for the DIY-biologist, but be warned DIY-bio is not for the faint of heart. To be a DIY-biologist you need to adopt all the same techniques and rigor of a laboratory biologist as well as all the creativity you can muster to recreate laboratory equipment, reagents and results without breaking the bank. In my opinion, as a DIY-biologist, it's easier and more efficient to become a member of a local DIY-bio lab... but more on that in the next step.

Step 13: DIY-Bio Labs

All DIY-biologists are probably going to want to start their wet lab (working in a lab with reagents, chemicals, etc.) experience at a DIY-Bio lab. In fact, there's little reason to venture out of one since they offer much more than what any one DIY-biologist can accomplish on their own in their own space. In DIY-Bio labs you can conduct most in vitro and some in vivo studies of interest to the DIY-biologist. To find one, check out the link I mentioned before for DIYBIO's Local Groups list. Each lab will have its own rules, dues, membership requirements and limitations however they aren't too different than your local makerspace.

Some labs conduct their own courses on laboratory techniques and use of equipment. Additional information and experience can usually be gleaned from existing members and by volunteering with ongoing studies. From them you'll learn important techniques on reagent formulation and equipment resourcing on a budget not to mention you will be able to get reagents and consumables much easier through the lab instead of trying to ship to your home address. Not only that, but many of these are real labs doing real science!

If you're looking to build a better foundation in biology or genetics but can't afford tuition at your local university, you can always enroll in studies at one of the popular MOOCs (Massive Online Open Course) like edx, Udacity or Coursera. I've tried at least one of these and I was impressed with what was offered considering the price tag (FREE!). They are not the only MOOCs around, but they are the largest.

Please note, all logos are properties of their respective organizations.

Step 14: Programs

The following are free programs that you can use to view, edit, manipulate, and design your projects. I will be reviewing some of them in the following steps, others I will leave to you to explore:

NCBI Genome Work Bench:

This is a function-rich program designed to explore, align and assemble from the genome to gene scale. It's a little awkward for the first time user however they have excellent tutorials and regular updates since it is sponsored by the NIH.

ABI Sequence Scanner:

Not much if anything has changed since ABI (now Life Technologies) released its Sequence Scanner software, but why fix what isn't broken. This simple user friendly tool allows users to quickly view their chromatograms (waveform files that are the product of Sanger sequencing reactions) and the data associated with them including quality scores. Their product bulletin sums it up nicely, however you will have to register to download.

BioEdit:

BioEdit may not win points on style, their interface looks like it's right out of VB, but it's rich in content and available tools. Perhaps one of the best free programs suites for genetics, it rivals even the commercial products. The feature list is long and is probably the most comprehensive of all the programs I've evaluated.

FastPCR:

The FastPCR program is an excellent streamlined program for designing primers, probes and PCR reactions; it even has a feature to test your primers in silico against your reference sequences. This program would be great for the DIY biologist to design at home SSP tests to detect specific alleles or to determine the primer selection for a sequencing project. I haven't used it since it went to a trial to license based approach, but you should still be able to get a trial version to try out for awhile.

MEGA:

I haven't used MEGA since undergrad, not because it's a substandard program in fact it's great, but because I no longer do phylogenetics. If you're interested in evolutionary biology and tracing the origins of genes, this is the program for you. From what I understand, it's only gotten better since my undergrad phylogenetic analysis on the evolution of hare mitochondrial DNA.

Chromas Lite:

Easily one of the most used DNA sequence viewers around, if only because of its simplicity and ease of use, Chromas Lite gives you only what you need to view DNA chromatograms and no fluff. This is a favorite for one-off views that do not require manipulation or assembly, this should be where most DIY scientists start their foray into genetic analysis.

Primer3 and Primer3Plus:

Primer3 has long been the standard bearer in free, open primer design tools. It is widely used throughout the research community and is even incorporated into several software packages (paid and free). The interface allows for fine tuning of your primer design parameters however it is as involved or hands-free as you make it allowing both novice and experienced users the ability to utilize its intereface. For more information, read the following publication, Untergasser et al. 2012.

Other programs besides these exist, however I am not as familiar with them. That is not to say that they aren't good, I just don't want to make this 'ible one giant review of software. If you're still curious, check out Sequencher, GeneMark, GeneStudio, Softgenetics (trial), GenBeans, UGene, SEQtools (license-based), and GENtle. I'm leaving out software more geared towards cloning in bacteria, yeast and other model systems however there is also some excellent freeware out for that too (e.g. Serial Cloner). Personally, now I only work with a paid software suite called Geneious since we standardized it in my lab years ago, it's slick and powerful but not affordable to the DIY geneticist.

Please note, all logos and screenshots are properties of their respective organizations

Step 15: Advanced DIYBio Genetic Analysis: DNA Extraction

Most professional geneticists, me included, have the luxury of having a research-grade and/or GLP facility to do lab work in. Though I do not do any DIYBio projects at work, it does afford me the ability to familiarize myself with molecular biology techniques before I attempt to recreate them at home.

By far, one the hardest techniques to do at home is also one of the most basic for our understanding of genetics, DNA extraction and Polymerase Chain Reaction (PCR).

Several Instructables users have already posted about DIY DNA extraction. Each method has it's pros and cons and are ultimately not as efficient or contaminant-free as standard laboratory techniques utilizing silica columns or magnetic beads. The easiest method for DIY-biologists uses paper punches, similar to blood spot cards, which utilizes heat (from a thermal cycler in PCR) as the main DNA extraction method however it is also the least efficient due to the introduction of PCR inhibitors (for human DNA, blood or cheek swabs can be applied to the paper). The other method popular for DIY biologists utilizes detergents and alcohol, a more difficult process because the two liquids are miscible, but two 'ibles here and here do an excellent job of explaining it. You can also experiment with other detergents/proteases, I've seen dish soap, shampoo and meat tenderizer all used.

Personally I'm partial to the alcohol and detergent/protease method for it's higher efficiency and yield, though great care must be used in minimizing contamination and carryover of alcohol and detergents/proteases. The effects of carryover can be minimized by using sterilized transfer tools (15 minutes in a pressure cooker with distilled water) and fully drying the DNA pellet in a sterile environment (google sterile technique, an alcohol lamp can be substituted for a bunsen burner in most applications).

Super DIY-biologists can up their game by building a spectrophotometer to quantitate their DNA. I won't cover this here, but you can either look into modifying one from Public Labs or by following one of the several instructables on it. Either way, you'll have to build it so that it will accept cuvettes with liquid in them.

Step 16: Advanced DIYBio Genetic Analysis: PCR and Gel Analysis

The Polymerase Chain Reaction (PCR) method is one of the most powerful tools in the geneticist's toolbox. PCR is an enzymatic method of logarithmically amplifying up targeted regions of DNA for further analysis using a series of heating and cooling steps. Long gone are the days of interns dipping vials of samples into water baths, nowadays we use automated thermal cyclers with a Peltier element to cycle temperatures and a few citizen scientists have even engineered DIY versions for us DIY-biologists to use.

By far the easiest DIY thermal cycler I've seen is the Gene Machine posted on Popular Science. It utilizes a light bulb, a sort-of Easy-Bake-Oven approach to a standard thermal cycler. Another DIY thermal cycler utilizes resistors in this instructable for the Arduino PCR. Yet another uses a similar approach with a heating element in the Coffee Cup Thermocycler.

A gel electrophoresis machine may be easier for the DIY biologist to make and run. Designs range in complexity from the relatively novice (major emphasis on the relatively) setup posted on Make to the more advanced instructables for the GellS or the mini gel system. For running a gel, I would advise using the method described in the aforementioned Make post but more advanced users can follow the instructable for gel preparation or obviously follow the protocol on Open Wet Ware. Of course you need to either buy or make a pipette to perform the liquid transfers and make a transilluminator (like this or this) to view them.

Of course, to perform PCR you'll need to design primers to amplify your sequence of interest. Primers are simply short single-stranded DNA sequences that are complimentary to areas flanking your sequence of interest. They are necessary for DNA sequencing as well as the more DIY-Bio-friendly sequence specific PCR (SSP) which can then be differentiated on a gel. There are several companies to order primers from, by searching the internet for "primer synthesis companies" you may be able to find one that will ship to individuals or DIY labs. For either sequencing or SSP it is important to perform an initial PCR to amplify and enrich your target region, some tips for a successful PCR include:

  • Target PCR lengths should be around 500bp but rarely should they exceed 1000bp
  • Assume approximately 30 seconds of amplification per 500bp
  • Keep forward and reverse primer melting temperatures within ±1°C if possible
  • Avoid targeting regions with repeating nucleotides (poly-tracts, ie. AAAAA... or GCGCGCGC...)
  • Avoid putting your primers in regions with common SNPs

You can see in the picture for this step I am using Primer3 to find primers to target the beta thalassemia SNP for sequencing. Using programs like Primer3 for primer design are a good idea because they will give you primer combinations with the best thermodynamics (see here for primer characteristics). You will then want to check your primers in reverse e-PCR (see second image) to make sure they are specific for only your target sequence. For more information on designing a good PCR reaction click here.

Step 17: Advanced DIYBio Genetic Analysis: Sequence Alignment and Analysis

So you have your PCR product and now you want to see what it says?

As a DIYbiologist it may be difficult to find a core DNA sequencing facility, but it's worth poking around your local university or searching the internet for "dna sequencing services". It may be difficult to find a place that is willing to take DNA from independent people, especially those who purify it through "homebrew" methods, so this is where partnering up with a local DIYBio lab is probably best. In either case you will need to send the sequencing facility your DNA along with the primers (or primer sequences) you want them to sequence with, you can either use your PCR primers or design additional "nested" primers within your PCR amplicon to sequence with (recommended).

Once you're able to find a sequencing facility and you receive your sequencing results you will need to analyze them. I have yet to find the perfect free sequence alignment and editing software. For most you can either edit or align sequencing results (chromatograms), not both, and it's a real pain to edit each chromatogram separately then align the basecalls (text format). This is where the commercial products have the definite upper hand.

The images above are the HBB chromatograms I generated and the HBB reference sequence in BioEdit, which allows for many of the same functions as the commercial products however it only allows for pairwise alignment. Unipro's UGENE also looks good, but I haven't used it. Alternately, you could edit and convert all the chromatograms to text files and align them using a program like Clustal or any number of the programs in this list.

Once you load your chromatograms you'll want to edit the sequences to make sure the automated program made the right base calls by clicking over to the editable sequence (second picture). Once you have all the sequences edited you can align them to their complimentary sequence and resolve any discrepancies or verify heterozygous nucleotides. You can also prepare the reference sequence by displaying features (third and fourth picture) so that you can see the polymorphisms (variations, fifth picture) in the sequence and, after aligning, see if your sample has any polymorphisms.

In order to download the reference sequence (either to design primers against it for the previous steps or for aligning in this step) find the gene you are looking for in the Gene database at NCBI. Upon clicking into the gene record scroll down to the section "NCBI Reference Sequences (RefSeq)" and select the 'GenBank' link under the 'Genomic' RefSeqGene. This will take you to the GenBank sequence (sixth picture) where you can download the reference sequence by clicking on the dropdown menu next to 'Send' (seventh picture).

Step 18: Wrap Up

I've attempted to condense the process into the shortest and most coherent 'ible possible, but I acknowledge that many of you will still have questions. Feel free to send a PM to me with any questions you may still have (following the rules in Step 1: Intro to Genetic Analysis, of course) and I will try to help steer you in the right direction. Granted there is still volumes I could write in this instructable, but in the interest of readability and conciseness I've included only the most relevant information.

Thank you for taking the time to read through the whole instructable. Like many things DIY, writing this 'ible has been a huge investment in time (months!!) and dedication. But hey, I love DIY almost as much as I love being a geneticist so hopefully I've informed and inspired a few of you out there!

Explore Science Contest

Second Prize in the
Explore Science Contest