Procedure (pdf version)
ProcedureGenetic Mapping by Bulk Segregation Analysis
Common Name
Posted On01/04/2011 4:29 PM
AuthorBruce Beutler, Yu Xia
Science WriterEva Marie Y. Moresco

Forward genetic analysis begins with mutagenesis using the alkylating agent N-ethyl-N-nitrosourea (ENU), which is administered to male C57BL/6J mice to create germline point mutations (1).  ENU eliminates most spermatogonia in these Generation 0 (G0) animals, causing transient sterility.  Between 10 and 100 precursors, each harboring approximately 6,000 point mutations, repopulate the testis over a period of 12 weeks following ENU administration.  About 3,000 mutations are incorporated into each gamete.  These mutations are transmitted to G1 offspring in a heterozygous state, and are brought to homozygosity in G3 mice using one of two different inbreeding strategies (Figure 1).  It is estimated that about nine coding changes are bred to homozygosity in each G3 mouse.  Using a variety of phenotypic screens, G3 mice and occasionally G1 mice, respectively, are tested for recessive and dominant phenotypes.

Although most mutations induced by ENU can quickly be identified by whole genome sequencing, genetic mapping, the process of confining a mutation to a circumscribed region of the genome, remains an essential step in proving that a particular mutation is responsible for a corresponding phenotype.  Genetic mapping relies on the existence of polymorphisms (typically millions of them) between the genomes of different mouse strains.  These genetic differences constitute markers of strain origin, and permit inference as to which parental strain contributed genetic material to a given mouse of mixed ancestry at a defined site in the genome.  Ideally, closely related strains or substrains are used for mapping, since subtle phenotypes can be affected by modifier loci, which occur in rough proportion to the genetic distance between strains.

Traditionally, genetic mapping of a recessive mutation occurring on a defined genetic background is accomplished by crossing homozygotes to animals of a different strain, backcrossing or intercrossing the F1 hybrids, and then measuring in F2 animals concordance between mutant or normal phenotype and homozygosity or heterozygosity at each informative marker (see Genetic Mapping: Whole Genome Mapping and Fine Mapping).  Since closely linked genetic loci assort together during meiosis, but loci on separate chromosomes or far apart on the same chromosome assort independently, a marker of C57BL/6J origin, associated with the mutant phenotype at a frequency greater than predicted by statistical probability, is likely to be located physically close (i.e. linked) to the mutation conferring the phenotype.  The likelihood of linkage between the mutation conferring the phenotype and an individual marker is indicated by the log odds distance (LOD) score, defined as log10 [p(linkage)/p(non-linkage)], where p(linkage) = 1-p(non-linkage), and p(non-linkage) is calculated from the binomial distribution.

Bulk segregation analysis (BSA) is a more efficient and cost-effective method for identifying genetic linkage than traditional genetic mapping where every informative marker is genotyped in every F2 mouse.  In BSA, allele frequency is measured at each informative locus in two pools of DNA from F2 animals grouped by phenotype.  One DNA pool is derived from F2 mice with the mutant phenotype; the other pool is derived from F2 mice with the normal phenotype.  The two DNA pools are created by measuring genomic DNA from each F2 mouse at a single copy locus by qPCR and then mixing equimolar quantities of DNA from each individual into the pool.  At each informative locus, enrichment of the C57BL/6J allele in the pool from mice with the mutant phenotype, and depletion of the C57BL/6J allele in the pool from mice with the normal phenotype, are used to establish linkage.  Currently, the cost of mapping by BSA is approximately $200 in reagents for each DNA pool subjected to genotypic analysis by capillary sequencing.

This protocol describes the use of BSA in mapping mutations induced on a C57BL/6J background by outcrossing to the closely related C57BL/10J strain, and backcrossing or intercrossing (2).  Our marker panel consists of 127 single nucleotide polymorphisms (124 spaced at ~20 Mb intervals across the 19 autosomes, and 3 markers on the X chromosome) identified by whole genome sequencing using the Applied Biosystems SOLiD platform and validated by capillary sequencing (B6:B10 SNP Mapping Panel) (2).  Allele frequencies at each marker are computed by software based on capillary sequencing chromatogram peak heights, which reflect the quantity of a given nucleotide at each position in the DNA sequence.

Since ENU induces only about nine homozygous coding changes across the genome, and since it is usually possible to exclude more than 99.9% of the genome by genetic mapping, one may be relatively confident that a coding change found within a critical region is in fact responsible for the phenotype in question.  Final confirmation of causality may depend on transgenesis or transfection studies.


1. Transmissibility of the phenotype is confirmed, and the inheritance mode of the phenotype is determined (Figure 2).

Typically, the index mouse is backcrossed to C57BL/6J.  F1 mice can be tested for the expression of the phenotype, which indicates a dominant mutation.  If no F1 animals have the phenotype, the mutation is probably recessive.  F1 animals also are mated to the index mouse, and their F2 progeny tested for the phenotype.  In the case of a recessive mutation, F2 animals expressing the phenotype are presumed homozygotes; those that do not show the phenotype are presumed heterozygotes.  F2 homozygotes are intercrossed to establish a homozygous stock.

2. Generation of F2 mice.

Assuming a recessive mutation is being mapped, a known homozygous mutant on a pure C57BL/6J background is outcrossed to C57BL/10J.  F1 hybrid progeny from this cross are either backcrossed or intercrossed to produce F2 progeny.  Intercross mapping is more powerful than backcross mapping because F2 intercross progeny will possess 25% less genetic material from the mutant strain (C57BL/6J) compared to F2 backcross progeny, permitting more of the genome to be ruled out as contributing to the mutant phenotype.  As few as 25 mice (12 mice with mutant phenotype, 13 mice with normal phenotype) have been used to successfully map a mutation using BSA.

3. Preparation and pooling of F2 DNA.

The concentration of genomic DNA from each F2 mouse is measured by qPCR of a single copy locus (we use Chr12:25631411-25631567 within the Mboat2 gene).  Primers are diluted to 100 μM and utilized at 0.2 μl per reaction for both forward and reverse primers.  Genomic DNA from each mouse is serially diluted and concentration is initially estimated by measuring the absorbance at λ=260 nm using a spectrophotometer.  Diluted genomic DNA with a concentration between 10 pg/μl and 10 ng/μl (the concentration range of DNA standards for qPCR reactions), is used as the template in the qPCR reaction mix:

qPCR Reaction Mix

Forward primer (stock 100 μM)

0.2 μl

Reverse primer (stock 100 μM)

0.2 μl

genomic DNA (diluted to between 10 pg/μ and ng/μl)

5 μl

2X qPCR Master Mix (Biopioneer, San Diego, CA)

10 μl

nuclease-free water

4.6 μl

Total volume

20 μl

The qPCR reactions are run on the ABI 7300 Real Time PCR System.  Genomic DNA sample concentrations are determined from a standard curve generated using DNA samples with known concentrations.

Two pools of DNA are then made- one from F2 mice with the mutant phenotype, and one from F2 mice with the normal phenotype.  Equimolar quantities of F2 DNA are combined.  The final concentration of DNA from each mouse in the final pool should be 30 ng/μl.  These samples are used for capillary sequencing.

4. Genotyping informative markers from each DNA pool.

Each SNP is amplified by PCR from the two pooled F2 DNA samples; PCR products are purified (AMPure beads (Agencourt)) and then sequenced on a capillary sequencer (ABI 3730 XL capillary sequencer). 

5. Determination of allele frequency, calculation of P value and linkage score, and calculation of “synthetic LOD score”.

These steps are performed using custom software.  How the calculations are made is explained below.

Determination of allele frequency

DNA sequencing trace peak heights are used to estimate C57BL/6J (B6) and C57BL/10J (B10) allele frequencies at each SNP site in the pooled F2 samples.  The software interpolates B6 and B10 allele frequencies from standard curves of normalized peak height vs. allele frequency, generated using DNA samples containing known ratios of B6:B10 DNA.  To generate standard curves, each SNP was amplified by PCR from four DNA samples containing B6:B10 contribution ratios of 100:0, 75:25, 50:50, and 0:100, and PCR products were sequenced by capillary electrophoresis.  For each SNP site, B6 or B10 allele percentage was plotted against normalized trace peak height, defined as [SNP peak height / trace basal signal level], where the trace basal signal level was calculated as the average height of ten nucleotides flanking the SNP site (5 nts upstream and 5 nts downstream of the SNP).  This normalization corrects for differences in overall efficiency between individual sequencing runs.  Linear regression was used to fit the plotted data points to a line.

To determine allele frequencies in the pooled F2 DNA samples, normalized peak height is calculated for B6 and B10 alleles at each SNP site as described above, and estimated B6 and B10 allele percentages are interpolated from the standard curve for each corresponding SNP.  This method for measuring allele frequency corrects for differences in nucleotide incorporation (A vs. T vs. C vs. G) within a sequencing run.

A good indication of linkage can be obtained from a comparison of the calculated and expected B6 and B10 allele frequencies for each SNP.  Expected allele frequencies differ depending on whether F2 mice were generated by intercross or backcross (Table 1).

Table 1.B6:B10 Allele Frequencies in F2 mice with a Recessive Mutation



















Calculation of P value and linkage score

Because BSA does not determine the genotypes of individual mice but instead measures allele frequency in pooled samples, it is not possible to formally calculate LOD scores for each marker.  Therefore, the χ2 distribution is used to calculate the significance of departure from the expectation that for intercross and backcross progeny, respectively, B6:B10 allele frequencies should approximate 50:50 and 75:25 at unlinked loci.  In these calculations, the total allele number (N) is taken to be twice the number of mice at all autosomal loci.  The estimated B6 and B10 allele numbers for each SNP site are determined as [estimated allele percentage x N], and are used to calculate P values separately for the mutant phenotype and normal phenotype pools.  These P values are then combined using Fisher’s method to give a P value for linkage reflective of data from both pools.  A linkage score, defined as -log10(p), is then calculated for each marker.

Calculation of “synthetic LOD score”

For each marker, the synthetic LOD score derives from the nearest estimate, based on the determined allele frequencies, of the number of mice with concordant phenotype and genotype.  Different assumptions are made in estimating concordant mice depending upon whether a backcross or intercross has been performed.  We consider the synthetic LOD score to be less reliable than the P values calculated based on the χ2 distribution, since the number of mice with each genotype is not directly assessed in BSA.

Example for intercross:

Given 12 F2 mice with the mutant phenotype and an estimated allele frequency of 80% B6 (=P), 20% B10 (=Q) at a particular marker, we assume genotype frequencies would correspond to P2 B6/B6; 2PQ B6/B10; and Q2 B10/B10, or 0.64 (B6/B6), 0.32 (B6/B10), and 0.04 (B10/B10).  The most probable numbers of mice of each genotype, rounded to the nearest integer, would then correspond to 0.64 x 12 = 8 (B6/B6), 0.32 x 12 = 4 (B6/B10), and 0.04 x 12 = 0 (B10/B10) mice (8:4:0).  Calculation of the synthetic LOD score would then be based on 8 instances of concordance out of 12, with the expectation of 25% concordance for each event.  A synthetic LOD score of 2.55 would be calculated for the marker in question based on mice with the mutant phenotype.

Example for backcross:

Given 13 F2 mice with the mutant phenotype and an estimated allele frequency of 80% B6, 20% B10 at a particular marker, 26 alleles would be assumed to exist in the pool of DNA.  This would correspond to a nearest estimate of 21 B6 alleles and 5 B10 alleles.  Since in backcross progeny B10 alleles are only expected to occur in heterozygous form, we assume that there were 5 heterozygous calls at the locus and 13-5=8 B6 homozygous calls at the locus.  Calculation of LOD would be based on 8 instances of concordance out of 13 events, with the expectation of a 50% concordance for each event.  Hence a synthetic LOD score of 0.39 would be calculated for the marker in question.

6. An example of mapping by BSA: the aoba phenotype.

Aoba homozygotes housed in SPF conditions died spontaneously between six and seven months of age (3).  Dying animals were found to have end-stage renal disease and kidney failure.  Proteinuria, hematuria, and leukocytes in the urine could be detected in aoba homozygotes by two months of age.

The aoba mutation was mapped by BSA based on proteinuria (3).  Homozygous males were outcrossed to C57BL/10J females, and the resulting F1 hybrids intercrossed.  At four to five months of age, DNA from 17 offspring with proteinuria or 28 offspring with normal urine protein levels was pooled and sequenced across 127 markers.  Linkage and synthetic LOD scores were calculated as described above.  A peak synthetic LOD score of 10 was calculated for marker 118,565,405 on Chromosome 1.  The synthetic LOD scores for all markers are graphed in Figure 3A.  Genotyping individual F2 mice for the markers with the three highest synthetic LOD scores resulted in actual LOD scores of 5.9, 1.2, and 10.2, respectively, confirming the BSA results.  A G to A transition at nucleotide 82,532,315 of Chromosome 1 was identified by whole genome SOLiD sequencing of a homozygous aoba mouse (Figure 3B).  The mutation lies in Col4a4, mutations of which are known to cause inherited progressive kidney disease in humans (Alport syndrome).


7. Fine Mapping.

To define the proximal and distal boundaries of the critical region, the genotypes of individual mice from the mutant phenotype pool at markers near the marker with peak linkage may be examined to identify DNA crossover events that separate proximal and distal markers from the mutation (see Genetic Mapping: Whole Genome Mapping and Fine Mapping). Full sequencing of coding regions is undertaken once the critical region has been reduced to a size that encompasses 1000 or fewer coding exons.

Alternatively, if whole genome sequencing has been performed for the mutant mouse in question, genes with coding changes close to the marker with peak linkage may be sequenced in the individual F2 mice.  For a causative mutation, homozygosity is expected in 100% of mice with the mutant phenotype.

Critical Parameters and Troubleshooting

Just as in traditional genetic mapping, success in mapping by BSA depends critically on correctly assigning mice to the two phenotype pools, mutant and normal.

Because estimation of allele frequency depends on measurement of sequencing chromatogram peak height, “dirty” sequencing runs will impair the ability to map using this protocol, since background noise in the chromatogram may be interpreted as the presence of an allele that is not actually present.  Also, sequencing behavior has not been examined for every marker in the panel, and therefore some markers may yet be discovered as unusable for accurate estimation of allele frequency (e.g. because the trace peak is always shifted).  Manual examination of trace files may be necessary in cases where a clear-cut phenotype showed no linkage by BSA.

For X linked phenotypes, the two DNA pools must be from either female or male mice exclusively; DNA from males and females cannot be mixed.