|Procedure||Genetic Mapping: Whole Genome Mapping and Fine Mapping|
|Posted On||05/05/2010 9:53 AM|
|Author||Xin Du, Yu Xia, Bruce Beutler|
|Science Writer||Eva Marie Y. Moresco|
Forward genetic analysis begins with mutagenesis using the alkylating agent N-ethyl-N-nitrosourea (ENU), which is administered to male C57BL/6J mice to create germline point mutations (1). ENU eliminates most spermatogonia in these Generation 0 (G0) animals, causing transient sterility. Between 10 and 100 precursors, each harboring approximately 6,000 point mutations, repopulate the testis over a period of 12 weeks following ENU administration. About 3,000 mutations are incorporated into each gamete. These mutations are transmitted to G1 offspring in a heterozygous state, and brought to homozygosity in G3 mice using one of two different inbreeding strategies (Figure 1). It is estimated that about four to six coding changes are bred to homozygosity in each G3 mouse, and based on broad experience in the ENU field, a single coding change nearly always account for a defined phenotype.
G3 mice, and occasionally G1 mice, are tested for recessive and dominant phenotypes, respectively, using about ten different phenotypic screens currently employed in the lab. Once a phenotype of interest has been identified and confirmed, genetic mapping, the process of confining a mutation to a circumscribed region of the genome, begins. Mapping generally proceeds through two stages: whole genome mapping and fine mapping. Whole genome mapping locates the causative mutation within a relatively large region on one of 20 mouse chromosomes (1-19 and X; the Y chromosome is not covered). Fine mapping narrows the size of the defined region within which the mutation exists, called the critical region (see below). The protein-encoding portion of the critical region is finally sequenced to identify the causative mutation. Alternatively, candidate genes are selected for direct sequencing based on the phenotypes of known mutants (and in some cases, this is done without mapping the mutation at all). Since ENU induces only about four to six homozygous coding changes across the genome, and since it is usually possible to exclude more than 99.9% of the genome by genetic mapping, one may be relatively confident that a coding change found within a critical region is in fact responsible for the phenotype in question. Final confirmation of causality may depend on transgenesis or transfection studies.
Genetic polymorphisms between different mouse strains, used as markers for the strain of origin of DNA in a given animal, form the basis of the mapping strategy. Typically, two different Mus musculus strains will differ at millions of sites within the genome. PCR primers are chosen so as to amplify segments of DNA that are informative with regard to the strain of origin of a chromosome. Genetic differences thus constitute “markers” of strain origin, and permit inference as to which parental strain contributed genetic material to a given mouse of mixed ancestry at a defined site in the genome.
Phenotypically mutant and normal animals of mixed genetic background (incorporating DNA from the mutant strain C57BL/6J and one other strain) are generated using one of several mating strategies (as described below), and are designated F2 mice, since they are two generations removed from the homozygous mutant stock. For each F2 mouse, the genotype of about 120 markers (simple sequence length polymorphisms, SSLPs; or single nucleotide polymorphisms, SNPs), chosen so they are spaced on average 20 Mb apart on each chromosome, is determined by PCR followed by fragment length analysis or DNA sequencing. Since closely linked genetic loci assort together during meiosis, but loci on separate chromosomes or far apart on the same chromosome assort independently, a marker of C57BL/6J origin, associated with the mutant phenotype at a frequency greater than predicted by statistical probability, is likely to be located physically close (i.e., linked) to the mutation conferring the phenotype. The likelihood of linkage between the mutation conferring the phenotype and an individual marker is indicated by the LOD score (see below).
Establishing linkage is not sufficient to find a mutation. It is necessary to prove that the mutation in question lies between two limiting markers on a chromosome. The interval thus defined is called the critical region. Fine mapping is the process used to establish a critical region small enough to begin mass sequencing (Figure 2). It again involves determining the genotype of F2 mice that have been scored for phenotype, with attention given only to markers in close linkage to the mutation. Markers chosen for analysis are typically spaced approximately 2-5 Mb apart on either side of the marker with the peak LOD score, or between markers that are known from whole genome mapping analysis to delimit the location of the mutation, but are widely separated on the chromosome. A critical region is established when, within a group of F2 mice that have inherited chromosomes that have undergone crossover events during meiosis, at least one instance of meiotic recombination (a crossover) separates a proximal marker from the mutation, and at least one instance of meiotic recombination separates a distal marker from the mutation. The existence of crossovers is inferred from discordance in genotype between adjacent markers, or from discordance between a marker that is known to be linked to the phenotype and the phenotype itself. The location of the crossover and which particular chromatid is inherited determines whether an F2 mouse will have the mutant or normal phenotype, with the marker on one side of the crossover defining the boundary of the critical region.
A critical region usually must contain fewer than 100 annotated genes before it can be approached by mass sequencing to find the causative mutation. On average, 18 crossover events occur per meiosis in female mice; slightly fewer in male mice. By examining a panel of 60 meioses (60 F2 animals born from backcrossing or 30 animals born from intercrossing hybrid parents), one can divide the genome into 1000 parts. This means that the critical region, on average, will be less than 3 Mb in size, and in the typical case will contain fewer than 50 annotated genes.
The LOD score is a measure of the probability of linkage between markers, or between a marker and a phenotype. The ratio of the probability of linkage to the probability of non-linkage gives the odds of linkage, and the logarithm of odds is the LOD score:
LOD = log10 [p(linkage)/p(non-linkage)]
By convention, a LOD score greater than 3.0 is considered good evidence for linkage, which corresponds to 1000:1 odds of linkage (log101000=3). Even adjusted for the fact that 120 markers are surveyed, a peak LOD score >3.0 usually points to the location of the mutation. A LOD score of 4.0 or greater is considered extremely strong evidence of linkage. A LOD score of 5.0 or greater gives overwhelming evidence of linkage.
The probability of linkage is calculated from the probability of non-linkage as
p(linkage) = 1- p(non-linkage).
The probability of non-linkage is calculated from a binomial distribution, defined as a distribution giving the probability of obtaining a specified number of successes in a finite set of independent trials in which the probability of a success remains the same from trial to trial. Each trial or observation must be independent, and represent one of only two outcomes (e.g., success or failure, heads or tails, etc.). In the case of genetic mapping, the genotype at a particular marker of each F2 mouse from a mapping backcross or intercross represents the trial. Genotype is interpreted in light of the phenotype of the mouse, and two outcomes are possible: concordance or discordance.
For example, if hybrid mice heterozygous for a recessive mutation are backcrossed to the mutant stock, about half the F2 mice will normally show the mutant phenotype and about half will not (see Recessive Phenotype Mapping, Backcross in Method section). For each mouse that shows the mutant phenotype, each marker is analyzed in turn. If the marker genotype is homozygous C57BL/6J, a concordant score is registered. If the marker genotype is heterozygous for C57BL/6J and the marker strain, a non-concordant score is registered. Similarly, for each mouse that doesn’t show the phenotype, heterozygosity for a particular marker is scored as concordant; a homozygous C57BL/6J genotype is scored as non-concordant.
The probability of concordance between genotype and phenotype given non-linkage differs depending upon the type of cross that has been made (see Tables for individual crosses in Method section).
Binomial calculations are used to determine the probability of non-linkage for each marker in the mapping panel. If a backcross is used, a single binomial calculation may be applied for mice that show the mutant phenotype and mice that don’t (all scores are pooled for each marker). If an intercross has been used, two data sets (from mice with the mutant phenotype and mice with the normal phenotype) must be tested independently, and probability values combined using Fisher’s method:
Χ2 = -2[ln(p1) +ln(p2)].
This quantity has a chi-squared distribution with 4 degrees of freedom.
1. Transmissibility of the phenotype is confirmed, and the inheritance mode of the phenotype is determined (Figure 3).
Typically, the index mouse is backcrossed to C57BL/6J. F1 mice can be tested for the expression of the phenotype, which indicates a dominant mutation. If no F1 animals have the phenotype, the mutation is probably recessive. F1 animals also are mated to the index mouse, and their F2 progeny tested for the phenotype. In the case of a recessive mutation, F2 animals expressing the phenotype are presumed homozygotes; those that do not show the phenotype are presumed heterozygotes. F2 homozygotes are intercrossed to establish a homozygous stock.
2. Whole genome mapping.
Breeding mice for mapping purposes always begins with an outcross of a known mutant on a pure C57BL/6J background (assumed for this discussion to be a male homozygote) to a “mapping strain,” either C3H/HeN, 129x1/SvJ, BALB/c, or C57BL/10J. The latter strain is reserved for cases in which phenotype is subtle, or is not reliably expressed when genetic contributions from more distant strains are present. C57BL/10J is not used as the standard mapping partner for mutants with a C57BL/6J background because it is so closely related to C57BL/10J that relatively few markers are available for analysis. This situation may change with the availability of whole genome sequence data for C57BL/10J. In most cases, the C3H/HeN strain is used for mapping.
Recessive phenotype mapping
Offspring from the initial outcross, called F1 hybrids, are either backcrossed or intercrossed to produce F2 progeny whose DNA is analyzed for markers across the whole genome. The assumption at the outset is that mice showing the phenotype are homozygous for a mutant allele of the causative gene. Mutant alleles are generated on the C57BL/6J background; thus, linked markers will also be of the C57BL/6J genotype. Given an autosomal mutation, a particular autosomal marker, M, with C57BL/6J allele B, and C3H/HeN allele C, a concordant genotype for a mouse with the mutant phenotype is MB/MB (hereafter B/B). Concordant genotypes for a mouse with the normal phenotype are MB/MC (B/C) and MC/MC (C/C). The C/C genotype is only observed when an intercross is performed.
Backcross (Figure 4)
F1 hybrids are crossed to the index mouse or to another mouse that is known to be homozygous for the mutation. F2 offspring can have the possible genoytypes B/B or B/C for any marker in the panel. The binomial distribution used to calculate p(non-linkage) for each marker is based on the null hypothesis that the marker is unlinked, and therefore, that genotype and phenotype show only random concordance in each F2 mouse. For a particular autosomal marker, there are two binomial distributions to describe the proportion of either phenotypically mutant or normal mice with concordant genotypes. The binomial distributions are defined by the number of F2 mice with mutant or normal phenotype, and the probability of a concordant genotype:
Rejection of the null hypothesis depends upon a significant departure from the expectation of 50% concordance, in the direction of higher concordance: i.e., a significant excess of mice which in aggregate show the mutant phenotype with the B/B genotype, and the normal phenotype with the B/C genotype.
Intercross (Figure 5)
F1 hybrids are crossed to each other in sib-matings. F2 offspring can have the possible genotypes B/B, B/C, or C/C for any marker in the panel. Intercross has the advantage over backcross of allowing more F1 matings to be set up sooner and therefore having more “mappable” F2 offspring sooner, since one does not rely on a single mouse (e.g., the index mouse) to mate with F1 hybrids and produce F2 mice. Also, hybrid mice are usually more fertile than inbred mice of pure C57BL/6J background. And finally, each mouse born to an intercross represents two meiotic events; hence each mouse has 36 crossovers in its genome rather than the 18 represented in each backcross animal. With only 5 or 6 mice showing a recessive phenotype, chromosome location may be assigned with fairly good confidence. The disadvantages of intercross mapping include:
1. Possible ambiguity of phenotypes on a mixed background in which homozygosity of mapping strain alleles may occur at many different loci.
2. For some strain combinations the density of informative markers is insufficient to allow all parts of the genome to be in linkage with at least one marker, since (as just noted) more crossovers occur per F2 mouse.
For a particular autosomal marker, the binomial distributions describing the number of mice with concordant genotypes are defined by:
Because the expected frequency of concordant genotypes differs for the mutant and normal phenotypes, p(non-linkage) for all mice is estimated as the product of p(non-linkage) for mutant phenotype and p(non-linkage) for normal phenotype:
pcomposite = -2[ln(p1) +ln(p2)].
When intercross mapping is performed, mice with the mutant phenotype are far more informative than mice with the normal phenotype in assigning linkage. However, linkage can be established using entirely mice with normal phenotype. Many more of them are required than would be the case for backcross mapping or for mice derived from an intercross that express the phenotype of interest.
Dominant phenotype mapping
F1 hybrids from the initial outcross are either outcrossed or intercrossed to produce F2 progeny whose DNA is analyzed for markers across the whole genome. It is assumed that F1 hybrids have one mutant allele and show the mutant phenotype. Linked markers will be from C57BL/6J. For a given marker, concordant genotypes for a mouse with the mutant phenotype are B/B and B/C. For a mouse with the normal phenotype, the only concordant genotype is C/C.
Outcross (Figure 6)
Mapping may be started with a homozygous stock, or if the mutation is lethal or not yet fixed, with heterozygotes on a pure C57BL/6J background. F1 hybrids showing the phenotype are outcrossed a second time to wild type C3H/HeN. F2 offspring can have the possible genotypes B/C or C/C for any marker in the panel. The binomial distributions describing the number of mice with concordant genotypes are defined by:
Intercross (Figure 7)
F1 hybrids (showing the mutant phenotype) are crossed to each other in sib-matings. F2 offspring can have the possible genotypes B/B, B/C, or C/C. The binomial distributions describing the number of mice with concordant genotypes are defined by:
As for the recessive intercross mapping, p(non-linkage) for all mice is estimated as:
pcomposite = -2[ln(p1) +ln(p2)].
X-linked phenotype mapping
This section describes mapping in the special case of X linkage. About 7% of all genes are X linked. Using the breeding scheme depicted in Figure 1, an estimated 3.5% of coding changes will be X linked in G3 mice. Phenotypes caused by such mutations are more likely to be observed in males than in females.
Recessive, Backcross (Figure 8)
An F1 hybrid female (XB/XC) is mated back to the male index mouse or mutant stock (XB/Y). For a marker on the X chromosome, the binomial distributions describing the number of mice with concordant genotypes are defined by:
XB/Y (male) x XB/XC (female)
A mating of XB/Y (male) x XB/XB (female) cannot be used for linkage mapping, since all F2 offspring will only inherit X chromosomes from C57BL/6J and show the mutant phenotype. If the index mouse is female, an intercross must be used for mapping an X-linked recessive phenotype.
Recessive, Intercross (Figure 9)
Depending on whether the index mutant was male or female, there are two possible F1 hybrid intercrosses when the mutation is X-linked: male (XC/Y) x female (XB/XC) (Figure 9A), and male (XB/Y) x female (XB/XC) (Figure 9B). For each cross, the binomial distributions describing the number of mice with concordant genotypes are defined by:
XC/Y (male) x XB/XC (female)
XB/Y (male) x XB/XC (female)
Dominant, Outcross (Figure 10)
An F1 hybrid female (XB/XC) is mated to a male of C3H/HeN (or other strain) background (XC/Y). The binomial distributions describing the number of mice with concordant genotypes are defined by:
XC/Y (male) x XB/XC (female)
A mating of XC/Y (male) x XC/XC (female) cannot be used for linkage mapping, since all F2 offspring will only inherit X chromosomes from the mapping strain and show the normal phenotype.
Dominant, Intercross (Figure 11)
Again, depending on whether the index mutant was male or female, there are two possible F1 hybrid intercrosses when the mutation is X-linked: male (XC/Y) x female (XB/XC) (Figure 11A), and male (XB/Y) x female (XB/XC) (Figure 11B). For each cross, the binomial distributions describing the number of mice with concordant genotypes are defined by:
XC/Y (male) x XB/XC (female)
XB/Y (male) x XB/XC (female)
Each mapping strain has its own set of markers, consisting of simple sequence length polymorphisms (SSLPs) and/or single nucleotide polymorphisms (SNPs).
Markers for Mapping Strains
SSLPs, short repeated DNA sequences (e.g., GATA, CA) present in varying numbers of repeats in the genomes of different mouse strains, are PCR-amplified using flanking primers and can be distinguished based on strain-specific size differences of the amplified products. PCR primers are labeled with one of three fluorescent dyes, providing a means of distinguishing between markers with similar sizes in multiplex analyses. High-throughput analysis of SSLP markers is facilitated by automated preparation of PCR reactions in 384-well plate format using robotic technology. Following PCR, products for 5-9 markers, distinguishable from each other by size and/or color, are combined into one mixture (called a “panel”) for separation by electrophoresis in a single capillary followed by analysis in the ABI 3730 DNA Analyzer. The DNA Analyzer identifies the presence of C57BL/6J- or other strain-specific markers based on product size and/or color. For each mapping strain, markers are organized into 16 panels with 5-9 markers each. The total number of markers fills one-third of a 384-well plate, and therefore 3 complete genomes can be analyzed in one plate (prior to their combination in multiplex for analysis in the DNA Analyzer). Please click here to view the mapping panels for C3H/HeN, 129x1/SvJ, and BALB/c.
SNPs are single nucleotide differences in genomic DNA sequences that distinguish strains, and are detected by PCR amplification of the region containing the SNP followed by DNA sequencing. 127 SNPs, identified by whole genome sequencing of a wild type C57BL/10J mouse using SOLiD technology and validated by Sanger sequencing, are used as markers for mapping traits using the C57BL/10J strain (2). Please click here to view the markers for C57BL/10J.
More recent SOLiD sequencing of the genome of another wild type C57BL/10J mouse yielded ≥1X, ≥2X, and ≥3X sequencing coverage of 90.1%, 88.4%, and 87.1%, respectively. For the subset of nucleotides corresponding to coding/splice junction nucleotides, ≥1X, ≥2X, and ≥3X coverage was 95.8%, 94.6%, and 93.4%, respectively. A total of 598,241 SNPs were identified for C57BL/10J compared to the C57BL/6J reference sequence (NCBI v37.1). Note that none of the SNPs has been validated by Sanger sequencing. However, the number of reads and a quality value (QV) provide an indication of the reliability of each SNP. The list of C57BL/10J SNPs can be viewed here.
3. Fine Mapping.
The strain of origin of additional markers (not used in the standard whole genome mapping panels) within the critical region is determined for F2 offspring from the mapping crosses. Markers are selected for the particular critical region, typically spaced about 5 Mb apart. When possible, publicly available markers are used (see http://snp.gnf.org/ and http://www.informatics.jax.org/strains_SNPs.shtml). Otherwise, it may be necessary to find informative markers by amplifying and sequencing random unique intervals of DNA (to look for SNPs) or amplifying and sequencing GATA and CA repeats (to look for SSLPs). In our experience, GATA repeats are more polymorphic than all others; CA repeats are the next most commonly polymorphic. Other SSLPs (CT, CG, AT, or single base pair repeats) are rarely polymorphic. Narrowing the critical region relies on the inheritance of chromosomes in which crossover events within the critical region occurred during meiosis (Figure 2). Full sequencing of all known coding regions is undertaken once the critical region has been reduced to a size that encompasses 1000 or fewer coding exons. On average, there are about 12 genes and about 100 coding exons per Mb of DNA.
|Critical Parameters and Troubleshooting|
A double crossover within a short distance (≤40 MB) is unlikely event, but genotyping errors can sometimes indicate a double crossover occurred. When a double crossover is observed within 40 MB, the genotypes of the markers in question should be carefully re-checked for accuracy.
Deviation from the expected ratio of genotypes for a particular marker can also be caused by genotyping errors (or by linkage of the marker to the phenotype). Therefore, genotype ratios should be monitored for all markers; genotyping should be checked for possible errors when deviations from expected ratios occur. Allele frequencies for each marker can also be monitored for deviation from expected frequencies caused by genotyping errors.
1. Hoebe, K., Jiang, Z., Tabeta, K., Du, X., Georgel, P., Crozat, K., and Beutler, B. (2006) Genetic Analysis of Innate Immunity. Adv. Immunol. 91, 175-226.