Short Contents | Full ContentsOther books @ NCBI

 
Molecular Cell Biology Chapter 9. Molecular Structure of Genes and Chromosomes

9.4. Functional Rearrangements in Chromosomal DNA

In contrast to the transposition of mobile elements in genomic DNA, which appears to serve no direct, immediate function for the organism, there are several types of rearrangements of DNA regions that are beneficial to the organism. These functional rearrangements, which have been identified in both prokaryotes and eukaryotes, occur by inversion and deletion of DNA segments and by DNA amplification. Examples of each of these mechanisms, are described in this section. Although functional DNA rearrangements play a role in regulating a few selected genes, the most frequent mechanism of gene control involves the regulation of transcription, discussed in Chapters 10 and 14.

Inversion of a Transcription-Control Region Switches Salmonella Flagellar Antigens

When Salmonella typhimurium, a type of bacterium closely related to E. coli, is ingested, it produces the nausea and diarrhea of common "food poisoning." The protein from which the Salmonella flagellum is constructed is one of the major Salmonella antigens to which the human immune system responds in eliminating this pathogenic bacterium. However, Salmonella cells can express two types of flagellar proteins called H1 and H2, which are encoded at distant sites on the Salmonella chromosome. Any one Salmonella cell expresses only one type of flagellar protein, but as a clone of cells grows, some progeny spontaneously switch to expression of the other flagellar protein, a process known as phase variation. As a result, when individuals respond by making antibody against the major flagellar protein expressed by the Salmonella cells that have infected them, a small fraction of the bacterial cells are resistant to the antibody because they express the alternative type of flagellar protein. These cells can then proliferate until the immune system responds a second time to the second flagellar antigen.

The mechanism of Salmonella phase variation has been studied by cloning and sequencing the genes involved, by analyzing mutants defective in the switching mechanism, and finally by developing an in vitro switching system using purified proteins that direct the process. The mechanism, outlined in Figure 9-20, involves inversion of a DNA segment that is located adjacent to the H2 operon and functions as a promoter. When this promoter region is in one 5[prime prime or minute]→3[prime prime or minute] orientation (phase I), the two proteins encoded by the H2 operon are expressed: the H2 flagellar protein and rH1, a specific repressor that inhibits transcription of the H1 gene, which is located in a different region of the Salmonella genome. Approximately once every thousand cell divisions, the segment containing the H2 promoter is inverted; as a result neither H2 nor rH1 is expressed (phase II). In the absence of the repressor rH1, the H1 gene is transcribed and the H1 protein is expressed.

The protein that catalyzes inversion of the H2 promoter is a site-specific recombinase encoded by the Hin gene, which is completely contained within the inverting segment of DNA. Hin protein binds to a specific recognition sequence located at two sites in the H2 promoter region and catalyzes recombination, that is, an exchange of DNA strands, at the center of the binding sites. This results in an inversion of the DNA between the two Hin recognition sites upstream of the H2 and rH1 genes (Figure 9-21). The Hin protein is expressed only very rarely, so that inversion and the resulting phase variation is an infrequent phenomenon. Nonetheless, this process is important to the viability of the Salmonella species, because it extends the period of infection and, consequently, increases the number of new hosts that are exposed to Salmonella.

Antibody Genes Are Assembled by Rearrangements of Germ-Line DNA

One of the most remarkable aspects of the immune response is the vast diversity of antibodies that can be elicited against the many possible antigens an individual encounters. It is estimated that an individual can potentially produce many millions of types of antibodies with specificities for as many different antigens. A million different antibody molecules cannot be encoded directly in the human genome, since the genome only contains on the order of one hundred thousand genes. The molecular mechanisms for generating such remarkable diversity from a limited amount of DNA are now understood. Regulated DNA inversions and deletions figure importantly in the process. To understand how this is accomplished, we must first consider the general structure of antibodies.

Antibody Domain Structure

Antibodies belong to a class of proteins called immunoglobulins, which constitute about 20 percent of the proteins in the blood. The most abundant type of antibody, immunoglobulin G (IgG) is a symmetrical molecule composed of four polypeptide chains: two identical heavy (H) chains of ≈55 kDa, and two identical light (L) chains of ≈23 kDa. The light chains are composed of two domains, an N-terminal domain called VL, because it is slightly different or "variable" in sequence and structure in different antibody molecules, and a C-terminal domain called CL, which has an identical or "constant" sequence in all IgG molecules. Similarly, the heavy chains are composed of four domains, an N-terminal variable VH domain, and three constant domains designated CH1, CH2, and CH3. The domain structure of IgG is illustrated in Figure 9-22. An antibody molecule binds an antigen molecule through the surfaces created at the interfaces of the VL and VH domains at the tips of the Y-shaped molecule (see Figures 3-21 and 3-22). Consequently, the antigen-binding specificity of an antibody is determined by the sequence of its VL and VH domains.

Organization of Light-Chain DNA

Antibodies are produced by a class of leukocytes (white blood cells) called B lymphocytes, or B cells. The genes encoding antibodies with different binding specificities are not directly inherited from the fertilized egg. Rather, they are assembled from a number of separated gene segments present in germ-line DNA; this process occurs during the development of B cells from stem cells in the bone marrow. For example, a functional rearranged gene encoding the k light chain (the major type of light chain in mice and humans) contains three segments. At the 5[prime prime or minute] end is the L k segment; it encodes a leader or signal peptide that directs the newly translated protein into the endoplasmic reticulum in preparation for secretion from the cell (Chapter 17). The signal peptide is removed during post-translational processing of the light chain and is not present in the mature antibody molecule. The second segment encodes the VL domain of the light chain, and the third segment, at the 3[prime prime or minute] end, encodes the CL domain.

As shown in Figure 9-23, the DNA of germ cells (i.e., sperm and egg cells) and all other cells except mature B lymphocytes contains a k locus that has a variable region at its 5[prime prime or minute] end. This region consists of a library of leader (L k ) and variable (V k ) segments containing ≈100 L k + V k units in humans; these units are arrayed in tandem along one long stretch of DNA. (The L k segment corresponds to the leader exon in the final gene; the V k segment makes up most, but not all, of the final VL exon, encoding the variable region of the light chain.) Each of the L k + V k units is about 400 nucleotides long, and they are separated by about 7 kb; thus 100 L k + V k units would cover about 740 kb of DNA. The variable region of the k locus is followed by five joining (J k ) segments in human germ-line DNA and then by the one constant (C k ) segment. The five J k segments are tandemly arranged and are separated by about 20 kb from the 3[prime prime or minute] end of the variable region. Each of the J k segments is about 30 nucleotides long, and they are spread over 1.4 kb of DNA. Between the 3[prime prime or minute] J k segment and the single C k segment lies 2.4 kb of intervening DNA. The number of V k and J k segments varies with the species of mammal, although there are always many more V k than J k segments.

Rearrangement of Light-Chain DNA

When DNA reorganizes to make a functional k gene, one V k segment joins to one J k segment. This joining is performed by a sitespecific recombinase that recognizes sequences at the 3[prime prime or minute] end of each V k segment and the 5[prime prime or minute] end of each J k segment. Recombination between these sequences results in a deletion or inversion of the intervening sequence, depending on whether the L k + V k unit has the same or opposite transcriptional orientation as the J k segment (Figure 9-24). This recombination forms the completed variable region. So far as is known, any V k can join to any J k , and the choice is random. Once a V k and J k are joined, the variable and constant regions are transcribed together into a primary RNA transcript. The intervening sequences between L k and V k and between J k and C k (including any remaining J k regions) then are removed by RNA splicing to produce the mature mRNA for the k light-chain protein.

Since each V k and J k segment has a unique nucleotide sequence, V k -J k joining of 100 V k segments and 5 J k segments in human germ-line DNA can produce 500 different possible chains. But V k -J k joining generates even more sequence variability than this calculation would suggest because a small, variable number of nucleotides are lost from the V k and J k segments when they are joined. The imprecision of the joining process greatly increases the diversity of possible VL amino acid sequences encoded by the V k -J k joint region. Significantly this region encodes many of the amino acids in the antigen-binding site at the tip of the VL domain (see Figure 9-22).

The random loss of nucleotides at the joining site generates significant diversity at that point, but the system pays for its diversity. The cost is evident if we remember the constraints on a coding sequence. Recall that a coding sequence in DNA must be read in three-base (triplet) codons, and that the initiation codon, AUG (methionine), defines a reading frame that groups the rest of the coding region into triplets. Reading DNA in one of the two other reading frames would generate a meaningless string of amino acids. Thus, the joining of two pieces of coding DNA, such as a V k and J k segment, can produce an in-phase joint, which maintains a sensible reading frame, or an out-of-phase joint, which encodes a nonsense protein. Because the V k -J k joining process is a random one, two out of every three joinings result in joints that make no sense. Thus the increased diversity permitted by imprecise joining is obtained at the expense of formation of two nonproductive joints for each productive joint.

We have described three sources of diversity in antibody k light chains: variability in the sequence of the many V k segments in the germ-line k locus, variability in the sequences of a small number of J k segments, and variability in the number of nucleotides deleted at V k -J k joints. (The actual number of V k and J k segments varies among species; in the mouse, for instance, there are ≈300 V k segments and four functional J k segments.) Other processes beyond the scope of our discussion generate even more antibody diversity by randomly altering the DNA sequence in the joint between the V k and J k segments.

Organization and Rearrangement of Heavy-Chain DNA

Functional genes encoding antibody heavy chains are formed by processes similar to those just described for light-chain genes. Analysis of the antigen-binding sites of numerous antibodies suggests that the heavy-chain contribution to contacts with antigen is even greater than the light-chain contribution. Consistent with a need for greater diversity in heavy chains, three libraries of gene segments contribute to the variable region of functional heavy-chain genes, rather than just the two (V and J) that make up light-chain genes. The third library consists of diversity (D) segments, which are located between the other two libraries, whose segments are called VH and JH. Thus two joining reactions, VH to D and D to JH, are required to assemble the region of a heavy-chain gene encoding the variable region (Figure 9-25). Clearly, having three segments, rather than two, greatly increases the possible combinatorial diversity. Human germ-line DNA contains an estimated 100 VH segments, 30 D segments, and six functional JH segments. In addition to the diversity due to random joining of VH, D, and JH segments, further diversity is created by loss of nucleotides at the VH-D and D-JH joints, as occurs at the V k -J k joint. The variable domains of heavy chains are diversified even further by the random addition of up to 15 nucleotides when a D segment joins to a JH, or when a VH joins to a D. The junctions where this occurs encode most of the amino acids that form the antigen-binding tip of the VH domain in an antibody molecule (see Figure 9-22). Thus maximum diversity is generated in the portion of the antibody molecule that interacts with antigen.

Heavy-chain germ-line DNA also contains multiple C segments encoding the constant-region domains of the various Ig classes. A combination of alternative RNA processing and additional DNA rearrangements (class switching) determine which Ig class is expressed by a particular B cell.

Generalized DNA Amplification Produces Polytene Chromosomes

All of the DNA rearrangements discussed so far  --- both functional and nonfunctional  --- involve changes in the position of sequences within the genome. Another type of rearrangement involves generalized amplification of DNA sequences, or polytenization.

The salivary glands of Drosophila species contain enlarged interphase chromosomes. When fixed and stained, these chromosomes are characterized by a large number of well- demarcated bands, which can be used to establish the position and order of genes on Drosophila chromosomes (see Figure 8-22). The enlargement of chromosomes in the salivary glands, and in some cells in other Drosophila tissues as well, occurs when the DNA repeatedly replicates but the daughter chromosomes do not separate. The result is a polytene chromosome composed of many parallel copies of itself. The amplification of chromosomal DNA greatly increases gene copy number, presumably to supply sufficient mRNA for protein synthesis in the massive salivary gland cells.

Although most of a chromosome participates in polytenization, certain sequences, such as the simple-sequence DNAs near the centromere and telomeres, are not amplified. Furthermore, the ribosomal genes tend to be amplified less than other sequences during polytenization (Figure 9-26); as discussed previously, multiple copies of these genes already are present in tandem arrays. The molecular basis for the varying extent of replication along presumably linear chromosomal DNA molecules remains unknown, but the unreplicated simplesequence DNA probably contributes to alignment of the amplified DNA along the length of polytene chromosomes.

SUMMARY


* Although gene expression is most frequently regulated by DNA-binding proteins that control the initiation of transcription, some cells use specific rearrangements of the DNA sequence to control expression of certain genes.

* Salmonella typhimurium controls expression of the H1 or H2 flagellar antigen by rare site-specific recombination between two repeated sequences flanking the promoter of the H1 gene (see Figure 9-20). The resulting inversion of the promoter causes the cell to shift from expressing one flagellar antigen to expressing the other.

* The remarkable diversity of antibody molecules is achieved by assembly of functional genes, encoding the antibody heavy and light chains, from multiple gene segments with unique sequences present in germ-line DNA. The segments encoding the variable domains of heavy and light chains are assembled by recombining two or three gene segments from libraries of alternative short coding regions (see Figures 9-24 and 9-25).

* The large number of possible combinations in which light-chain and heavy-chain gene segments can be combined is one mechanism for generating variation in the amino acid sequence, and therefore binding specificity, of antibodies produced by individual B lymphocytes. Additional diversity results from the random loss and addition of nucleotides at the joints between gene segments during the joining process.

* In some organisms, certain specialized cells grow to much larger size than other cells and amplify their chromosomal DNA, producing polytene chromosomes such as those in the larval salivary glands of Drosophila species (see Figure 9-26). The chromosomes of these giant cells result from about 10 replications without cell division and without replication of the associated simple-sequence DNA at the centromere and telomeres.



© 2000 by W. H. Freeman and Company. All rights reserved.