A human-specific allelic group of the MHC DRB1 gene in primates
© Yasukochi and Satta; licensee BioMed Central Ltd. 2014
Received: 24 December 2013
Accepted: 27 May 2014
Published: 13 June 2014
Skip to main content
© Yasukochi and Satta; licensee BioMed Central Ltd. 2014
Received: 24 December 2013
Accepted: 27 May 2014
Published: 13 June 2014
Diversity among human leukocyte antigen (HLA) molecules has been maintained by host-pathogen coevolution over a long period of time. Reflecting this diversity, the HLA loci are the most polymorphic in the human genome. One characteristic of HLA diversity is long-term persistence of allelic lineages, which causes trans-species polymorphisms to be shared among closely related species. Modern humans have disseminated across the world after their exodus from Africa, while chimpanzees have remained in Africa since the speciation event between humans and chimpanzees. It is thought that modern humans have recently acquired resistance to novel pathogens outside Africa. In the present study, we investigated HLA alleles that could contribute to this local adaptation in humans and also studied the contribution of natural selection to human evolution by using molecular data.
Phylogenetic analysis of HLA-DRB1 genes identified two major groups, HLA Groups A and B. Group A formed a monophyletic clade distinct from DRB1 alleles in other Catarrhini, suggesting that Group A is a human-specific allelic group. Our estimates of divergence time suggested that seven HLA-DRB1 Group A allelic lineages in humans have been maintained since before the speciation event between humans and chimpanzees, while chimpanzees possess only one DRB1 allelic lineage (Patr-DRB1*03), which is a sister group to Group A. Experimental data showed that some Group A alleles bound to peptides derived from human-specific pathogens. Of the Group A alleles, three exist at high frequencies in several local populations outside Africa.
HLA Group A alleles are likely to have been retained in human lineages for a long period of time and have not expanded since the divergence of humans and chimpanzees. On the other hand, most orthologs of HLA Group A alleles may have been lost in the chimpanzee due to differences in selective pressures. The presence of alleles with high frequency outside of Africa suggests these HLA molecules result from the local adaptations of humans. Our study helps elucidate the mechanism by which the human adaptive immune system has coevolved with pathogens over a long period of time.
Modern humans (Homo sapiens) live in a wide variety of environments, ranging from polar to tropical regions. Physiological anthropologists have long addressed the issue of ‘human adaptation’ to a variety of environments (that is the ability of humans to survive in a changing environment). Molecular evolution and population genetics also focus on the adaptation of humans to environmental changes. The approach of physiological anthropology is mainly to investigate differences in physiological modifications among individuals or ethnic groups in various environments (‘physiological polymorphism’) in order to understand human adaptation. On the other hand, molecular evolution or population genetics seek indications of natural selection by comparing nucleotide sequences of a target gene. If a new mutation at a target locus confers advantage for fitness in a certain environment, such a mutation is expected to rapidly spread throughout a population because of positive natural selection. Methods to detect such a signal of natural selection have been developed. For instance, in a protein coding gene, an excess in the number of non-synonymous substitutions (that change the amino acid sequence) over synonymous substitutions (neutral mutation) suggests that positive selection or balancing selection has occurred during the evolution of the target gene. In addition, the relationship between an allelic frequency and the extent of linkage disequilibrium (LD) around the selected mutation help us to find an allele that has rapidly spread in a population . The advantageous allele is expected to dramatically increase its frequency in a short time so that recombination does not substantially break down the LD around the selected site.
Humans live in various environments around the world. The endemic pathogens that humans are infected by in these areas differ and humans have evolved to deal with these pathogens. In the present study, we focus on polymorphisms in the major histocompatibility complex (MHC), which plays an important role in triggering immune reactions in response to pathogens, and we discuss the possibility that a human-specific MHC allele is involved in the immunological adaptation to a human-specific pathogen.
The MHC is a set of cell-surface molecules that are responsible for presenting antigens from pathogens to lymphocytes in jawed vertebrates. As such, it is an important genetic system for protection against infectious disease . In humans, the MHC is termed human leukocyte antigen (HLA). The HLA genomic region is located on the short arm of chromosome 6 at 6p21.3, spanning approximately 4 Mbp and comprising 224 genes . The region is classified into three subregions: class I, class II, and class III regions. Among HLA molecules, six class I and II molecules (HLA-A, B, and C of class I and HLA-DR, DQ, and DP of class II) are important for antigen presentation to T lymphocytes. Class I molecules mainly bind to peptides from cytosolic proteins and the HLA-peptide complex is recognized by CD8+ T cells. Class II molecules present extracellular antigens to CD4+ T cells. Class I molecules consist of two polypeptide chains, an α heavy chain encoded in the class I region, and a β2-microglobulin light chain encoded on chromosome 15. Class II molecules are composed of two polypeptide chains, α and β chains, encoded in the class II region. For instance, the DRA and DRB1 genes in the class II region encode the α and β chains, respectively, of the DR molecule. A peptide-binding region (PBR) was characterized with crystallography by Bjorkman et al.  for class I HLA-A and by Brown et al.  for class II HLA-DR. Molecular evolutionary studies of this region have revealed an enhancement of non-synonymous substitutions in the PBR, suggesting that the PBR is a target for balancing selection, which is responsible for the maintenance of HLA polymorphisms [6–10].
Polymorphisms in HLA genes have three unique features: (1) a large number of alleles, (2) a high degree of heterozygosity, and (3) remarkably long persistence time of the allelic lineage. These features are maintained by balancing selection but not by an increased mutation rate [11, 12].
The chimpanzee (Pan troglodytes) is the closest extant relative of humans. Interestingly, chimpanzees appear to have resistance to several pathogens to which humans are susceptible, including HIV type 1 and human hepatitis B virus . This indicates that the two species differ in their immune responses to these pathogens, and that possibly the pathogen recognition repertoire for MHC is different between the two species. Chimpanzees share some class II DRB1 allelic lineages with humans [14–16]. In humans, genetic variation and selective intensity on DRB1 are the greatest in the class II genes . In humans, there are 13 DRB1 allelic lineages (HLA-DRB1*01, *03, *04, *07, *08, *09, *10, *11, *12, *13, *14, *15 and *16), while there are only four allelic lineages (Patr-DRB1*02, *03, *07 and *10) in chimpanzees [14–16].
Chimpanzees have stayed in Africa since their divergence from humans approximately six million years ago (MYA). On the other hand, modern humans have dispersed across the world from Africa from 100,000 to 50,000 years ago and have adapted to regions with various exogenous pathogens. This begs the question of how modern humans have acquired resistance to a variety of pathogens in different environments. Therefore, the present study investigated the evolution of HLA-DRB1 alleles that confer resistance to novel pathogens in humans. For this purpose, we studied nucleotide sequences of HLA genes using the IMGT/HLA database (http://www.ebi.ac.uk/imgt/hla/, ).
Nucleotide sequences of humans, chimpanzees, rhesus monkeys (Macaca mulatta), and crab-eating macaques (Macaca fascicularis) were used for phylogenetic analyses. A dataset of human DRB allele sequences, including DRB1 and other functional DRB (DRB3, DRB4, and DRB5), was obtained from the IMGT/HLA database. The dataset of non-human primate DRB1 alleles was obtained from the IPD MHC NHP database (http://www.ebi.ac.uk/ipd/mhc/nhp/, ). In the database, there were many partial coding sequences (CDS) (mainly exon 2 sequences). Using incomplete sequences is likely to be misleading in analysis of the phylogenetic relationships among sequences; therefore, we performed phylogenetic analyses only for full-length DRB1 CDS. Because only partial sequences were available, we also excluded sequence data of the gorilla (Gorilla gorilla) and orangutan (Pongo pygmaeus) from the present analysis. We used two HLA-DQB1 alleles as outgroup sequences. Next, we removed sequences of potential recombinant alleles according to a method that assumes a binomial distribution of the ratio of substitutions in a particular region to that in the entire region [17, 20–22]. For phylogenetic analyses, we used 104 complete CDS: 56 HLA-DRB1, 6 HLA-DRB3, 4 HLA-DRB4, 2 HLA-DRB5, 11 chimpanzee Patr-DRB1, 22 rhesus monkey Mamu-DRB1, and 3 crab-eating macaque Mafa-DRB1 alleles.
Brown et al.  identified 24 amino acids in the PBR of HLA-DRB1 genes. In addition to the defined PBR, we included three amino acid sites (positions of 57, 67, and 90; for a total of 27 amino acids), because Brown and collaborators have subsequently shown that the three sites are involved in the formation of peptide-binding grooves and peptide binding .
where μ is the neutral substitution rate of 10−9 per site per year at the MHC loci . Pathogens recognized by HLA-DRB1 molecules were examined using the Immune Epitope Database (IEDB) (http://www.immuneepitope.org, ). Information about HLA-DRB1 allele frequency among different human populations was collected from the NCBI dbMHC database (http://www.ncbi.nlm.nih.gov/gv/mhc, ).
In the ML tree, the Group B alleles showed trans-species evolution of polymorphisms with those in the chimpanzee (Patr-DRB1*02 and *07). Interestingly, 31 Group A alleles formed a monophyletic clade distinct from other primate DRB1 alleles, although the bootstrap value for supporting this cluster was not particularly high, suggesting that the Group A alleles are human-specific. Previous studies [14–16] have not identified this DRB1 monophyletic group in humans, because the nucleotide sequences used in those studies were limited to exon 2.
The divergence time of the two HLA groups, HLA -Group A and HLA -Group B
The phylogeny showed a difference in divergence time between Groups A and B. The mean divergence times for Groups A and B were approximately 9 and 21 MYA, respectively, and TMRCAs were approximately 29 and 41 MYA, respectively (Table 1). These values suggest the presence of specific trans-species polymorphisms [10, 30, 31] in both groups, because the mean divergence time exceeded the speciation time of humans and chimpanzees [32–34]. Based on this result, we rejected the hypothesis that the HLA Group A allelic lineages specifically expanded in humans. However, the tree revealed that alleles in Group A did not intermingle with other non-human primate DRB1 alleles (Figure 1). The closest was the Patr-DRB1*03 lineage cluster (indicated by an arrow in Figure 1).
Although alleles in Group A formed a single clade in the ML tree of primate DRB alleles, the TMRCA was 29 MYA, which is significantly older than six MYA (that is the speciation time of humans and chimpanzees). Thus, the molecular clock for DRB1 alleles may have been skewed by various factors, such as back or parallel mutations (multiple mutations) or recombination/gene conversion. Indeed, in the Group A allele sequences, there was segregation of 21 synonymous sites. Among them, ten were singletons with a unique nucleotide observed only once in the sampled alleles, and 11 were phylogenetically informative sites. Among 55 pairs of 11 informative sites, 13 pairs were phylogenetically incompatible with each other. This incompatibility was likely the result of either recombination/gene conversion or multiple mutations at a single site. In the event of recombination/gene conversion, however, double recombination in a relatively small region or a conversion tract with a small size should be considered. Multiple mutations are a more likely cause of this incompatibility. To examine whether the presence of multiple substitutions masked an accurate estimate of the TMRCA, we tested the accuracy of the correction for multiple substitutions in the calculation of dSmax.
For this purpose, we estimated the maximum number of synonymous substitutions in a different way. First, we placed synonymous substitutions observed in the Group A alleles on each branch of the ML tree parsimoniously (Figure 1 and Additional file 1: Figure S1) and re-counted the number of synonymous substitutions (KS) in each pair of Group A alleles. The maximum KS was thirteen (KSmax = 13). TMRCA was calculated from this KSmax divided by the mean number of synonymous sites (LS = 223). As a result, the TMRCA of the Group A alleles was estimated to be 29 MYA. This showed good agreement with the TMRCA estimated by the Jukes-Cantor correction (29 MYA). Because there was no bias in our method of estimating TMRCA, we considered it to be reliable.
A method for calculating the probability, g nk (t) , that there were k allelic lineages among n extant lineages for t in N generations under balancing selection is available. In the present study, we tried to calculate the probability g nk (t) for seven ancestral allelic lineages being maintained since approximately six MYA among a sample of 31 Group A alleles (n = 31). However, because HLA-DRB1 also contains the 25 Group B alleles, the 31 Group A sequences are only a part of the samples in the entire HLA-DRB1. There were no means to determine the effective population size (Ne) of these subpopulations, which was required for the calculation of g nk (t); therefore, we could not calculate the probability of maintaining the current Group A alleles for six million years.
The effective population size Ne of modern humans is smaller than that of chimpanzees [36–38], and the eight allelic lineages in the ancestral population have likely been lost more frequently from the human lineage than the chimpanzee lineage. Nevertheless, the number of allelic lineages in humans is seven times larger than that in chimpanzees. This supports the hypothesis that natural selection selectively maintained Group A alleles in humans. It is important to understand the biological reasons why these seven lineages have been maintained only in humans.
The comparison of specific pathogen bound by HLA-DRB1 molecules between Group A and Group B
Source organism ID
Source organism name
Hepatitis B virus subtype adw2
Human papillomavirus type 11
Influenza B virus [B/Hong Kong/330/2001]
Influenza A virus [A/Bangkok/1/1979(H3N2)]
Influenza A virus [A/Japan/305/1957(H2N2)]
HLA-DRB1*11:01, 11:02, 11:03, 11:04
Streptococcus mutans GS-5
HLA-DRB1*04:01, 07:01, 15:01
Vaccinia virus WR
Herpes simplex virus (type 1/strain SC16)
Human rotavirus strain P
HLA-DRB1*04:01, 04:04, 15:01
Unidentified influenza virus
HIV type 1 (CLONE 12)
HIV virus type 1 (JH3 ISOLATE)
Influenza A virus [A/Hong Kong/156/97(H5N1)]
Influenza virus A
Influenza A virus [A/Puerto Rico/8/1934(H1N1)]
Burkholderia mallei ATCC 23344
Influenza A virus [A/New Caledonia/20/1999(H1N1)]
Influenza A virus [A/Panama/2007/1999(H3N2)]
Influenza A virus [A/Singapore/1/1957(H2N2)]
Influenza A virus [A/California/04/2009(H1N1)]
Influenza A virus [A/chicken/Uchal/8293/2006(H9N2)]
Influenza A virus [A/chicken/Uchal/8286/2006(H9N2)]
Influenza A virus [A/swine/Hong Kong/71/2009(H1N1)]
In HLA Group B, although some pathogens infect not only humans but also other animals (for example, Brucella ovis and Burkholderia mallei), candidates for human-specific pathogens (for example, Helicobacter pylori) were included. This suggests that some Group B alleles might be also involved in local adaptation in humans.
The frequency distributions of eight HLA-DRB1 alleles (HLA-DRB1*0301, *08:02, *11:01, *11:02, *11:03, *11:04, *12:01, and *14:01) that recognize Group A-specific pathogens were investigated using information in the NCBI dbMHC database (Additional file 2: Figure S2). The frequency distributions of HLA-DRB1*08:02, *12:01, and *14:01 were high outside Africa, suggesting that the frequency of the DRB1 molecules might have increased since the human species disseminated outside Africa.
Chimpanzees appear to have lost a relatively large number of alleles from the Group A allelic lineage while humans have maintained several allelic lineages since their speciation. The examination of genetic variation in MHC class I Patr-A, Patr-B, and Patr-C loci suggested that the genetic variations in chimpanzees have been severely reduced . In this previous study, it was hypothesized that a selective sweep caused the loss of genetic diversity at MHC loci in chimpanzees in order to avoid widespread viral infection, such as that with chimpanzee-derived simian immunodeficiency virus, prior to a subspeciation of the common chimpanzee and bonobo (Pan paniscus) approximately two MYA. Although it is not known whether such selective sweep resulted in the loss of some DRB1 allelic lineages in chimpanzees, reduced genetic variation at the three class I loci in chimpanzees may have been linked to the relatively small number of Patr-DRB1 allelic lineages.
A phylogenetic analysis of the HLA-DRB1 gene identified two major groups of alleles, Groups A and B. Our findings suggest that Group A is human-specific and has been maintained by balancing selection in humans, while chimpanzees may have lost their counterparts to these allelic lineages due to different selective pressure. Some Group A alleles can bind to peptides derived from human specific pathogens and these showed a high frequency in populations outside Africa. Therefore, these alleles may have increased in frequency after the Out-of-Africa event. Our results imply that some of HLA Group A alleles may have contributed to local adaptation of humans.
In the present study, we identified a candidate human-specific HLA-DRB1 allelic group. However, the sample size of chimpanzees was smaller than that of humans. Specifically, there were at least 88 chimpanzees used in published studies [14, 15, 43–45], while the HLA-DRB1 alleles were detected in thousands of human individuals. Therefore, there is possible sampling bias among chimpanzees. The common chimpanzees are classified into at least four subspecies, which are, Pan troglodytes troglodytes, P. t. verus, P. t. ellioti, and P. t. schweinfurthii, in Mammal Species of the World. In addition to the common chimpanzees, bonobo samples should also be included in the phylogenetic analyses of DRB1 alleles. To exclude the possibility that our finding is an artifact of sampling bias, we plan to increase the sample size of chimpanzees in future studies, which will help validate the present estimates.
In the present study, DRB1 alleles of rhesus monkeys and crab-eating macaques formed a taxon-specific clade with the exception of HLA-DRB4*01 sequences. All sampled alleles in the two macaques formed a sister clade with HLA Group A alleles in the ML tree but not with HLA Group B alleles (Figure 1). In the future, the reason why the DRB1 alleles of macaques formed a large monophyletic group should be investigated.
It is difficult to verify that a molecule in HLA Group A can recognize human-specific pathogens. In recent years, there has been increasing information on peptide-HLA binding. Future studies must examine the relationships among HLA alleles, binding peptides, and pathogens in order to elucidate the mechanisms by which modern humans have adapted to a variety of environments around the world.
The contribution of natural selection to local adaptation in humans was evaluated from genomic data. The genomic data provide a universal framework for understanding human evolution and enable quantitative analysis of the operation of natural selection. We believe that molecular genetics techniques can shed light on some important issues in physiological anthropology.
number of non-synonymous substitutions per non-synonymous site
number of synonymous substitutions per synonymous site
maximum genetic distance at synonymous sites
human leukocyte antigen
human papillomavirus type 11
influenza B virus
Immune Epitope Database
number of synonymous substitutions
maximum number of synonymous substitutions
mean number of synonymous site
major histocompatibility complex
million years ago
effective population size
time to most recent common ancestor.
This work was supported by Grant-in-Aid for Scientific Research on Innovative Areas from the Ministry of Education, Culture, Sports, Science and Technology of Japan (22133007). We owe special thanks to Drs. Naoyuki Takahata and Jun Gojobori for providing valuable comments.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.