Homolog, ortholog and Paralog-Bioinformatics
Both concepts are illustrated by the gene trees above (a gene tree is a type of phylogenetic tree depicting the evolutionary history of genes through time). Initially, there is a single gene (black) in a single lineage, the last common ancestor of species 1 and 2.
In the scenario on the left(1), this gene first undergoes a gene duplication event, and both copies (A and B in red and blue, respectively), while gradually evolving apart, persist in the last common ancestor until a subsequent speciation event splits the lineage into two new species. Both species inherit both copies of the duplicated gene, where they continue to diverge until the present day. Genes of different colors are paralogs because they are related through the initial gene duplication, regardless of whether they are found in the same (like 1A and 1B) or different species (like 1B and 2A). In contrast, genes sharing a color are more closely related through the speciation event and are therefore orthologs. By definition, orthologs can never be found in the same species.
The scenario on the right(2)shows what happens if the order of events is reversed: first, the ancestral lineage harboring the gene splits by speciation into two new species, both of which inherit the gene. Then independent gene duplication events create copies of the gene in each species. In this case, the genes found within each species are paralogous to each other. Between species, however, they are orthologous tobothcopies in the other species, because both are related through speciation first and then through gene duplication.
Whether two genes are orthologs or paralogs has important implications. In phylogenetics, they can be used to trace the relatedness of organisms because orthologous gene trees are a reflection of the species tree. Orthologous genes are also often assumed to fulfill similar or identical roles in two organisms. While this is not necessarily true, establishing orthology can often provide a first hint at the function of a newly discovered gene by comparing it to its orthologs from well-studied species.
Homolog or homologue
A gene related to a second gene by descent from a common ancestral DNA sequence. The term, homolog, may apply to the relationship between genes separated by the event of speciation (see ortholog) or to the relationship betwen genes separated by the event of genetic duplication (see paralog).
Ortholog or othologue
Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes.
Orthologues
Homologues which diverged by a speciation event. There are four types of orthologues:
1-to-1 orthologues (ortholog_one2one)
1-to-many orthologues (ortholog_one2many)
many-to-many orthologues (ortholog_many2many)
between-species paralogues – only as exceptions
Genes in different species and related by a speciation event are defined as orthologues. Depending on the number of genes found in each species, we differentiate among 1:1, 1:many and many:many relationships. Please, refer to the figure where there are examples of the three kinds.
Speciation
Speciation is the origin of a new species capable of making a living in a new way from the species from which it arose. As part of this process it has also aquired some barreir to genetic exchage with the parent species.
Paralog or paralogue
Paralogs are genes related by duplication within a genome. Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one.
Paralogues
Homologues which diverged by a duplication event. There are two types of paralogues:
same-species paralogies (within_species_paralog)
fragments of the same ‐predicted‐ gene (gene_split)
Genes of the same species and related by a duplication event are defined as paralogues. In the previous figure, Hsap2 and Hsap2′, and Mmus2 and Mmus2′ are two examples of within species paralogues. The duplication event relating the paralogues does not need to affect this species only. For example, Mmus2′ and Mmus3′ are also within species paralogues but the duplication event has occurred in the common ancestor between species Hsap (human) and species Mmus (mouse). The taxonomy level “times” the duplication event to the ancestor of Euarchontoglires.
Between species paralogues
A between species paralogue corresponds to a relation between genes of different species where the ancestor node has been labelled as a duplication node e.g. Mmus1:Hsap2 or Mmus1:Hsap3. Currently, we only annotate between species paralogue when there is no better match for any of the genes, and the duplication is weakly-supported (duplication confidence score ≤ 0.25).
Such cases can be the results of real duplications followed by gene losses (as shown in the picture below), but most of the times occur as the result of a wrong gene tree topology with a spurious duplication node. Often assembly errors are behind these problems. It is not clear whether these genes are real orthologues or not, but they are the best available candidates (given the data), and we bend the definition of orthology to tag them as orthologues. They are flagged as “non-compliant with the gene tree”. People interested in phylogenetic analysis mixing the orthologies and the trees should probably use the set of tree-compliant orthologies.
Identifying homologs-Paralog, ortholog through bioinformatics approach
The most commonly used method to establish homology is through sequence similarity (sometimes, though not quite accurately, called sequence homology). In the absence of gene duplication events, it is reasonable to assume that orthologous genes fill an equivalent functional niche, and are more similar in sequence to each other than to any other gene. In a pairwise comparison, orthologs are therefore each other’s best match. This method, called thereciprocal best hit method, is easily implemented usingBLAST. While all orthologs are reciprocal best BLAST hits, the reverse is not necessarily true: gene duplications and gene loss can lead to scenarios in which reciprocal best BLAST hits are actually paralogs. However, in simple cases, this method is very accurate and still useful to acquire a set of candidates genes for orthologs in more complex cases.
To see how it works, consider the following exercise:
- Starting with theD. melanogasterCytoplasmic Ribosomal ProteinRpL30, find the best match inP. barbatus, using theBLAST tool on HGD’s Ant Portalas described in thechapter on BLAST searches(blastp against the Official Gene Set).
- Take the best hit (PB22887; you can acquire the protein sequence by clicking on the gene identifier) and BLAST it against the annotated protein dataset ofD. melanogasteronFlyBase.
- Coming full circle, the best reciprocal hit turns out to be RpL30 (note that there are several alternative transcripts in the fly) — there seems to be only a single gene for RpL30 in both species, and they can be assumed to be orthologous to each other.
In the example above, the lack of paralogs is evident by the fact that neither genome harbors another gene that comes even close in similarity. The following examples show what happens if there is. First, repeat the exercise above with Rp10Aa:
WhileP. barbatusprovides only one single significant BLAST hit, PB25666, running the reciprocal BLAST search inD. melanogasternets two genes, RpL10Aa and RpL10Ab. Moreover, the best reciprocal BLAST hit is RpL10Ab, not our starting geneRpL10Aa. This suggests that there are two paralogs in the fruit fly genome, and that RpL10Aa has diverged more from the ancestral gene than RpL10Ab. While it is possible that there were originally two copies inP. barbatusas well, one of which was subsequently lost, it is more parsimonious to assume that the fly paralogs came into being after the speciation event that split flies from ants. Both fly genes are therefore most likely orthologous to theP. barbatusgene.
The opposite case, a recent gene duplication leading to two paralogs inP. barbatus, both of which are orthologs to a single fly gene, is exemplified by RpLP0: There are two significant BLAST hits inP. barbatus, PB13254 and PB16486, both of which match the same gene in the fly.
Tree-based homology assessment
Multiple instances of speciation and gene duplication in several species can create a complex web of homology relations that is impossible to resolve withthe reciprocal best hit method, and requires the reconstruction of agene tree. After all, orthology and paralogy are defined in phylogenetic terms. We will discuss phylogenetic reconstruction methods in the next chapter, but the following figure may illustrate the principle:
In this gene tree of RpS7 from select Hymenopteran species (bees, wasps and ants), some species are represented by a single gene, others by two genes (species can be distinguished by the four-letter prefixes of the gene identifiers). The duplicate genes fall into two groups, each forming a subtree with almost identical topology (or shape), which mostly reflects the known evolutionary relationships of the represented ant species (Aech,Acep,Pbar,Sinv,Cflo).
This pattern suggests that a gene duplication occurred in the lineage leading to the last common ancestor of these five species, all of which therefore inherited two copies. Since the gene inHsalis more closely related to one set of genes, the duplication might be even older, and theHsalcopy belonging to the other set was since deleted from its genome.
Gene trees are useful to visualize the homology relations of a large number of genes, but are time-consuming to reconstruct and can be prone to artifacts in the tree reconstruction process. Phylogenies are not definite results of an infallible method, but hypotheses. For example, if the position of theHsalgene in the gene tree is inaccurate, we would misinterpret the timing of the gene duplication event. Finally, to make sense of gene trees, the underlying species tree has to be known, which is not always the case.
Related posts:
FAQs
What is the difference between homolog ortholog and paralogs? ›
Here, orthologs are defined as homologs in different species that catalyze the same reaction, and paralogs are defined as homologs in the same species that do not catalyze the same reaction.
What is the difference between orthologs and paralogs in bioinformatics? ›Briefly, orthologs are genes in different organisms which are direct evolutionary counterparts of each other. Orthologs were inherited through speciation, as opposed to paralogs which are genes in the same organism which evolved by gene duplication [6, 3, 2].
What is the relationship between homolog ortholog and Paralog? ›Homolog is the umbrella term for a genes that share origin. Orthologs are two genes in two different species that share a common ancestor, while paralogs are two genes in the same genome that are a product of a gene duplication event of the original gene. In all cases, the genes can be dissimilar in sequence.
What is the difference between homologue and orthologue? ›Homology is a relation between a pair of genes that share a common ancestor. All pairs of genes in the below figure are homologous to each other. Orthology is a relation defined over a pair of homologous genes, where the two genes have emerged through a speciation event [4].
What are homologous orthologous and paralogous genes? ›"By definition, orthologs are genes that are related by vertical descent from a common ancestor and encode proteins with the same function in different species. By contrast, paralogs are homologous genes that have evolved by duplication and code for protein with similar, but not identical functions."
What is the definition of homologues in bioinformatics? ›Homolog or homologue
A gene related to a second gene by descent from a common ancestral DNA sequence. The term, homolog, may apply to the relationship between genes separated by the event of speciation (see ortholog) or to the relationship betwen genes separated by the event of genetic duplication (see paralog).
Orthologs are homologous genes in different species that diverged from a single ancestral gene after a speciation event and paralogs are homologous genes that originate from the intragenomic duplication of an ancestral gene.
What are examples of orthologs and paralogs? ›Orthologs are genes related by common descent, i.e., "true" homologs. The copies are generated by speciation, not by gene duplication. An example would be the beta-hemoglobin genes of human and chimpanzee. Paralogs are genes related by gene duplication.
How do you identify orthologs and paralogs? ›How can we know whether these genes are ortholog or paralog? Orthologs are from different organisms. Paralogs are from the same organims. You will have to run phylogenetics analysis and also check the sequence identity [>= 70%].
What is Paralog in bioinformatics? ›Definition. One of a set of homologous genes that have diverged from each other as a consequence of genetic duplication. For example, the mouse alpha globin and beta globin genes are paralogs. The relationship between mouse alpha globin and chick beta globin is also considered paralogous.
What is an example of homolog? ›
An example of homologous characters is the four limbs of tetrapods. Birds, bats, mice, and crocodiles all have four limbs. Sharks and bony fish do not. The ancestor of tetrapods evolved four limbs, and its descendents have inherited that feature — so the presence of four limbs is a homology.
Why are paralogs considered homologous? ›However, paralogous genes are another type of homologous genes that can occur in same or different species. They are transferred via small scale duplication events. These genes share slightly or vastly different functions when they occur in different species altogether.
Why do we need the terms orthologs and Paralogue? ›Orthologous genes diverged after a speciation event, while paralogous genes diverge from one another within a species. Put another way, the terms orthologous and paralogous describe the relationships between genetic sequence divergence and gene products associated with speciation or genetic duplication.
What are the 2 types of homologous genes? ›Orthologs and paralogs are two fundamentally different types of homologous genes that evolved, respectively, by vertical descent from a single ancestral gene and by duplication.
What are the two types of homologous sequences? ›Homology among nucleic acid are of two major types: orthologous and paralogous. Homologous said to be orthologous if they were separated by an event called speciation. orthologous gene are found in different species, but similar to each other in which they originate from the same common ancestors.
What is the difference between orthologs and paralogs quizlet? ›- Paralogs are genes related by duplication within a genome. Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one.
How do you know if two sequences are homologous? ›Two sequences are said to be homologous if they are both derived from a common ancestral sequence. Homologous genes that have appeared by speciation (A and B) Homologous genes that arise by gene duplication in one specie (A and A'). Usually they diverge in term of function.
What are the three types of homology? ›In general terms, the homologies definition refers to a similarity in genetics or structure between two species that implies a common ancestor. There are three main categories of homologies: structural, developmental, and molecular.
What is a paralogue? ›paralogue (plural paralogues) (genetics) A pair of genes that derives from the same ancestral gene and now reside at different locations within the same genome.
Are all homologs paralogs? ›Multiple homologs in the same genome will always be paralogous, but this does not mean that paralogs will always be restricted to the same genome as evolution progresses.
How do you tell if a gene is a ortholog gene or not? ›
The basic procedure entails collecting all the genes in two species and comparing them all to one another. If genes from two species identify each other as their closest partners then they are considered orthologs.
How are paralogs and orthologs produced? ›Orthologs are genes related via speciation (vertical descent), whereas paralogs are genes related via duplication (23). The combination of speciation and duplication events, along with HGT, gene loss, and gene rearrangements, entangle orthologs and paralogs into complex webs of relationships.
How do you identify a Paralog? ›Another possible fate of a paralog is to become a pseudogene (a nonfunctional gene) Paralogs are normally identified by sequence similarity searches (e.g. BLAST) of a query protein against the rest of the same genome.
What are 4 examples of homologous structures? ›A dolphin's flipper, a bird's wing, a cat's leg, and a human arm are considered homologous structures.
What is the difference between homolog and Homeolog? ›Homologous and homeologous chromosomes are two types of chromosomes based on the homology. Homologous chromosomes show complete homology between chromosomes while homeologous chromosomes show partial homology between two chromosomes. This is the key difference between homologous and homeologous chromosomes.
What is the difference between orthologs paralogs and xenologs? ›Homologous genes share a common evolutionary ancestor and can be orthologs (derived from speciation events), paralogs (derived from gene duplication events) or xenologs (derived from horizontal transfer or lineage fusion).
What is orthologs vs paralogs example? ›Orthologs are genes related by common descent, i.e., "true" homologs. The copies are generated by speciation, not by gene duplication. An example would be the beta-hemoglobin genes of human and chimpanzee. Paralogs are genes related by gene duplication.
What is a paralog example? ›Definition. One of a set of homologous genes that have diverged from each other as a consequence of genetic duplication. For example, the mouse alpha globin and beta globin genes are paralogs. The relationship between mouse alpha globin and chick beta globin is also considered paralogous.
What are the three types of homologous? ›In general terms, the homologies definition refers to a similarity in genetics or structure between two species that implies a common ancestor. There are three main categories of homologies: structural, developmental, and molecular.
Are all orthologs and paralogs homologs? ›Orthologs are homologous genes in different species that diverged from a single ancestral gene after a speciation event and paralogs are homologous genes that originate from the intragenomic duplication of an ancestral gene.
What are paralogs in bioinformatics? ›
Paralogs refer to gene sequences that are shared by organisms in the same species but exhibit different functions. Paralogs are usually the product of gene duplication which can be caused by any number of mechanisms such as transposons or unequal cross-overs.
How do you identify paralogs? ›Another possible fate of a paralog is to become a pseudogene (a nonfunctional gene) Paralogs are normally identified by sequence similarity searches (e.g. BLAST) of a query protein against the rest of the same genome.