Search box

RCPedia usage orbitates its querying system, therefore, the only way of getting the right answers is by asking the right questions. We make available four ways of querying for retrocopy events:

- Coordinates (chr18 or chr18 23747811 23751321 or chr18:23747811-23751321);
- Parental gene (DHFR,RPL21,GAPDH);
- Target gene (TF,ERBB2);
- And some less specific keywords such as "kinase","transcription factor"

Humans have the most curated and annotated genome, however, it is also possible to change the queried organism. We make available queries for retrocopies on six primate genomes: Humans, Chimpanzee, Gorilla, Orangutan, Rhesus and Marmoset.

We also implemented the capability of performing boolean searches. Mechanistically, the users can do searches using the following operators:

  • +

    A leading plus sign indicates that this word must be present in each row that is returned.

  • -

    A leading minus sign indicates that this word must not be present in any of the rows that are returned.

    Note: The - operator acts only to exclude rows that are otherwise matched by other search terms. Thus, a boolean-mode search that contains only terms preceded by - returns an empty result. It does not return all rows except those containing any of the excluded terms.

  • (no operator)

    By default (when neither + nor - is specified) the word is optional.

  • *

    The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it should be appended to the word to be affected. Words match if they begin with the word preceding the * operator.

  • "

    A phrase that is enclosed within double quote (") characters matches only rows that contain the phrase literally, as it was typed.

The following examples demonstrate some search strings that use boolean full-text operators:

  • 'tumor supressor'

    Find rows that contain at least one of the two words.

  • '+tumor +supressor'

    Find rows that contain both words.

  • '+tumor liver'

    Find rows that contain the word tumor, but rank rows higher if they also contain liver.

  • '+tumor -liver'

    Find rows that contain the word tumor but not liver.

  • '+tumor ~liver'

    Find rows that contain the word tumor, but if the row also contains the word liver, rate it lower than if row does not. This is softer than a search for '+tumor -liver', for which the presence of liver causes the row not to be returned at all.

  • 'tumor*'

    Find rows that contain words such as tumor, tumors, tumorigenic, or tumoral.

  • '"some words"'

    Find rows that contain the exact phrase some words (for example, rows that contain some words of wisdom but not some noise words). Note that the " characters that enclose the phrase are operator characters that delimit the phrase. They are not the quotation marks that enclose the search string itself.

Interpreting the results

Search results

Filling the search box with your favorite gene and hitting submit will forward you to all retrocopies of the searched term. For example, when searching for "DHFR" on Human:

Here you have a few options:
  • Follow to the parental perspective by clicking on the gene name (DHFR in this case), which will gather all information about retrocopies from a parental gene;

  • Follow to the retrocopy perspective by clicking on "Details", which will show all information about a single genomic loci;

  • See the genomic context of the retrocopy by clicking on the link "UCSC GB" which stands for UCSC Genome Browser.

  • Parental perspective

    Parental perspective is one of the many ways to investigate retrocopies. In this view you can see a compilation of all data gathered about retrocopies from a specific protein coding gene.

    Summary block compiles information, such as, Full Name, Genomic coordinate, Strand and Summary about its function.

    A graphical representation representing the movements of the duplications is also available. Outermost blocks represent the organism chromosomes, lines on the center represents the movements of the sequence. On DHFR for example there are six duplications and they are shown by links leaving the coordinate on chromosome 5 and arriving on the insertion point where the retrocopy is now located.

    Very similar to the search results we also provide a compilation of all retrocopies from a given parental gene.

    The "NCBI Reference Sequence" shows all transcripts used in our analysis.

    And finally a landscape of the retrocopy sequence is also provided by the multiple alignment of all retrocopies against the transcript of the parental gene. This overview enables the user to detect, for example, old and recent retrocopies by its similarity to the parental sequence.

    Retrocopy perspective

    Retrocopy perspective is the main source of information for a specific genomic locus annotated as retrocopy. Here we present all available information of a recently described retrocopy with putative function: DHFRL1.

    Summary block is a compilation of the information for a genomic locus annotated as retrocopy by RCPedia. Here we make available the genomic coordinate, identity of the retrocopy sequence compared to the parental sequence, the percentage of the parental sequence that was duplicated, putative direct repeats flanking the insertion, genomic context and finally, the putative parental transcript that was used as template during the reverse transcription process.

    Genomic context is really important for retrocopies since the event may take advantage of neighbor promoters to get expressed. Here we make available a browsable window where one may see the genes nearby the retrocopy locus.

    Parental block compiles information about the gene that was duplicated by the retrotransposition event.

    Interspecies conservation block shows if the event has an orthologous sequence on other organisms. All orthologous events are clickable and forward the user to the retrocopy event on the respective organism.

    Based on publicly available RNA-seq data, we were able to detect the expression of retrocopies in Human, Chimpanzee, Gorilla, Orangutan and Rhesus in six tissues: Brain, Cerebellum, Heart, Kidney, Liver and Testis. This block shows a representation of the expression level of the genomic locus annotated as retrocopy by the number of reads supporting the expression of the genomic region.

    Alignment Retrocopy x Parental gene shows the similarity of the sequences of the retrocopy and the putative transcript used as template during the reverse transcription.

    Finally, we also make available the sequences cited above if the user wants to investigate further the retrocopy and parental sequence.

    Advanced Search

    In order to enrich users capability of doing more specific searches, we also developed an advanced search feature. Here, the user is able to fine tune the available retrocopy features searches. Currently we make available the following filters:

    Parental geneequal/notequalGene nameRPL21
    Host geneequal/notequalGene nameDTL
    Identity>=/<=Identity percentage80
    Parental seq. overlap>=/<=Parental sequence overlap percentage90
    Specieequal/notequalSpecie nameHuman
    FullName/SummaryhavingBoolean keywords+ribosomal +60S

    The parameters used here would return only one retrocopy from RPL21 parental gene RC137, however, users can perform more complex searches to retrieve longer lists of retrocopies with similar features. For example, one could search for Human retrocopies with Identity higher than 98%, or, nearly complete reverse transcription events by searching for events with Overlap higher than 95%.