Quick Start Guide
  • Introduction
  • How to search
  • A. Quick Search
    B. Gene information search
    C. ncRNA information search
    D. Function classification retrieval
  • How to use the tools
  • A. Blast
    B. Gbrowse
    C. Muscle
    D. Genewise
    E. Codeml
    F. Positive selection analysis pipeline
    G. ExtractSeq
    H. ReverseSeq
    I.TranslateCDS
    1. Introduction

    Treeshrew Database is an integrated database for genome biology of treeshrew. This database provides not only information of genomic data including sequences, genes and functional annotation, but also biological information such as expression data and corresponding references. In addition, previously and newly developed bioinformatics tools provided in the database will facilitate users to easily exploit information they wanted. Treeshrew Database will be useful for the research of the treeshrew functional and comparative genomics.

    2. How to search
    A. Quick Search

    The quick search tool that users get gene information quickly through gene symbol on the home page.

    B. Gene information search

    On the search page, we provided a “one-stop” retrieval system for viewing gene information by gene symbol, gene ID or gene full name.

    Step1: Tpye the gene symbol, gene ID or gene full name

    Step2: select the gene you want

    Step3: View the gene information

    In the basic information table, users would view some basic information of the gene they want, including gene symbol, refseq ID, location, genomic map, or more from other databases. In the location region, users would obtain the genomic DNA sequence or with the flanking sequence of the gene they queried. In the genomic map region, we embedded a graphic to show gene structure and gene mapping by using Gbrowse software; the graphic was linked to genome browse page.

    The sequences table is linked to the corresponding CDS sequence, UTR sequence, exon and intron sequence, as well as deduced protein sequence.

    The ortholog table displayed all identified ortholog genes with links to the Ensembl database (http://www.ensembl.org). In order to facilitate to compare the ortholog gene sequences in different species, we provided a button which can quick align the selected sequences.

    In the function annotation table, the gene function annotation is displayed. Users would learn gene function classification from GO annotation, pathway information from KEGG annotation and domain information from InterPro annotation.

    The expression information table contains RPKM values and a histogram figure to show the expression level of seven tissues in Chinese tree shrew.

    C. ncRNA information search

    The ncRNA retrieval is provided to view the information about four types of small RNA in Chinese tree shrew.

    Step1: Tpye the ncRNA name

    Step2: Select the ncRNA you want

    Step3: View the gene information

    D. Function classification retrieval

    Uses would search genes by GO term, domain information and pathway information taxonomy and download gene sequences by function classification in batch.

    Step1: Tpye the KEGG pathway name

    Step2: Select the pathway you want

    Step3: View the genes information or select the genes you want to download the CDS and/or protein sequences in batch

    3. How to use the tools
    A. Blast

    The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences (Altschul et al. 1997). The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of the matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as to help identify members of gene families.

    Step1: Select the program and database and paste or upload sequences

    Step2: View the blast results and download the sequences you want

    ......

    B. Gbrowse

    GBrowse is a popular genome browser for displaying annotations on genomes by combining the database and interactive web pages (Donlin 2007). In the TreeshrewDB, we integrated Gbrowse to display Chinese tree shrew gene models and annotations, and built a combination between Gbrowse and gene retrieval system. The gene information can be viewed by the links in GBrowse.

    Step1: Type the gene ID or scaffold region to view the gene models

    Step2: Click the gene ID and jump to gene information page

    C. Muscle

    MUSCLE (Multiple Sequence Comparison by Log-Expectation), an accurate Multiple Sequence Alignment (MSA) tool, especially good with proteins, is claimed to achieve better average accuracy and better speed than ClustalW2 or T-Coffee, depending on the options (Edgar 2004).

    Step1: Paste or upload the sequences in fasta format as follows and select the output format

    D. Genewise

    GeneWise, a Pairwise Sequence Alignment tool, compares a protein sequence to a genomic DNA sequence, allowing for introns and frames-shifting errors (Birney et al. 2004). GeneWise can provide highly accurate and sensitive predictions of gene structures.

    Step1: Paste or upload the protein and corresponding genomic DNA sequences in fasta format as follows to predict gene structure

    E. Codeml

    Codeml is a part of the PAML package, which is a suite of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood (ML) (Yang 2007).

    Step1: Paste or upload alignment sequences of species they want to analysis with phylip format as follows and phylogenetic tree of these species, then select the model to finish the compute by codeml.

    F. Positive selection analysis pipeline

    Positive selection analysis pipeline (PSAP) is developed based on perl script, muscle and codeml from PAML package (Yang 2007), which provides 10 species to preform positive selection analysis including human, chimpanzee, gorilla, rhesus monkey, Chinese tree shrew, mouse, rat, rabbit, dog and cow. When users type the gene, select the species and model, the pipeline will extract the related sequences according to ortholog relationship list and choose the phylogenetic tree from the phylogenetic tree list. Then the extracted CDS sequences will be aligned by Muscle 3.7 with the guidance of aligned protein sequences and finish positive selection analysis by codeml from PAML package.

    Step1: Type the gene symbol and foreground branch, select species and model to finish the compute by codeml.

    G. ExtractSeq

    ExtractSeq can extract the sequence from Chinese tree shrew genome by scaffold position information

    Step1: Type the position informations as follows to extract the corresponding sequences.

    H. ReverseSeq

    ReverseSeq can convert a DNA sequence into its reverse, complement, or reverse-complement counterpart. You may want to work with the reverse-complement of a sequence if it contains an ORF on the reverse strand.

    Step1: Paste sequences in fasta format as follows to convert into its reverse, complement, or reverse-complement counterpart.

    I. TranslateCDS

    ReverseSeq can translate coding DNA sequence to protein sequence.

    Step1: Paste CDS sequence in fasta format as follows to translate the corresponding protein sequence

    References:

    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25(17): 3389-3402.

    Birney E, Clamp M, Durbin R. 2004. GeneWise and Genomewise. Genome Res, 14(5): 988-995.

    Donlin MJ. 2007. Using the Generic Genome Browser (GBrowse). Curr Protoc Bioinformatics, Chapter 9: Unit 9 9.

    Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res, 32(5): 1792-1797.

    Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol, 24(8): 1586-1591.