Search
Question: Importing Gene Symbols with makeTxDbFromGFF
0
gravatar for Dario Strbenac
4 days ago by
Dario Strbenac1.3k
Australia
Dario Strbenac1.3k wrote:

I'd like to import the GENCODE Genes GFF3 file with its gene symbols. By using columns on the TxDb object, it is apparent that only the gene_id field is imported, which has entries such as ENSG00000000003.14.How can I also import the gene_name column, which has values like TSPAN6?

ADD COMMENTlink modified 4 days ago by Valerie Obenchain ♦♦ 5.9k • written 4 days ago by Dario Strbenac1.3k
1
gravatar for Valerie Obenchain
4 days ago by
Valerie Obenchain ♦♦ 5.9k
United States
Valerie Obenchain ♦♦ 5.9k wrote:

The decision was made to not include a gene_name column in the TxDbs. This is explained on the ?transcripts man page:

    Finally, \code{use.names=TRUE} cannot be used when grouping
    by gene \code{by="gene"}. This is because, unlike for the
    other features, the gene ids are external ids (e.g. Entrez
    Gene or Ensembl ids) so the db doesn't have a \code{"gene_name"}
    column for storing alternate gene names.

You can convert from Entrez or Ensembl ids to gene name with an OrgDb package:

> columns(org.Hs.eg.db)
 [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"
 [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"    
[11] "GO"           "GOALL"        "IPI"          "MAP"          "OMIM"        
[16] "ONTOLOGY"     "ONTOLOGYALL"  "PATH"         "PFAM"         "PMID"        
[21] "PROSITE"      "REFSEQ"       "SYMBOL"       "UCSCKG"       "UNIGENE"     
[26] "UNIPROT" 

Valerie

ADD COMMENTlink written 4 days ago by Valerie Obenchain ♦♦ 5.9k
1

It's obviously unfortunate that the user starts with the gene names but then is forced to discard them and get them back. It would be nice if TxDb supported arbitrary meta columns, e.g., through a NoSQL or EAV approach.

ADD REPLYlink written 3 days ago by Michael Lawrence9.0k

Another solution is to read the file twice, once with makeTxDbFromGFF and a second time with import.gff3. Then, the matching of IDs is easy and doesn't miss those newly discovered genes which GENCODE has annotated with symbols.

ADD REPLYlink written 3 days ago by Dario Strbenac1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 123 users visited in the last hour