I have a database like this with 6876 obs.
baseMean log2FoldChange lfcSE pvalue padj ENST00000229416.10 850.753253 0.2068899 0.05536265 6.804183e-05 3.349652e-03 ENST00000509541.5 288.905762 0.1790877 0.06232428 1.752013e-03 3.600306e-02 ENST00000487168.1 15.855337 0.2610650 0.07884873 2.235875e-04 8.202311e-03 ENST00000381177.6 57.890737 -0.4073535 0.05360163 2.962595e-15 1.696065e-12 ENST00000381180.8 2.851529 -0.3272547 0.11009314 4.038313e-04 1.277395e-02 ENST00000381184.6 35.539393 -0.5825259 0.08344768 1.771709e-13 9.039495e-11
I would like to get the following informations for each transcript: Ensembl ENSG, Gene name and Gene symbol
What should I do?
Adding to Guido's answer:
Point 2 is crucial, always ensure you know exactly which annotation version was used in your analysis.
For point 3, assuming you've Ensembl 101:
EnsDbdatabase you could either use the
AnnotationDbisolution from Guido's comment, or you could use the
ensembldbprovided functions: 1) get a
data.framewith all transcript annotations including some gene annotations 2) match your IDs to the IDs in the table.
There is now also the columns
"tx_id_version"available in the
EnsDbdatabases that you could use to match your versioned Ensembl IDs against, e.g.:
NAin here (i.e. the IDs that could not be mapped to tx ids in the
EnsDb) tells us that your IDs come from another Ensembl release than Ensembl 101 - otherwise all versioned transcript identifier would match.
Thanks Johannes for this very useful addition, especially regarding the column
Thanks you all, I extracted these transcripts from a DESeq2 dataset. I used the GTEx release V8 so I suppose I should use Ensembl 26, is it correct? As from here https://gtexportal.org/home/releaseInfoPage
No, a quick search on Google found this info / track (at the UCSC Genome browser):
In other words, GENCODE V26 corresponds to Ensembl version 88, and not 26.
First of all, many many thanks Guido and Johannes.
About my script I don't understand how to put my dataframe in comparison with the annotation formula
This is my script
But these are the errors:
Why it doesn't find "EnsDb.Hsapiens.v88"? From the console I see that it was loaded
Thanks again for the help
edb <- ah[["AH53715"]]you assign the
EnsDbdatabase to the variable
edb, so you would have to use
EnsDb.Hsapiens.v88in your code above - or alternatively use
EnsDb.Hsapiens.v88 <- ah[["AH53715"]].
I was blind, I'm sorry... Thanks so much!
Thanks Johannes, and the way to find the chromosome is deprecated, is it right?
What do you mean with "find the chromosome"? To get the chromosome name you can query the
"seq_name"column (i.e. include that in the list of requested column names with parameter
columns) - or you get the annotations as a
GRangesand you can call
Ok thanks, so I'll use ensembldb not annotationdbi
Ciao suxxa, please use '
Add Reply' when responding (not '