Hi there,
I have a database like this with 6876 obs.
                     baseMean log2FoldChange      lfcSE       pvalue         padj
ENST00000229416.10 850.753253      0.2068899 0.05536265 6.804183e-05 3.349652e-03
ENST00000509541.5  288.905762      0.1790877 0.06232428 1.752013e-03 3.600306e-02
ENST00000487168.1   15.855337      0.2610650 0.07884873 2.235875e-04 8.202311e-03
ENST00000381177.6   57.890737     -0.4073535 0.05360163 2.962595e-15 1.696065e-12
ENST00000381180.8    2.851529     -0.3272547 0.11009314 4.038313e-04 1.277395e-02
ENST00000381184.6   35.539393     -0.5825259 0.08344768 1.771709e-13 9.039495e-11
I would like to get the following informations for each transcript: Ensembl ENSG, Gene name and Gene symbol
What should I do?
Many thanks

Adding to Guido's answer:
Point 2 is crucial, always ensure you know exactly which annotation version was used in your analysis.
For point 3, assuming you've Ensembl 101:
With that
EnsDbdatabase you could either use theAnnotationDbisolution from Guido's comment, or you could use theensembldbprovided functions: 1) get adata.framewith all transcript annotations including some gene annotations 2) match your IDs to the IDs in the table.There is now also the columns
"tx_id_version"available in theEnsDbdatabases that you could use to match your versioned Ensembl IDs against, e.g.:Actually, the
NAin here (i.e. the IDs that could not be mapped to tx ids in theEnsDb) tells us that your IDs come from another Ensembl release than Ensembl 101 - otherwise all versioned transcript identifier would match.Thanks Johannes for this very useful addition, especially regarding the column
"tx_id_version"!Thanks you all, I extracted these transcripts from a DESeq2 dataset. I used the GTEx release V8 so I suppose I should use Ensembl 26, is it correct? As from here https://gtexportal.org/home/releaseInfoPage
No, a quick search on Google found this info / track (at the UCSC Genome browser):
In other words, GENCODE V26 corresponds to Ensembl version 88, and not 26.
First of all, many many thanks Guido and Johannes.
About my script I don't understand how to put my dataframe in comparison with the annotation formula
This is my script
But these are the errors:
Why it doesn't find "EnsDb.Hsapiens.v88"? From the console I see that it was loaded
Thanks again for the help
With
edb <- ah[["AH53715"]]you assign theEnsDbdatabase to the variableedb, so you would have to useedbinstead ofEnsDb.Hsapiens.v88in your code above - or alternatively useEnsDb.Hsapiens.v88 <- ah[["AH53715"]].I was blind, I'm sorry... Thanks so much!
Thanks Johannes, and the way to find the chromosome is deprecated, is it right?
What do you mean with "find the chromosome"? To get the chromosome name you can query the
"seq_name"column (i.e. include that in the list of requested column names with parametercolumns) - or you get the annotations as aGRangesand you can callseqnameson that.Ok thanks, so I'll use ensembldb not annotationdbi
Ciao suxxa, please use '
Add Reply' when responding (not 'Add Answer")