Hi there,
I have a database like this with 6876 obs.
baseMean log2FoldChange lfcSE pvalue padj
ENST00000229416.10 850.753253 0.2068899 0.05536265 6.804183e-05 3.349652e-03
ENST00000509541.5 288.905762 0.1790877 0.06232428 1.752013e-03 3.600306e-02
ENST00000487168.1 15.855337 0.2610650 0.07884873 2.235875e-04 8.202311e-03
ENST00000381177.6 57.890737 -0.4073535 0.05360163 2.962595e-15 1.696065e-12
ENST00000381180.8 2.851529 -0.3272547 0.11009314 4.038313e-04 1.277395e-02
ENST00000381184.6 35.539393 -0.5825259 0.08344768 1.771709e-13 9.039495e-11
I would like to get the following informations for each transcript: Ensembl ENSG, Gene name and Gene symbol
What should I do?
Many thanks
Adding to Guido's answer:
Point 2 is crucial, always ensure you know exactly which annotation version was used in your analysis.
For point 3, assuming you've Ensembl 101:
With that
EnsDb
database you could either use theAnnotationDbi
solution from Guido's comment, or you could use theensembldb
provided functions: 1) get adata.frame
with all transcript annotations including some gene annotations 2) match your IDs to the IDs in the table.There is now also the columns
"tx_id_version"
available in theEnsDb
databases that you could use to match your versioned Ensembl IDs against, e.g.:Actually, the
NA
in here (i.e. the IDs that could not be mapped to tx ids in theEnsDb
) tells us that your IDs come from another Ensembl release than Ensembl 101 - otherwise all versioned transcript identifier would match.Thanks Johannes for this very useful addition, especially regarding the column
"tx_id_version"
!Thanks you all, I extracted these transcripts from a DESeq2 dataset. I used the GTEx release V8 so I suppose I should use Ensembl 26, is it correct? As from here https://gtexportal.org/home/releaseInfoPage
No, a quick search on Google found this info / track (at the UCSC Genome browser):
In other words, GENCODE V26 corresponds to Ensembl version 88, and not 26.
First of all, many many thanks Guido and Johannes.
About my script I don't understand how to put my dataframe in comparison with the annotation formula
This is my script
But these are the errors:
Why it doesn't find "EnsDb.Hsapiens.v88"? From the console I see that it was loaded
Thanks again for the help
With
edb <- ah[["AH53715"]]
you assign theEnsDb
database to the variableedb
, so you would have to useedb
instead ofEnsDb.Hsapiens.v88
in your code above - or alternatively useEnsDb.Hsapiens.v88 <- ah[["AH53715"]]
.I was blind, I'm sorry... Thanks so much!
Thanks Johannes, and the way to find the chromosome is deprecated, is it right?
What do you mean with "find the chromosome"? To get the chromosome name you can query the
"seq_name"
column (i.e. include that in the list of requested column names with parametercolumns
) - or you get the annotations as aGRanges
and you can callseqnames
on that.Ok thanks, so I'll use ensembldb not annotationdbi
Ciao suxxa, please use '
Add Reply
' when responding (not 'Add Answer
")