Correct annotation of DGEList object from edgeR package regarding an RNA-Seq dataset
1
0
Entering edit mode
svlachavas ▴ 840
@svlachavas-7225
Last seen 14 months ago
Germany/Heidelberg/German Cancer Resear…

Dear Community,

i would like to annotate a created DGEList object using the DGEList function from the edgeR function,with unique gene symbols for ensemble identifiers. My approach is the following :

y <- DGEList(counts=assay(coad_clear), group=colData(coad_clear)$definition)

 head(y$counts[1:3,1:3])
                TCGA-3L-AA1B-01A-11R-A37K-07 TCGA-DM-A1D8-01A-11R-A155-07
ENSG00000000003                         7280                        10395
ENSG00000000005                           23                            1
ENSG00000000419                         2065                         4158
                TCGA-AU-6004-01A-11R-1723-07
ENSG00000000003                         2547
ENSG00000000005                           27
ENSG00000000419                         1465

 head(y$samples)
                                           group lib.size norm.factors
TCGA-3L-AA1B-01A-11R-A37K-07 Primary solid Tumor 42553617            1
TCGA-DM-A1D8-01A-11R-A155-07 Primary solid Tumor 60377942            1
TCGA-AU-6004-01A-11R-1723-07 Primary solid Tumor 47402733            1
TCGA-T9-A92H-01A-11R-A37K-07 Primary solid Tumor 46429596            1
TCGA-AA-3663-11A-01R-1723-07 Solid Tissue Normal 35484802            1
TCGA-AA-A01T-01A-21R-A16W-07 Primary solid Tumor 15405325            1

#The one approach i followed:

dim(y)
[1] 56963   497

gene.ids <-  select(org.Hs.eg.db, rownames(y), keytype="ENSEMBL",column="SYMBOL")
'select()' returned 1:many mapping between keys and columns

 dim(gene.ids)
[1] 57310     2

head(gene.ids)

          ENSEMBL   SYMBOL

1 ENSG00000000003   TSPAN6
2 ENSG00000000005     TNMD
3 ENSG00000000419     DPM1
4 ENSG00000000457    SCYL3
5 ENSG00000000460 C1orf112
6 ENSG00000000938      FGR

sum(duplicated(gene.ids$ENSEMBL))
[1] 347

gene.ids <- gene.ids[!duplicated(gene.ids$ENSEMBL),] 

iidentical(gene.ids$ENSEMBL,rownames(y))
[1] TRUE

y$genes <- gene.ids

head(y$genes)
          ENSEMBL   SYMBOL
1 ENSG00000000003   TSPAN6
2 ENSG00000000005     TNMD
3 ENSG00000000419     DPM1
4 ENSG00000000457    SCYL3
5 ENSG00000000460 C1orf112
6 ENSG00000000938      FGR

y2 <- y[!duplicated(y$genes$SYMBOL),]

dim(y2)

[1] 25214   497

I wanted to ask if there is a more straightforward or more accurate approach, in order to perform the above annotation ? or my implementation has any pitfalls ? I have also checked the alternative function mapIds, but this returns a vector not a data frame. My aim is to perform downstream DE gene analysis. 

Thank you in advance,

Efstathios

 

edger DGEList org.hs.eg.db gene annotation • 4.0k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 12 hours ago
The city by the bay

Just use mapIds and save the output in a data.frame:

gene.ids <- mapIds(org.Hs.eg.db, keys=rownames(y),
                   keytype="ENSEMBL", column="SYMBOL")
y$genes <- data.frame(ENSEMBL=rownames(y), SYMBOL=gene.ids)

Done.

ADD COMMENT
0
Entering edit mode

Thanks Aaron for the update. The simpler the better.

ADD REPLY
0
Entering edit mode

What is "coad_clear" in this case? What data have you allotted to it? Can you please share the whole code? Or can anyone answer this question?

ADD REPLY

Login before adding your answer.

Traffic: 535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6