Search
Question: Correct annotation of DGEList object from edgeR package regarding an RNA-Seq dataset
0
gravatar for svlachavas
7 months ago by
svlachavas610
Greece/Athens/National Hellenic Research Foundation
svlachavas610 wrote:

Dear Community,

i would like to annotate a created DGEList object using the DGEList function from the edgeR function,with unique gene symbols for ensemble identifiers. My approach is the following :

y <- DGEList(counts=assay(coad_clear), group=colData(coad_clear)$definition)

 head(y$counts[1:3,1:3])
                TCGA-3L-AA1B-01A-11R-A37K-07 TCGA-DM-A1D8-01A-11R-A155-07
ENSG00000000003                         7280                        10395
ENSG00000000005                           23                            1
ENSG00000000419                         2065                         4158
                TCGA-AU-6004-01A-11R-1723-07
ENSG00000000003                         2547
ENSG00000000005                           27
ENSG00000000419                         1465

 head(y$samples)
                                           group lib.size norm.factors
TCGA-3L-AA1B-01A-11R-A37K-07 Primary solid Tumor 42553617            1
TCGA-DM-A1D8-01A-11R-A155-07 Primary solid Tumor 60377942            1
TCGA-AU-6004-01A-11R-1723-07 Primary solid Tumor 47402733            1
TCGA-T9-A92H-01A-11R-A37K-07 Primary solid Tumor 46429596            1
TCGA-AA-3663-11A-01R-1723-07 Solid Tissue Normal 35484802            1
TCGA-AA-A01T-01A-21R-A16W-07 Primary solid Tumor 15405325            1

#The one approach i followed:

dim(y)
[1] 56963   497

gene.ids <-  select(org.Hs.eg.db, rownames(y), keytype="ENSEMBL",column="SYMBOL")
'select()' returned 1:many mapping between keys and columns

 dim(gene.ids)
[1] 57310     2

head(gene.ids)

          ENSEMBL   SYMBOL

1 ENSG00000000003   TSPAN6
2 ENSG00000000005     TNMD
3 ENSG00000000419     DPM1
4 ENSG00000000457    SCYL3
5 ENSG00000000460 C1orf112
6 ENSG00000000938      FGR

sum(duplicated(gene.ids$ENSEMBL))
[1] 347

gene.ids <- gene.ids[!duplicated(gene.ids$ENSEMBL),] 

iidentical(gene.ids$ENSEMBL,rownames(y))
[1] TRUE

y$genes <- gene.ids

head(y$genes)
          ENSEMBL   SYMBOL
1 ENSG00000000003   TSPAN6
2 ENSG00000000005     TNMD
3 ENSG00000000419     DPM1
4 ENSG00000000457    SCYL3
5 ENSG00000000460 C1orf112
6 ENSG00000000938      FGR

y2 <- y[!duplicated(y$genes$SYMBOL),]

dim(y2)

[1] 25214   497

I wanted to ask if there is a more straightforward or more accurate approach, in order to perform the above annotation ? or my implementation has any pitfalls ? I have also checked the alternative function mapIds, but this returns a vector not a data frame. My aim is to perform downstream DE gene analysis. 

Thank you in advance,

Efstathios

 

ADD COMMENTlink modified 7 months ago by Aaron Lun20k • written 7 months ago by svlachavas610
3
gravatar for Aaron Lun
7 months ago by
Aaron Lun20k
Cambridge, United Kingdom
Aaron Lun20k wrote:

Just use mapIds and save the output in a data.frame:

gene.ids <- mapIds(org.Hs.eg.db, keys=rownames(y),
                   keytype="ENSEMBL", column="SYMBOL")
y$genes <- data.frame(ENSEMBL=rownames(y), SYMBOL=gene.ids)

Done.

ADD COMMENTlink modified 7 months ago • written 7 months ago by Aaron Lun20k

Thanks Aaron for the update. The simpler the better.

ADD REPLYlink modified 7 months ago • written 7 months ago by svlachavas610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 178 users visited in the last hour