adding "symbol" column in my differential expressed genes in EdgeR
2
0
Entering edit mode
@alihakimzadeh73-20840
Last seen 4.3 years ago

Hi,

I try to do differential expression analysis by "EdgeR", I have the "counts.csv " which achieved by "HTSeq". i want to add the "symbol" column with the gene symbol corresponding to the Gene ID to a data frame in EdgeR to have also the symbols in up and down-regulated genes table. Here is my code which i used for differential expression analysis:

library(edgeR)
library(org.Hs.eg.db)
x<-read.csv("counts.csv")
y<-DGEList(counts=x[,2:51], genes = x[,1]) 
y <- calcNormFactors(y)
group <- factor(c(rep("high",30),rep("low",20)))
time<-factor(c("pre","post","pre","post","pre","post","pre","post","pre","post",
           "pre","post","pre","post","pre","post","pre","post","pre","post",
           "pre","post","pre","post","pre","post","pre","post","pre","post",
           "pre","post","pre","post","pre","post","pre","post","pre","post",
           "pre","post","pre","post","pre","post","pre","post","pre","post"))
data.frame(sample=colnames(y),group,time) #data frame
design<-model.matrix(~group+time) 
y<-estimateDisp(y,design)
fit<-glmFit(y,design)
lrt<-glmLRT(fit,coef = 2)
deg <-topTags(lrt, n = Inf , p= 0.05)$table
up <-deg[deg$logFC > 0,]
down <-deg[deg$logFC < 0,]
write.csv(up, file="up.csv")
write.csv(down, file="down.csv")

i try to use this code to insert symbols to my data frame but it doesn't work! can anyone help me to go through it?

mp=gsub("\\..*","",row.names(y))
y$symbol<- mapIds(org.Hs.eg.db, keys= row.names(mp),
                  keytype="ENSEMBL", column="SYMBOL")

This is also the results table that i got:

head(up)

              genes    logFC     logCPM         LR        PValue           FDR
5659  ENSG00000125207   4.383522   0.5658056 1139.1891   1.003436e-249     5.797650e-245
25588 ENSG00000222057  4.589772   -0.2246701  980.9821 2.444029e-215   7.060555e-211
50136 ENSG00000261177  5.207807    -0.1559902  810.7058 2.537797e-178   4.887627e-174
35996 ENSG00000236941  2.595311    2.0293394  790.5476 6.126318e-174   8.849161e-170
29288 ENSG00000227615  5.668960    0.4194818  767.6254 5.902889e-169    6.821142e-165
17466 ENSG00000196564  4.778656    -0.8531067  715.6777 1.165581e-157   1.122415e-153

the error phrase that i received is:

Error in mapIds_base(x, keys, column, keytype, ..., multiVals = multiVals) : mapIds must have at least one key to match against.

Thanks

EdgeR • 2.3k views
ADD COMMENT
3
Entering edit mode
@james-w-macdonald-5106
Last seen 17 hours ago
United States

You have an error in your code, and you are overlooking something. First the error. This code won't do what you apparently think it does:

y$symbol <- mapIds(org.Hs.eg.db, keys= row.names(mp),
                  keytype="ENSEMBL", column="SYMBOL")

Because a DGEList doesn't have a 'symbol' list item. Well, a DGEList is just a list, and you can add any list item to it that you like, but it won't be used for anything, because none of the code in edgeR expects a 'symbol' list item so it will be ignored.

> class(d)
[1] "DGEList"
attr(,"package")
[1] "edgeR"
> d$symbol <- "HERESASYMBOLFORYA"
> d
An object of class "DGEList"
$counts
       x0 x1 x2
gene.1  2  8 58
gene.2  3  2  5
gene.3  7  2  4
gene.4  5  3  4
gene.5  2  1  1
95 more rows ...

$samples
   group lib.size norm.factors
x0     1      379    0.9840312
x1     1      384    1.0295185
x2     1      473    0.9870905

$common.dispersion
[1] 0.09547519

$AveLogCPM
[1] 15.76788 13.64082 13.90450 13.82198 12.97364
95 more elements ...

$symbol
[1] "HERESASYMBOLFORYA"

> topTags(glmLRT(glmFit(d, design, dispersion = dispersion.true), 2))
Coefficient:  x 
             logFC   logCPM        LR       PValue        FDR
gene.1   1.3630814 16.06110 11.868112 0.0005710326 0.05710326
gene.66 -1.7069713 14.11364  8.224193 0.0041335577 0.20667789
gene.5  -1.1682427 14.17484  4.740671 0.0294575861 0.69947257
gene.13 -1.1283106 14.23087  4.523089 0.0334403966 0.69947257
gene.11  1.4918801 13.53773  4.090369 0.0431282295 0.69947257
gene.77 -1.3691516 13.65080  3.995847 0.0456125163 0.69947257
gene.19  1.0616309 14.19212  3.876618 0.0489630801 0.69947257
gene.50 -1.1771528 13.66889  3.502580 0.0612733098 0.71180393
gene.39  0.9668069 14.01184  2.988546 0.0838554094 0.71180393
gene.91  1.1008979 13.71190  2.849609 0.0913961719 0.71180393

## put some random stuff in the 'genes' list item
> d$genes$whatevs <- sample(letters, 100, TRUE)
> topTags(glmLRT(glmFit(d, design, dispersion = dispersion.true), 2))
Coefficient:  x 
        whatevs      logFC   logCPM        LR       PValue        FDR
gene.1        x  1.3630814 16.06110 11.868112 0.0005710326 0.05710326
gene.66       z -1.7069713 14.11364  8.224193 0.0041335577 0.20667789
gene.5        e -1.1682427 14.17484  4.740671 0.0294575861 0.69947257
gene.13       a -1.1283106 14.23087  4.523089 0.0334403966 0.69947257
gene.11       m  1.4918801 13.53773  4.090369 0.0431282295 0.69947257
gene.77       b -1.3691516 13.65080  3.995847 0.0456125163 0.69947257
gene.19       o  1.0616309 14.19212  3.876618 0.0489630801 0.69947257
gene.50       g -1.1771528 13.66889  3.502580 0.0612733098 0.71180393
gene.39       o  0.9668069 14.01184  2.988546 0.0838554094 0.71180393
gene.91       c  1.1008979 13.71190  2.849609 0.0913961719 0.71180393

## now we get the annotations added from the `genes` list item

From your topTags output we can infer that you do have a 'genes' list item in your DGEList, and you can just add whatever extras you need to that, and you can use the genes column of the existing genes list item as your keys, because they are obviously already Ensembl IDs.

So what you really want to do is

``` y$genes$symbol <- mapIds(org.Hs.eg.db, y$genes$genes, "SYMBOL","ENSEMBL")

ADD COMMENT
0
Entering edit mode

Thank you so much! i appreciate your description, and i got it. It was a comprehensive one. But i get a new error now, unfortunately.

Error in .testForValidKeys(x, keys, keytype, fks) : 'keys' must be a character vector

ADD REPLY
1
Entering edit mode

So the error message is supposed to be self-explanatory. It is telling you that the keys have to be a character vector. Which almost surely means the keys are a factor. So you need to coerce to character first. If you don't understand what I am telling you, then you need to read An Intro to R.

ADD REPLY
0
Entering edit mode

i reckon i will need that since i am not into R. thanks again, James. i appreciate that.

ADD REPLY
0
Entering edit mode
Yunshun Chen ▴ 900
@yunshun-chen-5451
Last seen 5 days ago
Australia

You didn't show what your mp is. You set keys=row.names(mp) in mapIds(). But are they Ensembl Ids as required?

Note that mapIds() is not an edgeR function. This questions is more related to how to use annotation packages such as org.Hs.eg.db.

ADD COMMENT
0
Entering edit mode

mp is row.names(y), indeed, my main problem is the annotation and adding symbols to report the genes which are upregulated and downregulated in the experiment. unfortunately, as you see I couldn't make it happen.

ADD REPLY
1
Entering edit mode

I can see how you defined mp. But have you ever checked what your mp actually is? You need to extract Endembl Ids from the genes column of y$genes and use it as keys in mapIds().

ADD REPLY
0
Entering edit mode

Thanks, Yunshun. I understand why it doesn't work correctly.

ADD REPLY

Login before adding your answer.

Traffic: 857 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6