Question: adding "symbol" column in my differential expressed genes in EdgeR
0
gravatar for alihakimzadeh73
26 days ago by
alihakimzadeh730 wrote:

Hi,

I try to do differential expression analysis by "EdgeR", I have the "counts.csv " which achieved by "HTSeq". i want to add the "symbol" column with the gene symbol corresponding to the Gene ID to a data frame in EdgeR to have also the symbols in up and down-regulated genes table. Here is my code which i used for differential expression analysis:

library(edgeR)
library(org.Hs.eg.db)
x<-read.csv("counts.csv")
y<-DGEList(counts=x[,2:51], genes = x[,1]) 
y <- calcNormFactors(y)
group <- factor(c(rep("high",30),rep("low",20)))
time<-factor(c("pre","post","pre","post","pre","post","pre","post","pre","post",
           "pre","post","pre","post","pre","post","pre","post","pre","post",
           "pre","post","pre","post","pre","post","pre","post","pre","post",
           "pre","post","pre","post","pre","post","pre","post","pre","post",
           "pre","post","pre","post","pre","post","pre","post","pre","post"))
data.frame(sample=colnames(y),group,time) #data frame
design<-model.matrix(~group+time) 
y<-estimateDisp(y,design)
fit<-glmFit(y,design)
lrt<-glmLRT(fit,coef = 2)
deg <-topTags(lrt, n = Inf , p= 0.05)$table
up <-deg[deg$logFC > 0,]
down <-deg[deg$logFC < 0,]
write.csv(up, file="up.csv")
write.csv(down, file="down.csv")

i try to use this code to insert symbols to my data frame but it doesn't work! can anyone help me to go through it?

mp=gsub("\\..*","",row.names(y))
y$symbol<- mapIds(org.Hs.eg.db, keys= row.names(mp),
                  keytype="ENSEMBL", column="SYMBOL")

This is also the results table that i got:

head(up)

              genes    logFC     logCPM         LR        PValue           FDR
5659  ENSG00000125207   4.383522   0.5658056 1139.1891   1.003436e-249     5.797650e-245
25588 ENSG00000222057  4.589772   -0.2246701  980.9821 2.444029e-215   7.060555e-211
50136 ENSG00000261177  5.207807    -0.1559902  810.7058 2.537797e-178   4.887627e-174
35996 ENSG00000236941  2.595311    2.0293394  790.5476 6.126318e-174   8.849161e-170
29288 ENSG00000227615  5.668960    0.4194818  767.6254 5.902889e-169    6.821142e-165
17466 ENSG00000196564  4.778656    -0.8531067  715.6777 1.165581e-157   1.122415e-153

the error phrase that i received is:

Error in mapIds_base(x, keys, column, keytype, ..., multiVals = multiVals) : mapIds must have at least one key to match against.

Thanks

edger • 103 views
ADD COMMENTlink modified 25 days ago by James W. MacDonald52k • written 26 days ago by alihakimzadeh730
Answer: adding "symbol" column in my differential expressed genes in EdgeR
3
gravatar for James W. MacDonald
25 days ago by
United States
James W. MacDonald52k wrote:

You have an error in your code, and you are overlooking something. First the error. This code won't do what you apparently think it does:

y$symbol <- mapIds(org.Hs.eg.db, keys= row.names(mp),
                  keytype="ENSEMBL", column="SYMBOL")

Because a DGEList doesn't have a 'symbol' list item. Well, a DGEList is just a list, and you can add any list item to it that you like, but it won't be used for anything, because none of the code in edgeR expects a 'symbol' list item so it will be ignored.

> class(d)
[1] "DGEList"
attr(,"package")
[1] "edgeR"
> d$symbol <- "HERESASYMBOLFORYA"
> d
An object of class "DGEList"
$counts
       x0 x1 x2
gene.1  2  8 58
gene.2  3  2  5
gene.3  7  2  4
gene.4  5  3  4
gene.5  2  1  1
95 more rows ...

$samples
   group lib.size norm.factors
x0     1      379    0.9840312
x1     1      384    1.0295185
x2     1      473    0.9870905

$common.dispersion
[1] 0.09547519

$AveLogCPM
[1] 15.76788 13.64082 13.90450 13.82198 12.97364
95 more elements ...

$symbol
[1] "HERESASYMBOLFORYA"

> topTags(glmLRT(glmFit(d, design, dispersion = dispersion.true), 2))
Coefficient:  x 
             logFC   logCPM        LR       PValue        FDR
gene.1   1.3630814 16.06110 11.868112 0.0005710326 0.05710326
gene.66 -1.7069713 14.11364  8.224193 0.0041335577 0.20667789
gene.5  -1.1682427 14.17484  4.740671 0.0294575861 0.69947257
gene.13 -1.1283106 14.23087  4.523089 0.0334403966 0.69947257
gene.11  1.4918801 13.53773  4.090369 0.0431282295 0.69947257
gene.77 -1.3691516 13.65080  3.995847 0.0456125163 0.69947257
gene.19  1.0616309 14.19212  3.876618 0.0489630801 0.69947257
gene.50 -1.1771528 13.66889  3.502580 0.0612733098 0.71180393
gene.39  0.9668069 14.01184  2.988546 0.0838554094 0.71180393
gene.91  1.1008979 13.71190  2.849609 0.0913961719 0.71180393

## put some random stuff in the 'genes' list item
> d$genes$whatevs <- sample(letters, 100, TRUE)
> topTags(glmLRT(glmFit(d, design, dispersion = dispersion.true), 2))
Coefficient:  x 
        whatevs      logFC   logCPM        LR       PValue        FDR
gene.1        x  1.3630814 16.06110 11.868112 0.0005710326 0.05710326
gene.66       z -1.7069713 14.11364  8.224193 0.0041335577 0.20667789
gene.5        e -1.1682427 14.17484  4.740671 0.0294575861 0.69947257
gene.13       a -1.1283106 14.23087  4.523089 0.0334403966 0.69947257
gene.11       m  1.4918801 13.53773  4.090369 0.0431282295 0.69947257
gene.77       b -1.3691516 13.65080  3.995847 0.0456125163 0.69947257
gene.19       o  1.0616309 14.19212  3.876618 0.0489630801 0.69947257
gene.50       g -1.1771528 13.66889  3.502580 0.0612733098 0.71180393
gene.39       o  0.9668069 14.01184  2.988546 0.0838554094 0.71180393
gene.91       c  1.1008979 13.71190  2.849609 0.0913961719 0.71180393

## now we get the annotations added from the `genes` list item

From your topTags output we can infer that you do have a 'genes' list item in your DGEList, and you can just add whatever extras you need to that, and you can use the genes column of the existing genes list item as your keys, because they are obviously already Ensembl IDs.

So what you really want to do is

``` y$genes$symbol <- mapIds(org.Hs.eg.db, y$genes$genes, "SYMBOL","ENSEMBL")

ADD COMMENTlink written 25 days ago by James W. MacDonald52k

Thank you so much! i appreciate your description, and i got it. It was a comprehensive one. But i get a new error now, unfortunately.

Error in .testForValidKeys(x, keys, keytype, fks) : 'keys' must be a character vector

ADD REPLYlink modified 25 days ago • written 25 days ago by alihakimzadeh730
1

So the error message is supposed to be self-explanatory. It is telling you that the keys have to be a character vector. Which almost surely means the keys are a factor. So you need to coerce to character first. If you don't understand what I am telling you, then you need to read An Intro to R.

ADD REPLYlink written 24 days ago by James W. MacDonald52k

i reckon i will need that since i am not into R. thanks again, James. i appreciate that.

ADD REPLYlink written 17 days ago by alihakimzadeh730
Answer: adding "symbol" column in my differential expressed genes in EdgeR
0
gravatar for Yunshun Chen
26 days ago by
Yunshun Chen540
Australia
Yunshun Chen540 wrote:

You didn't show what your mp is. You set keys=row.names(mp) in mapIds(). But are they Ensembl Ids as required?

Note that mapIds() is not an edgeR function. This questions is more related to how to use annotation packages such as org.Hs.eg.db.

ADD COMMENTlink modified 26 days ago • written 26 days ago by Yunshun Chen540

mp is row.names(y), indeed, my main problem is the annotation and adding symbols to report the genes which are upregulated and downregulated in the experiment. unfortunately, as you see I couldn't make it happen.

ADD REPLYlink written 25 days ago by alihakimzadeh730
1

I can see how you defined mp. But have you ever checked what your mp actually is? You need to extract Endembl Ids from the genes column of y$genes and use it as keys in mapIds().

ADD REPLYlink written 25 days ago by Yunshun Chen540

Thanks, Yunshun. I understand why it doesn't work correctly.

ADD REPLYlink written 25 days ago by alihakimzadeh730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 147 users visited in the last hour