Search
Question: edgeR
0
7.0 years ago by
puvan001@umn.edu20 wrote:
I am new to edgeR and I am not a statistician. I analyzed my RNA seq data, I got the results as 200 up regulated and 97 down regulated genes. My questions are- 1. What is the cut off value used by edgeR to say upregulated versus down regulated? I am little bit confused here. When I checked the log fold changes I couldn't come to a conclusion. When I counted the number of genes <0, the number is more than 97. Am I doing some thing wrong? 2. If I want to do some pathway analysis, I am wondering which value I have to use. When I use log fold change values, some genes are not identified as differentially expressed though I gave different cut off values. Can somebody help me? Sumathy
modified 6.9 years ago by Gordon Smyth33k • written 7.0 years ago by puvan001@umn.edu20
0
6.9 years ago by
Gordon Smyth33k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth33k wrote:

Dear Sumathy,

I don't know whether you're doing anything wrong from a code point of view, but you're certainly not giving us enough information in your email. We don't know how you've got your results or how you've counted genes <0. Basically you need to show us the code you've used.

What genes are you refering to when you say they are "not identified"? Why should they have been?

Reading the help page for any edgeR function that performs statistical testing, such as decideTestsDGE, will tell you what cutoff values have been used.

Best wishes
Gordon

ADD COMMENTlink modified 13 months ago • written 6.9 years ago by Gordon Smyth33k
I am sorry. I?ll describe what I have done. I am looking for differentially expressed genes in two groups and each group has three replicates. First I used common dispersion (counts.csv is my data file and totalreads.csv is about my library size). counts <-read.csv("counts.csv",row.names=1) libsize <- read.csv("totalreads.csv", row.names=1) d <-DGEList (counts, group=c(2,2,2,1,1,1), lib.size=libsize$total d <- DGEList(counts, group=c(2,2,2,1,1,1), lib.size=libsize$Total) I removed reads < than 5 ( according to the manual ) d <- d[rowSums(d$counts) >= 5, ] d <- estimateCommonDisp(d) > d$common.dispersion [1] 0.008390605 de.com <- exactTest(d) summarydecideTestsDGEde.com, p.value=0.01)) -1 92 0 12357 1 190 I decided to use moderated tagwise dispersion and prior. n 10 (according to the manual I thought this would be better ) summary(d$tagwise.dispersion) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.006051 0.006051 0.011150 0.008987 0.011150 0.037460 summary(decideTestsDGE(de.tgw,p.value=0.01)) [,1] -1 96 0 12343 1 200 (What my understanding is 96 genes are down regulated and 200 genes are up regulated and I wanted to get the all these differentially expressed genes) detop.tgw <-topTags (de.tgw, n=296) topTags(de.tgw) my results-only a part ID logConc logFC P.Value FDR >ENSSSCT00000011438 -15.33546 4.312736 1.806225e-97 2.282888e-93 >ENSSSCT00000004165 -15.34398 2.862672 1.236263e-88 7.812563e-85 >ENSSSCT00000013211 -14.81470 3.003443 1.145003e-77 4.823900e-74 >ENSSSCT00000011437 -16.88133 4.625023 4.776290e-73 1.509188e-69 >ENSSSCT00000010657 -18.66400 6.789652 1.482137e-71 3.746546e-68 >ENSSSCT00000013209 -18.85435 7.246780 1.757241e-67 3.701628e-64 >ENSSSCT00000019271 -15.67690 3.069440 7.803066e-66 1.408899e-62 >ENSSSCT00000017966 -15.86827 3.089246 7.932011e-63 1.253159e-59 >ENSSSCT00000013215 -16.27135 3.161484 2.973959e-58 4.176430e-55 >ENSSSCT00000011439 -16.20229 3.461734 2.624576e-55 3.317202e-52 >ENSSSCT00000010656 -34.07557 31.8809589 1.646600e-16 4.729859e-14 >ENSSSCT00000010084 -15.37232 1.3166928 3.997218e-16 1.122685e-13 >ENSSSCT00000003998 -15.32474 -1.3062885 4.289687e-16 1.178638e-13 >ENSSSCT00000018978 -14.38557 0.9368015 1.582580e-15 4.255793e-13 >ENSSSCT00000010832 -21.11786 5.8784200 7.617618e-15 2.005814e-12 >ENSSSCT00000012919 -19.87289 3.7289682 2.159284e-14 5.569629e-12 >ENSSSCT00000002141 -15.34515 1.2304344 2.586146e-14 6.537259e-12 >ENSSSCT00000015009 -16.73944 1.3602630 1.537101e-13 3.809299e-11 >ENSSSCT00000007972 -13.99426 -0.8061797 4.105738e-13 9.934890e-11 >ENSSSCT00000009832 -20.70634 4.7030671 4.166067e-13 9.934890e-11 >ENSSSCT00000017656 -15.16996 1.1164872 8.232265e-13 1.926807e-10 >ENSSSCT00000003967 -12.27699 -0.9214116 1.414101e-12 3.249604e-10 >ENSSSCT00000017482 -13.37026 0.7404062 2.139773e-12 4.829390e-10 >ENSSSCT00000019546 -13.99564 0.9811760 2.192554e-12 4.861701e-10 My confusion starts here. I thought the fold change for down regulated genes are lower than 0. But when I counted after changing into csv. format I got more than 96 genes (According to the summary, -1 is 96). Am I doing something wrong? Generally, more than 2 fold difference is considered as differential expression. I am little bit confused about the value used by edgeR for differential expression. These questions may be too simple but my background is not statistics and I don't have a very good understanding. I'd really appreciate your comments. Thanks Sumathy ADD REPLYlink written 6.9 years ago by puvan001@umn.edu20 Dear Sumathy, Thanks for giving your code. I can now see what you are referring to. You are quite correct that after your code sequence summary(decideTestsDGE(de.tgw,p.value=0.01)) [,1] -1 96 0 12343 1 200 detop.tgw <- topTags (de.tgw, n=296) you should see exactly 96 negative and exactly 200 positive log-fold-changes in the data.frame detop.tgw. Try:  table(detop.tgw$table\$logFC > 0)

to see how many there are up and down.

In all my testing of decideTestsDGE() and topTags(), this is exactly how they behave.  If this isn't what you see, is it possible that you have simply made a mistake transferring to csv or with the counting?

edgeR uses statistical significance to decide on the number of differentially expressed genes, rather than arbitrary fold change cutoff. The approach is explained in the documentation and the cited papers.

Best wishes
Gordon

ADD REPLYlink modified 13 months ago • written 6.9 years ago by Gordon Smyth33k