edgeR and FDR values always equals 1
4
0
Entering edit mode
A Gossner ▴ 50
@a-gossner-4349
Last seen 8.6 years ago
UK
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
• 2.0k views
ADD COMMENT
0
Entering edit mode
Simon Anders ★ 3.8k
@simon-anders-3855
Last seen 4.4 years ago
Zentrum für Molekularbiologie, Universi…
Hi On 11/11/2010 11:44 AM, A Gossner wrote: > While using edgeR to analysis my Tag-seq data, no matter which way I > analyse the data common or tagwise dispersion the FDR value is always 1. > Typical output is shown below; [...] >> d$common.dispersion > [1] 4.884378 A common dispersion value of 4.8 means that your expresiion typically varies between replicates by 220% (that's sqrt(4.8.)). In other words, there is only noise and no signal in your data -- or you are doing something fundamentally wrong. Look at some scatter plots, plotting the count values of one sample against the count values of a replicate sample on a log-log scale (or better, plot asinh(count) ). Do they seem to correlate? Simon
ADD COMMENT
0
Entering edit mode
Hi Simon, Have plotted two samples as you suggested [ plot (d$counts[,2],d$counts[,3],log="xy")] and saved the file on the FTP server. ftp.ed.ac.uk Login using the username 'anonymous' and use your email address as password. get /edupload/logplot.eps as well as get /edupload/plot2.eps and get /edupload/plot2.eps Which are plot asinh(count) graphs you suggested but must confess not sure if plotted correctly, plot(d$counts[,2],asinh(d$counts[,2])) plot(d$counts[,3],asinh(d$counts[,3])) but they are plots 1 and 2 respectively. There does seem to be some correlation on the log-log plot. Thanks Anton On 19:59, Simon Anders wrote: > Hi > > On 11/11/2010 11:44 AM, A Gossner wrote: >> While using edgeR to analysis my Tag-seq data, no matter which way I >> analyse the data common or tagwise dispersion the FDR value is always 1. >> Typical output is shown below; > [...] >>> d$common.dispersion >> [1] 4.884378 > > A common dispersion value of 4.8 means that your expresiion typically > varies between replicates by 220% (that's sqrt(4.8.)). In other words, > there is only noise and no signal in your data -- or you are doing > something fundamentally wrong. > > Look at some scatter plots, plotting the count values of one sample > against the count values of a replicate sample on a log-log scale (or > better, plot asinh(count) ). Do they seem to correlate? > > Simon > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
ADD REPLY
0
Entering edit mode
Hi Anton On 11/12/2010 06:33 PM, A Gossner wrote: > Have plotted two samples as you suggested [ plot > (d$counts[,2],d$counts[,3],log="xy")] and saved the file on the FTP server. > ftp.ed.ac.uk > Login using the username 'anonymous' and use your email address as > password. > get /edupload/logplot.eps as well as get /edupload/plot2.eps and get > /edupload/plot2.eps > There does seem to be some correlation on the log-log plot. Yes, there is some correlation but it's not great. If you have, say, 100 counts in one sample, you seem to get anything from 0 to 3000 counts in the other. Hence, even if you get 5-fold changes between treatment and control, this will not be significant. > Which are plot asinh(count) graphs you suggested but must confess not > sure if plotted correctly, plot(d$counts[,2],asinh(d$counts[,2])) > plot(d$counts[,3],asinh(d$counts[,3])) > but they are plots 1 and 2 respectively. This is not what I meant. It was rather something trivial: If you just want to compare correlation, then instead of plot (d$counts[,2],d$counts[,3],log="xy") it can be nicer to use plot( asinh(d$counts[,2]), asinh(d$counts[,3]) ) because the zeroes don't get suppressed. (The area hyperbolic sine is linear for very low values and then becomes quickly indistuingishable from a log. hence this is a cheap way to get a log-log-like plot with zeroes. The axis tick marks are completely misleading though.) Simon
ADD REPLY
0
Entering edit mode
A Gossner ▴ 50
@a-gossner-4349
Last seen 8.6 years ago
UK
Gordon K Smyth <smyth at="" ...=""> writes: > > Hi Anton, > > This is the way the software is designed to behave when there is no > differential expression between your groups. The software is telling you > that there is no statistically significant differential expression. > > The reason for this result seems to be the enormously high values for the > dispersions. The values you have (3.5 up to 7) are an order of magnitude > higher than my lab has ever seen for RNA-seq or SAGE-seq data. This > represents enormous inconsistency between your replicate samples, and > suggests something might wrong with your data setup. Another curious fact > is that all the putative differential expression is one direction, down in > the INF group. > > To examine your data setup, you might try an MDS plot (plotMDS). This > would show you if you have one or more outlier libraries, or if one or > more libraries are mis-classified into groups. You might also explore > your data using smear plots plotSmear() to get a better idea of what is > happening. You must have some very unusual samples. > > To combat the fact that much of the differential expression is in one > direction, you might try normalizing, calcNormFactors(). > > Best wishes > Gordon > > > Message: 25 > > Date: Thu, 11 Nov 2010 10:44:07 +0000 > > From: A Gossner <a.gossner at="" ...=""> > > To: bioconductor at ... > > Subject: [BioC] edgeR and FDR values always equals 1 > > > > Hi, > > > > While using edgeR to analysis my Tag-seq data, no matter which way I > > analyse the data common or tagwise dispersion the FDR value is always 1. > > Typical output is shown below; > > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:31}}
ADD COMMENT
0
Entering edit mode
A Gossner ▴ 50
@a-gossner-4349
Last seen 8.6 years ago
UK
Hi Gordon, Gordon K Smyth <smyth at="" ...=""> writes: > Another curious fact is that all the putative differential expression is one > direction, down in the INF group. Have plotted the fold change smear plots and the top 500 DE tags seem to lie in a very strange line. Not sure what the cause/effect this is due to? > To examine your data setup, you might try an MDS plot (plotMDS). Have plotted the MDS plots and there are obviously differences between the control samples (highlighted) and the infected samples but also large differences within the groups especially in dimension 1, especially if judged by the scale in the edgeR manual pdf. You might also explore > your data using smear plots plotSmear() to get a better idea of what is > happening. You must have some very unusual samples. > All plots are in a pdf saved on the following FTP server. ftp.ed.ac.uk Login using the username 'anonymous' and use your email address as password. get /edupload/edgerplots.pdf > To combat the fact that much of the differential expression is in one > direction, you might try normalizing, calcNormFactors(). > will try normalizing as you suggest. Thanks for you help. Anton
ADD COMMENT
0
Entering edit mode
Hi Anton. I've had a quick look at both batches of your plots. The replicates are quite variable, but something is indeed unusual. Can you post your full set of commands used? Cheers, Mark On 2010-11-13, at 9:51 PM, A Gossner wrote: > Hi Gordon, > > Gordon K Smyth <smyth at="" ...=""> writes: > > >> Another curious fact is that all the putative differential expression is one >> direction, down in the INF group. > > Have plotted the fold change smear plots and the top 500 DE tags seem to lie in > a very strange line. Not sure what the cause/effect this is due to? > >> To examine your data setup, you might try an MDS plot (plotMDS). > > > Have plotted the MDS plots and there are obviously differences between the > control samples (highlighted) and the infected samples but also large > differences within the groups especially in dimension 1, especially if judged by > the scale in the edgeR manual pdf. > > You might also explore >> your data using smear plots plotSmear() to get a better idea of what is >> happening. You must have some very unusual samples. >> > > All plots are in a pdf saved on the following FTP server. > ftp.ed.ac.uk > Login using the username 'anonymous' and use your email address as password. > get /edupload/edgerplots.pdf > >> To combat the fact that much of the differential expression is in one >> direction, you might try normalizing, calcNormFactors(). >> > > will try normalizing as you suggest. > > Thanks for you help. > > Anton > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: mrobinson at wehi.edu.au e: m.robinson at garvan.org.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 ------------------------------ ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD REPLY
0
Entering edit mode
A Gossner ▴ 50
@a-gossner-4349
Last seen 8.6 years ago
UK
Mark Robinson <mrobinson at="" ...=""> writes: Hi Mark, > > Hi Anton. > > I've had a quick look at both batches of your plots. The replicates are quite variable, but something is > indeed unusual. Can you post your full set of commands used? > > Cheers, > Mark > Typically for this data just following the manual examples, data files are tab files with one column tag sequence and one counts plus header for each column. library(edgeR) targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE) targets d <- readDGE(targets, header=TRUE) d d <- d[rowSums(d$counts) > 5, ] d #pdf(file="MDSLND10.pdf") plotMDS.dge(d, main="MDS Plot for LND10 Data", xlim=c(-5,5)) #dev.off() d <- estimateCommonDisp(d) names(d) d$samples$lib.size d$common.lib.size colSums(d$pseudo.alt) d$common.dispersion sqrt(d$common.dispersion) de.common <- exactTest(d) topTags(de.common) detags.com <- rownames(topTags(de.common)$table) d$counts[detags.com, ] topTags(de.common, sort.by = "logFC") sum(de.common$table$p.value < 0.01) #p-value <0.01 sum(de.common$table$p.value < 0.05) #p-value <0.05 top45 <- topTags(de.common, n = 45) sum(top45$table$logFC > 0) sum(top45$table$logFC < 0) top751 <- topTags(de.common, n = 751) sum(top751$table$logFC > 0) sum(top751$table$logFC < 0) sum(p.adjust(de.common$table$p.value, method = "BH") < 0.05) sum(p.adjust(de.common$table$p.value, method = "BH") < 0.1) detags500.com <- rownames(topTags(de.common, n = 500)$table) plotSmear(d, de.tags = detags500.com, main = "FC plot using common dispersion") abline(h = c(-2, 2), col = "dodgerblue", lwd = 2) #Moderated tagwise dispersion d <- estimateTagwiseDisp(d, prior.n = 10) names(d) d$prior.n head(d$tagwise.dispersion) summary(d$tagwise.dispersion) d$common.dispersion #Estimating smoothing using an approximate eBayes rule prior.n <- estimateSmoothing(d) prior.n #exactTest de.tagwise <- exactTest(d, common.disp = FALSE) topTags(de.tagwise) topTags(de.tagwise, n = 10, sort.by = "logFC") sum(de.tagwise$table$p.value < 0.01) #p-value <0.01 sum(de.tagwise$table$p.value < 0.05) #p-value <0.05 top12 <- topTags(de.tagwise, n = 12) sum(top12$table$logFC > 0) sum(top12$table$logFC < 0) top537 <- topTags(de.tagwise, n = 537) sum(top537$table$logFC > 0) sum(top537$table$logFC < 0) detags.tgw <- rownames(topTags(de.tagwise)$table) detags.com <- rownames(topTags(de.common)$table) d$counts[detags.tgw, ] d$counts[detags.com, ] sum(rownames(topTags(de.tagwise)$table) %in% rownames(topTags(de.common)$table)) sum(rownames(topTags(de.tagwise, n = 100)$table) %in% rownames(topTags(de.common, n = 100)$table)) sum(rownames(topTags(de.tagwise, n = 1000)$table) %in% rownames(topTags(de.common,n = 1000)$table))/1000 * 100 sum(p.adjust(de.common$table$p.value, method = "BH") < 0.05) sum(p.adjust(de.common$table$p.value, method = "BH") < 0.1) sum(p.adjust(de.tagwise$table$p.value, method = "BH") < 0.05) sum(p.adjust(de.tagwise$table$p.value, method = "BH") < 0.1) detags500.com <- rownames(topTags(de.common, n = 500)$table) detags500.tgw <- rownames(topTags(de.tagwise, n = 500)$table) par(mfcol = c(2, 1)) plotSmear(d, de.tags = detags500.com, main = "Using common dispersion") abline(h = c(-2, 2), col = "dodgerblue", lwd = 2) plotSmear(d, de.tags = detags500.tgw, main = "Using tagwise dispersions") abline(h = c(-2, 2), col = "dodgerblue", lwd = 2)
ADD COMMENT

Login before adding your answer.

Traffic: 706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6