edgeR and FDR values always equals 1

0

Entering edit mode

A Gossner ▴ 50

@a-gossner-4349

Last seen 7.9 years ago

UK

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

• 1.7k views

ADD COMMENT • link 13.5 years ago A Gossner ▴ 50

0

Entering edit mode

Simon Anders ★ 3.7k

@simon-anders-3855

Last seen 3.7 years ago

Zentrum für Molekularbiologie, Universi…

Hi On 11/11/2010 11:44 AM, A Gossner wrote: > While using edgeR to analysis my Tag-seq data, no matter which way I > analyse the data common or tagwise dispersion the FDR value is always 1. > Typical output is shown below; [...] >> d$common.dispersion > [1] 4.884378 A common dispersion value of 4.8 means that your expresiion typically varies between replicates by 220% (that's sqrt(4.8.)). In other words, there is only noise and no signal in your data -- or you are doing something fundamentally wrong. Look at some scatter plots, plotting the count values of one sample against the count values of a replicate sample on a log-log scale (or better, plot asinh(count) ). Do they seem to correlate? Simon

ADD COMMENT • link 13.5 years ago Simon Anders ★ 3.7k

0

Entering edit mode

Hi Simon, Have plotted two samples as you suggested [ plot (d$counts[,2],d$counts[,3],log="xy")] and saved the file on the FTP server. ftp.ed.ac.uk Login using the username 'anonymous' and use your email address as password. get /edupload/logplot.eps as well as get /edupload/plot2.eps and get /edupload/plot2.eps Which are plot asinh(count) graphs you suggested but must confess not sure if plotted correctly, plot(d$counts[,2],asinh(d$counts[,2])) plot(d$counts[,3],asinh(d$counts[,3])) but they are plots 1 and 2 respectively. There does seem to be some correlation on the log-log plot. Thanks Anton On 19:59, Simon Anders wrote: > Hi > > On 11/11/2010 11:44 AM, A Gossner wrote: >> While using edgeR to analysis my Tag-seq data, no matter which way I >> analyse the data common or tagwise dispersion the FDR value is always 1. >> Typical output is shown below; > [...] >>> d$common.dispersion >> [1] 4.884378 > > A common dispersion value of 4.8 means that your expresiion typically > varies between replicates by 220% (that's sqrt(4.8.)). In other words, > there is only noise and no signal in your data -- or you are doing > something fundamentally wrong. > > Look at some scatter plots, plotting the count values of one sample > against the count values of a replicate sample on a log-log scale (or > better, plot asinh(count) ). Do they seem to correlate? > > Simon > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

ADD REPLY • link 13.5 years ago A Gossner ▴ 50

0

Entering edit mode

Hi Anton On 11/12/2010 06:33 PM, A Gossner wrote: > Have plotted two samples as you suggested [ plot > (d$counts[,2],d$counts[,3],log="xy")] and saved the file on the FTP server. > ftp.ed.ac.uk > Login using the username 'anonymous' and use your email address as > password. > get /edupload/logplot.eps as well as get /edupload/plot2.eps and get > /edupload/plot2.eps > There does seem to be some correlation on the log-log plot. Yes, there is some correlation but it's not great. If you have, say, 100 counts in one sample, you seem to get anything from 0 to 3000 counts in the other. Hence, even if you get 5-fold changes between treatment and control, this will not be significant. > Which are plot asinh(count) graphs you suggested but must confess not > sure if plotted correctly, plot(d$counts[,2],asinh(d$counts[,2])) > plot(d$counts[,3],asinh(d$counts[,3])) > but they are plots 1 and 2 respectively. This is not what I meant. It was rather something trivial: If you just want to compare correlation, then instead of plot (d$counts[,2],d$counts[,3],log="xy") it can be nicer to use plot( asinh(d$counts[,2]), asinh(d$counts[,3]) ) because the zeroes don't get suppressed. (The area hyperbolic sine is linear for very low values and then becomes quickly indistuingishable from a log. hence this is a cheap way to get a log-log-like plot with zeroes. The axis tick marks are completely misleading though.) Simon

ADD REPLY • link 13.5 years ago Simon Anders ★ 3.7k

0

Entering edit mode

A Gossner ▴ 50

@a-gossner-4349

Last seen 7.9 years ago

UK

Gordon K Smyth <smyth at="" ...=""> writes: > > Hi Anton, > > This is the way the software is designed to behave when there is no > differential expression between your groups. The software is telling you > that there is no statistically significant differential expression. > > The reason for this result seems to be the enormously high values for the > dispersions. The values you have (3.5 up to 7) are an order of magnitude > higher than my lab has ever seen for RNA-seq or SAGE-seq data. This > represents enormous inconsistency between your replicate samples, and > suggests something might wrong with your data setup. Another curious fact > is that all the putative differential expression is one direction, down in > the INF group. > > To examine your data setup, you might try an MDS plot (plotMDS). This > would show you if you have one or more outlier libraries, or if one or > more libraries are mis-classified into groups. You might also explore > your data using smear plots plotSmear() to get a better idea of what is > happening. You must have some very unusual samples. > > To combat the fact that much of the differential expression is in one > direction, you might try normalizing, calcNormFactors(). > > Best wishes > Gordon > > > Message: 25 > > Date: Thu, 11 Nov 2010 10:44:07 +0000 > > From: A Gossner <a.gossner at="" ...=""> > > To: bioconductor at ... > > Subject: [BioC] edgeR and FDR values always equals 1 > > > > Hi, > > > > While using edgeR to analysis my Tag-seq data, no matter which way I > > analyse the data common or tagwise dispersion the FDR value is always 1. > > Typical output is shown below; > > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:31}}

ADD COMMENT • link 13.5 years ago A Gossner ▴ 50

0

Entering edit mode

A Gossner ▴ 50

@a-gossner-4349

Last seen 7.9 years ago

UK

Hi Gordon, Gordon K Smyth <smyth at="" ...=""> writes: > Another curious fact is that all the putative differential expression is one > direction, down in the INF group. Have plotted the fold change smear plots and the top 500 DE tags seem to lie in a very strange line. Not sure what the cause/effect this is due to? > To examine your data setup, you might try an MDS plot (plotMDS). Have plotted the MDS plots and there are obviously differences between the control samples (highlighted) and the infected samples but also large differences within the groups especially in dimension 1, especially if judged by the scale in the edgeR manual pdf. You might also explore > your data using smear plots plotSmear() to get a better idea of what is > happening. You must have some very unusual samples. > All plots are in a pdf saved on the following FTP server. ftp.ed.ac.uk Login using the username 'anonymous' and use your email address as password. get /edupload/edgerplots.pdf > To combat the fact that much of the differential expression is in one > direction, you might try normalizing, calcNormFactors(). > will try normalizing as you suggest. Thanks for you help. Anton

ADD COMMENT • link 13.5 years ago A Gossner ▴ 50

0

Entering edit mode

Hi Anton. I've had a quick look at both batches of your plots. The replicates are quite variable, but something is indeed unusual. Can you post your full set of commands used? Cheers, Mark On 2010-11-13, at 9:51 PM, A Gossner wrote: > Hi Gordon, > > Gordon K Smyth <smyth at="" ...=""> writes: > > >> Another curious fact is that all the putative differential expression is one >> direction, down in the INF group. > > Have plotted the fold change smear plots and the top 500 DE tags seem to lie in > a very strange line. Not sure what the cause/effect this is due to? > >> To examine your data setup, you might try an MDS plot (plotMDS). > > > Have plotted the MDS plots and there are obviously differences between the > control samples (highlighted) and the infected samples but also large > differences within the groups especially in dimension 1, especially if judged by > the scale in the edgeR manual pdf. > > You might also explore >> your data using smear plots plotSmear() to get a better idea of what is >> happening. You must have some very unusual samples. >> > > All plots are in a pdf saved on the following FTP server. > ftp.ed.ac.uk > Login using the username 'anonymous' and use your email address as password. > get /edupload/edgerplots.pdf > >> To combat the fact that much of the differential expression is in one >> direction, you might try normalizing, calcNormFactors(). >> > > will try normalizing as you suggest. > > Thanks for you help. > > Anton > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: mrobinson at wehi.edu.au e: m.robinson at garvan.org.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 ------------------------------ ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 13.5 years ago Mark Robinson ★ 1.1k

0

Entering edit mode

A Gossner ▴ 50

@a-gossner-4349

Last seen 7.9 years ago

UK

Mark Robinson <mrobinson at="" ...=""> writes: Hi Mark, > > Hi Anton. > > I've had a quick look at both batches of your plots. The replicates are quite variable, but something is > indeed unusual. Can you post your full set of commands used? > > Cheers, > Mark > Typically for this data just following the manual examples, data files are tab files with one column tag sequence and one counts plus header for each column. library(edgeR) targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE) targets d <- readDGE(targets, header=TRUE) d d <- d[rowSums(d$counts) > 5, ] d #pdf(file="MDSLND10.pdf") plotMDS.dge(d, main="MDS Plot for LND10 Data", xlim=c(-5,5)) #dev.off() d <- estimateCommonDisp(d) names(d) d$samples$lib.size d$common.lib.size colSums(d$pseudo.alt) d$common.dispersion sqrt(d$common.dispersion) de.common <- exactTest(d) topTags(de.common) detags.com <- rownames(topTags(de.common)$table) d$counts[detags.com, ] topTags(de.common, sort.by = "logFC") sum(de.common$table$p.value < 0.01) #p-value <0.01 sum(de.common$table$p.value < 0.05) #p-value <0.05 top45 <- topTags(de.common, n = 45) sum(top45$table$logFC > 0) sum(top45$table$logFC < 0) top751 <- topTags(de.common, n = 751) sum(top751$table$logFC > 0) sum(top751$table$logFC < 0) sum(p.adjust(de.common$table$p.value, method = "BH") < 0.05) sum(p.adjust(de.common$table$p.value, method = "BH") < 0.1) detags500.com <- rownames(topTags(de.common, n = 500)$table) plotSmear(d, de.tags = detags500.com, main = "FC plot using common dispersion") abline(h = c(-2, 2), col = "dodgerblue", lwd = 2) #Moderated tagwise dispersion d <- estimateTagwiseDisp(d, prior.n = 10) names(d) d$prior.n head(d$tagwise.dispersion) summary(d$tagwise.dispersion) d$common.dispersion #Estimating smoothing using an approximate eBayes rule prior.n <- estimateSmoothing(d) prior.n #exactTest de.tagwise <- exactTest(d, common.disp = FALSE) topTags(de.tagwise) topTags(de.tagwise, n = 10, sort.by = "logFC") sum(de.tagwise$table$p.value < 0.01) #p-value <0.01 sum(de.tagwise$table$p.value < 0.05) #p-value <0.05 top12 <- topTags(de.tagwise, n = 12) sum(top12$table$logFC > 0) sum(top12$table$logFC < 0) top537 <- topTags(de.tagwise, n = 537) sum(top537$table$logFC > 0) sum(top537$table$logFC < 0) detags.tgw <- rownames(topTags(de.tagwise)$table) detags.com <- rownames(topTags(de.common)$table) d$counts[detags.tgw, ] d$counts[detags.com, ] sum(rownames(topTags(de.tagwise)$table) %in% rownames(topTags(de.common)$table)) sum(rownames(topTags(de.tagwise, n = 100)$table) %in% rownames(topTags(de.common, n = 100)$table)) sum(rownames(topTags(de.tagwise, n = 1000)$table) %in% rownames(topTags(de.common,n = 1000)$table))/1000 * 100 sum(p.adjust(de.common$table$p.value, method = "BH") < 0.05) sum(p.adjust(de.common$table$p.value, method = "BH") < 0.1) sum(p.adjust(de.tagwise$table$p.value, method = "BH") < 0.05) sum(p.adjust(de.tagwise$table$p.value, method = "BH") < 0.1) detags500.com <- rownames(topTags(de.common, n = 500)$table) detags500.tgw <- rownames(topTags(de.tagwise, n = 500)$table) par(mfcol = c(2, 1)) plotSmear(d, de.tags = detags500.com, main = "Using common dispersion") abline(h = c(-2, 2), col = "dodgerblue", lwd = 2) plotSmear(d, de.tags = detags500.tgw, main = "Using tagwise dispersions") abline(h = c(-2, 2), col = "dodgerblue", lwd = 2)

ADD COMMENT • link 13.5 years ago A Gossner ▴ 50

Login before adding your answer.