edgeR and tagwise dispersion: overcorrection for multiple tests?

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 22 minutes ago

WEHI, Melbourne, Australia

Dear Allessandro, I haven't seen the MDS plots (because attachments are not distributed to the list), but don't see anything surprising in what you have reported. If you compare one group (all C) vs only those members of the other group that are most different to it (1R+3R), naturally you will find lots of DE genes. Best wishes Gordon > Date: Thu, 12 Jul 2012 10:48:01 +0200 > From: "alessandro.guffanti at genomnia.com" > <alessandro.guffanti at="" genomnia.com=""> > To: Bioconductor mailing list <bioconductor at="" r-project.org=""> > Subject: Re: [BioC] edgeR and tagwise dispersion: overcorrection for > multiple tests? > > Dear colleagues good morning - I am back to an old issue because I am > now much more > certain of what I see - and I begin to wonder wether this is due to > biology rather than > to analytical tools or strategies .. > > => Here is my sessionInfo() to begin with: > > R version 2.15.0 (2012-03-30) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] edgeR_2.6.7 limma_3.12.1 R.utils_1.12.1 R.oo_1.9.8 > [5] R.methodsS3_1.4.2 > > => the experiment description: RNA from five samples and five controls, > mice, > homogenesous stimulus, brain tissue, SAGE with SOLiD with a good mapping > in the UTR (checked also with genome-wide mapping). Tags have been selected > with the following parameters: only in UTR; unique mapping; only one > mismatch; > begin with CATG, hence quite stringent. Hence tha samples are tagged {1 > to 5}R > for ths stimulus, {1 to 5} as the control > > => MDS plot and simple pairwise regression analysis of the tag counts > between > R,C,R vs R and C vs C reveals a clear division of the R samples in two > groups: > {1R, 3R} and {2R,4R,5R}. In addition, one C sample (3C) overlaps with > two R samples > and is removed from comparisons > > => three DEG calculations were performed: > (A) all C vs all R; > (B) all C minus 3 C vs 1R + 3R; > (C) all C minus 3 C versus {2R,4R,5R} > > => tagwise dispersion; normalizatuion factor on the libraries > calculated; filtering by minimal CPM in samples leaves between 6000 and > 7000 genes for each comparison. > > => results which make me wonder about what is happening in the R > (esperiment) samples: > > Comparison A (ALL vs ALL): TWO genes with significant FDR (BH corrected > PValue I understand) > Comparison B (ALL-3C vs 1R,3R): 2099 genes with significant FDR (!) > Comparison C (ALL-3C vs 2R,4R,5R): 20 genes with significant FDR > > Now, excuse my ignorance, but this is a rather strong effect of the > subsetting of one of the two comparison datasets on the FDR, which I did > not found in many other similar analyses. In fact, when I first mailed > the list, I was talking about 'overcorrection for multiple tests'. > > Is there any reasonable explanation (apart from {1R,3R} and {2R,4R,5R} > being totally different samples, which I exclude) for this ? maybe a > strong dependency between the genes involved in the response to the > stimulus in the two R subgroups ? > > I include below the three MDS plots - thanks for any answer and again > excuse me, maybe there is a trivial reason for this (such as number of > samples..) but it is an unqiue situation between my many SAGE > experiments analyzed with edgeR.. > > Kind regards, > > Alessandro > > -- > > > > > > > > -- > > Alessandro Guffanti - Head, Bioinformatics, Genomnia srl > Via Nerviano, 31 - 20020 Lainate, Milano, Italy > Ph: +39-0293305.702 Fax: +39-0293305.777 > http://www.genomnia.com > "When you're curious, you find lots of interesting things to do." > (Walt Disney) > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

SAGE Regression edgeR BRAIN SAGE Regression edgeR BRAIN • 1.6k views

ADD COMMENT • link updated 13.6 years ago by Martin Morgan 25k • written 13.6 years ago by Gordon Smyth 53k

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 12 days ago

United States

On 07/12/2012 11:34 PM, Gordon K Smyth wrote: > Dear Allessandro, > > I haven't seen the MDS plots (because attachments are not distributed to > the list), but don't see anything surprising in what you have reported. actually, some attachments are (this was a recent realization on our part, too!). The posting guide http://bioconductor.org/help/mailing-list/posting-guide/ now says "The following attachment types are accepted: png, pdf, rda/Rdata. Total message size cannot exceed 1MB". Martin > > If you compare one group (all C) vs only those members of the other > group that are most different to it (1R+3R), naturally you will find > lots of DE genes. > > Best wishes > Gordon > >> Date: Thu, 12 Jul 2012 10:48:01 +0200 >> From: "alessandro.guffanti at genomnia.com" >> <alessandro.guffanti at="" genomnia.com=""> >> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> >> Subject: Re: [BioC] edgeR and tagwise dispersion: overcorrection for >> multiple tests? >> >> Dear colleagues good morning - I am back to an old issue because I am >> now much more >> certain of what I see - and I begin to wonder wether this is due to >> biology rather than >> to analytical tools or strategies .. >> >> => Here is my sessionInfo() to begin with: >> >> R version 2.15.0 (2012-03-30) >> Platform: x86_64-pc-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 >> [2] LC_CTYPE=English_United States.1252 >> [3] LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C >> [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] stats graphics grDevices datasets utils methods base >> >> other attached packages: >> [1] edgeR_2.6.7 limma_3.12.1 R.utils_1.12.1 R.oo_1.9.8 >> [5] R.methodsS3_1.4.2 >> >> => the experiment description: RNA from five samples and five controls, >> mice, >> homogenesous stimulus, brain tissue, SAGE with SOLiD with a good mapping >> in the UTR (checked also with genome-wide mapping). Tags have been >> selected >> with the following parameters: only in UTR; unique mapping; only one >> mismatch; >> begin with CATG, hence quite stringent. Hence tha samples are tagged {1 >> to 5}R >> for ths stimulus, {1 to 5} as the control >> >> => MDS plot and simple pairwise regression analysis of the tag counts >> between >> R,C,R vs R and C vs C reveals a clear division of the R samples in two >> groups: >> {1R, 3R} and {2R,4R,5R}. In addition, one C sample (3C) overlaps with >> two R samples >> and is removed from comparisons >> >> => three DEG calculations were performed: >> (A) all C vs all R; >> (B) all C minus 3 C vs 1R + 3R; >> (C) all C minus 3 C versus {2R,4R,5R} >> >> => tagwise dispersion; normalizatuion factor on the libraries >> calculated; filtering by minimal CPM in samples leaves between 6000 and >> 7000 genes for each comparison. >> >> => results which make me wonder about what is happening in the R >> (esperiment) samples: >> >> Comparison A (ALL vs ALL): TWO genes with significant FDR (BH corrected >> PValue I understand) >> Comparison B (ALL-3C vs 1R,3R): 2099 genes with significant FDR (!) >> Comparison C (ALL-3C vs 2R,4R,5R): 20 genes with significant FDR >> >> Now, excuse my ignorance, but this is a rather strong effect of the >> subsetting of one of the two comparison datasets on the FDR, which I >> did not found in many other similar analyses. In fact, when I first >> mailed the list, I was talking about 'overcorrection for multiple tests'. >> >> Is there any reasonable explanation (apart from {1R,3R} and {2R,4R,5R} >> being totally different samples, which I exclude) for this ? maybe a >> strong dependency between the genes involved in the response to the >> stimulus in the two R subgroups ? >> >> I include below the three MDS plots - thanks for any answer and again >> excuse me, maybe there is a trivial reason for this (such as number of >> samples..) but it is an unqiue situation between my many SAGE >> experiments analyzed with edgeR.. >> >> Kind regards, >> >> Alessandro >> >> -- >> >> >> >> >> >> >> >> -- >> >> Alessandro Guffanti - Head, Bioinformatics, Genomnia srl >> Via Nerviano, 31 - 20020 Lainate, Milano, Italy >> Ph: +39-0293305.702 Fax: +39-0293305.777 >> http://www.genomnia.com >> "When you're curious, you find lots of interesting things to do." >> (Walt Disney) >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:19}}

ADD COMMENT • link 13.6 years ago Martin Morgan 25k

0

Entering edit mode

Hi Martin, Thanks for info -- I assume it applies to individual emails. I get the mailing list in daily digest form, and I have never known there to be an attachment to that. There are also no attachments linked to the archived posts as far as I have seen: https://stat.ethz.ch/pipermail/bioconductor/2012-July/date.html That is how I would expect it to be -- I don't really want to be barraged by attachments every day. Gordon On Fri, 13 Jul 2012, Martin Morgan wrote: > On 07/12/2012 11:34 PM, Gordon K Smyth wrote: >> Dear Allessandro, >> >> I haven't seen the MDS plots (because attachments are not distributed to >> the list), but don't see anything surprising in what you have reported. > > actually, some attachments are (this was a recent realization on our part, > too!). The posting guide > > http://bioconductor.org/help/mailing-list/posting-guide/ > > now says "The following attachment types are accepted: png, pdf, rda/Rdata. > Total message size cannot exceed 1MB". > > Martin > [original posting deleted] > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > > > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 13.6 years ago Gordon Smyth 53k

0

Entering edit mode

On Sat, Jul 14, 2012 at 7:26 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > Hi Martin, > > Thanks for info -- I assume it applies to individual emails. I get the > mailing list in daily digest form, and I have never known there to be an > attachment to that. There are also no attachments linked to the archived > posts as far as I have seen: > > https://stat.ethz.ch/pipermail/bioconductor/2012-July/date.html > Attachments do show up in the archives, see: https://stat.ethz.ch/pipermail/bioconductor/2012-June/046478.html > That is how I would expect it to be -- I don't really want to be barraged by > attachments every day. I do not believe that attachments show up in the daily digest emails. Dan > > Gordon > > On Fri, 13 Jul 2012, Martin Morgan wrote: > >> On 07/12/2012 11:34 PM, Gordon K Smyth wrote: >>> >>> Dear Allessandro, >>> >>> I haven't seen the MDS plots (because attachments are not distributed to >>> the list), but don't see anything surprising in what you have reported. >> >> >> actually, some attachments are (this was a recent realization on our part, >> too!). The posting guide >> >> http://bioconductor.org/help/mailing-list/posting-guide/ >> >> now says "The following attachment types are accepted: png, pdf, >> rda/Rdata. Total message size cannot exceed 1MB". >> >> Martin >> > > [original posting deleted] > >> -- >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 >> >> >> > > ______________________________________________________________________ > The information in this email is confidential and intend...{{dropped:4}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.6 years ago Dan Tenenbaum ★ 8.2k

Login before adding your answer.