EdgeR questions in analyzing 454 data-about prior.n, TMM, and p

EdgeR questions in analyzing 454 data-about prior.n, TMM, and p_value

0

Entering edit mode

Ying Ye ▴ 10

@ying-ye-4304

Last seen 9.7 years ago

Dear edgeR users and developersï¼ I have few questions about edgeR when recently I use it for 454 pyrosequencing data: 1. prior.n According to users' manual, we may not use too low prior.n in moderated tagwise dispersion approach. But in my dataset, there are more than 15 samples in each comparison group and the freedom is larger than 30. prior.n <- estimateSmoothing(d) gives 0.0005329. So I am wondering if I could use 0.0005329 since I have rather big number of samples in each group. Or I should adjust prior.n into 10 according to the manual's suggestion. 2. TMM I am not sure if this is also applicable to 454 microbiota data. I suppose I should do TMM normalization as well since the normalization factors from my samples have a big variation (f is from 0.41 to 4.58). Is that right? 3. p_value According to your experience, is it reasonable and reliable to use p_value < 0.05 as significance criteria? or only <0.01 can be reliable. I am a new users in this package and hope you may give some suggestions. Many thanks! Ying Ye [[alternative HTML version deleted]]

Normalization edgeR Normalization edgeR • 887 views

ADD COMMENT • link updated 13.6 years ago by Gordon Smyth 50k • written 13.6 years ago by Ying Ye ▴ 10

0

Entering edit mode

Mark Robinson ★ 1.1k

@mark-robinson-2171

Last seen 9.7 years ago

Hi Ying. Some comments below. On 2010-10-18, at 10:22 PM, Ying Ye wrote: > Dear edgeR users and developers? > > I have few questions about edgeR when recently I use it for 454 > pyrosequencing data: > > 1. prior.n > According to users' manual, we may not use too low prior.n in > moderated tagwise dispersion approach. But in my dataset, there are > more than 15 samples in each comparison group and the freedom is > larger than 30. prior.n <- estimateSmoothing(d) gives 0.0005329. So I > am wondering if I could use 0.0005329 since I have rather big number > of samples in each group. Or I should adjust prior.n into 10 according > to the manual's suggestion. Well, its hard to give a prescription for prior.n for all datasets. Since you have so many degrees of freedom, you shouldn't need prior.n as high as 10. You might try something lower, say 1-3. > 2. TMM > I am not sure if this is also applicable to 454 microbiota data. > I suppose I should do TMM normalization as well since the > normalization factors from my samples have a big variation (f is from > 0.41 to 4.58). Is that right? I must admit that I'm not intimately aware of all the nuances of microbiota data, but I will say that those factors you mention above are generally lower/higher than we see in RNA-seq data. I'd say its probably best to look at some "smear" plots -- through maPlot() for example -- to assess whether the TMM normalization is appropriately capturing shifts due to composition or the like. As always for exploratory analysis, it would be good to look multidimension scaling plots -- see plotMDS.dge(). There is no substitute for looking at your data. > 3. p_value > According to your experience, is it reasonable and reliable to > use p_value < 0.05 as significance criteria? or only <0.01 can be > reliable. First off, you'll probably want to do some multiple testing correction, which can be done through the topTags() function. As to where to set the threshold on significance, that is a matter of your false discovery tolerance ... the status quo is 5%, but you may want to be more or less stringent. Hope that helps. Mark > I am a new users in this package and hope you may give some > suggestions. Many thanks! > > Ying Ye > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-sig-sequencing mailing list > Bioc-sig-sequencing at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing ------------------------------ Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robinson at garvan.org.au e: mrobinson at wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 ------------------------------ ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD COMMENT • link 13.6 years ago Mark Robinson ★ 1.1k

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 9 hours ago

WEHI, Melbourne, Australia

Dear Ying Ye, Just adding to one of Mark's comment, see below. > Date: Tue, 19 Oct 2010 09:43:10 +1100 > From: Mark Robinson <mrobinson at="" wehi.edu.au=""> > To: Ying Ye <mikecrux at="" gmail.com=""> > Cc: Bioc-sig-sequencing at r-project.org, bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] [Bioc-sig-seq] EdgeR questions in analyzing 454 > data-about prior.n, TMM, and p_value > > Hi Ying. > > Some comments below. > > On 2010-10-18, at 10:22 PM, Ying Ye wrote: > >> Dear edgeR users and developers? >> >> I have few questions about edgeR when recently I use it for 454 >> pyrosequencing data: >> >> 1. prior.n >> According to users' manual, we may not use too low prior.n in >> moderated tagwise dispersion approach. But in my dataset, there are >> more than 15 samples in each comparison group and the freedom is larger >> than 30. prior.n <- estimateSmoothing(d) gives 0.0005329. So I am >> wondering if I could use 0.0005329 since I have rather big number of >> samples in each group. Or I should adjust prior.n into 10 according to >> the manual's suggestion. > > Well, its hard to give a prescription for prior.n for all datasets. > Since you have so many degrees of freedom, you shouldn't need prior.n as > high as 10. You might try something lower, say 1-3. Just to refine this, how many degrees of freedom do you have per tag? Let's define df = number of libraries - number of groups. I would suggest you choose your prior.n so that prior.n * df is around 50, but don't go below prior.n=1. We are not recommending estimateSmoothing() at the moment because it gives variable results on next-generation sequencing data. The estimateSmoothing() value for your data is too small to be recommended. Best wishes Gordon ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 13.6 years ago Gordon Smyth 50k

Login before adding your answer.