Question

Unequally spaced replicates in limma

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 9.7 years ago

Hi As I have varying numbers of replicates, and they are not regularly spaced on the array, and given that I would like a list of differentially expressed genes which is averaged over replicates, I assume the best thing to do is normalise my data, and then average over replicates in the MAList object, and then pass the averaged data to lmFit() etc? Is that right? Cheers Mick

• 943 views

ADD COMMENT • link updated 19.7 years ago by Elizabeth Brooke-Powell ▴ 160 • written 19.7 years ago by michael watson IAH-C ★ 3.4k

score 0 · Answer 1 · 2004-09-01

> Hi > > As I have varying numbers of replicates, and they are not regularly > spaced on the array, and given that I would like a list of > differentially expressed genes which is averaged over replicates, I assume that these are within-array replicates. > I > assume the best thing to do is normalise my data, and then average over > replicates in the MAList object, and then pass the averaged data to > lmFit() etc? Yes, you could do that. It does raise subtle issues though concerning how the variance of the averages depends on the number of replicates. You might like to compute weights based on the number of replicates for each probe and pass that to lmFit also. Gordon > Is that right? > > Cheers > Mick

score 0 · Answer 2 · 2004-09-02

Thanks Gordon Actually when I did this, I got some odd results. If I ran lmFit(), eBayes() and topTable() on my data set on a per-spot basis, I found ~800 SPOTS with a p-value <= 0.05. Now most of my genes are replicated in duplicate on the arrays (within-array replicates) and when I averaged over those replicates, and used that data to feed into lmFit(), eBayes() and topTable() I got ~1100 GENES with a p-value <=0.05. Does this suggest that after averaging over replicate spots, the measurements for my genes are more tightly distributed than the individual spots were..? Cheers Mick -----Original Message----- From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU] Sent: 01 September 2004 23:12 To: michael watson (IAH-C) Cc: bioconductor@stat.math.ethz.ch Subject: Re: [BioC] Unequally spaced replicates in limma > Hi > > As I have varying numbers of replicates, and they are not regularly > spaced on the array, and given that I would like a list of > differentially expressed genes which is averaged over replicates, I assume that these are within-array replicates. > I > assume the best thing to do is normalise my data, and then average > over replicates in the MAList object, and then pass the averaged data > to > lmFit() etc? Yes, you could do that. It does raise subtle issues though concerning how the variance of the averages depends on the number of replicates. You might like to compute weights based on the number of replicates for each probe and pass that to lmFit also. Gordon > Is that right? > > Cheers > Mick

score 0 · Answer 3 · 2004-09-02

At 07:23 PM 2/09/2004, michael watson (IAH-C) wrote: >Thanks Gordon > >Actually when I did this, I got some odd results. The results look to me as you would hope for and expect. >If I ran lmFit(), eBayes() and topTable() on my data set on a per- spot >basis, I found ~800 SPOTS with a p-value <= 0.05. Now most of my genes >are replicated in duplicate on the arrays (within-array replicates) and >when I averaged over those replicates, and used that data to feed into >lmFit(), eBayes() and topTable() I got ~1100 GENES with a p-value ><=0.05. > >Does this suggest that after averaging over replicate spots, the >measurements for my genes are more tightly distributed than the >individual spots were..? 1. You've reduced the number of genes by half, hence you do only half the adjustment for multiple testing, hence you end up with lower p-values. 2. You'd certainly hope that averages are more tightly distributed than the individual spots, that's why averaging is a good thing. If your genes are virtually all in duplicate, and the others have an even number of reps, you could sort your MA object by gene ID and then use duplicateCorrelation() with ndups=2 and spacing=1. Gordon >Cheers >Mick > >-----Original Message----- >From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU] >Sent: 01 September 2004 23:12 >To: michael watson (IAH-C) >Cc: bioconductor@stat.math.ethz.ch >Subject: Re: [BioC] Unequally spaced replicates in limma > > > > Hi > > > > As I have varying numbers of replicates, and they are not regularly > > spaced on the array, and given that I would like a list of > > differentially expressed genes which is averaged over replicates, > >I assume that these are within-array replicates. > > > I > > assume the best thing to do is normalise my data, and then average > > over replicates in the MAList object, and then pass the averaged data > > to > > lmFit() etc? > >Yes, you could do that. It does raise subtle issues though concerning >how the variance of the averages depends on the number of replicates. >You might like to compute weights based on the number of replicates for >each probe and pass that to lmFit also. > >Gordon > > > Is that right? > > > > Cheers > > Mick

score 0 · Answer 4 · 2004-09-02

Hi Gordon, Is the solution of sorting the table available in LimmaGUI? Should I resort the input files to get the replicates taken into account using ndups=2 and spacing=1? What happens to the replicates if you have no spot weighting, are they just averaged? Thank you for your help, Liz ------------------------------ Date: Thu, 02 Sep 2004 19:44:34 +1000 From: Gordon Smyth <smyth@wehi.edu.au> Subject: RE: [BioC] Unequally spaced replicates in limma To: "michael watson (IAH-C)" <michael.watson@bbsrc.ac.uk> Cc: bioconductor@stat.math.ethz.ch Message-ID: <6.0.1.1.1.20040902193610.02984088@imaphost.wehi.edu.au> Content-Type: text/plain; charset="us-ascii"; format=flowed At 07:23 PM 2/09/2004, michael watson (IAH-C) wrote: >Thanks Gordon > >Actually when I did this, I got some odd results. The results look to me as you would hope for and expect. >If I ran lmFit(), eBayes() and topTable() on my data set on a per- spot >basis, I found ~800 SPOTS with a p-value <= 0.05. Now most of my genes >are replicated in duplicate on the arrays (within-array replicates) and >when I averaged over those replicates, and used that data to feed into >lmFit(), eBayes() and topTable() I got ~1100 GENES with a p-value ><=0.05. > >Does this suggest that after averaging over replicate spots, the >measurements for my genes are more tightly distributed than the >individual spots were..? 1. You've reduced the number of genes by half, hence you do only half the adjustment for multiple testing, hence you end up with lower p-values. 2. You'd certainly hope that averages are more tightly distributed than the individual spots, that's why averaging is a good thing. If your genes are virtually all in duplicate, and the others have an even number of reps, you could sort your MA object by gene ID and then use duplicateCorrelation() with ndups=2 and spacing=1. Gordon >Cheers >Mick >