Data analysis

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 4.8 years ago

United States

I also have a data set with differing numbers of spot replications. I used lme to analyze these data, gene by gene. Basically, I wrote a little function that pulls the spot information out of the array, removes the flagged spots and does other data cleaning, and then runs lme (using "try" in case it bombs). Then I use "split" to split the array data by geneID, and lapply to apply the function to every gene. Is this slow? Yes. But once it is tested I just get it started on Friday at 5, and by Monday at 9 I have my results. The major drawback is that I am doing a gene by gene ANOVA. The major advantage is that I can safely remove flagged spots, instead of trying to fudge in some values to maintain the balance. --Naomi Altman At 11:40 PM 10/16/2003, Gordon Smyth wrote: >At 11:53 PM 16/10/2003, Jason Skelton wrote: >>Gordon Smyth wrote: >>> >>>I would use the limma commands lmFit (or lm.series or gls.series) >>>followed by makeContrasts, eBayes and classifyTests. See the earliers posts: >>Thanks for this infomation Gordon I'll try this and see what results I >>get......... >> >>On a different note >>The arrays I have tested LIMMA on have 2 duplicates and are spaced evenly >>throughout the array and so have no problems running your functions. >> >>Someone else at the Sanger Insitite would like to be able to use LIMMA >>but the number of duplicates for each gene differs on their array e.g for >>some genes their are two copies and for others there would be four copies >>or more which inturn obviously effects spacing etc between replicates. >>I'm not sure why they would want differing numbers of copies of genes but >>they would like to be able to estimate the correlation between these >>genes anyway and obviously see the results as one data point per merged gene. > >I haven't implemented this in limma because it seems to me that it might >invalidate the assumptions behind the duplicate correlation approach. See >the earlier post: > >https://stat.ethz.ch/pipermail/bioconductor/2003-August/002224.html > >>I've tried to think of how this can be done but it seems overly complex >>and I'm not sure if it is at all possible in R or Limma. >> >>I'm guessing there is no way of carryout the correlation, series model >>fits etc based simply on the "Name" specified in the GAL files ? > >No. > >Cheers >Gordon > >>or some how specifying the duplicate number for each gene seperately >>and somehow merging this information for use as a parameter ? >> >>I'm doubting very much that this can be done at all but it's worth >>asking ;-) >> >>thanks >> >>Jason > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

limma limma • 764 views

ADD COMMENT • link updated 22.1 years ago by Gordon Smyth 53k • written 22.1 years ago by Naomi Altman ★ 6.0k

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 27 minutes ago

WEHI, Melbourne, Australia

Hi Naomi, At 09:14 AM 5/12/2003, Naomi Altman wrote: >I also have a data set with differing numbers of spot replications. I >used lme to analyze these data, gene by gene. > >Basically, I wrote a little function that pulls the spot information out >of the array, removes the flagged spots and does other data cleaning, and >then runs lme (using "try" in case it bombs). Then I use >"split" to split the array data by geneID, and lapply to apply the >function to every gene. You can do this fine, but it is not equivalent to the limma treatment of duplicate spots. Limma does a two-stage analysis, the first stage of which is equivalent to lme(). >Is this slow? Yes. But once it is tested I just get it started on Friday >at 5, and by Monday at 9 I have my results. >The major drawback is that I am doing a gene by gene ANOVA. The major >advantage is that I can safely remove flagged spots, instead of trying to >fudge in some values to maintain the balance. If you start with the same number of spot replications, limma allows you to remove flagged spots without any fudging by setting the corresponding spot weights to zero. Gordon >--Naomi Altman > >At 11:40 PM 10/16/2003, Gordon Smyth wrote: >>At 11:53 PM 16/10/2003, Jason Skelton wrote: >>>Gordon Smyth wrote: >>>> >>>>I would use the limma commands lmFit (or lm.series or gls.series) >>>>followed by makeContrasts, eBayes and classifyTests. See the earliers posts: >>>Thanks for this infomation Gordon I'll try this and see what results I >>>get......... >>> >>>On a different note >>>The arrays I have tested LIMMA on have 2 duplicates and are spaced >>>evenly throughout the array and so have no problems running your functions. >>> >>>Someone else at the Sanger Insitite would like to be able to use LIMMA >>>but the number of duplicates for each gene differs on their array e.g >>>for some genes their are two copies and for others there would be four >>>copies or more which inturn obviously effects spacing etc between replicates. >>>I'm not sure why they would want differing numbers of copies of genes >>>but they would like to be able to estimate the correlation between these >>>genes anyway and obviously see the results as one data point per merged gene. >> >>I haven't implemented this in limma because it seems to me that it might >>invalidate the assumptions behind the duplicate correlation approach. See >>the earlier post: >> >>https://stat.ethz.ch/pipermail/bioconductor/2003-August/002224.html >> >>>I've tried to think of how this can be done but it seems overly complex >>>and I'm not sure if it is at all possible in R or Limma. >>> >>>I'm guessing there is no way of carryout the correlation, series model >>>fits etc based simply on the "Name" specified in the GAL files ? >> >>No. >> >>Cheers >>Gordon >> >>>or some how specifying the duplicate number for each gene seperately >>>and somehow merging this information for use as a parameter ? >>> >>>I'm doubting very much that this can be done at all but it's worth >>>asking ;-) >>> >>>thanks >>> >>>Jason >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor@stat.math.ethz.ch >>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > >Naomi S. Altman 814-865-3791 (voice) >Associate Professor >Bioinformatics Consulting Center >Dept. of Statistics 814-863-7114 (fax) >Penn State University 814-865-1348 (Statistics) >University Park, PA 16802-2111 > >

ADD COMMENT • link 22.1 years ago Gordon Smyth 53k

Login before adding your answer.