RNA degradation tends & options for analysis

0

Entering edit mode

Juanma Vaquerizas ▴ 20

@juanma-vaquerizas-1624

Last seen 9.6 years ago

Dear list, I'm trying to analyse some Affy arrays for my PhD thesis but I'm a little bit stuck, so any comments on the following would be very welcome. Basically I'm analysing a set of Affy arrays coming form 10 different labs (3 biological replicates per lab) where each lab is using a different RNA source. I've done some quality control using affyPLM and the chips seem to be ok. If I have a look at the RNA digestion plot, 2 different trends are clearly visible (half of the arrays follow one trend with a slope around 1 and the other half with a slope around 3). I want to make some contrasts between the different RNA sources that have been used, but as I've read in (Bolstad et al., 2005, Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Springer) and in some previous messages in this list, mixing arrays with very different slopes in the RNA digestion plots is not a very good idea. The options I'm thinking about at the moment are the following: Option 1: 1.- Split the arrays by the lab of origin. 2.- Preprocess them separately using GCRMA. 3.- Combine the resulting esets into one eset. 4.- Analyse using limma, modeling for 3 factors (RNA type, lab effect, trend in the RNA digestion plot) 5.- Extract the contrasts I am interested in (the RNA type ones) Option 2: 1.- Split the arrays by the trend of the RNA digestion plot. 2.- Preprocess them separately using GCRMA. 3.- Combine the resulting esets into one eset. 4.- Analyse using limma, modeling for 3 factors (RNA type, lab effect, trend in the RNA digestion plot) 5.- Extract the contrasts I am interested in (the RNA type ones) Option 3: 1.- Do not split the arrays in groups. 2.- Preprocess all of them using GCRMA. 3.- Analyse using limma, modeling for 3 factors (RNA type, lab effect, trend in the RNA digestion plot) 4.- Extract the contrasts I am interested in (the RNA type ones) Unfortunately I can't figure out which would be the best way to proceed, or even if modeling for the trend is something that would be acceptable. I've seen in the vignette of the affycoretools package that the arrays coming from different RNA protocols are preprocessed separately and then mixed for the linear model, although it is not clear for me why is this option better that any of the others. On the other hand, some messages to the list last week were for preprocessing all the experiments at once... My understanding is that there is not a clear consensus about what to do in those cases and I don't really know the consequences and the differences between following the different approaches, so any comments would be very much appreciated. Thank you very much for your help. Best wishes, Juanma. Juanma Vaquerizas PhD Student Regulation Group EMBL-EBI Wellcome Trust Genome Campus Cambridge CB10 1SD UK

affy limma gcrma affycoretools affy limma gcrma affycoretools • 1.5k views

ADD COMMENT • link updated 18.2 years ago by James W. MacDonald 65k • written 18.2 years ago by Juanma Vaquerizas ▴ 20

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 8 hours ago

United States

Juanma Vaquerizas wrote: > Dear list, > > I'm trying to analyse some Affy arrays for my PhD thesis but I'm a > little bit stuck, so any comments on the following would be very > welcome. > > Basically I'm analysing a set of Affy arrays coming form 10 different > labs (3 biological replicates per lab) where each lab is using a > different RNA source. I've done some quality control using affyPLM > and the chips seem to be ok. Is this after processing them as one batch? If the residuals look OK, then this is a good indication that you can process them all together. > > If I have a look at the RNA digestion plot, 2 different trends are > clearly visible (half of the arrays follow one trend with a slope > around 1 and the other half with a slope around 3). > > I want to make some contrasts between the different RNA sources that > have been used, but as I've read in (Bolstad et al., 2005, > Bioinformatics and Computational Biology Solutions Using R and > Bioconductor, Springer) and in some previous messages in this list, > mixing arrays with very different slopes in the RNA digestion plots > is not a very good idea. In my experience, the RNA degradation plots are not nearly as important as the density plots. What do they look like? Are the distributions all pretty similar in shape and fairly close together? > > The options I'm thinking about at the moment are the following: > > Option 1: > 1.- Split the arrays by the lab of origin. > 2.- Preprocess them separately using GCRMA. > 3.- Combine the resulting esets into one eset. > 4.- Analyse using limma, modeling for 3 factors (RNA type, lab > effect, trend in the RNA digestion plot) > 5.- Extract the contrasts I am interested in (the RNA type ones) > > Option 2: > 1.- Split the arrays by the trend of the RNA digestion plot. > 2.- Preprocess them separately using GCRMA. > 3.- Combine the resulting esets into one eset. > 4.- Analyse using limma, modeling for 3 factors (RNA type, lab > effect, trend in the RNA digestion plot) > 5.- Extract the contrasts I am interested in (the RNA type ones) > > Option 3: > 1.- Do not split the arrays in groups. > 2.- Preprocess all of them using GCRMA. > 3.- Analyse using limma, modeling for 3 factors (RNA type, lab > effect, trend in the RNA digestion plot) > 4.- Extract the contrasts I am interested in (the RNA type ones) I would think this is the most reasonable method, if as you say the residuals from affyPLM all look good. One further check you can make is to do a PCA plot of the first two PCs and see how the replicated samples are grouping. If the replicates are all grouping together it may not even be necessary to model the lab effect. You could use plotPCA() in affycoretools to do this step. > > > Unfortunately I can't figure out which would be the best way to > proceed, or even if modeling for the trend is something that would be > acceptable. I've seen in the vignette of the affycoretools package > that the arrays coming from different RNA protocols are preprocessed > separately and then mixed for the linear model, although it is not > clear for me why is this option better that any of the others. Well, the example in affycoretools is a very special case and should not be construed as an example that one should use for 'normal' analyses (which makes me wonder if I need a different example). Anyway, in that vignette the samples have been processed completely differently (one set amplified with the NuGen Ovation kit, and one using the normal Affy IVT kit), so there is no way they should be processed as one batch. I then stick both sets of expression values into one exprSet simply to make the linear modeling step easier. Since I use a cell means model and never make any contrasts between the groups, this analysis is equivalent to keeping the data separate and fitting two separate models. HTH, Jim > > On the other hand, some messages to the list last week were for > preprocessing all the experiments at once... > > My understanding is that there is not a clear consensus about what to > do in those cases and I don't really know the consequences and the > differences between following the different approaches, so any > comments would be very much appreciated. > > Thank you very much for your help. > > Best wishes, > > Juanma. > > > > Juanma Vaquerizas > PhD Student > Regulation Group > EMBL-EBI > Wellcome Trust Genome Campus > Cambridge CB10 1SD > UK > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD COMMENT • link 18.2 years ago James W. MacDonald 65k

0

Entering edit mode

One thing I find very useful is to look at pairs(pm(myAffydata)) (This takes a while, since you are plotting lots of probes, and I usually put in lower.panel=NULL to get only the upper triangle of plots.) If the arrays are comparable, then most of the data should cluster pretty tightly on the diagonal. Incidentally, if some ambitious person would write a pairs routine for hexbin, that would be both faster and more informative. --Naomi At 09:16 AM 2/23/2006, James W. MacDonald wrote: >Juanma Vaquerizas wrote: > > Dear list, > > > > I'm trying to analyse some Affy arrays for my PhD thesis but I'm a > > little bit stuck, so any comments on the following would be very > > welcome. > > > > Basically I'm analysing a set of Affy arrays coming form 10 different > > labs (3 biological replicates per lab) where each lab is using a > > different RNA source. I've done some quality control using affyPLM > > and the chips seem to be ok. > >Is this after processing them as one batch? If the residuals look OK, >then this is a good indication that you can process them all together. > > > > > If I have a look at the RNA digestion plot, 2 different trends are > > clearly visible (half of the arrays follow one trend with a slope > > around 1 and the other half with a slope around 3). > > > > I want to make some contrasts between the different RNA sources that > > have been used, but as I've read in (Bolstad et al., 2005, > > Bioinformatics and Computational Biology Solutions Using R and > > Bioconductor, Springer) and in some previous messages in this list, > > mixing arrays with very different slopes in the RNA digestion plots > > is not a very good idea. > >In my experience, the RNA degradation plots are not nearly as important >as the density plots. What do they look like? Are the distributions all >pretty similar in shape and fairly close together? > > > > > The options I'm thinking about at the moment are the following: > > > > Option 1: > > 1.- Split the arrays by the lab of origin. > > 2.- Preprocess them separately using GCRMA. > > 3.- Combine the resulting esets into one eset. > > 4.- Analyse using limma, modeling for 3 factors (RNA type, lab > > effect, trend in the RNA digestion plot) > > 5.- Extract the contrasts I am interested in (the RNA type ones) > > > > Option 2: > > 1.- Split the arrays by the trend of the RNA digestion plot. > > 2.- Preprocess them separately using GCRMA. > > 3.- Combine the resulting esets into one eset. > > 4.- Analyse using limma, modeling for 3 factors (RNA type, lab > > effect, trend in the RNA digestion plot) > > 5.- Extract the contrasts I am interested in (the RNA type ones) > > > > Option 3: > > 1.- Do not split the arrays in groups. > > 2.- Preprocess all of them using GCRMA. > > 3.- Analyse using limma, modeling for 3 factors (RNA type, lab > > effect, trend in the RNA digestion plot) > > 4.- Extract the contrasts I am interested in (the RNA type ones) > >I would think this is the most reasonable method, if as you say the >residuals from affyPLM all look good. One further check you can make is >to do a PCA plot of the first two PCs and see how the replicated samples >are grouping. If the replicates are all grouping together it may not >even be necessary to model the lab effect. You could use plotPCA() in >affycoretools to do this step. > > > > > > > Unfortunately I can't figure out which would be the best way to > > proceed, or even if modeling for the trend is something that would be > > acceptable. I've seen in the vignette of the affycoretools package > > that the arrays coming from different RNA protocols are preprocessed > > separately and then mixed for the linear model, although it is not > > clear for me why is this option better that any of the others. > >Well, the example in affycoretools is a very special case and should not >be construed as an example that one should use for 'normal' analyses >(which makes me wonder if I need a different example). > >Anyway, in that vignette the samples have been processed completely >differently (one set amplified with the NuGen Ovation kit, and one using >the normal Affy IVT kit), so there is no way they should be processed as >one batch. I then stick both sets of expression values into one exprSet >simply to make the linear modeling step easier. Since I use a cell means >model and never make any contrasts between the groups, this analysis is >equivalent to keeping the data separate and fitting two separate models. > >HTH, > >Jim > > > > > > On the other hand, some messages to the list last week were for > > preprocessing all the experiments at once... > > > > My understanding is that there is not a clear consensus about what to > > do in those cases and I don't really know the consequences and the > > differences between following the different approaches, so any > > comments would be very much appreciated. > > > > Thank you very much for your help. > > > > Best wishes, > > > > Juanma. > > > > > > > > Juanma Vaquerizas > > PhD Student > > Regulation Group > > EMBL-EBI > > Wellcome Trust Genome Campus > > Cambridge CB10 1SD > > UK > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > >-- >James W. MacDonald, M.S. >Biostatistician >Affymetrix and cDNA Microarray Core >University of Michigan Cancer Center >1500 E. Medical Center Drive >7410 CCGC >Ann Arbor MI 48109 >734-647-5623 > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 18.2 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

An alternative way to the pairs() method mentioned below is MAplot(myAffyBatch,pairs=TRUE) which will give pairwise MA plots (which tend to be a little more useful for looking at things in an Y vs X type manner). However, once you get beyond a relatively small number of arrays this plot becomes unwieldy. In that case you might consider using it this way: MAplot(myAffyBatch) in this case each array will be MA plotted against a "reference" array created using all the chips in the dataset. Ben On Thu, 2006-02-23 at 11:25 -0500, Naomi Altman wrote: > One thing I find very useful is to look at > > pairs(pm(myAffydata)) > > (This takes a while, since you are plotting lots of probes, and I > usually put in lower.panel=NULL to get only the upper triangle of plots.) > > If the arrays are comparable, then most of the data should cluster > pretty tightly on the diagonal. > > Incidentally, if some ambitious person would write a pairs routine > for hexbin, that would be both faster and more informative. > > --Naomi >

ADD REPLY • link 18.2 years ago Ben Bolstad ★ 1.2k

0

Entering edit mode

Hi Naomi, Naomi Altman wrote: > One thing I find very useful is to look at > > pairs(pm(myAffydata)) > > [snip] > > Incidentally, if some ambitious person would write a pairs routine > for hexbin, that would be both faster and more informative. You can use the "smoothScatter" function in geneplotter (thanks to Florian Hahne) for this: library("affydata") library("geneplotter") data("Dilution") x = log2(pm(Dilution[, 1:3])) pairs(x, panel=smoothScatter, add=TRUE) Best regards Wolfgang ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Fax: +44 1223 494486 Http: www.ebi.ac.uk/huber

ADD REPLY • link 18.2 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Thanks. And of course I meant to take log(pm(Affydata),2)) - otherwise it is hard to see the low expression values. --Naomi At 02:47 PM 2/23/2006, Wolfgang Huber wrote: >Hi Naomi, > > >Naomi Altman wrote: > > One thing I find very useful is to look at > > > > pairs(pm(myAffydata)) > > > > [snip] > > > > Incidentally, if some ambitious person would write a pairs routine > > for hexbin, that would be both faster and more informative. > >You can use the "smoothScatter" function in geneplotter (thanks to >Florian Hahne) for this: > >library("affydata") >library("geneplotter") >data("Dilution") > >x = log2(pm(Dilution[, 1:3])) >pairs(x, panel=smoothScatter, add=TRUE) > > >Best regards > Wolfgang > >------------------------------------- >Wolfgang Huber >European Bioinformatics Institute >European Molecular Biology Laboratory >Cambridge CB10 1SD >England >Phone: +44 1223 494642 >Fax: +44 1223 494486 >Http: www.ebi.ac.uk/huber > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 18.2 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Thank you all very much for all the answers and comments. I really appreciate them. The residuals were OK when processing all the arrays as one batch and the MAplots seem to be OK as well. The density plots were also ok, more or less same shape and all close together. So I guess I can process them together. My concerns were coming form the fact that as I'm going to use GCRMA and in the summary step all the chips are used for modeling the expression value of the probeset, mixing things with different trends would make the process less accurate (or more biased) than if all chips would have had the same slope, although I must say I'm not sure wether this is going to happen or not. Thanks very much again for all the answers. Best wishes, Juanma. On 23 Feb 2006, at 16:25, Naomi Altman wrote: > One thing I find very useful is to look at > > pairs(pm(myAffydata)) > > (This takes a while, since you are plotting lots of probes, and I > usually put in lower.panel=NULL to get only the upper triangle of > plots.) > > If the arrays are comparable, then most of the data should cluster > pretty tightly on the diagonal. > > Incidentally, if some ambitious person would write a pairs routine > for hexbin, that would be both faster and more informative. > > --Naomi > > > At 09:16 AM 2/23/2006, James W. MacDonald wrote: >> Juanma Vaquerizas wrote: >> > Dear list, >> > >> > I'm trying to analyse some Affy arrays for my PhD thesis but I'm a >> > little bit stuck, so any comments on the following would be very >> > welcome. >> > >> > Basically I'm analysing a set of Affy arrays coming form 10 >> different >> > labs (3 biological replicates per lab) where each lab is using a >> > different RNA source. I've done some quality control using affyPLM >> > and the chips seem to be ok. >> >> Is this after processing them as one batch? If the residuals look OK, >> then this is a good indication that you can process them all >> together. >> >> > >> > If I have a look at the RNA digestion plot, 2 different trends are >> > clearly visible (half of the arrays follow one trend with a slope >> > around 1 and the other half with a slope around 3). >> > >> > I want to make some contrasts between the different RNA sources >> that >> > have been used, but as I've read in (Bolstad et al., 2005, >> > Bioinformatics and Computational Biology Solutions Using R and >> > Bioconductor, Springer) and in some previous messages in this list, >> > mixing arrays with very different slopes in the RNA digestion plots >> > is not a very good idea. >> >> In my experience, the RNA degradation plots are not nearly as >> important >> as the density plots. What do they look like? Are the >> distributions all >> pretty similar in shape and fairly close together? >> >> > >> > The options I'm thinking about at the moment are the following: >> > >> > Option 1: >> > 1.- Split the arrays by the lab of origin. >> > 2.- Preprocess them separately using GCRMA. >> > 3.- Combine the resulting esets into one eset. >> > 4.- Analyse using limma, modeling for 3 factors (RNA type, lab >> > effect, trend in the RNA digestion plot) >> > 5.- Extract the contrasts I am interested in (the RNA type ones) >> > >> > Option 2: >> > 1.- Split the arrays by the trend of the RNA digestion plot. >> > 2.- Preprocess them separately using GCRMA. >> > 3.- Combine the resulting esets into one eset. >> > 4.- Analyse using limma, modeling for 3 factors (RNA type, lab >> > effect, trend in the RNA digestion plot) >> > 5.- Extract the contrasts I am interested in (the RNA type ones) >> > >> > Option 3: >> > 1.- Do not split the arrays in groups. >> > 2.- Preprocess all of them using GCRMA. >> > 3.- Analyse using limma, modeling for 3 factors (RNA type, lab >> > effect, trend in the RNA digestion plot) >> > 4.- Extract the contrasts I am interested in (the RNA type ones) >> >> I would think this is the most reasonable method, if as you say the >> residuals from affyPLM all look good. One further check you can >> make is >> to do a PCA plot of the first two PCs and see how the replicated >> samples >> are grouping. If the replicates are all grouping together it may not >> even be necessary to model the lab effect. You could use plotPCA() in >> affycoretools to do this step. >> >> > >> > >> > Unfortunately I can't figure out which would be the best way to >> > proceed, or even if modeling for the trend is something that >> would be >> > acceptable. I've seen in the vignette of the affycoretools package >> > that the arrays coming from different RNA protocols are >> preprocessed >> > separately and then mixed for the linear model, although it is not >> > clear for me why is this option better that any of the others. >> >> Well, the example in affycoretools is a very special case and >> should not >> be construed as an example that one should use for 'normal' analyses >> (which makes me wonder if I need a different example). >> >> Anyway, in that vignette the samples have been processed completely >> differently (one set amplified with the NuGen Ovation kit, and one >> using >> the normal Affy IVT kit), so there is no way they should be >> processed as >> one batch. I then stick both sets of expression values into one >> exprSet >> simply to make the linear modeling step easier. Since I use a cell >> means >> model and never make any contrasts between the groups, this >> analysis is >> equivalent to keeping the data separate and fitting two separate >> models. >> >> HTH, >> >> Jim >> >> >> > >> > On the other hand, some messages to the list last week were for >> > preprocessing all the experiments at once... >> > >> > My understanding is that there is not a clear consensus about >> what to >> > do in those cases and I don't really know the consequences and the >> > differences between following the different approaches, so any >> > comments would be very much appreciated. >> > >> > Thank you very much for your help. >> > >> > Best wishes, >> > >> > Juanma. >> > >> > >> > >> > Juanma Vaquerizas >> > PhD Student >> > Regulation Group >> > EMBL-EBI >> > Wellcome Trust Genome Campus >> > Cambridge CB10 1SD >> > UK >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at stat.math.ethz.ch >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Affymetrix and cDNA Microarray Core >> University of Michigan Cancer Center >> 1500 E. Medical Center Drive >> 7410 CCGC >> Ann Arbor MI 48109 >> 734-647-5623 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 > (Statistics) > University Park, PA 16802-2111 > Juanma Vaquerizas PhD Student Regulation Group EMBL-EBI Wellcome Trust Genome Campus Cambridge CB10 1SD UK

ADD REPLY • link 18.2 years ago Juanma Vaquerizas ▴ 20

Login before adding your answer.