Plot signal distribution in miRNA arrays

0

Entering edit mode

Andrea Grilli ▴ 240

@andrea-grilli-4664

Last seen 10.5 years ago

Italy, Bologna, Rizzoli Orthopaedic Ins…

Hi to all, I would like to plot signal distribution from miRNA array experiment, one only plot for all arrays. Any suggestion on how to perform it? Is there some easy way to check how this data fit a normal distribution (e.g. with K-S test)? Data comes from Agilent Human miRNA v.2, normalized with quantile method and log2 transformed. Thanks in advance, Andrea

miRNA miRNA • 1.6k views

ADD COMMENT • link updated 14.4 years ago by Davis, Wade ▴ 350 • written 14.4 years ago by Andrea Grilli ▴ 240

0

Entering edit mode

Paul Geeleher ★ 1.3k

@paul-geeleher-2679

Last seen 11.3 years ago

hist(data[,arrayNumer], breaks=100) where data is your matrix of expression values and arrayNumber is whatever column you want to print a histogram for, breaks is the number of bars on the histogram. You could also use a density plot but that has smoothing applied which may not be what you want. Shapiro?Wilk test is a commonly used test for normality. Paul. On Wed, Aug 3, 2011 at 10:52 AM, <andrea.grilli at="" ior.it=""> wrote: > Hi to all, > > I would like to plot signal distribution from miRNA array experiment, one > only plot for all arrays. Any suggestion on how to perform it? Is there some > easy way to check how this data fit a normal distribution (e.g. with K-S > test)? > > Data comes from Agilent Human miRNA v.2, normalized with quantile method and > log2 transformed. > > Thanks in advance, > Andrea > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Paul Geeleher (PhD Student) School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com

ADD COMMENT • link 14.4 years ago Paul Geeleher ★ 1.3k

0

Entering edit mode

Davis, Wade ▴ 350

@davis-wade-2803

Last seen 11.3 years ago

Hi Andrea, As far as plotting, you would plot it as you would any other microarray data. Since you are interested in distribution across genes in a single array, then some subsetting of your expression matrix like hist(mydata[,1]) would give you a histogram of the first sample (assuming samples are stored column-wise as is typical). Or you could do plot(density()) instead of hist(). But my primary reason for responding to your question is to ask why you would assume the distribution would or should be normal across mirna's in a GIVEN sample. It is hard for me to think why it should be (biologically), and there is no statistical requirement for it to be. (Quantile normalization won't impose normality, but the log2 transform will make the distribution more symmetric.) I have attached some plots from an Affy mirna (2.0) mouse experiment that I happened to have open in R right now. The QC pdf contains plots before normalizing, and the other is after normalizing. (Note that it is really not necessary to plot each density after quantile normalization, they must look at the same by definition. I just did it since you asked about plotting each one separately). Notice how the distribution is long tailed. BTW, here is sample code for the density plot: par(mfrow=c(3,2),pty="s") for(i in 1:6){ plot(density(exprs(mirna.norm_mouse)[,i])) } Wade -----Original Message----- From: andrea.grilli@ior.it [mailto:andrea.grilli@ior.it] Sent: Wednesday, August 03, 2011 4:53 AM To: bioconductor at r-project.org Subject: [BioC] Plot signal distribution in miRNA arrays Hi to all, I would like to plot signal distribution from miRNA array experiment, one only plot for all arrays. Any suggestion on how to perform it? Is there some easy way to check how this data fit a normal distribution (e.g. with K-S test)? Data comes from Agilent Human miRNA v.2, normalized with quantile method and log2 transformed. Thanks in advance, Andrea -------------- next part -------------- A non-text attachment was scrubbed... Name: myownQC.pdf Type: application/pdf Size: 218821 bytes Desc: myownQC.pdf URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20110803="" 972cc92a="" attachment.pdf=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: example.pdf Type: application/pdf Size: 68816 bytes Desc: example.pdf URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20110803="" 972cc92a="" attachment-0001.pdf="">

ADD COMMENT • link 14.4 years ago Davis, Wade ▴ 350

0

Entering edit mode

Andrea Grilli ▴ 240

@andrea-grilli-4664

Last seen 10.5 years ago

Italy, Bologna, Rizzoli Orthopaedic Ins…

Hi Davis, thank you for your exhaustive reply. You are right, reason for this check is mainly statistic, because after quantile normalization data was analyzed with t-test. I know that this kind of test is "robust" to violation of normal distribution, but I wanted to check my data (I know should have been better doing these steps before the analysis...). Log2 transformation improved symmetry of my data, but we are far from normal distribution. I've a further question: do you think that using the mean of each probe along all arrays could be a good resume on the general distribution of the signals in my data? or this approach is altering the results? Thank you so much for your help and for your files, I'm pretty new to Bioconductor and this is first time I try to perform this kind of control. Andrea Citando "Davis, Wade" <davisjwa at="" health.missouri.edu="">: > Hi Andrea, > As far as plotting, you would plot it as you would any other microarray data. > Since you are interested in distribution across genes in a single > array, then some subsetting of your expression matrix like > hist(mydata[,1]) would give you a histogram of the first sample > (assuming samples are stored column-wise as is typical). Or you > could do plot(density()) instead of hist(). > > But my primary reason for responding to your question is to ask why > you would assume the distribution would or should be normal across > mirna's in a GIVEN sample. It is hard for me to think why it should > be (biologically), and there is no statistical requirement for it to > be. (Quantile normalization won't impose normality, but the log2 > transform will make the distribution more symmetric.) > > I have attached some plots from an Affy mirna (2.0) mouse experiment > that I happened to have open in R right now. The QC pdf contains > plots before normalizing, and the other is after normalizing. (Note > that it is really not necessary to plot each density after quantile > normalization, they must look at the same by definition. I just did > it since you asked about plotting each one separately). Notice how > the distribution is long tailed. > > BTW, here is sample code for the density plot: > > par(mfrow=c(3,2),pty="s") > for(i in 1:6){ > plot(density(exprs(mirna.norm_mouse)[,i])) > } > > > Wade > > > > > -----Original Message----- > From: andrea.grilli at ior.it [mailto:andrea.grilli at ior.it] > Sent: Wednesday, August 03, 2011 4:53 AM > To: bioconductor at r-project.org > Subject: [BioC] Plot signal distribution in miRNA arrays > > Hi to all, > > I would like to plot signal distribution from miRNA array experiment, > one only plot for all arrays. Any suggestion on how to perform it? Is > there some easy way to check how this data fit a normal distribution > (e.g. with K-S test)? > > Data comes from Agilent Human miRNA v.2, normalized with quantile > method and log2 transformed. > > Thanks in advance, > Andrea > > > Dr. Andrea Grilli andrea.grilli at ior.it phone 051/63.66.756 Laboratory of Experimental Oncology Rizzoli Orthopaedic Institute Codivilla Putti Research Center via di Barbiano 1/10 40136 - Bologna - Italy

ADD COMMENT • link 14.3 years ago Andrea Grilli ▴ 240

0

Entering edit mode

Hi Andrea, You are right about the robustness of the t-test. You had asked about the distribution across microRNAs on each array, but based on your questions, I think you should be asking about the distribution across arrays of each microRNA. There is a big difference in those distributions, as you can imagine. Plotting and examining each one of those microRNAs would be tedious. If your sample size is "large" (large is subjective, but I recall some paper with simulations showing good robustness with n>15 for unimodal data), then you can appeal to the central limit theorem that the distribution of the sample mean is approx normally distributed, regardless of parent distribution. If the sample size is small (n<5), making any reliable conclusion about the distribution based on KS test or plots is very unreliable in my opinion. A brief Google search turned up a paper (http://www.ukm.my/jsm/pdf_files/SM-PDF-40-6-2011/15%20NorAishah.pdf) which has a nice little simulation study on the power of normality tests for the 4 most common tests. At the smallest sample size they reported (n=39), the BEST power was 28%, where the alternative was from a chi-square w/3 df.... Wade ________________________________________ From: andrea.grilli@ior.it [andrea.grilli@ior.it] Sent: Thursday, August 04, 2011 4:10 AM To: Davis, Wade Cc: bioconductor at r-project.org Subject: RE: [BioC] Plot signal distribution in miRNA arrays Hi Davis, thank you for your exhaustive reply. You are right, reason for this check is mainly statistic, because after quantile normalization data was analyzed with t-test. I know that this kind of test is "robust" to violation of normal distribution, but I wanted to check my data (I know should have been better doing these steps before the analysis...). Log2 transformation improved symmetry of my data, but we are far from normal distribution. I've a further question: do you think that using the mean of each probe along all arrays could be a good resume on the general distribution of the signals in my data? or this approach is altering the results? Thank you so much for your help and for your files, I'm pretty new to Bioconductor and this is first time I try to perform this kind of control. Andrea Citando "Davis, Wade" <davisjwa at="" health.missouri.edu="">: > Hi Andrea, > As far as plotting, you would plot it as you would any other microarray data. > Since you are interested in distribution across genes in a single > array, then some subsetting of your expression matrix like > hist(mydata[,1]) would give you a histogram of the first sample > (assuming samples are stored column-wise as is typical). Or you > could do plot(density()) instead of hist(). > > But my primary reason for responding to your question is to ask why > you would assume the distribution would or should be normal across > mirna's in a GIVEN sample. It is hard for me to think why it should > be (biologically), and there is no statistical requirement for it to > be. (Quantile normalization won't impose normality, but the log2 > transform will make the distribution more symmetric.) > > I have attached some plots from an Affy mirna (2.0) mouse experiment > that I happened to have open in R right now. The QC pdf contains > plots before normalizing, and the other is after normalizing. (Note > that it is really not necessary to plot each density after quantile > normalization, they must look at the same by definition. I just did > it since you asked about plotting each one separately). Notice how > the distribution is long tailed. > > BTW, here is sample code for the density plot: > > par(mfrow=c(3,2),pty="s") > for(i in 1:6){ > plot(density(exprs(mirna.norm_mouse)[,i])) > } > > > Wade > > > > > -----Original Message----- > From: andrea.grilli at ior.it [mailto:andrea.grilli at ior.it] > Sent: Wednesday, August 03, 2011 4:53 AM > To: bioconductor at r-project.org > Subject: [BioC] Plot signal distribution in miRNA arrays > > Hi to all, > > I would like to plot signal distribution from miRNA array experiment, > one only plot for all arrays. Any suggestion on how to perform it? Is > there some easy way to check how this data fit a normal distribution > (e.g. with K-S test)? > > Data comes from Agilent Human miRNA v.2, normalized with quantile > method and log2 transformed. > > Thanks in advance, > Andrea > > > Dr. Andrea Grilli andrea.grilli at ior.it phone 051/63.66.756 Laboratory of Experimental Oncology Rizzoli Orthopaedic Institute Codivilla Putti Research Center via di Barbiano 1/10 40136 - Bologna - Italy

ADD REPLY • link 14.3 years ago Davis, Wade ▴ 350

Login before adding your answer.