Hi to all,
I would like to plot signal distribution from miRNA array experiment,
one only plot for all arrays. Any suggestion on how to perform it? Is
there some easy way to check how this data fit a normal distribution
(e.g. with K-S test)?
Data comes from Agilent Human miRNA v.2, normalized with quantile
method and log2 transformed.
Thanks in advance,
Andrea
hist(data[,arrayNumer], breaks=100)
where data is your matrix of expression values and arrayNumber is
whatever column you want to print a histogram for, breaks is the
number of bars on the histogram. You could also use a density plot but
that has smoothing applied which may not be what you want.
Shapiro?Wilk test is a commonly used test for normality.
Paul.
On Wed, Aug 3, 2011 at 10:52 AM, <andrea.grilli at="" ior.it=""> wrote:
> Hi to all,
>
> I would like to plot signal distribution from miRNA array
experiment, one
> only plot for all arrays. Any suggestion on how to perform it? Is
there some
> easy way to check how this data fit a normal distribution (e.g. with
K-S
> test)?
>
> Data comes from Agilent Human miRNA v.2, normalized with quantile
method and
> log2 transformed.
>
> Thanks in advance,
> Andrea
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Paul Geeleher (PhD Student)
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland
--
www.bioinformaticstutorials.com
Hi Andrea,
As far as plotting, you would plot it as you would any other
microarray data.
Since you are interested in distribution across genes in a single
array, then some subsetting of your expression matrix like
hist(mydata[,1]) would give you a histogram of the first sample
(assuming samples are stored column-wise as is typical). Or you could
do plot(density()) instead of hist().
But my primary reason for responding to your question is to ask why
you would assume the distribution would or should be normal across
mirna's in a GIVEN sample. It is hard for me to think why it should be
(biologically), and there is no statistical requirement for it to be.
(Quantile normalization won't impose normality, but the log2 transform
will make the distribution more symmetric.)
I have attached some plots from an Affy mirna (2.0) mouse experiment
that I happened to have open in R right now. The QC pdf contains plots
before normalizing, and the other is after normalizing. (Note that it
is really not necessary to plot each density after quantile
normalization, they must look at the same by definition. I just did it
since you asked about plotting each one separately). Notice how the
distribution is long tailed.
BTW, here is sample code for the density plot:
par(mfrow=c(3,2),pty="s")
for(i in 1:6){
plot(density(exprs(mirna.norm_mouse)[,i]))
}
Wade
-----Original Message-----
From: andrea.grilli@ior.it [mailto:andrea.grilli@ior.it]
Sent: Wednesday, August 03, 2011 4:53 AM
To: bioconductor at r-project.org
Subject: [BioC] Plot signal distribution in miRNA arrays
Hi to all,
I would like to plot signal distribution from miRNA array experiment,
one only plot for all arrays. Any suggestion on how to perform it? Is
there some easy way to check how this data fit a normal distribution
(e.g. with K-S test)?
Data comes from Agilent Human miRNA v.2, normalized with quantile
method and log2 transformed.
Thanks in advance,
Andrea
-------------- next part --------------
A non-text attachment was scrubbed...
Name: myownQC.pdf
Type: application/pdf
Size: 218821 bytes
Desc: myownQC.pdf
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20110803="" 972cc92a="" attachment.pdf="">
-------------- next part --------------
A non-text attachment was scrubbed...
Name: example.pdf
Type: application/pdf
Size: 68816 bytes
Desc: example.pdf
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20110803="" 972cc92a="" attachment-0001.pdf="">
Hi Davis,
thank you for your exhaustive reply.
You are right, reason for this check is mainly statistic, because
after quantile normalization data was analyzed with t-test. I know
that this kind of test is "robust" to violation of normal
distribution, but I wanted to check my data (I know should have been
better doing these steps before the analysis...). Log2 transformation
improved symmetry of my data, but we are far from normal distribution.
I've a further question: do you think that using the mean of each
probe along all arrays could be a good resume on the general
distribution of the signals in my data? or this approach is altering
the results?
Thank you so much for your help and for your files, I'm pretty new to
Bioconductor and this is first time I try to perform this kind of
control.
Andrea
Citando "Davis, Wade" <davisjwa at="" health.missouri.edu="">:
> Hi Andrea,
> As far as plotting, you would plot it as you would any other
microarray data.
> Since you are interested in distribution across genes in a single
> array, then some subsetting of your expression matrix like
> hist(mydata[,1]) would give you a histogram of the first sample
> (assuming samples are stored column-wise as is typical). Or you
> could do plot(density()) instead of hist().
>
> But my primary reason for responding to your question is to ask why
> you would assume the distribution would or should be normal across
> mirna's in a GIVEN sample. It is hard for me to think why it should
> be (biologically), and there is no statistical requirement for it to
> be. (Quantile normalization won't impose normality, but the log2
> transform will make the distribution more symmetric.)
>
> I have attached some plots from an Affy mirna (2.0) mouse experiment
> that I happened to have open in R right now. The QC pdf contains
> plots before normalizing, and the other is after normalizing. (Note
> that it is really not necessary to plot each density after quantile
> normalization, they must look at the same by definition. I just did
> it since you asked about plotting each one separately). Notice how
> the distribution is long tailed.
>
> BTW, here is sample code for the density plot:
>
> par(mfrow=c(3,2),pty="s")
> for(i in 1:6){
> plot(density(exprs(mirna.norm_mouse)[,i]))
> }
>
>
> Wade
>
>
>
>
> -----Original Message-----
> From: andrea.grilli at ior.it [mailto:andrea.grilli at ior.it]
> Sent: Wednesday, August 03, 2011 4:53 AM
> To: bioconductor at r-project.org
> Subject: [BioC] Plot signal distribution in miRNA arrays
>
> Hi to all,
>
> I would like to plot signal distribution from miRNA array
experiment,
> one only plot for all arrays. Any suggestion on how to perform it?
Is
> there some easy way to check how this data fit a normal distribution
> (e.g. with K-S test)?
>
> Data comes from Agilent Human miRNA v.2, normalized with quantile
> method and log2 transformed.
>
> Thanks in advance,
> Andrea
>
>
>
Dr. Andrea Grilli
andrea.grilli at ior.it
phone 051/63.66.756
Laboratory of Experimental Oncology
Rizzoli Orthopaedic Institute
Codivilla Putti Research Center
via di Barbiano 1/10
40136 - Bologna - Italy
Hi Andrea,
You are right about the robustness of the t-test.
You had asked about the distribution across microRNAs on each array,
but based on your questions, I think you should be asking about the
distribution across arrays of each microRNA. There is a big difference
in those distributions, as you can imagine. Plotting and examining
each one of those microRNAs would be tedious. If your sample size is
"large" (large is subjective, but I recall some paper with simulations
showing good robustness with n>15 for unimodal data), then you can
appeal to the central limit theorem that the distribution of the
sample mean is approx normally distributed, regardless of parent
distribution. If the sample size is small (n<5), making any reliable
conclusion about the distribution based on KS test or plots is very
unreliable in my opinion. A brief Google search turned up a paper
(http://www.ukm.my/jsm/pdf_files/SM-PDF-40-6-2011/15%20NorAishah.pdf)
which has a nice little simulation study on the power of normality
tests for the 4 most common tests. At the smallest sample size they
reported (n=39), the BEST power was 28%, where the alternative was
from a chi-square w/3 df....
Wade
________________________________________
From: andrea.grilli@ior.it [andrea.grilli@ior.it]
Sent: Thursday, August 04, 2011 4:10 AM
To: Davis, Wade
Cc: bioconductor at r-project.org
Subject: RE: [BioC] Plot signal distribution in miRNA arrays
Hi Davis,
thank you for your exhaustive reply.
You are right, reason for this check is mainly statistic, because
after quantile normalization data was analyzed with t-test. I know
that this kind of test is "robust" to violation of normal
distribution, but I wanted to check my data (I know should have been
better doing these steps before the analysis...). Log2 transformation
improved symmetry of my data, but we are far from normal distribution.
I've a further question: do you think that using the mean of each
probe along all arrays could be a good resume on the general
distribution of the signals in my data? or this approach is altering
the results?
Thank you so much for your help and for your files, I'm pretty new to
Bioconductor and this is first time I try to perform this kind of
control.
Andrea
Citando "Davis, Wade" <davisjwa at="" health.missouri.edu="">:
> Hi Andrea,
> As far as plotting, you would plot it as you would any other
microarray data.
> Since you are interested in distribution across genes in a single
> array, then some subsetting of your expression matrix like
> hist(mydata[,1]) would give you a histogram of the first sample
> (assuming samples are stored column-wise as is typical). Or you
> could do plot(density()) instead of hist().
>
> But my primary reason for responding to your question is to ask why
> you would assume the distribution would or should be normal across
> mirna's in a GIVEN sample. It is hard for me to think why it should
> be (biologically), and there is no statistical requirement for it to
> be. (Quantile normalization won't impose normality, but the log2
> transform will make the distribution more symmetric.)
>
> I have attached some plots from an Affy mirna (2.0) mouse experiment
> that I happened to have open in R right now. The QC pdf contains
> plots before normalizing, and the other is after normalizing. (Note
> that it is really not necessary to plot each density after quantile
> normalization, they must look at the same by definition. I just did
> it since you asked about plotting each one separately). Notice how
> the distribution is long tailed.
>
> BTW, here is sample code for the density plot:
>
> par(mfrow=c(3,2),pty="s")
> for(i in 1:6){
> plot(density(exprs(mirna.norm_mouse)[,i]))
> }
>
>
> Wade
>
>
>
>
> -----Original Message-----
> From: andrea.grilli at ior.it [mailto:andrea.grilli at ior.it]
> Sent: Wednesday, August 03, 2011 4:53 AM
> To: bioconductor at r-project.org
> Subject: [BioC] Plot signal distribution in miRNA arrays
>
> Hi to all,
>
> I would like to plot signal distribution from miRNA array
experiment,
> one only plot for all arrays. Any suggestion on how to perform it?
Is
> there some easy way to check how this data fit a normal distribution
> (e.g. with K-S test)?
>
> Data comes from Agilent Human miRNA v.2, normalized with quantile
> method and log2 transformed.
>
> Thanks in advance,
> Andrea
>
>
>
Dr. Andrea Grilli
andrea.grilli at ior.it
phone 051/63.66.756
Laboratory of Experimental Oncology
Rizzoli Orthopaedic Institute
Codivilla Putti Research Center
via di Barbiano 1/10
40136 - Bologna - Italy