how to normalize by columns
2
0
Entering edit mode
diego huck ▴ 40
@diego-huck-1271
Last seen 7.1 years ago
Hello I am a beginner at bioconductor and R. I have a confussion about how to do a normalization which consist of obtain the mean of a column, and then substract the mean of the column to each value in the column. x1(1)- mean(col x1) x2(1)- mean(col x2) x1(2)- mean(col x1) x2(2)- mean(col x2) x1(3)- mean(col x1) x2(3)- mean(col x2) .................... ................... I have the genes in columns and the conditions in rows. I don't want to stabilize the variance. As you can see is a very simple calculation. I am wondering if could use packages like vsn or affy to do that or is more easy to write a script. Futhermore, I have a doubt if such simple normalization is conceptually correct whith the objetive of eliminate the effect between array. I would to know if I have to iterate any numbers of times the process o f calculate the mean of each column and substract the mean. Thank you diego lugro studient universidad de buenos aires argentina
Normalization affy vsn Normalization affy vsn • 921 views
0
Entering edit mode
David Kipling ▴ 110
@david-kipling-1252
Last seen 7.1 years ago
On 26 May 2005, at 07:12, diego huck wrote: > > Hello > > I am a beginner at bioconductor and R. I have a confussion about how > to do a normalization which consist of obtain the mean of a column, > and then substract the mean of the column to each value in the column. > x1(1)- mean(col x1) x2(1)- mean(col x2) > x1(2)- mean(col x1) x2(2)- mean(col x2) > x1(3)- mean(col x1) x2(3)- mean(col x2) > .................... ................... > > > I have the genes in columns and the conditions in rows. That is fine, although unusual. Be aware that many of the BioC (and similar) microarray packages use a rows=genes, columns=samples convention. Although this perhaps wouldn't be the way a statistician would arrange subjects and measurements in a table in R, I think it is partly a historical carry-over from microarray data analysis in spreadsheets and the like. Excel has a 256 column x 65000(ish) row size limit, so you are pretty much stuck with one layout! If you ever need to rotate your data then this is easy: use the t() function. newArray <- t(oldArray) > I don't want to stabilize the variance. If you did, the vsn package will do this. > As you can see is a very simple calculation. > I am wondering if could use packages like vsn or affy to do that or > is more easy to write a script. You can do this yourself very easy, as this code snippet shows: # Make a spoof array of 100 genes and 20 samples to demonstrate x <- matrix(runif(2000), ncol=100) # Calculate the mean of each column. Note: you could us median here to make it slightly more robust colMeans <- apply(x, 2, mean) # Subtrate the column means from each value in that column x <- sweep(x, 2, colMeans, "-") # You can do a similar version to subtrate the row means; simply change the second value of both apply() and sweep() to "1". # Alternatively, if you wanted to do division as opposed to subtraction use x <- sweep(x, 2, colMeans, "/") > Futhermore, I have a doubt if such simple normalization is > conceptually correct whith the objetive of eliminate the effect > between array. > I would to know if I have to iterate any numbers of times the process > o f calculate the mean of each column and substract the mean. > Subtracting the mean from each column will make the new mean of each column zero, so one cycle is enough. Hope this helps. David Prof David Kipling Department of Pathology School of Medicine Cardiff University Heath Park Cardiff CF14 4XN Tel: 029 2074 4847 Email: KiplingD@cardiff.ac.uk
0
Entering edit mode
Thank you David, this commands were very useful. Thank you Gordon for your comments, I?ll go to see the again the statistics theory. best regards diego David Kipling wrote: > > > On 26 May 2005, at 07:12, diego huck wrote: > >> >> Hello >> >> I am a beginner at bioconductor and R. I have a confussion about how >> to do a normalization which consist of obtain the mean of a column, >> and then substract the mean of the column to each value in the column. >> x1(1)- mean(col x1) x2(1)- mean(col x2) >> x1(2)- mean(col x1) x2(2)- mean(col x2) >> x1(3)- mean(col x1) x2(3)- mean(col x2) >> .................... ................... >> >> >> I have the genes in columns and the conditions in rows. > > > That is fine, although unusual. Be aware that many of the BioC (and > similar) microarray packages use a rows=genes, columns=samples > convention. Although this perhaps wouldn't be the way a statistician > would arrange subjects and measurements in a table in R, I think it is > partly a historical carry-over from microarray data analysis in > spreadsheets and the like. Excel has a 256 column x 65000(ish) row size > limit, so you are pretty much stuck with one layout! > > If you ever need to rotate your data then this is easy: use the t() > function. > > newArray <- t(oldArray) > > >> I don't want to stabilize the variance. > > > If you did, the vsn package will do this. > >> As you can see is a very simple calculation. >> I am wondering if could use packages like vsn or affy to do that or >> is more easy to write a script. > > > You can do this yourself very easy, as this code snippet shows: > > > # Make a spoof array of 100 genes and 20 samples to demonstrate > x <- matrix(runif(2000), ncol=100) > > # Calculate the mean of each column. Note: you could us median here > to make it slightly more robust > colMeans <- apply(x, 2, mean) > > # Subtrate the column means from each value in that column > x <- sweep(x, 2, colMeans, "-") > > # You can do a similar version to subtrate the row means; simply > change the second value of both apply() and sweep() to "1". > # Alternatively, if you wanted to do division as opposed to > subtraction use > x <- sweep(x, 2, colMeans, "/") > > >> Futhermore, I have a doubt if such simple normalization is >> conceptually correct whith the objetive of eliminate the effect >> between array. >> I would to know if I have to iterate any numbers of times the process >> o f calculate the mean of each column and substract the mean. >> > > Subtracting the mean from each column will make the new mean of each > column zero, so one cycle is enough. > > Hope this helps. > > David > > Prof David Kipling > Department of Pathology > School of Medicine > Cardiff University > Heath Park > Cardiff CF14 4XN > > Tel: 029 2074 4847 > Email: KiplingD@cardiff.ac.uk > >
0
Entering edit mode
@gordon-smyth
Last seen 4 minutes ago
WEHI, Melbourne, Australia
> Date: Thu, 26 May 2005 03:12:41 -0300 > From: diego huck <diegolugro@yahoo.com.ar> > Subject: [BioC] how to normalize by columns > To: bioconductor@stat.math.ethz.ch > Message-ID: <429568D9.1050308@yahoo.com.ar> > Content-Type: text/plain; charset=us-ascii; format=flowed > > > Hello > > I am a beginner at bioconductor and R. I have a confussion about how > to do a normalization which consist of obtain the mean of a column, and > then substract the mean of the column to each value in the column. > x1(1)- mean(col x1) x2(1)- mean(col x2) > x1(2)- mean(col x1) x2(2)- mean(col x2) > x1(3)- mean(col x1) x2(3)- mean(col x2) > .................... ................... > I have the genes in columns and the conditions in rows. If you were subtracting condition means, then this would be similar to method="median" of the normalizeWithinArrays() function in the limma package. However, subtracting genewise means is not likely to be a useful normalization method for any sort of expression data. > I don't want to stabilize the variance. > As you can see is a very simple calculation. > I am wondering if could use packages like vsn or affy to do that or is > more easy to write a script. > Futhermore, I have a doubt if such simple normalization is > conceptually correct whith the objetive of eliminate the effect between > array. If you don't think it's right, why do it? Why not do use one of the methods provided with a proven track record? If you want suggestions from BioC people, you could start by explaining exactly what your data is -- microarray, PCR, one channel, two channel, log-expression, log-ratios?? Gordon > I would to know if I have to iterate any numbers of times the process > o f calculate the mean of each column and substract the mean. > > Thank you > > diego lugro > studient > universidad de buenos aires > argentina