sort the difference and save to individual files problem

0

Entering edit mode

Dr Gyorffy Balazs ▴ 320

@dr-gyorffy-balazs-619

Last seen 11.3 years ago

I have a table with gene expression data: Smpl 1 Smpl 2 Smpl 3 Gene 1 2 3 2 Gene 2 4 6 8 Gene 3 6 9 10 ? [1.] I would like to construct a table/list of differences of every sample minus every other sample. For example: Smpl 2 - Smpl 3: Gene 1 1 Gene 2 -2 Gene 3 -1 ... [2.] I would like to sort all these tables/lists decreasingly Sample 2 ? Sample 3 Gene 1 1 Gene 3 -1 Gene 2 -2 ... [3.] I have all together 10 samples, so the expected outputs are 10x10=100 tables with one column. I would like to save this result in 100 files (for example in a tab separated output). Is this possible? I can do every individual sample vs other sample by simple R commands (-, sort, etc), but that?s just too much work. Maybe somenone can help me to make this in an elegant way. (I want to compare already calculated mean values of the samples without additional significance tests. Therefore I don?t want to use sam, t-test, or other statistical method.) Thank you for the help Balazs Gy?rffy

• 915 views

ADD COMMENT • link updated 21.4 years ago by Adaikalavan Ramasamy ★ 1.8k • written 21.4 years ago by Dr Gyorffy Balazs ▴ 320

0

Entering edit mode

Adaikalavan Ramasamy ★ 1.8k

@adaikalavan-ramasamy-675

Last seen 11.3 years ago

What you are looking for is pairwise difference between all pairs of columns. Also you only have choose(n, 2) = n * (n-1) / 2 pairs which in your case means 45 pairs not 100. The below works but is inefficient for large number of columns. I would be interested if anyone can suggest how to rewrite this using the apply() family. [Is this question more appropriate for R-help ?] pairwise.difference <- function(m){ npairs <- choose( ncol(m), 2 ) results <- matrix( NA, nc=npairs, nr=nrow(m) ) cnames <- rep(NA, npairs) if(is.null(colnames(m))) colnames(m) <- paste("col", 1:ncol(m), sep="") k <- 1 for(i in 1:ncol(m)){ for(j in 1:ncol(m)){ if(j <= i) next; results[ ,k] <- m[ ,i] - m[ ,j] cnames[k] <- paste(colnames(m)[ c(i, j) ], collapse=".vs.") k <- k + 1 } } colnames(results) <- cnames return(results) } # Example mat <- matrix( sample(1:12), nc=4 ) colnames(mat) <- LETTERS[1:4] mat A B C D [1,] 10 6 3 5 [2,] 7 11 2 12 [3,] 1 8 9 4 pairwise.difference(mat) A.vs.B A.vs.C A.vs.D B.vs.C B.vs.D C.vs.D [1,] 4 7 5 3 1 -2 [2,] -4 5 -5 9 -1 -10 [3,] -7 -8 -3 -1 4 5 It is more efficient to store 1 file with 45 columns than 45 files with one column. On Fri, 2004-07-30 at 11:14, Dr_Gyorffy_Balazs wrote: > I have a table with gene expression data: > > Smpl 1 Smpl 2 Smpl 3 > Gene 1 2 3 2 > Gene 2 4 6 8 > Gene 3 6 9 10 > ? > > [1.] I would like to construct a table/list of differences > of every sample minus every other sample. For example: > > Smpl 2 - Smpl 3: > > Gene 1 1 > Gene 2 -2 > Gene 3 -1 > ... > > [2.] I would like to sort all these tables/lists > decreasingly > > Sample 2 ? Sample 3 > > Gene 1 1 > Gene 3 -1 > Gene 2 -2 > ... > > [3.] I have all together 10 samples, so the expected > outputs are 10x10=100 tables with one column. I would like > to save this result in 100 files (for example in a tab > separated output). > > Is this possible? > > I can do every individual sample vs other sample by simple > R commands (-, sort, etc), but that?s just too much work. > Maybe somenone can help me to make this in an elegant way. > > (I want to compare already calculated mean values of the > samples without additional significance tests. Therefore I > don?t want to use sam, t-test, or other statistical > method.) > > Thank you for the help > Balazs Gy?rffy > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 21.4 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

Adaikalavan Ramasamy ★ 1.8k

@adaikalavan-ramasamy-675

Last seen 11.3 years ago

Put everything in a matrix and then use apply() family to find index with highest. You need to add one more line to my function just before the return(results) : rownames(results) <- rownames(m) so you output will have rownames. Then something like this would work. pairwise.difference <- function(m){ npairs <- choose( ncol(m), 2 ) results <- matrix( NA, nc=npairs, nr=nrow(m) ) cnames <- rep(NA, npairs) if( is.null(colnames(m)) ) colnames(m) <- paste("col", 1:ncol(m), sep="") k <- 1 for(i in 1:ncol(m)){ for(j in 1:ncol(m)){ if(j <= i) next; results[ ,k] <- m[ ,i] - m[ ,j] cnames[k] <- paste(colnames(m)[ c(i, j) ], collapse=".vs.") k <- k + 1 } } colnames(results) <- cnames rownames(results) <- rownames(m) return(results) } # Example using a matrix with 5 gene/row and 4 columns mat <- matrix( sample(1:20), nc=4 ) colnames(mat) <- LETTERS[1:4] rownames(mat) <- paste( "g", 1:5, sep="") mat A B C D g1 10 16 3 15 g2 18 5 12 19 g3 7 4 8 13 g4 14 2 6 11 g5 17 1 20 9 (out <- pairwise.difference(mat)) A.vs.B A.vs.C A.vs.D B.vs.C B.vs.D C.vs.D g1 -6 7 -5 13 1 -12 g2 13 6 -1 -7 -14 -7 g3 3 -1 -6 -4 -9 -5 g4 12 8 3 -4 -9 -5 g5 16 -3 8 -19 -8 11 # Now show the 3 genes with largest absolute value in each column apply(abs(out), 2, function(x) names(x[order(-x)]) [ 1:3 ]) A.vs.B A.vs.C A.vs.D B.vs.C B.vs.D C.vs.D [1,] "g5" "g4" "g5" "g5" "g2" "g1" [2,] "g2" "g1" "g3" "g1" "g3" "g5" [3,] "g4" "g2" "g1" "g2" "g4" "g2" This says that g5 had the largest absolute difference between A and B followed by g2 and so on. If you want the whole list, remove the [ 1:3 ] part from the code above. Viewing this output is easier than viewing 100 files and lets you see the genes that are picked up most frequently. On Fri, 2004-07-30 at 13:24, Dr_Gyorffy_Balazs wrote: > Dear Adaikalavan, > > thank you for the help! > > However, this way I have a big table with all the data in > it. The problem is, that I have also the gene names (in the > first column of the initial table), and I would like to > have not only the differnce, but also the ranked difference > with the gene names. So at the end I would know, which gene > had the biggest difference (or the smallest). I was > thinking to save in different files in order to keep the > gene names. > > (You are right, I really don't need 100 columns. It seemed > for me more simple to construct the function to get 100 > results instead of correcting for simmetrical- and > self-tests. :-)) > > Balazs > > > > > > ___________________________________________________________ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com >

ADD COMMENT • link 21.4 years ago Adaikalavan Ramasamy ★ 1.8k

Login before adding your answer.