genereating correlation matrix from gene expression data
2
0
Entering edit mode
pankaj borah ▴ 120
@pankaj-borah-3804
Last seen 9.7 years ago
Hi, Thanks for the previous suggestion. I am wondering how to write only the $R values to a output matrix? Here we have both $R and $P values. Regards, Pankaj Barah Department of Biology,  Norwegian University of Science & Technology (NTNU)  Realfagbygget, N-7491 Trondheim, Norway --- On Thu, 19/11/09, Robert Castelo <robert.castelo@upf.edu> wrote: From: Robert Castelo <robert.castelo@upf.edu> Subject: Re: [BioC] genereating correlation matrix from gene expression data To: "pankaj borah" <pankajborah2k3@yahoo.co.in> Cc: bioconductor@stat.math.ethz.ch Date: Thursday, 19 November, 2009, 2:33 PM Pankaj, in addition to what Steve provided below, if you are dealing with expression data, it might be useful for you the function qpPCC from the bioconductor package 'qpgraph'. it performs the same calculation as the 'cor' function illustrated in Steve's message but in addition it accepts as input parameter not only a matrix or a data frame but also an ExpressionSet object (not a big deal) and, more interestingly, it calculates for you the P-values of a two sided test for the null hypothesis of zero Pearson correlation. this latter feature might be useful since as far as i know if one wants to test every pair of genes and get the P-values one needs to call 'cor.test' for each pair of genes within a two nested for loops or some 'apply' call and this function 'qpPCC' does it using matrix calculations and thus is faster for that purpose. of course, if you are interested in determining significance through the P-values you will have to deal with the multiple testing problem. library(qpgraph) set.seed(123) x <- matrix(rnorm(20), 4, 5) round(x, digits=3)        [,1]   [,2]   [,3]   [,4]   [,5] [1,] -0.560  0.129 -0.687  0.401  0.498 [2,] -0.230  1.715 -0.446  0.111 -1.967 [3,]  1.559  0.461  1.224 -0.556  0.701 [4,]  0.071 -1.265  0.360  1.787 -0.473 lapply(qpPCC(x), round, digits=3) $R        1      2      3      4      5 1  1.000 -0.016  0.958 -0.490  0.437 2 -0.016  1.000 -0.271 -0.753 -0.462 3  0.958 -0.271  1.000 -0.218  0.431 4 -0.490 -0.753 -0.218  1.000 -0.198 5  0.437 -0.462  0.431 -0.198  1.000 $P       1    2     3     4     5 1 0.000 0.984 0.042 0.510 0.563 2 0.984 0.000 0.729 0.247 0.538 3 0.042 0.729 0.000 0.782 0.569 4 0.510 0.247 0.782 0.000 0.802 5 0.563 0.538 0.569 0.802 0.000 cor.test(x[,1], x[,2])         Pearson's product-moment correlation data:  x[, 1] and x[, 2] t = -0.0231, df = 2, p-value = 0.9837 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.962  0.960 sample estimates:     cor -0.0163 as a final shameless remark :) if you are interested in trying to sort out genuine from spurious correlations you might want to calculate the so-called "partial correlations" (see http://en.wikipedia.org/wiki/Partial_correlation) which in fact one cannot calculate with gene expression data because typically you will many more genes than samples. instead, as a sort of surrogate for partial correlations, you might find useful to calculate the so-called non-rejection rates through the functions 'qpNrr' or 'qpAvgNrr' from the 'qpgraph' package (look at the corresponding help pages if you're interested). best, robert. On Wed, 2009-11-18 at 14:05 -0500, Steve Lianoglou wrote: > Hi Pankaj, > > On Nov 18, 2009, at 1:30 PM, pankaj borah wrote: > > > Hi all, > > > > i am a new user of R and Bioconductor package. i deal with gene > > expression data. i was wondering if there is a way to generate a > > correlation matrix (all to all square symmetric matrix) for  a set > > of genes and their expression values. is there any available function? > > Try ?cor to get the pairwise correlation between columns of your > matrix, eg: > > R> x <- matrix(rnorm(20), 4, 5) > R> x >             [,1]        [,2]       [,3]       [,4]       [,5] > [1,]  0.5818808  0.58781709  2.5511157 -1.5332180  0.6905731 > [2,] -0.7640311  0.25960974  0.7246655 -1.5539226 -0.5459625 > [3,] -1.7141619 -0.25808091 -0.0868366 -0.6547804  0.4629494 > [4,] -2.2906217  0.04932864  0.5694895  0.7736206  0.4078665 > > R> cor(x) >              [,1]         [,2]       [,3]       [,4]        [,5] > [1,] 1.00000000  0.851982381  0.8715055 -0.8404848 0.074770380 > [2,]  0.85198238  1.000000000  0.9429553 -0.5338964 0.004627258 > [3,]  0.87150545  0.942955253  1.0000000 -0.4719936 0.325584972 > [4,] -0.84048477 -0.533896414 -0.4719936  1.0000000 0.309414781 > [5,]  0.07477038  0.004627258  0.3255850  0.3094148 1.000000000 > > If your genes are in rows, you'll just need to pass in the transpose > of your matrix. > > HTH, > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology >    |  Memorial Sloan-Kettering Cancer Center >    |  Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[elided Yahoo spam]] [[elided Yahoo spam]] [[alternative HTML version deleted]]
Cancer Cancer • 1.3k views
ADD COMMENT
0
Entering edit mode
Ramzi TEMANNI ▴ 160
@ramzi-temanni-3819
Last seen 9.7 years ago
Hi, Try this : Data<-lapply(qpPCC(x), round, digits=3) res<-as.matrix(data[[1]]) or Data<-lapply(qpPCC(x), round, digits=3) res<-as.matrix(data$R) Best Regards, Ramzi ---------------------------------------------------------------- On Mon, Dec 7, 2009 at 2:10 PM, pankaj borah <pankajborah2k3@yahoo.co.in>wrote: > Hi, > > Thanks for the previous suggestion. I am wondering how to write only the $R > values to a output matrix? Here we have both $R and $P values. > > Regards, > Pankaj Barah Department of Biology, > Norwegian University of Science & Technology (NTNU) > Realfagbygget, N-7491 Trondheim, Norway > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- On Thu, 19/11/09, Robert Castelo <robert.castelo@upf.edu> wrote: > > From: Robert Castelo <robert.castelo@upf.edu> > Subject: Re: [BioC] genereating correlation matrix from gene expression > data > To: "pankaj borah" <pankajborah2k3@yahoo.co.in> > Cc: bioconductor@stat.math.ethz.ch > Date: Thursday, 19 November, 2009, 2:33 PM > > Pankaj, > > in addition to what Steve provided below, if you are dealing > with > expression data, it might be useful for you the function qpPCC from the > bioconductor package 'qpgraph'. it performs the same calculation as the > 'cor' function illustrated in Steve's message but in addition it accepts > as input parameter not only a matrix or a data frame but also an > ExpressionSet object (not a big deal) and, more interestingly, it > calculates for you the P-values of a two sided test for the null > hypothesis of zero Pearson correlation. this latter feature might be > useful since as far as i know if one wants to test every pair of genes > and get the P-values one needs to call 'cor.test' for each pair of genes > within a two nested for loops or some 'apply' call and this function > 'qpPCC' does it using matrix calculations and thus is faster for that > purpose. of course, if you are interested in determining significance > through the P-values you will have to deal with the multiple > testing > problem. > > library(qpgraph) > > set.seed(123) > x <- matrix(rnorm(20), 4, 5) > > round(x, digits=3) > [,1] [,2] [,3] [,4] [,5] > [1,] -0.560 0.129 -0.687 0.401 0.498 > [2,] -0.230 1.715 -0.446 0.111 -1.967 > [3,] 1.559 0.461 1.224 -0.556 0.701 > [4,] 0.071 -1.265 0.360 1.787 -0.473 > > lapply(qpPCC(x), round, digits=3) > $R > 1 2 3 4 5 > 1 1.000 -0.016 0.958 -0.490 0.437 > 2 -0.016 1.000 -0.271 -0.753 -0.462 > 3 0.958 -0.271 1.000 -0.218 0.431 > 4 -0.490 -0.753 -0.218 1.000 -0.198 > 5 0.437 -0.462 0.431 -0.198 1.000 > > $P > 1 > 2 3 4 5 > 1 0.000 0.984 0.042 0.510 0.563 > 2 0.984 0.000 0.729 0.247 0.538 > 3 0.042 0.729 0.000 0.782 0.569 > 4 0.510 0.247 0.782 0.000 0.802 > 5 0.563 0.538 0.569 0.802 0.000 > > cor.test(x[,1], x[,2]) > > Pearson's product-moment correlation > > data: x[, 1] and x[, 2] > t = -0.0231, df = 2, p-value = 0.9837 > alternative hypothesis: true correlation is not equal to 0 > 95 percent confidence interval: > -0.962 0.960 > sample estimates: > cor > -0.0163 > > as a final shameless remark :) if you are interested in trying to sort > out genuine from spurious correlations you might want to calculate the > so-called "partial correlations" (see > http://en.wikipedia.org/wiki/Partial_correlation) which in > fact one > cannot calculate with gene expression data because typically you will > many more genes than samples. instead, as a sort of surrogate for > partial correlations, you might find useful to calculate the so- called > non-rejection rates through the functions 'qpNrr' or 'qpAvgNrr' from the > 'qpgraph' package (look at the corresponding help pages if you're > interested). > > best, > robert. > > On Wed, 2009-11-18 at 14:05 -0500, Steve Lianoglou wrote: > > Hi Pankaj, > > > > On Nov 18, 2009, at 1:30 PM, pankaj borah wrote: > > > > > Hi all, > > > > > > i am a new user of R and Bioconductor package. i deal with gene > > > expression data. i was wondering if there is a way to generate a > > > correlation matrix (all to all square symmetric matrix) for a set > > > of genes and their expression values. is there any available function? > > > > > Try ?cor to get the pairwise correlation between columns of your > > matrix, eg: > > > > R> x <- matrix(rnorm(20), 4, 5) > > R> x > > [,1] [,2] [,3] [,4] [,5] > > [1,] 0.5818808 0.58781709 2.5511157 -1.5332180 0.6905731 > > [2,] -0.7640311 0.25960974 0.7246655 -1.5539226 -0.5459625 > > [3,] -1.7141619 -0.25808091 -0.0868366 -0.6547804 0.4629494 > > [4,] -2.2906217 0.04932864 0.5694895 0.7736206 0.4078665 > > > > R> cor(x) > > [,1] [,2] [,3] [,4] [,5] > > [1,] > 1.00000000 0.851982381 0.8715055 -0.8404848 0.074770380 > > [2,] 0.85198238 1.000000000 0.9429553 -0.5338964 0.004627258 > > [3,] 0.87150545 0.942955253 1.0000000 -0.4719936 0.325584972 > > [4,] -0.84048477 -0.533896414 -0.4719936 1.0000000 0.309414781 > > [5,] 0.07477038 0.004627258 0.3255850 0.3094148 1.000000000 > > > > If your genes are in rows, you'll just need to pass in the transpose > > of your matrix. > > > > HTH, > > > > -steve > > > > -- > > Steve Lianoglou > > Graduate Student: Computational Systems Biology > > | Memorial Sloan-Kettering Cancer Center > > | Weill Medical College of Cornell University > > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio.ms="" kcc.org="" %7elianos="" contact=""> > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > [[elided Yahoo spam]] > > > [[elided Yahoo spam]] > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Robert Castelo ★ 3.3k
@rcastelo
Last seen 3 days ago
Barcelona/Universitat Pompeu Fabra
hi Pankaj, > Thanks for the previous suggestion. I am wondering how to write only the > $R values to a output matrix? Here we have both $R and $P values. let's say your call was: myPCC <- qpPCC(x) then you can either handle myPCC$R as a matrix or assign it to another variable r <- myPCC$R and then r will be your matrix with Pearson correlation coefficients. if what you're looking for is to write the matrix to a flat file then you should check out the function called 'write.table'. cheers, robert. > > Regards, > Pankaj Barah Department of Biology, > ?Norwegian University of Science & Technology (NTNU) > ?Realfagbygget, N-7491 Trondheim, Norway > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- On Thu, 19/11/09, Robert Castelo <robert.castelo at="" upf.edu=""> wrote: > > From: Robert Castelo <robert.castelo at="" upf.edu=""> > Subject: Re: [BioC] genereating correlation matrix from gene expression > data > To: "pankaj borah" <pankajborah2k3 at="" yahoo.co.in=""> > Cc: bioconductor at stat.math.ethz.ch > Date: Thursday, 19 November, 2009, 2:33 PM > > Pankaj, > > in addition to what Steve provided below, if you are dealing > with > expression data, it might be useful for you the function qpPCC from the > bioconductor package 'qpgraph'. it performs the same calculation as the > 'cor' function illustrated in Steve's message but in addition it accepts > as input parameter not only a matrix or a data frame but also an > ExpressionSet object (not a big deal) and, more interestingly, it > calculates for you the P-values of a two sided test for the null > hypothesis of zero Pearson correlation. this latter feature might be > useful since as far as i know if one wants to test every pair of genes > and get the P-values one needs to call 'cor.test' for each pair of genes > within a two nested for loops or some 'apply' call and this function > 'qpPCC' does it using matrix calculations and thus is faster for that > purpose. of course, if you are interested in determining significance > through the P-values you will have to deal with the multiple > testing > problem. > > library(qpgraph) > > set.seed(123) > x <- matrix(rnorm(20), 4, 5) > > round(x, digits=3) > ? ? ???[,1]???[,2]???[,3]???[,4]???[,5] > [1,] -0.560? 0.129 -0.687? 0.401? 0.498 > [2,] -0.230? 1.715 -0.446? 0.111 -1.967 > [3,]? 1.559? 0.461? 1.224 -0.556? 0.701 > [4,]? 0.071 -1.265? 0.360? 1.787 -0.473 > > lapply(qpPCC(x), round, digits=3) > $R > ? ? ???1? ? ? 2? ? ? 3? ? ? 4? ? ? 5 > 1? 1.000 -0.016? 0.958 -0.490? 0.437 > 2 -0.016? 1.000 -0.271 -0.753 -0.462 > 3? 0.958 -0.271? 1.000 -0.218? 0.431 > 4 -0.490 -0.753 -0.218? 1.000 -0.198 > 5? 0.437 -0.462? 0.431 -0.198? 1.000 > > $P > ? ? ? 1? > ???2? ???3? ???4? ???5 > 1 0.000 0.984 0.042 0.510 0.563 > 2 0.984 0.000 0.729 0.247 0.538 > 3 0.042 0.729 0.000 0.782 0.569 > 4 0.510 0.247 0.782 0.000 0.802 > 5 0.563 0.538 0.569 0.802 0.000 > > cor.test(x[,1], x[,2]) > > ? ? ? ? Pearson's product-moment correlation > > data:? x[, 1] and x[, 2] > t = -0.0231, df = 2, p-value = 0.9837 > alternative hypothesis: true correlation is not equal to 0 > 95 percent confidence interval: > -0.962? 0.960 > sample estimates: > ? ? cor > -0.0163 > > as a final shameless remark :) if you are interested in trying to sort > out genuine from spurious correlations you might want to calculate the > so-called "partial correlations" (see > http://en.wikipedia.org/wiki/Partial_correlation) which in > fact one > cannot calculate with gene expression data because typically you will > many more genes than samples. instead, as a sort of surrogate for > partial correlations, you might find useful to calculate the so- called > non-rejection rates through the functions 'qpNrr' or 'qpAvgNrr' from the > 'qpgraph' package (look at the corresponding help pages if you're > interested). > > best, > robert. > > On Wed, 2009-11-18 at 14:05 -0500, Steve Lianoglou wrote: >> Hi Pankaj, >> >> On Nov 18, 2009, at 1:30 PM, pankaj borah wrote: >> >> > Hi all, >> > >> > i am a new user of R and Bioconductor package. i deal with gene? >> > expression data. i was wondering if there is a way to generate a? >> > correlation matrix (all to all square symmetric matrix) for? a set? >> > of genes and their expression values. is there any available function? >> >> > Try ?cor to get the pairwise correlation between columns of your? >> matrix, eg: >> >> R> x <- matrix(rnorm(20), 4, 5) >> R> x >> ? ? ? ? ? ???[,1]? ? ? ? [,2]? ? ???[,3]? ? ???[,4]? ? ???[,5] >> [1,]? 0.5818808? 0.58781709? 2.5511157 -1.5332180? 0.6905731 >> [2,] -0.7640311? 0.25960974? 0.7246655 -1.5539226 -0.5459625 >> [3,] -1.7141619 -0.25808091 -0.0868366 -0.6547804? 0.4629494 >> [4,] -2.2906217? 0.04932864? 0.5694895? 0.7736206? 0.4078665 >> >> R> cor(x) >> ? ? ? ? ? ? ? [,1]? ? ? ???[,2]? ? ???[,3]? ? ???[,4]? ? ? ? [,5] >> [1,]? > 1.00000000? 0.851982381? 0.8715055 -0.8404848 0.074770380 >> [2,]? 0.85198238? 1.000000000? 0.9429553 -0.5338964 0.004627258 >> [3,]? 0.87150545? 0.942955253? 1.0000000 -0.4719936 0.325584972 >> [4,] -0.84048477 -0.533896414 -0.4719936? 1.0000000 0.309414781 >> [5,]? 0.07477038? 0.004627258? 0.3255850? 0.3094148 1.000000000 >> >> If your genes are in rows, you'll just need to pass in the transpose? >> of your matrix. >> >> HTH, >> >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> ? ? |? Memorial Sloan-Kettering Cancer Center >> ? ? |? Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact >> >> > _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > > > > The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. > > > The INTERNET now has a personality. YOURS! See your Yahoo! > Homepage. http://in.yahoo.com/
ADD COMMENT

Login before adding your answer.

Traffic: 562 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6