positively correlated genes

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 11.4 years ago

hi I am interested to find out the genes that are positively and negatively correlated genes with my genes of interest. (using rnaseq normalized expression data). Can some one suggest me a better option. Thank you -- output of sessionInfo(): sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) -- Sent via the guest posting facility at bioconductor.org.

• 2.4k views

ADD COMMENT • link updated 11.4 years ago by Gordon Smyth 53k • written 11.4 years ago by Guest User ★ 13k

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 16 hours ago

WEHI, Melbourne, Australia

If you are using edgeR's glmFit function or limma's voom and lmFit functions, you can simply add the log-expression values of the gene of interest as a column of the design matrix. Then a standard DE analysis will detect any other genes that are significantly correlated with the gene of interest. Gordon > Date: Tue, 9 Sep 2014 01:41:14 -0700 (PDT) > From: "karthik [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, deepaksrna at gmail.com > Subject: [BioC] positively correlated genes > > hi > I am interested to find out the genes that are positively and > negatively correlated genes with my genes of interest. (using rnaseq > normalized expression data). Can some one suggest me a better option. > > Thank you > > -- output of sessionInfo(): > > sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 11.4 years ago Gordon Smyth 53k

0

Entering edit mode

Can I please ask how to interpret this results? Im used to Spearman/Pearson correlations and don't quite know how to present or explain the results obtained this way. I wanted to find genes correlating with gene X. Then I got about 6000 significant genes at p < 0.05. Some with negative some with positive log2FC. Now, what do I do? What does this tell me? Thank you! ________________________________________ From: bioconductor-bounces@r-project.org <bioconductor- bounces@r-project.org=""> on behalf of Gordon K Smyth <smyth@wehi.edu.au> Sent: 10 September 2014 03:08 To: deepaksrna at gmail.com Cc: Bioconductor mailing list Subject: [BioC] positively correlated genes If you are using edgeR's glmFit function or limma's voom and lmFit functions, you can simply add the log-expression values of the gene of interest as a column of the design matrix. Then a standard DE analysis will detect any other genes that are significantly correlated with the gene of interest. Gordon > Date: Tue, 9 Sep 2014 01:41:14 -0700 (PDT) > From: "karthik [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, deepaksrna at gmail.com > Subject: [BioC] positively correlated genes > > hi > I am interested to find out the genes that are positively and > negatively correlated genes with my genes of interest. (using rnaseq > normalized expression data). Can some one suggest me a better option. > > Thank you > > -- output of sessionInfo(): > > sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}} _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.4 years ago Sindre ▴ 70

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 16 hours ago

WEHI, Melbourne, Australia

If the logFC is positive, the gene is positively correlated with gene X. If the logFC is negative, the gene is negatively correlated with gene X. You should be using the FDR column rather than the PValue column to judge significance, as is usual with edgeR analyses. Pearson correlation between y and x is equivalent to regressing y on x. Pearson correlation is significant and positive if and only if the regression coefficient of y on x is significantly positive. Situation is similar here. Gordon On Fri, 12 Sep 2014, Sindre Lee wrote: > Can I please ask how to interpret this results? Im used to > Spearman/Pearson correlations and don't quite know how to present or > explain the results obtained this way. I wanted to find genes correlating with gene X. Then I got about 6000 significant genes at p < 0.05. Some with negative some with positive log2FC. Now, what do I do? What does this tell me? Thank you! ________________________________________ From: bioconductor-bounces@r-project.org <bioconductor- bounces@r-project.org=""> on behalf of Gordon K Smyth <smyth@wehi.edu.au> Sent: 10 September 2014 03:08 To: deepaksrna at gmail.com Cc: Bioconductor mailing list Subject: [BioC] positively correlated genes If you are using edgeR's glmFit function or limma's voom and lmFit functions, you can simply add the log-expression values of the gene of interest as a column of the design matrix. Then a standard DE analysis will detect any other genes that are significantly correlated with the gene of interest. Gordon > Date: Tue, 9 Sep 2014 01:41:14 -0700 (PDT) > From: "karthik [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, deepaksrna at gmail.com > Subject: [BioC] positively correlated genes > > hi > I am interested to find out the genes that are positively and > negatively correlated genes with my genes of interest. (using rnaseq > normalized expression data). Can some one suggest me a better option. > > Thank you > > -- output of sessionInfo(): > > sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 11.4 years ago Gordon Smyth 53k

0

Entering edit mode

Thank you! Which expression values should I input? I did this: # Used FeatureCount x2 <- DGEList(counts=fc$counts[ , -c(1:48) ], genes=fc$annotation) # RLE normalization RLE <- calcNormFactors(x2, method="RLE") RLEScaleFactors <- x2$samples$lib.size * RLE$samples$norm.factors RLEExp <- round(t(t(RLE$counts)/RLEScaleFactors) * mean(RLEScaleFactors)) # Add MSNT expression as continous variable MSTN <- RLEExp[rownames(RLEExp)=="MSTN", ] # Paired samples ID.SM <- ID[-c(1:48)] # Making the design design <- model.matrix(~ID.SM+log2(MSTN+0.125)) # Estimating dispersion RLE <- estimateGLMCommonDisp(RLE, design, verbose=TRUE) RLE <- estimateGLMTagwiseDisp(RLE, design) # Fitting the linear model fit <- glmFit(RLE,design, prior.count=0.125, dispersion=RLE$tagwise.dispersion) The result was: logFC logCPM LR PValue MSTN 1E+00 3E+00 5E+03 0E+00 SLC44A2 1E-01 7E+00 2E+02 6E-48 OSBPL1A 1E-01 7E+00 2E+02 1E-35 KIAA0195 2E-01 6E+00 1E+02 2E-33 BOLL 3E-01 2E+00 1E+02 1E-32 GPD2 4E-01 6E+00 1E+02 2E-31 Now, why does MSTN correlate with itself with log2FC = 1? One time, which I can't reproduce, I got log2FC=0. ________________________________________ From: Gordon K Smyth <smyth@wehi.edu.au> Sent: 13 September 2014 01:10 To: Sindre Lee Cc: deepaksrna at gmail.com; Bioconductor mailing list Subject: RE: [BioC] positively correlated genes If the logFC is positive, the gene is positively correlated with gene X. If the logFC is negative, the gene is negatively correlated with gene X. You should be using the FDR column rather than the PValue column to judge significance, as is usual with edgeR analyses. Pearson correlation between y and x is equivalent to regressing y on x. Pearson correlation is significant and positive if and only if the regression coefficient of y on x is significantly positive. Situation is similar here. Gordon On Fri, 12 Sep 2014, Sindre Lee wrote: > Can I please ask how to interpret this results? Im used to > Spearman/Pearson correlations and don't quite know how to present or > explain the results obtained this way. I wanted to find genes correlating with gene X. Then I got about 6000 significant genes at p < 0.05. Some with negative some with positive log2FC. Now, what do I do? What does this tell me? Thank you! ________________________________________ From: bioconductor-bounces@r-project.org <bioconductor- bounces@r-project.org=""> on behalf of Gordon K Smyth <smyth@wehi.edu.au> Sent: 10 September 2014 03:08 To: deepaksrna at gmail.com Cc: Bioconductor mailing list Subject: [BioC] positively correlated genes If you are using edgeR's glmFit function or limma's voom and lmFit functions, you can simply add the log-expression values of the gene of interest as a column of the design matrix. Then a standard DE analysis will detect any other genes that are significantly correlated with the gene of interest. Gordon > Date: Tue, 9 Sep 2014 01:41:14 -0700 (PDT) > From: "karthik [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, deepaksrna at gmail.com > Subject: [BioC] positively correlated genes > > hi > I am interested to find out the genes that are positively and > negatively correlated genes with my genes of interest. (using rnaseq > normalized expression data). Can some one suggest me a better option. > > Thank you > > -- output of sessionInfo(): > > sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 11.4 years ago Sindre ▴ 70

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 16 hours ago

WEHI, Melbourne, Australia

What you are doing is basically correct, except that you're making it slightly more complicated than it needs to be, and you're rounding counts-per-million to integers which is inappropriate and unnecessary. Slightly simpler and more correct is: logCPM <- cpm(RLE, log=TRUE, prior.count=2) MSTN <- logCPM["MSTN",] design <- model.matrix(~ID.SM+MSTN) You ask why MSTN correlates with itself with logFC=1. You said you were familiar with Pearson correlation. You must be familiar with the fact that correlating a variable with itself will give a correlation of 1 and a p-value of zero, and the situation is similar here. The edgeR model is fitting something analogous to: log(expression of Y) = logFC * log(expression of MSTN) Obviously if Y is MSTN itself then you should get logFC=1. Gordon On Fri, 12 Sep 2014, Sindre Lee wrote: > Thank you! > > Which expression values should I input? I did this: > > # Used FeatureCount > x2 <- DGEList(counts=fc$counts[ , -c(1:48) ], genes=fc$annotation) > > # RLE normalization > RLE <- calcNormFactors(x2, method="RLE") > RLEScaleFactors <- x2$samples$lib.size * RLE$samples$norm.factors > RLEExp <- round(t(t(RLE$counts)/RLEScaleFactors) * > mean(RLEScaleFactors)) > > # Add MSNT expression as continous variable > MSTN <- RLEExp[rownames(RLEExp)=="MSTN", ] > > # Paired samples > ID.SM <- ID[-c(1:48)] > > # Making the design > design <- model.matrix(~ID.SM+log2(MSTN+0.125)) > > # Estimating dispersion > RLE <- estimateGLMCommonDisp(RLE, design, verbose=TRUE) > RLE <- estimateGLMTagwiseDisp(RLE, design) > > # Fitting the linear model > fit <- glmFit(RLE,design, prior.count=0.125, > dispersion=RLE$tagwise.dispersion) > > The result was: > > logFC logCPM LR PValue > MSTN 1E+00 3E+00 5E+03 0E+00 > SLC44A2 1E-01 7E+00 2E+02 6E-48 > OSBPL1A 1E-01 7E+00 2E+02 1E-35 > KIAA0195 2E-01 6E+00 1E+02 2E-33 > BOLL 3E-01 2E+00 1E+02 1E-32 > GPD2 4E-01 6E+00 1E+02 2E-31 > > Now, why does MSTN correlate with itself with log2FC = 1? > One time, which I can't reproduce, I got log2FC=0. ________________________________________ From: Gordon K Smyth <smyth@wehi.edu.au> Sent: 13 September 2014 01:10 To: Sindre Lee Cc: deepaksrna at gmail.com; Bioconductor mailing list Subject: RE: [BioC] positively correlated genes If the logFC is positive, the gene is positively correlated with gene X. If the logFC is negative, the gene is negatively correlated with gene X. You should be using the FDR column rather than the PValue column to judge significance, as is usual with edgeR analyses. Pearson correlation between y and x is equivalent to regressing y on x. Pearson correlation is significant and positive if and only if the regression coefficient of y on x is significantly positive. Situation is similar here. Gordon On Fri, 12 Sep 2014, Sindre Lee wrote: > Can I please ask how to interpret this results? Im used to > Spearman/Pearson correlations and don't quite know how to present or > explain the results obtained this way. I wanted to find genes correlating with gene X. Then I got about 6000 significant genes at p < 0.05. Some with negative some with positive log2FC. Now, what do I do? What does this tell me? Thank you! ________________________________________ From: bioconductor-bounces@r-project.org <bioconductor- bounces@r-project.org=""> on behalf of Gordon K Smyth <smyth@wehi.edu.au> Sent: 10 September 2014 03:08 To: deepaksrna at gmail.com Cc: Bioconductor mailing list Subject: [BioC] positively correlated genes If you are using edgeR's glmFit function or limma's voom and lmFit functions, you can simply add the log-expression values of the gene of interest as a column of the design matrix. Then a standard DE analysis will detect any other genes that are significantly correlated with the gene of interest. Gordon > Date: Tue, 9 Sep 2014 01:41:14 -0700 (PDT) > From: "karthik [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, deepaksrna at gmail.com > Subject: [BioC] positively correlated genes > > hi > I am interested to find out the genes that are positively and > negatively correlated genes with my genes of interest. (using rnaseq > normalized expression data). Can some one suggest me a better option. > > Thank you > > -- output of sessionInfo(): > > sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 11.4 years ago Gordon Smyth 53k

Login before adding your answer.