What you are doing is basically correct, except that you're making it
slightly more complicated than it needs to be, and you're rounding
counts-per-million to integers which is inappropriate and unnecessary.
Slightly simpler and more correct is:
logCPM <- cpm(RLE, log=TRUE, prior.count=2)
MSTN <- logCPM["MSTN",]
design <- model.matrix(~ID.SM+MSTN)
You ask why MSTN correlates with itself with logFC=1. You said you
were
familiar with Pearson correlation. You must be familiar with the fact
that correlating a variable with itself will give a correlation of 1
and a
p-value of zero, and the situation is similar here.
The edgeR model is fitting something analogous to:
log(expression of Y) = logFC * log(expression of MSTN)
Obviously if Y is MSTN itself then you should get logFC=1.
Gordon
On Fri, 12 Sep 2014, Sindre Lee
wrote:
> Thank you!
>
> Which expression values should I input? I did this:
>
> # Used FeatureCount
> x2 <- DGEList(counts=fc$counts[ , -c(1:48) ], genes=fc$annotation)
>
> # RLE normalization
> RLE <- calcNormFactors(x2, method="RLE")
> RLEScaleFactors <- x2$samples$lib.size * RLE$samples$norm.factors
> RLEExp <- round(t(t(RLE$counts)/RLEScaleFactors) *
> mean(RLEScaleFactors))
>
> # Add MSNT expression as continous variable
> MSTN <- RLEExp[rownames(RLEExp)=="MSTN", ]
>
> # Paired samples
> ID.SM <- ID[-c(1:48)]
>
> # Making the design
> design <- model.matrix(~ID.SM+log2(MSTN+0.125))
>
> # Estimating dispersion
> RLE <- estimateGLMCommonDisp(RLE, design, verbose=TRUE)
> RLE <- estimateGLMTagwiseDisp(RLE, design)
>
> # Fitting the linear model
> fit <- glmFit(RLE,design, prior.count=0.125,
> dispersion=RLE$tagwise.dispersion)
>
> The result was:
>
> logFC logCPM LR PValue
> MSTN 1E+00 3E+00 5E+03 0E+00
> SLC44A2 1E-01 7E+00 2E+02 6E-48
> OSBPL1A 1E-01 7E+00 2E+02 1E-35
> KIAA0195 2E-01 6E+00 1E+02 2E-33
> BOLL 3E-01 2E+00 1E+02 1E-32
> GPD2 4E-01 6E+00 1E+02 2E-31
>
> Now, why does MSTN correlate with itself with log2FC = 1?
> One time, which I can't reproduce, I got log2FC=0.
________________________________________
From: Gordon K Smyth <smyth@wehi.edu.au>
Sent: 13 September 2014 01:10
To: Sindre Lee
Cc: deepaksrna at gmail.com; Bioconductor mailing list
Subject: RE: [BioC] positively correlated genes
If the logFC is positive, the gene is positively correlated with gene
X.
If the logFC is negative, the gene is negatively correlated with gene
X.
You should be using the FDR column rather than the PValue column to
judge
significance, as is usual with edgeR analyses.
Pearson correlation between y and x is equivalent to regressing y on
x.
Pearson correlation is significant and positive if and only if the
regression coefficient of y on x is significantly positive. Situation
is
similar here.
Gordon
On Fri, 12 Sep 2014, Sindre Lee wrote:
> Can I please ask how to interpret this results? Im used to
> Spearman/Pearson correlations and don't quite know how to present or
> explain the results obtained this way.
I wanted to find genes correlating with gene X. Then I got about 6000
significant genes at p < 0.05. Some with negative some with positive
log2FC.
Now, what do I do? What does this tell me?
Thank you!
________________________________________
From: bioconductor-bounces@r-project.org <bioconductor- bounces@r-project.org=""> on behalf of Gordon K Smyth <smyth@wehi.edu.au>
Sent: 10 September 2014 03:08
To: deepaksrna at gmail.com
Cc: Bioconductor mailing list
Subject: [BioC] positively correlated genes
If you are using edgeR's glmFit function or limma's voom and lmFit
functions, you can simply add the log-expression values of the gene of
interest as a column of the design matrix. Then a standard DE
analysis
will detect any other genes that are significantly correlated with the
gene of interest.
Gordon
> Date: Tue, 9 Sep 2014 01:41:14 -0700 (PDT)
> From: "karthik [guest]" <guest at="" bioconductor.org="">
> To: bioconductor at r-project.org, deepaksrna at gmail.com
> Subject: [BioC] positively correlated genes
>
> hi
> I am interested to find out the genes that are positively and
> negatively correlated genes with my genes of interest. (using rnaseq
> normalized expression data). Can some one suggest me a better
option.
>
> Thank you
>
> -- output of sessionInfo():
>
> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}