Hi all,
I'm working with Agilent single-channel microarray data.
However, it's not clear to me whether raw data should be log2 transformed or not (or if it's intrinsically done "behind the scenes"):
1) In some places (http://matticklab.com/index.php?title=Single_channel_analysis_of_Agilent_microarray_data_with_Limma) the data is not logged at all.
2) Elsewhere (https://stat.ethz.ch/pipermail/bioconductor/2011-August/040543.html) it is said that first quantile normalization is applied to raw intensities and the data has to be (log2)transformed afterwards.
3) On Page 14 (limma's manual) the following paragraphs can be read:
EListRaw. Raw Expression list. A class used to store single-channel raw intensities prior to
normalization. Intensities are unlogged. [...].
EList. Expression list. Contains background corrected and normalized log-intensities.
Usually created from an EListRaw objecting using normalizeBetweenArrays().
Does normalizeBetweenArrays() internally (log2)transform the data? [Therefore nothing should be done to the raw data apart from using it as input for normalizeBetweenArrays()]
---------------------------------------------------------------------------------------------------------------------------------------------
So I don't know what am I supposed to do:
a) y <- normalizeBetweenArrays(y,method="quantile")
y$E <- log2(y$E)
b) y$E <- normalizeBetweenArrays(log2(y$E),method="quantile")
#Transforming the data inside "normalizeBetweenArrays()"
c) Don't change the raw data at all since normalizeBetweenArrays() internally transforms the data (that's my lecture of what's on page 4 of limma's manual (revised 17 June 2014).
Any help would be greatly appreciated.
-------------------------------------------------------------------------------------------------------------------------------------------------------------
> sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] gplots_2.14.2 lattice_0.20-29 vsn_3.34.0 [4] Biobase_2.26.0 BiocGenerics_0.12.0 limma_3.22.1 [7] BiocInstaller_1.16.0 loaded via a namespace (and not attached): [1] affy_1.44.0 affyio_1.34.0 bitops_1.0-6 [4] caTools_1.17.1 gdata_2.13.3 grid_3.1.0 [7] gtools_3.4.1 KernSmooth_2.23-12 preprocessCore_1.28.0 [10] zlibbioc_1.12.0
Moreover, I noticed that output from "topTable(fit2)" [After having defined fit1 <- lmFit(y, design) / fit2 = contrasts.fit(fit1, contrastMatrix) /
fit2 = eBayes(fit2)] is totally different:
Without log2-transforming: Highest logFC ~ 1.540, lowest logFC ~ -2.586
With log2-transformed data: Highest logFC ~ 0.278, lowest logFC ~ - 0.403
(The second approach seems to yield really low values...)