Entering edit mode
Marcin Jakub Kamiński
▴
40
@marcin-jakub-kaminski-6242
Last seen 10.0 years ago
Dear experts,
I'm trying to analyze raw data from GEO GSE34571 using limma.
It's 8 single-channel Agilent microarrays scanned with Genepix
software to
.gpr files.
It's a simple design of between-group comparison including two groups,
4
replicates in each.
With some help from limma guide and previous mailing list entries, my
pipeline includes:
- setting function for weights
- reading .gpr files as fake double-channel data
- normexp background correction
- deleting duplicated channels (fake green)
- vsn normalization
- lm fitting
There are two major issues (at least I find them difficult):
A. background correction/data quality
B. data I loose after performing vsn normalization or because of
having
single-channel data only
A.
1. Imageplots indicate spots of high-background signal. Should I be
concerned about such noise? (present on imageplot.1.png - odd rows
depict
signal, even rows are for background intensities)
2. Even after applying background correction, I'm left with high
intensities in the mentioned spots. Is it how background correction
should
work, I've chosen the wrong method, or such spots can't be corrected?
(present on imageplot.2.bgcor.png)
3. MAplots show big differences in within-group expression (technical
or
biological replicates), even after normalization. Did I choose a wrong
method?
(as seen in maplot.1.png - first column is for arrays c(1,4), second
for
c(5,8); consecutive rows are for: raw data, background-corrected data,
normalized data)
4. I think the above leads to the MAplot of beetween-group difference
being
skewed upwards for high intensities, am I right? (maplot.2.final.png)
5. After filtering-out genes with weak SNR and flags (see code),
within-group fold-changes are considerably smaller. How should i
decide
which genes should be left for analysis: are there any standards or
should
I try 'trial-and-error plotting' of MAplots with different functions
for
weights? (maplot.3.filtered.png - upper plot presents filtered data)
6. Is there any reason to perform background correction, if it worsens
densities (plotdensities.png - upper plot was before background
correction)
B.
7. VSN normalization doesn't take $weights into account. Is there any
convenient way to include them? It also trims genes names, so I set
them as
rownames to be processed.
8. Is there any convenient way to assign '0 weights' to certain
location on
the microarray (such as previously mentioned high-intensity
spots/scratches)?
9. I'm unable to plotMA(RG) with spottypes included, because my data
is
single-channel. Now I think about transferring four last arrays to be
treated as green channel, but won't it affect the further analysis?
10. Due to inability to produce imageplots, I needed to set
RG$printer$ngrid.r = 1 . Wouldn't it affect the plots reliability?
If you consider any of those questions too basic, could you please
provide
me with a reliable online materials for basic microarray analysis? I'm
trying to figure it out by myself.
Source: http://pastebin.com/ZF2NJc7G
Plots: http://imgur.com/a/I9k5C
Session info: http://pastebin.com/kVRf2NWf
Thank you for your support,
Marcin Kaminski, medical student
Medical University of Bialystok
-------------- next part --------------
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C
[5] LC_TIME=Polish_Poland.1250
attached base packages:
[1] parallel stats graphics grDevices utils datasets
[7] methods base
other attached packages:
[1] vsn_3.30.0 Biobase_2.22.0 BiocGenerics_0.8.0
[4] limma_3.18.2
loaded via a namespace (and not attached):
[1] affy_1.40.0 affyio_1.30.0 BiocInstaller_1.12.0
[4] grid_3.0.2 lattice_0.20-24 preprocessCore_1.24.0
[7] tools_3.0.2 zlibbioc_1.8.0
-------------- next part --------------
A non-text attachment was scrubbed...
Name: imageplot.1.png
Type: image/png
Size: 226759 bytes
Desc: not available
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a7edc234="" attachment.png="">
-------------- next part --------------
A non-text attachment was scrubbed...
Name: imageplot.2.bgcor.png
Type: image/png
Size: 125957 bytes
Desc: not available
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a7edc234="" attachment-0001.png="">
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maplot.1.png
Type: image/png
Size: 25500 bytes
Desc: not available
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a7edc234="" attachment-0002.png="">
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maplot.2.final.png
Type: image/png
Size: 10400 bytes
Desc: not available
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a7edc234="" attachment-0003.png="">
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maplot.3.filtered.png
Type: image/png
Size: 16349 bytes
Desc: not available
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a7edc234="" attachment-0004.png="">
-------------- next part --------------
A non-text attachment was scrubbed...
Name: plotdensities.png
Type: image/png
Size: 5469 bytes
Desc: not available
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a7edc234="" attachment-0005.png="">
-------------- next part --------------
library(limma)
library(vsn)
# DATA PREPROCESSING
# Download RAW data from GEO experiment
setInternet2(use=FALSE)
download.file('http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE34571&
format=file', 'GSE34571_RAW.tar', mode='wb')
untar('GSE34571_RAW.tar', exdir='gpr')
setwd('gpr')
# Set function to assign weights
f <- function(x) {ok.flags <- x$Flags > (-99)
ok.snr <- x[,'SNR 532'] > 3
as.numeric(ok.snr & ok.flags)
}
# Read single-channel .gpr files as double-channel data and perform bg
correction
RG <- read.maimages(dir(pattern='*.gpr.gz'), source="genepix",
columns=list(R="F532 Mean",G="F532
Mean",Rb="B532",Gb="B532"),
wt.fun=f)
RG.b = backgroundCorrect(RG=RG, method='normexp')
# Delete fake second channel
rownames(RG.b$R) <- RG.b$genes$Name
RG.b$G <- NULL
RG.b$Gb <- NULL
# Specify contrast and design matrices
design <- model.matrix(~ 0+factor(c(1,1,1,1,2,2,2,2)))
colnames(design) <- c("group1", "group2")
contrast.matrix <- makeContrasts(group1-group2, levels=design)
# Peform normalisation and stats
norm.vsn <- normalizeVSN(RG.b$R)
fit.vsn <- lmFit(norm.vsn, design)
fit2.vsn <- contrasts.fit(fit.vsn, contrast.matrix)
fit3.vsn <- eBayes(fit2.vsn)
limma::volcanoplot(fit3.vsn)
RG$printer$ngrid.r = 1
imageplot(log2(RG.b$R[,2]), RG$printer)
# DIAGNOSTIC PLOTS
# maplot.1
par(mfrow=c(3,2))
limma::plotMA(log2(RG$R[,c(1:4)]))
limma::plotMA(log2(RG$R[,c(5:8)]))
limma::plotMA(log2(RG.b$R[,c(1:4)]))
limma::plotMA(log2(RG.b$R[,c(5:8)]))
limma::plotMA(norm.vsn[,c(1:4)])
limma::plotMA(norm.vsn[,c(5:8)])
# maplot.2.final
par(mfrow=c(1,1))
limma::plotMA(fit3.vsn)
# maplot.3.filtered
par(mfrow=c(2,1))
limma::plotMA(log2(RG.b$R[((RG.b$weights[,1] == 1) & (RG.b$weights[,2]
== 1)),c(1:2)]))
limma::plotMA(log2(RG.b$R[,c(1:2)]))
boxplot(RG$R)
boxplot(norm.vsn)
# plotdensities
plotDensities(RG, singlechannels=c(1:8))
plotDensities(RG.b, singlechannels=c(1:8))