Single-channel GenePix analysis with Limma (bg correction, normalization, weights)
1
0
Entering edit mode
@marcin-jakub-kaminski-6242
Last seen 9.4 years ago
Dear experts, I'm trying to analyze raw data from GEO GSE34571 using limma. It's 8 single-channel Agilent microarrays scanned with Genepix software to .gpr files. It's a simple design of between-group comparison including two groups, 4 replicates in each. With some help from limma guide and previous mailing list entries, my pipeline includes: - setting function for weights - reading .gpr files as fake double-channel data - normexp background correction - deleting duplicated channels (fake green) - vsn normalization - lm fitting There are two major issues (at least I find them difficult): A. background correction/data quality B. data I loose after performing vsn normalization or because of having single-channel data only A. 1. Imageplots indicate spots of high-background signal. Should I be concerned about such noise? (present on imageplot.1.png - odd rows depict signal, even rows are for background intensities) 2. Even after applying background correction, I'm left with high intensities in the mentioned spots. Is it how background correction should work, I've chosen the wrong method, or such spots can't be corrected? (present on imageplot.2.bgcor.png) 3. MAplots show big differences in within-group expression (technical or biological replicates), even after normalization. Did I choose a wrong method? (as seen in maplot.1.png - first column is for arrays c(1,4), second for c(5,8); consecutive rows are for: raw data, background-corrected data, normalized data) 4. I think the above leads to the MAplot of beetween-group difference being skewed upwards for high intensities, am I right? (maplot.2.final.png) 5. After filtering-out genes with weak SNR and flags (see code), within-group fold-changes are considerably smaller. How should i decide which genes should be left for analysis: are there any standards or should I try 'trial-and-error plotting' of MAplots with different functions for weights? (maplot.3.filtered.png - upper plot presents filtered data) 6. Is there any reason to perform background correction, if it worsens densities (plotdensities.png - upper plot was before background correction) B. 7. VSN normalization doesn't take $weights into account. Is there any convenient way to include them? It also trims genes names, so I set them as rownames to be processed. 8. Is there any convenient way to assign '0 weights' to certain location on the microarray (such as previously mentioned high-intensity spots/scratches)? 9. I'm unable to plotMA(RG) with spottypes included, because my data is single-channel. Now I think about transferring four last arrays to be treated as green channel, but won't it affect the further analysis? 10. Due to inability to produce imageplots, I needed to set RG$printer$ngrid.r = 1 . Wouldn't it affect the plots reliability? If you consider any of those questions too basic, could you please provide me with a reliable online materials for basic microarray analysis? I'm trying to figure it out by myself. Source: http://pastebin.com/ZF2NJc7G Plots: http://imgur.com/a/I9k5C Session info: http://pastebin.com/kVRf2NWf Thank you for your support, Marcin Kaminski, medical student Medical University of Bialystok -------------- next part -------------- R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250 [3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C [5] LC_TIME=Polish_Poland.1250 attached base packages: [1] parallel stats graphics grDevices utils datasets [7] methods base other attached packages: [1] vsn_3.30.0 Biobase_2.22.0 BiocGenerics_0.8.0 [4] limma_3.18.2 loaded via a namespace (and not attached): [1] affy_1.40.0 affyio_1.30.0 BiocInstaller_1.12.0 [4] grid_3.0.2 lattice_0.20-24 preprocessCore_1.24.0 [7] tools_3.0.2 zlibbioc_1.8.0 -------------- next part -------------- A non-text attachment was scrubbed... Name: imageplot.1.png Type: image/png Size: 226759 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a7edc234="" attachment.png=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: imageplot.2.bgcor.png Type: image/png Size: 125957 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a7edc234="" attachment-0001.png=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: maplot.1.png Type: image/png Size: 25500 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a7edc234="" attachment-0002.png=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: maplot.2.final.png Type: image/png Size: 10400 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a7edc234="" attachment-0003.png=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: maplot.3.filtered.png Type: image/png Size: 16349 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a7edc234="" attachment-0004.png=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: plotdensities.png Type: image/png Size: 5469 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a7edc234="" attachment-0005.png=""> -------------- next part -------------- library(limma) library(vsn) # DATA PREPROCESSING # Download RAW data from GEO experiment setInternet2(use=FALSE) download.file('http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE34571& format=file', 'GSE34571_RAW.tar', mode='wb') untar('GSE34571_RAW.tar', exdir='gpr') setwd('gpr') # Set function to assign weights f <- function(x) {ok.flags <- x$Flags > (-99) ok.snr <- x[,'SNR 532'] > 3 as.numeric(ok.snr & ok.flags) } # Read single-channel .gpr files as double-channel data and perform bg correction RG <- read.maimages(dir(pattern='*.gpr.gz'), source="genepix", columns=list(R="F532 Mean",G="F532 Mean",Rb="B532",Gb="B532"), wt.fun=f) RG.b = backgroundCorrect(RG=RG, method='normexp') # Delete fake second channel rownames(RG.b$R) <- RG.b$genes$Name RG.b$G <- NULL RG.b$Gb <- NULL # Specify contrast and design matrices design <- model.matrix(~ 0+factor(c(1,1,1,1,2,2,2,2))) colnames(design) <- c("group1", "group2") contrast.matrix <- makeContrasts(group1-group2, levels=design) # Peform normalisation and stats norm.vsn <- normalizeVSN(RG.b$R) fit.vsn <- lmFit(norm.vsn, design) fit2.vsn <- contrasts.fit(fit.vsn, contrast.matrix) fit3.vsn <- eBayes(fit2.vsn) limma::volcanoplot(fit3.vsn) RG$printer$ngrid.r = 1 imageplot(log2(RG.b$R[,2]), RG$printer) # DIAGNOSTIC PLOTS # maplot.1 par(mfrow=c(3,2)) limma::plotMA(log2(RG$R[,c(1:4)])) limma::plotMA(log2(RG$R[,c(5:8)])) limma::plotMA(log2(RG.b$R[,c(1:4)])) limma::plotMA(log2(RG.b$R[,c(5:8)])) limma::plotMA(norm.vsn[,c(1:4)]) limma::plotMA(norm.vsn[,c(5:8)]) # maplot.2.final par(mfrow=c(1,1)) limma::plotMA(fit3.vsn) # maplot.3.filtered par(mfrow=c(2,1)) limma::plotMA(log2(RG.b$R[((RG.b$weights[,1] == 1) & (RG.b$weights[,2] == 1)),c(1:2)])) limma::plotMA(log2(RG.b$R[,c(1:2)])) boxplot(RG$R) boxplot(norm.vsn) # plotdensities plotDensities(RG, singlechannels=c(1:8)) plotDensities(RG.b, singlechannels=c(1:8))
Microarray Normalization vsn limma ASSIGN Microarray Normalization vsn limma ASSIGN • 2.2k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 8 hours ago
WEHI, Melbourne, Australia
Hi Marcin, It doesn't need to be quite so complicated. limma reads the single-channel data directly, so there is no need to make fake double-channel data. See Section 4.5 of the User's Guide, except that you will use source="genepix" instead of source="agilent". There is no need to specify the read columns explicitly. Just use source="genepix.custom" if you want to use the custom background correction column. Section 16.4 gives one case study with Agilent arrays showing the sort of background correction and normalization that I would recommend. There is no need to lose any annotation information. There is usually no need to make complicated filters or weights. There is no difficulty with setting spot types with single channel data. It works just the same as for two colour data. If you start a new R session with limma only loaded, or else load limma last, then the need to prefix limma::plotMA will go away as well. Hope this helps. Gordon > Date: Mon, 11 Nov 2013 01:39:19 +0100 > From: Marcin Jakub Kami?ski <marcinjakubkaminski at="" gmail.com=""> > To: bioconductor at r-project.org > Subject: [BioC] Single-channel GenePix analysis with Limma (bg > correction, normalization, weights) > > Dear experts, > I'm trying to analyze raw data from GEO GSE34571 using limma. > It's 8 single-channel Agilent microarrays scanned with Genepix software to > .gpr files. > It's a simple design of between-group comparison including two groups, 4 > replicates in each. > > With some help from limma guide and previous mailing list entries, my > pipeline includes: > - setting function for weights > - reading .gpr files as fake double-channel data > - normexp background correction > - deleting duplicated channels (fake green) > - vsn normalization > - lm fitting > > There are two major issues (at least I find them difficult): > A. background correction/data quality > B. data I loose after performing vsn normalization or because of having > single-channel data only > > A. > 1. Imageplots indicate spots of high-background signal. Should I be > concerned about such noise? (present on imageplot.1.png - odd rows depict > signal, even rows are for background intensities) > 2. Even after applying background correction, I'm left with high > intensities in the mentioned spots. Is it how background correction should > work, I've chosen the wrong method, or such spots can't be corrected? > (present on imageplot.2.bgcor.png) > 3. MAplots show big differences in within-group expression (technical or > biological replicates), even after normalization. Did I choose a wrong > method? > (as seen in maplot.1.png - first column is for arrays c(1,4), second for > c(5,8); consecutive rows are for: raw data, background-corrected data, > normalized data) > 4. I think the above leads to the MAplot of beetween-group difference being > skewed upwards for high intensities, am I right? (maplot.2.final.png) > 5. After filtering-out genes with weak SNR and flags (see code), > within-group fold-changes are considerably smaller. How should i decide > which genes should be left for analysis: are there any standards or should > I try 'trial-and-error plotting' of MAplots with different functions for > weights? (maplot.3.filtered.png - upper plot presents filtered data) > 6. Is there any reason to perform background correction, if it worsens > densities (plotdensities.png - upper plot was before background correction) > > B. > 7. VSN normalization doesn't take $weights into account. Is there any > convenient way to include them? It also trims genes names, so I set them as > rownames to be processed. > 8. Is there any convenient way to assign '0 weights' to certain location on > the microarray (such as previously mentioned high-intensity > spots/scratches)? > 9. I'm unable to plotMA(RG) with spottypes included, because my data is > single-channel. Now I think about transferring four last arrays to be > treated as green channel, but won't it affect the further analysis? > 10. Due to inability to produce imageplots, I needed to set > RG$printer$ngrid.r = 1 . Wouldn't it affect the plots reliability? > > If you consider any of those questions too basic, could you please provide > me with a reliable online materials for basic microarray analysis? I'm > trying to figure it out by myself. > > Source: http://pastebin.com/ZF2NJc7G > Plots: http://imgur.com/a/I9k5C > Session info: http://pastebin.com/kVRf2NWf > > Thank you for your support, > Marcin Kaminski, medical student > Medical University of Bialystok > -------------- next part -------------- > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250 > [3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C > [5] LC_TIME=Polish_Poland.1250 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets > [7] methods base > > other attached packages: > [1] vsn_3.30.0 Biobase_2.22.0 BiocGenerics_0.8.0 > [4] limma_3.18.2 > > loaded via a namespace (and not attached): > [1] affy_1.40.0 affyio_1.30.0 BiocInstaller_1.12.0 > [4] grid_3.0.2 lattice_0.20-24 preprocessCore_1.24.0 > [7] tools_3.0.2 zlibbioc_1.8.0 > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: imageplot.1.png > Type: image/png > Size: 226759 bytes > Desc: not available > URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 201311="" 11="" a7edc234="" attachment-0006.png=""> > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: imageplot.2.bgcor.png > Type: image/png > Size: 125957 bytes > Desc: not available > URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 201311="" 11="" a7edc234="" attachment-0007.png=""> > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: maplot.1.png > Type: image/png > Size: 25500 bytes > Desc: not available > URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 201311="" 11="" a7edc234="" attachment-0008.png=""> > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: maplot.2.final.png > Type: image/png > Size: 10400 bytes > Desc: not available > URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 201311="" 11="" a7edc234="" attachment-0009.png=""> > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: maplot.3.filtered.png > Type: image/png > Size: 16349 bytes > Desc: not available > URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 201311="" 11="" a7edc234="" attachment-0010.png=""> > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: plotdensities.png > Type: image/png > Size: 5469 bytes > Desc: not available > URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 201311="" 11="" a7edc234="" attachment-0011.png=""> > -------------- next part -------------- > library(limma) > library(vsn) > > # DATA PREPROCESSING > # Download RAW data from GEO experiment > setInternet2(use=FALSE) > download.file('http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE3457 1&format=file', 'GSE34571_RAW.tar', mode='wb') > untar('GSE34571_RAW.tar', exdir='gpr') > setwd('gpr') > > # Set function to assign weights > f <- function(x) {ok.flags <- x$Flags > (-99) > ok.snr <- x[,'SNR 532'] > 3 > as.numeric(ok.snr & ok.flags) > } > > # Read single-channel .gpr files as double-channel data and perform bg correction > RG <- read.maimages(dir(pattern='*.gpr.gz'), source="genepix", > columns=list(R="F532 Mean",G="F532 Mean",Rb="B532",Gb="B532"), > wt.fun=f) > RG.b = backgroundCorrect(RG=RG, method='normexp') > > # Delete fake second channel > rownames(RG.b$R) <- RG.b$genes$Name > RG.b$G <- NULL > RG.b$Gb <- NULL > > # Specify contrast and design matrices > design <- model.matrix(~ 0+factor(c(1,1,1,1,2,2,2,2))) > colnames(design) <- c("group1", "group2") > contrast.matrix <- makeContrasts(group1-group2, levels=design) > > # Peform normalisation and stats > norm.vsn <- normalizeVSN(RG.b$R) > fit.vsn <- lmFit(norm.vsn, design) > fit2.vsn <- contrasts.fit(fit.vsn, contrast.matrix) > fit3.vsn <- eBayes(fit2.vsn) > limma::volcanoplot(fit3.vsn) > > > RG$printer$ngrid.r = 1 > imageplot(log2(RG.b$R[,2]), RG$printer) > > # DIAGNOSTIC PLOTS > > # maplot.1 > par(mfrow=c(3,2)) > limma::plotMA(log2(RG$R[,c(1:4)])) > limma::plotMA(log2(RG$R[,c(5:8)])) > > limma::plotMA(log2(RG.b$R[,c(1:4)])) > limma::plotMA(log2(RG.b$R[,c(5:8)])) > > limma::plotMA(norm.vsn[,c(1:4)]) > limma::plotMA(norm.vsn[,c(5:8)]) > > # maplot.2.final > par(mfrow=c(1,1)) > limma::plotMA(fit3.vsn) > > # maplot.3.filtered > par(mfrow=c(2,1)) > limma::plotMA(log2(RG.b$R[((RG.b$weights[,1] == 1) & (RG.b$weights[,2] == 1)),c(1:2)])) > limma::plotMA(log2(RG.b$R[,c(1:2)])) > > boxplot(RG$R) > boxplot(norm.vsn) > > # plotdensities > plotDensities(RG, singlechannels=c(1:8)) > plotDensities(RG.b, singlechannels=c(1:8)) > > ------------------------------ ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
Hi Marcin, Normalizing of miRNA microarrays is quite a bit trickier than with mRNA microarrays, because of the smaller number of probes and the higher proportion that are differentially expressed. See http://rnajournal.cshlp.org/content/early/2013/05/24/rna.035055.112 Best wishes Gordon On Tue, 12 Nov 2013, Gordon K Smyth wrote: > Hi Marcin, > > It doesn't need to be quite so complicated. > > limma reads the single-channel data directly, so there is no need to make > fake double-channel data. See Section 4.5 of the User's Guide, except that > you will use source="genepix" instead of source="agilent". > > There is no need to specify the read columns explicitly. Just use > source="genepix.custom" if you want to use the custom background correction > column. > > Section 16.4 gives one case study with Agilent arrays showing the sort of > background correction and normalization that I would recommend. There is no > need to lose any annotation information. There is usually no need to make > complicated filters or weights. > > There is no difficulty with setting spot types with single channel data. It > works just the same as for two colour data. > > If you start a new R session with limma only loaded, or else load limma last, > then the need to prefix limma::plotMA will go away as well. > > Hope this helps. > Gordon > > >> Date: Mon, 11 Nov 2013 01:39:19 +0100 >> From: Marcin Jakub Kami?ski <marcinjakubkaminski at="" gmail.com=""> >> To: bioconductor at r-project.org >> Subject: [BioC] Single-channel GenePix analysis with Limma (bg >> correction, normalization, weights) >> >> Dear experts, >> I'm trying to analyze raw data from GEO GSE34571 using limma. >> It's 8 single-channel Agilent microarrays scanned with Genepix software to >> .gpr files. >> It's a simple design of between-group comparison including two groups, 4 >> replicates in each. >> >> With some help from limma guide and previous mailing list entries, my >> pipeline includes: >> - setting function for weights >> - reading .gpr files as fake double-channel data >> - normexp background correction >> - deleting duplicated channels (fake green) >> - vsn normalization >> - lm fitting >> >> There are two major issues (at least I find them difficult): >> A. background correction/data quality >> B. data I loose after performing vsn normalization or because of having >> single-channel data only >> >> A. >> 1. Imageplots indicate spots of high-background signal. Should I be >> concerned about such noise? (present on imageplot.1.png - odd rows depict >> signal, even rows are for background intensities) >> 2. Even after applying background correction, I'm left with high >> intensities in the mentioned spots. Is it how background correction should >> work, I've chosen the wrong method, or such spots can't be corrected? >> (present on imageplot.2.bgcor.png) >> 3. MAplots show big differences in within-group expression (technical or >> biological replicates), even after normalization. Did I choose a wrong >> method? >> (as seen in maplot.1.png - first column is for arrays c(1,4), second for >> c(5,8); consecutive rows are for: raw data, background-corrected data, >> normalized data) >> 4. I think the above leads to the MAplot of beetween-group difference being >> skewed upwards for high intensities, am I right? (maplot.2.final.png) >> 5. After filtering-out genes with weak SNR and flags (see code), >> within-group fold-changes are considerably smaller. How should i decide >> which genes should be left for analysis: are there any standards or should >> I try 'trial-and-error plotting' of MAplots with different functions for >> weights? (maplot.3.filtered.png - upper plot presents filtered data) >> 6. Is there any reason to perform background correction, if it worsens >> densities (plotdensities.png - upper plot was before background correction) >> >> B. >> 7. VSN normalization doesn't take $weights into account. Is there any >> convenient way to include them? It also trims genes names, so I set them as >> rownames to be processed. >> 8. Is there any convenient way to assign '0 weights' to certain location on >> the microarray (such as previously mentioned high-intensity >> spots/scratches)? >> 9. I'm unable to plotMA(RG) with spottypes included, because my data is >> single-channel. Now I think about transferring four last arrays to be >> treated as green channel, but won't it affect the further analysis? >> 10. Due to inability to produce imageplots, I needed to set >> RG$printer$ngrid.r = 1 . Wouldn't it affect the plots reliability? >> >> If you consider any of those questions too basic, could you please provide >> me with a reliable online materials for basic microarray analysis? I'm >> trying to figure it out by myself. >> >> Source: http://pastebin.com/ZF2NJc7G >> Plots: http://imgur.com/a/I9k5C >> Session info: http://pastebin.com/kVRf2NWf >> >> Thank you for your support, >> Marcin Kaminski, medical student >> Medical University of Bialystok >> -------------- next part -------------- >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250 >> [3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C >> [5] LC_TIME=Polish_Poland.1250 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets >> [7] methods base >> >> other attached packages: >> [1] vsn_3.30.0 Biobase_2.22.0 BiocGenerics_0.8.0 >> [4] limma_3.18.2 >> >> loaded via a namespace (and not attached): >> [1] affy_1.40.0 affyio_1.30.0 BiocInstaller_1.12.0 >> [4] grid_3.0.2 lattice_0.20-24 preprocessCore_1.24.0 >> [7] tools_3.0.2 zlibbioc_1.8.0 >> -------------- next part -------------- >> A non-text attachment was scrubbed... >> Name: imageplot.1.png >> Type: image/png >> Size: 226759 bytes >> Desc: not available >> URL: >> <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a="" 7edc234="" attachment-0006.png=""> >> -------------- next part -------------- >> A non-text attachment was scrubbed... >> Name: imageplot.2.bgcor.png >> Type: image/png >> Size: 125957 bytes >> Desc: not available >> URL: >> <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a="" 7edc234="" attachment-0007.png=""> >> -------------- next part -------------- >> A non-text attachment was scrubbed... >> Name: maplot.1.png >> Type: image/png >> Size: 25500 bytes >> Desc: not available >> URL: >> <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a="" 7edc234="" attachment-0008.png=""> >> -------------- next part -------------- >> A non-text attachment was scrubbed... >> Name: maplot.2.final.png >> Type: image/png >> Size: 10400 bytes >> Desc: not available >> URL: >> <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a="" 7edc234="" attachment-0009.png=""> >> -------------- next part -------------- >> A non-text attachment was scrubbed... >> Name: maplot.3.filtered.png >> Type: image/png >> Size: 16349 bytes >> Desc: not available >> URL: >> <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a="" 7edc234="" attachment-0010.png=""> >> -------------- next part -------------- >> A non-text attachment was scrubbed... >> Name: plotdensities.png >> Type: image/png >> Size: 5469 bytes >> Desc: not available >> URL: >> <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131111="" a="" 7edc234="" attachment-0011.png=""> >> -------------- next part -------------- >> library(limma) >> library(vsn) >> >> # DATA PREPROCESSING >> # Download RAW data from GEO experiment >> setInternet2(use=FALSE) >> download.file('http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE345 71&format=file', >> 'GSE34571_RAW.tar', mode='wb') >> untar('GSE34571_RAW.tar', exdir='gpr') >> setwd('gpr') >> >> # Set function to assign weights >> f <- function(x) {ok.flags <- x$Flags > (-99) >> ok.snr <- x[,'SNR 532'] > 3 >> as.numeric(ok.snr & ok.flags) >> } >> >> # Read single-channel .gpr files as double-channel data and perform bg >> correction >> RG <- read.maimages(dir(pattern='*.gpr.gz'), source="genepix", >> columns=list(R="F532 Mean",G="F532 >> Mean",Rb="B532",Gb="B532"), >> wt.fun=f) >> RG.b = backgroundCorrect(RG=RG, method='normexp') >> >> # Delete fake second channel >> rownames(RG.b$R) <- RG.b$genes$Name >> RG.b$G <- NULL >> RG.b$Gb <- NULL >> >> # Specify contrast and design matrices >> design <- model.matrix(~ 0+factor(c(1,1,1,1,2,2,2,2))) >> colnames(design) <- c("group1", "group2") >> contrast.matrix <- makeContrasts(group1-group2, levels=design) >> >> # Peform normalisation and stats >> norm.vsn <- normalizeVSN(RG.b$R) >> fit.vsn <- lmFit(norm.vsn, design) >> fit2.vsn <- contrasts.fit(fit.vsn, contrast.matrix) >> fit3.vsn <- eBayes(fit2.vsn) >> limma::volcanoplot(fit3.vsn) >> >> >> RG$printer$ngrid.r = 1 >> imageplot(log2(RG.b$R[,2]), RG$printer) >> >> # DIAGNOSTIC PLOTS >> >> # maplot.1 >> par(mfrow=c(3,2)) >> limma::plotMA(log2(RG$R[,c(1:4)])) >> limma::plotMA(log2(RG$R[,c(5:8)])) >> >> limma::plotMA(log2(RG.b$R[,c(1:4)])) >> limma::plotMA(log2(RG.b$R[,c(5:8)])) >> >> limma::plotMA(norm.vsn[,c(1:4)]) >> limma::plotMA(norm.vsn[,c(5:8)]) >> >> # maplot.2.final >> par(mfrow=c(1,1)) >> limma::plotMA(fit3.vsn) >> >> # maplot.3.filtered >> par(mfrow=c(2,1)) >> limma::plotMA(log2(RG.b$R[((RG.b$weights[,1] == 1) & (RG.b$weights[,2] == >> 1)),c(1:2)])) >> limma::plotMA(log2(RG.b$R[,c(1:2)])) >> >> boxplot(RG$R) >> boxplot(norm.vsn) >> >> # plotdensities >> plotDensities(RG, singlechannels=c(1:8)) >> plotDensities(RG.b, singlechannels=c(1:8)) >> >> ------------------------------ > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLY
0
Entering edit mode
Hello Gordon, thank you for your support. Somehow I didn't pay enough attention and thought that 'green.only' parameter shouldn't be used in my case. Now, with Elist, the analysis is much easier, thank you! However, there's a small inconvenience: I cannot simply plotDensities() for EList, because it expects $R. There's no derivative function, such as plotDensities.EList. I workaround it easily by $R <- $E, but maybe that's done on purpose and I shouldn't use these plots in my case? I've followed your suggestion about bg correction and normalization method and normexp+quantile gives me the lowest variance between replicates, as well as produces definitely more symmetric volcanoplot. I guess I'll stick to these methods, thank you. What's interesting, from the other hand, microRNA that has been reported functional in vivo based on the consecutive research (hsa-miR-150), was ranked higher by vsn method (although it's also present among the first 20 results from quantile normalization). Shame to admit, but I didn't perform avereps() before. Now I'm averaging across probes, as you've suggested to do in the other mailing list topic. Also, thank you for the paper, I believe it's of a great value. Unluckily, my university doesn't provide free access to the most recent RNA articles, we have 1yr embargo. Best regards, Marcin On Tue, Nov 12, 2013 at 11:42 AM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Hi Marcin, > > Normalizing of miRNA microarrays is quite a bit trickier than with mRNA > microarrays, because of the smaller number of probes and the higher > proportion that are differentially expressed. See > > http://rnajournal.cshlp.org/content/early/2013/05/24/rna.035055.112 > > Best wishes > Gordon > > > > On Tue, 12 Nov 2013, Gordon K Smyth wrote: > > Hi Marcin, >> >> It doesn't need to be quite so complicated. >> >> limma reads the single-channel data directly, so there is no need to make >> fake double-channel data. See Section 4.5 of the User's Guide, except that >> you will use source="genepix" instead of source="agilent". >> >> There is no need to specify the read columns explicitly. Just use >> source="genepix.custom" if you want to use the custom background correction >> column. >> >> Section 16.4 gives one case study with Agilent arrays showing the sort of >> background correction and normalization that I would recommend. There is >> no need to lose any annotation information. There is usually no need to >> make complicated filters or weights. >> >> There is no difficulty with setting spot types with single channel data. >> It works just the same as for two colour data. >> >> If you start a new R session with limma only loaded, or else load limma >> last, then the need to prefix limma::plotMA will go away as well. >> >> Hope this helps. >> Gordon >> >> >> Date: Mon, 11 Nov 2013 01:39:19 +0100 >>> From: Marcin Jakub Kami?ski <marcinjakubkaminski@gmail.com> >>> To: bioconductor@r-project.org >>> Subject: [BioC] Single-channel GenePix analysis with Limma (bg >>> correction, normalization, weights) >>> >>> Dear experts, >>> I'm trying to analyze raw data from GEO GSE34571 using limma. >>> It's 8 single-channel Agilent microarrays scanned with Genepix software >>> to >>> .gpr files. >>> It's a simple design of between-group comparison including two groups, 4 >>> replicates in each. >>> >>> With some help from limma guide and previous mailing list entries, my >>> pipeline includes: >>> - setting function for weights >>> - reading .gpr files as fake double-channel data >>> - normexp background correction >>> - deleting duplicated channels (fake green) >>> - vsn normalization >>> - lm fitting >>> >>> There are two major issues (at least I find them difficult): >>> A. background correction/data quality >>> B. data I loose after performing vsn normalization or because of having >>> single-channel data only >>> >>> A. >>> 1. Imageplots indicate spots of high-background signal. Should I be >>> concerned about such noise? (present on imageplot.1.png - odd rows depict >>> signal, even rows are for background intensities) >>> 2. Even after applying background correction, I'm left with high >>> intensities in the mentioned spots. Is it how background correction >>> should >>> work, I've chosen the wrong method, or such spots can't be corrected? >>> (present on imageplot.2.bgcor.png) >>> 3. MAplots show big differences in within-group expression (technical or >>> biological replicates), even after normalization. Did I choose a wrong >>> method? >>> (as seen in maplot.1.png - first column is for arrays c(1,4), second for >>> c(5,8); consecutive rows are for: raw data, background-corrected data, >>> normalized data) >>> 4. I think the above leads to the MAplot of beetween-group difference >>> being >>> skewed upwards for high intensities, am I right? (maplot.2.final.png) >>> 5. After filtering-out genes with weak SNR and flags (see code), >>> within-group fold-changes are considerably smaller. How should i decide >>> which genes should be left for analysis: are there any standards or >>> should >>> I try 'trial-and-error plotting' of MAplots with different functions for >>> weights? (maplot.3.filtered.png - upper plot presents filtered data) >>> 6. Is there any reason to perform background correction, if it worsens >>> densities (plotdensities.png - upper plot was before background >>> correction) >>> >>> B. >>> 7. VSN normalization doesn't take $weights into account. Is there any >>> convenient way to include them? It also trims genes names, so I set them >>> as >>> rownames to be processed. >>> 8. Is there any convenient way to assign '0 weights' to certain location >>> on >>> the microarray (such as previously mentioned high-intensity >>> spots/scratches)? >>> 9. I'm unable to plotMA(RG) with spottypes included, because my data is >>> single-channel. Now I think about transferring four last arrays to be >>> treated as green channel, but won't it affect the further analysis? >>> 10. Due to inability to produce imageplots, I needed to set >>> RG$printer$ngrid.r = 1 . Wouldn't it affect the plots reliability? >>> >>> If you consider any of those questions too basic, could you please >>> provide >>> me with a reliable online materials for basic microarray analysis? I'm >>> trying to figure it out by myself. >>> >>> Source: http://pastebin.com/ZF2NJc7G >>> Plots: http://imgur.com/a/I9k5C >>> Session info: http://pastebin.com/kVRf2NWf >>> >>> Thank you for your support, >>> Marcin Kaminski, medical student >>> Medical University of Bialystok >>> -------------- next part -------------- >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>> >>> locale: >>> [1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250 >>> [3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C >>> [5] LC_TIME=Polish_Poland.1250 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets >>> [7] methods base >>> >>> other attached packages: >>> [1] vsn_3.30.0 Biobase_2.22.0 BiocGenerics_0.8.0 >>> [4] limma_3.18.2 >>> >>> loaded via a namespace (and not attached): >>> [1] affy_1.40.0 affyio_1.30.0 BiocInstaller_1.12.0 >>> [4] grid_3.0.2 lattice_0.20-24 preprocessCore_1.24.0 >>> [7] tools_3.0.2 zlibbioc_1.8.0 >>> -------------- next part -------------- >>> A non-text attachment was scrubbed... >>> Name: imageplot.1.png >>> Type: image/png >>> Size: 226759 bytes >>> Desc: not available >>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>> attachments/20131111/a7edc234/attachment-0006.png> >>> -------------- next part -------------- >>> A non-text attachment was scrubbed... >>> Name: imageplot.2.bgcor.png >>> Type: image/png >>> Size: 125957 bytes >>> Desc: not available >>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>> attachments/20131111/a7edc234/attachment-0007.png> >>> -------------- next part -------------- >>> A non-text attachment was scrubbed... >>> Name: maplot.1.png >>> Type: image/png >>> Size: 25500 bytes >>> Desc: not available >>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>> attachments/20131111/a7edc234/attachment-0008.png> >>> -------------- next part -------------- >>> A non-text attachment was scrubbed... >>> Name: maplot.2.final.png >>> Type: image/png >>> Size: 10400 bytes >>> Desc: not available >>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>> attachments/20131111/a7edc234/attachment-0009.png> >>> -------------- next part -------------- >>> A non-text attachment was scrubbed... >>> Name: maplot.3.filtered.png >>> Type: image/png >>> Size: 16349 bytes >>> Desc: not available >>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>> attachments/20131111/a7edc234/attachment-0010.png> >>> -------------- next part -------------- >>> A non-text attachment was scrubbed... >>> Name: plotdensities.png >>> Type: image/png >>> Size: 5469 bytes >>> Desc: not available >>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>> attachments/20131111/a7edc234/attachment-0011.png> >>> -------------- next part -------------- >>> library(limma) >>> library(vsn) >>> >>> # DATA PREPROCESSING >>> # Download RAW data from GEO experiment >>> setInternet2(use=FALSE) >>> download.file('http://www.ncbi.nlm.nih.gov/geo/download/ >>> ?acc=GSE34571&format=file', 'GSE34571_RAW.tar', mode='wb') >>> untar('GSE34571_RAW.tar', exdir='gpr') >>> setwd('gpr') >>> >>> # Set function to assign weights >>> f <- function(x) {ok.flags <- x$Flags > (-99) >>> ok.snr <- x[,'SNR 532'] > 3 >>> as.numeric(ok.snr & ok.flags) >>> } >>> >>> # Read single-channel .gpr files as double-channel data and perform bg >>> correction >>> RG <- read.maimages(dir(pattern='*.gpr.gz'), source="genepix", >>> columns=list(R="F532 Mean",G="F532 >>> Mean",Rb="B532",Gb="B532"), >>> wt.fun=f) >>> RG.b = backgroundCorrect(RG=RG, method='normexp') >>> >>> # Delete fake second channel >>> rownames(RG.b$R) <- RG.b$genes$Name >>> RG.b$G <- NULL >>> RG.b$Gb <- NULL >>> >>> # Specify contrast and design matrices >>> design <- model.matrix(~ 0+factor(c(1,1,1,1,2,2,2,2))) >>> colnames(design) <- c("group1", "group2") >>> contrast.matrix <- makeContrasts(group1-group2, levels=design) >>> >>> # Peform normalisation and stats >>> norm.vsn <- normalizeVSN(RG.b$R) >>> fit.vsn <- lmFit(norm.vsn, design) >>> fit2.vsn <- contrasts.fit(fit.vsn, contrast.matrix) >>> fit3.vsn <- eBayes(fit2.vsn) >>> limma::volcanoplot(fit3.vsn) >>> >>> >>> RG$printer$ngrid.r = 1 >>> imageplot(log2(RG.b$R[,2]), RG$printer) >>> >>> # DIAGNOSTIC PLOTS >>> >>> # maplot.1 >>> par(mfrow=c(3,2)) >>> limma::plotMA(log2(RG$R[,c(1:4)])) >>> limma::plotMA(log2(RG$R[,c(5:8)])) >>> >>> limma::plotMA(log2(RG.b$R[,c(1:4)])) >>> limma::plotMA(log2(RG.b$R[,c(5:8)])) >>> >>> limma::plotMA(norm.vsn[,c(1:4)]) >>> limma::plotMA(norm.vsn[,c(5:8)]) >>> >>> # maplot.2.final >>> par(mfrow=c(1,1)) >>> limma::plotMA(fit3.vsn) >>> >>> # maplot.3.filtered >>> par(mfrow=c(2,1)) >>> limma::plotMA(log2(RG.b$R[((RG.b$weights[,1] == 1) & (RG.b$weights[,2] >>> == 1)),c(1:2)])) >>> limma::plotMA(log2(RG.b$R[,c(1:2)])) >>> >>> boxplot(RG$R) >>> boxplot(norm.vsn) >>> >>> # plotdensities >>> plotDensities(RG, singlechannels=c(1:8)) >>> plotDensities(RG.b, singlechannels=c(1:8)) >>> >>> ------------------------------ >>> >> >> > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
ADD REPLY
0
Entering edit mode
On Fri, 15 Nov 2013, Marcin Jakub Kami??ski wrote: > Hello Gordon, > thank you for your support. Somehow I didn't pay enough attention and > thought that 'green.only' parameter shouldn't be used in my case. Now, > with Elist, the analysis is much easier, thank you! > > However, there's a small inconvenience: I cannot simply plotDensities() > for EList, because it expects $R. There's no derivative function, such > as plotDensities.EList. True. > I workaround it easily by $R <- $E, but maybe > that's done on purpose and I shouldn't use these plots in my case? No, there is no good reason why plotDensities() isn't implemented for EList objects. We just haven't done so as it the need wasn't urgent. > I've followed your suggestion about bg correction and normalization method > and normexp+quantile gives me the lowest variance between replicates, as > well as produces definitely more symmetric volcanoplot. I guess I'll stick > to these methods, thank you. > What's interesting, from the other hand, microRNA that has been reported > functional in vivo based on the consecutive research (hsa-miR-150), was > ranked higher by vsn method (although it's also present among the first 20 > results from quantile normalization). > > Shame to admit, but I didn't perform avereps() before. Now I'm averaging > across probes, as you've suggested to do in the other mailing list topic. Actually my intention was to advise against routine use of avereps(): https://www.stat.math.ethz.ch/pipermail/bioconductor/2013-November/056 061.html Best wishes Gordon > Also, thank you for the paper, I believe it's of a great value. Unluckily, > my university doesn't provide free access to the most recent RNA articles, > we have 1yr embargo. > > Best regards, > Marcin > > On Tue, Nov 12, 2013 at 11:42 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> Hi Marcin, >> >> Normalizing of miRNA microarrays is quite a bit trickier than with mRNA >> microarrays, because of the smaller number of probes and the higher >> proportion that are differentially expressed. See >> >> http://rnajournal.cshlp.org/content/early/2013/05/24/rna.035055.112 >> >> Best wishes >> Gordon >> >> >> >> On Tue, 12 Nov 2013, Gordon K Smyth wrote: >> >> Hi Marcin, >>> >>> It doesn't need to be quite so complicated. >>> >>> limma reads the single-channel data directly, so there is no need to make >>> fake double-channel data. See Section 4.5 of the User's Guide, except that >>> you will use source="genepix" instead of source="agilent". >>> >>> There is no need to specify the read columns explicitly. Just use >>> source="genepix.custom" if you want to use the custom background correction >>> column. >>> >>> Section 16.4 gives one case study with Agilent arrays showing the sort of >>> background correction and normalization that I would recommend. There is >>> no need to lose any annotation information. There is usually no need to >>> make complicated filters or weights. >>> >>> There is no difficulty with setting spot types with single channel data. >>> It works just the same as for two colour data. >>> >>> If you start a new R session with limma only loaded, or else load limma >>> last, then the need to prefix limma::plotMA will go away as well. >>> >>> Hope this helps. >>> Gordon >>> >>> >>> Date: Mon, 11 Nov 2013 01:39:19 +0100 >>>> From: Marcin Jakub Kami?ski <marcinjakubkaminski at="" gmail.com=""> >>>> To: bioconductor at r-project.org >>>> Subject: [BioC] Single-channel GenePix analysis with Limma (bg >>>> correction, normalization, weights) >>>> >>>> Dear experts, >>>> I'm trying to analyze raw data from GEO GSE34571 using limma. >>>> It's 8 single-channel Agilent microarrays scanned with Genepix software >>>> to >>>> .gpr files. >>>> It's a simple design of between-group comparison including two groups, 4 >>>> replicates in each. >>>> >>>> With some help from limma guide and previous mailing list entries, my >>>> pipeline includes: >>>> - setting function for weights >>>> - reading .gpr files as fake double-channel data >>>> - normexp background correction >>>> - deleting duplicated channels (fake green) >>>> - vsn normalization >>>> - lm fitting >>>> >>>> There are two major issues (at least I find them difficult): >>>> A. background correction/data quality >>>> B. data I loose after performing vsn normalization or because of having >>>> single-channel data only >>>> >>>> A. >>>> 1. Imageplots indicate spots of high-background signal. Should I be >>>> concerned about such noise? (present on imageplot.1.png - odd rows depict >>>> signal, even rows are for background intensities) >>>> 2. Even after applying background correction, I'm left with high >>>> intensities in the mentioned spots. Is it how background correction >>>> should >>>> work, I've chosen the wrong method, or such spots can't be corrected? >>>> (present on imageplot.2.bgcor.png) >>>> 3. MAplots show big differences in within-group expression (technical or >>>> biological replicates), even after normalization. Did I choose a wrong >>>> method? >>>> (as seen in maplot.1.png - first column is for arrays c(1,4), second for >>>> c(5,8); consecutive rows are for: raw data, background-corrected data, >>>> normalized data) >>>> 4. I think the above leads to the MAplot of beetween-group difference >>>> being >>>> skewed upwards for high intensities, am I right? (maplot.2.final.png) >>>> 5. After filtering-out genes with weak SNR and flags (see code), >>>> within-group fold-changes are considerably smaller. How should i decide >>>> which genes should be left for analysis: are there any standards or >>>> should >>>> I try 'trial-and-error plotting' of MAplots with different functions for >>>> weights? (maplot.3.filtered.png - upper plot presents filtered data) >>>> 6. Is there any reason to perform background correction, if it worsens >>>> densities (plotdensities.png - upper plot was before background >>>> correction) >>>> >>>> B. >>>> 7. VSN normalization doesn't take $weights into account. Is there any >>>> convenient way to include them? It also trims genes names, so I set them >>>> as >>>> rownames to be processed. >>>> 8. Is there any convenient way to assign '0 weights' to certain location >>>> on >>>> the microarray (such as previously mentioned high-intensity >>>> spots/scratches)? >>>> 9. I'm unable to plotMA(RG) with spottypes included, because my data is >>>> single-channel. Now I think about transferring four last arrays to be >>>> treated as green channel, but won't it affect the further analysis? >>>> 10. Due to inability to produce imageplots, I needed to set >>>> RG$printer$ngrid.r = 1 . Wouldn't it affect the plots reliability? >>>> >>>> If you consider any of those questions too basic, could you please >>>> provide >>>> me with a reliable online materials for basic microarray analysis? I'm >>>> trying to figure it out by myself. >>>> >>>> Source: http://pastebin.com/ZF2NJc7G >>>> Plots: http://imgur.com/a/I9k5C >>>> Session info: http://pastebin.com/kVRf2NWf >>>> >>>> Thank you for your support, >>>> Marcin Kaminski, medical student >>>> Medical University of Bialystok >>>> -------------- next part -------------- >>>> R version 3.0.2 (2013-09-25) >>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>> >>>> locale: >>>> [1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250 >>>> [3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C >>>> [5] LC_TIME=Polish_Poland.1250 >>>> >>>> attached base packages: >>>> [1] parallel stats graphics grDevices utils datasets >>>> [7] methods base >>>> >>>> other attached packages: >>>> [1] vsn_3.30.0 Biobase_2.22.0 BiocGenerics_0.8.0 >>>> [4] limma_3.18.2 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] affy_1.40.0 affyio_1.30.0 BiocInstaller_1.12.0 >>>> [4] grid_3.0.2 lattice_0.20-24 preprocessCore_1.24.0 >>>> [7] tools_3.0.2 zlibbioc_1.8.0 >>>> -------------- next part -------------- >>>> A non-text attachment was scrubbed... >>>> Name: imageplot.1.png >>>> Type: image/png >>>> Size: 226759 bytes >>>> Desc: not available >>>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>>> attachments/20131111/a7edc234/attachment-0006.png> >>>> -------------- next part -------------- >>>> A non-text attachment was scrubbed... >>>> Name: imageplot.2.bgcor.png >>>> Type: image/png >>>> Size: 125957 bytes >>>> Desc: not available >>>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>>> attachments/20131111/a7edc234/attachment-0007.png> >>>> -------------- next part -------------- >>>> A non-text attachment was scrubbed... >>>> Name: maplot.1.png >>>> Type: image/png >>>> Size: 25500 bytes >>>> Desc: not available >>>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>>> attachments/20131111/a7edc234/attachment-0008.png> >>>> -------------- next part -------------- >>>> A non-text attachment was scrubbed... >>>> Name: maplot.2.final.png >>>> Type: image/png >>>> Size: 10400 bytes >>>> Desc: not available >>>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>>> attachments/20131111/a7edc234/attachment-0009.png> >>>> -------------- next part -------------- >>>> A non-text attachment was scrubbed... >>>> Name: maplot.3.filtered.png >>>> Type: image/png >>>> Size: 16349 bytes >>>> Desc: not available >>>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>>> attachments/20131111/a7edc234/attachment-0010.png> >>>> -------------- next part -------------- >>>> A non-text attachment was scrubbed... >>>> Name: plotdensities.png >>>> Type: image/png >>>> Size: 5469 bytes >>>> Desc: not available >>>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>>> attachments/20131111/a7edc234/attachment-0011.png> >>>> -------------- next part -------------- >>>> library(limma) >>>> library(vsn) >>>> >>>> # DATA PREPROCESSING >>>> # Download RAW data from GEO experiment >>>> setInternet2(use=FALSE) >>>> download.file('http://www.ncbi.nlm.nih.gov/geo/download/ >>>> ?acc=GSE34571&format=file', 'GSE34571_RAW.tar', mode='wb') >>>> untar('GSE34571_RAW.tar', exdir='gpr') >>>> setwd('gpr') >>>> >>>> # Set function to assign weights >>>> f <- function(x) {ok.flags <- x$Flags > (-99) >>>> ok.snr <- x[,'SNR 532'] > 3 >>>> as.numeric(ok.snr & ok.flags) >>>> } >>>> >>>> # Read single-channel .gpr files as double-channel data and perform bg >>>> correction >>>> RG <- read.maimages(dir(pattern='*.gpr.gz'), source="genepix", >>>> columns=list(R="F532 Mean",G="F532 >>>> Mean",Rb="B532",Gb="B532"), >>>> wt.fun=f) >>>> RG.b = backgroundCorrect(RG=RG, method='normexp') >>>> >>>> # Delete fake second channel >>>> rownames(RG.b$R) <- RG.b$genes$Name >>>> RG.b$G <- NULL >>>> RG.b$Gb <- NULL >>>> >>>> # Specify contrast and design matrices >>>> design <- model.matrix(~ 0+factor(c(1,1,1,1,2,2,2,2))) >>>> colnames(design) <- c("group1", "group2") >>>> contrast.matrix <- makeContrasts(group1-group2, levels=design) >>>> >>>> # Peform normalisation and stats >>>> norm.vsn <- normalizeVSN(RG.b$R) >>>> fit.vsn <- lmFit(norm.vsn, design) >>>> fit2.vsn <- contrasts.fit(fit.vsn, contrast.matrix) >>>> fit3.vsn <- eBayes(fit2.vsn) >>>> limma::volcanoplot(fit3.vsn) >>>> >>>> >>>> RG$printer$ngrid.r = 1 >>>> imageplot(log2(RG.b$R[,2]), RG$printer) >>>> >>>> # DIAGNOSTIC PLOTS >>>> >>>> # maplot.1 >>>> par(mfrow=c(3,2)) >>>> limma::plotMA(log2(RG$R[,c(1:4)])) >>>> limma::plotMA(log2(RG$R[,c(5:8)])) >>>> >>>> limma::plotMA(log2(RG.b$R[,c(1:4)])) >>>> limma::plotMA(log2(RG.b$R[,c(5:8)])) >>>> >>>> limma::plotMA(norm.vsn[,c(1:4)]) >>>> limma::plotMA(norm.vsn[,c(5:8)]) >>>> >>>> # maplot.2.final >>>> par(mfrow=c(1,1)) >>>> limma::plotMA(fit3.vsn) >>>> >>>> # maplot.3.filtered >>>> par(mfrow=c(2,1)) >>>> limma::plotMA(log2(RG.b$R[((RG.b$weights[,1] == 1) & (RG.b$weights[,2] >>>> == 1)),c(1:2)])) >>>> limma::plotMA(log2(RG.b$R[,c(1:2)])) >>>> >>>> boxplot(RG$R) >>>> boxplot(norm.vsn) >>>> >>>> # plotdensities >>>> plotDensities(RG, singlechannels=c(1:8)) >>>> plotDensities(RG.b, singlechannels=c(1:8)) >>>> >>>> ------------------------------ >>>> >>> >>> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________________________________________________ >> > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}
ADD REPLY
0
Entering edit mode
Hello Gordon, Thank you for elucidation on the avereps() usage. I've misinterpreted your words as: - do average multiple replicates of same probe (across probe ID) - do not average multiple probes for same gene (across gene ID) Thank you for your support. Best regards, Marcin On Sat, Nov 16, 2013 at 8:18 AM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > > On Fri, 15 Nov 2013, Marcin Jakub Kamiński wrote: > > Hello Gordon, >> > > thank you for your support. Somehow I didn't pay enough attention and >> thought that 'green.only' parameter shouldn't be used in my case. Now, with >> Elist, the analysis is much easier, thank you! >> >> However, there's a small inconvenience: I cannot simply plotDensities() >> for EList, because it expects $R. There's no derivative function, such as >> plotDensities.EList. >> > > True. > > > I workaround it easily by $R <- $E, but maybe that's done on purpose and >> I shouldn't use these plots in my case? >> > > No, there is no good reason why plotDensities() isn't implemented for > EList objects. We just haven't done so as it the need wasn't urgent. > > > I've followed your suggestion about bg correction and normalization method >> and normexp+quantile gives me the lowest variance between replicates, as >> well as produces definitely more symmetric volcanoplot. I guess I'll stick >> to these methods, thank you. >> What's interesting, from the other hand, microRNA that has been reported >> functional in vivo based on the consecutive research (hsa-miR-150), was >> ranked higher by vsn method (although it's also present among the first 20 >> results from quantile normalization). >> >> Shame to admit, but I didn't perform avereps() before. Now I'm averaging >> across probes, as you've suggested to do in the other mailing list topic. >> > > Actually my intention was to advise against routine use of avereps(): > > https://www.stat.math.ethz.ch/pipermail/bioconductor/2013- > November/056061.html > > Best wishes > Gordon > > > Also, thank you for the paper, I believe it's of a great value. Unluckily, >> my university doesn't provide free access to the most recent RNA articles, >> we have 1yr embargo. >> >> Best regards, >> Marcin >> >> On Tue, Nov 12, 2013 at 11:42 AM, Gordon K Smyth <smyth@wehi.edu.au> >> wrote: >> >> Hi Marcin, >>> >>> Normalizing of miRNA microarrays is quite a bit trickier than with mRNA >>> microarrays, because of the smaller number of probes and the higher >>> proportion that are differentially expressed. See >>> >>> http://rnajournal.cshlp.org/content/early/2013/05/24/rna.035055.112 >>> >>> Best wishes >>> Gordon >>> >>> >>> >>> On Tue, 12 Nov 2013, Gordon K Smyth wrote: >>> >>> Hi Marcin, >>> >>>> >>>> It doesn't need to be quite so complicated. >>>> >>>> limma reads the single-channel data directly, so there is no need to >>>> make >>>> fake double-channel data. See Section 4.5 of the User's Guide, except >>>> that >>>> you will use source="genepix" instead of source="agilent". >>>> >>>> There is no need to specify the read columns explicitly. Just use >>>> source="genepix.custom" if you want to use the custom background >>>> correction >>>> column. >>>> >>>> Section 16.4 gives one case study with Agilent arrays showing the sort >>>> of >>>> background correction and normalization that I would recommend. There >>>> is >>>> no need to lose any annotation information. There is usually no need to >>>> make complicated filters or weights. >>>> >>>> There is no difficulty with setting spot types with single channel data. >>>> It works just the same as for two colour data. >>>> >>>> If you start a new R session with limma only loaded, or else load limma >>>> last, then the need to prefix limma::plotMA will go away as well. >>>> >>>> Hope this helps. >>>> Gordon >>>> >>>> >>>> Date: Mon, 11 Nov 2013 01:39:19 +0100 >>>> >>>>> From: Marcin Jakub Kami?ski <marcinjakubkaminski@gmail.com> >>>>> To: bioconductor@r-project.org >>>>> Subject: [BioC] Single-channel GenePix analysis with Limma (bg >>>>> correction, normalization, weights) >>>>> >>>>> Dear experts, >>>>> I'm trying to analyze raw data from GEO GSE34571 using limma. >>>>> It's 8 single-channel Agilent microarrays scanned with Genepix software >>>>> to >>>>> .gpr files. >>>>> It's a simple design of between-group comparison including two groups, >>>>> 4 >>>>> replicates in each. >>>>> >>>>> With some help from limma guide and previous mailing list entries, my >>>>> pipeline includes: >>>>> - setting function for weights >>>>> - reading .gpr files as fake double-channel data >>>>> - normexp background correction >>>>> - deleting duplicated channels (fake green) >>>>> - vsn normalization >>>>> - lm fitting >>>>> >>>>> There are two major issues (at least I find them difficult): >>>>> A. background correction/data quality >>>>> B. data I loose after performing vsn normalization or because of having >>>>> single-channel data only >>>>> >>>>> A. >>>>> 1. Imageplots indicate spots of high-background signal. Should I be >>>>> concerned about such noise? (present on imageplot.1.png - odd rows >>>>> depict >>>>> signal, even rows are for background intensities) >>>>> 2. Even after applying background correction, I'm left with high >>>>> intensities in the mentioned spots. Is it how background correction >>>>> should >>>>> work, I've chosen the wrong method, or such spots can't be corrected? >>>>> (present on imageplot.2.bgcor.png) >>>>> 3. MAplots show big differences in within-group expression (technical >>>>> or >>>>> biological replicates), even after normalization. Did I choose a wrong >>>>> method? >>>>> (as seen in maplot.1.png - first column is for arrays c(1,4), second >>>>> for >>>>> c(5,8); consecutive rows are for: raw data, background-corrected data, >>>>> normalized data) >>>>> 4. I think the above leads to the MAplot of beetween-group difference >>>>> being >>>>> skewed upwards for high intensities, am I right? (maplot.2.final.png) >>>>> 5. After filtering-out genes with weak SNR and flags (see code), >>>>> within-group fold-changes are considerably smaller. How should i decide >>>>> which genes should be left for analysis: are there any standards or >>>>> should >>>>> I try 'trial-and-error plotting' of MAplots with different functions >>>>> for >>>>> weights? (maplot.3.filtered.png - upper plot presents filtered data) >>>>> 6. Is there any reason to perform background correction, if it worsens >>>>> densities (plotdensities.png - upper plot was before background >>>>> correction) >>>>> >>>>> B. >>>>> 7. VSN normalization doesn't take $weights into account. Is there any >>>>> convenient way to include them? It also trims genes names, so I set >>>>> them >>>>> as >>>>> rownames to be processed. >>>>> 8. Is there any convenient way to assign '0 weights' to certain >>>>> location >>>>> on >>>>> the microarray (such as previously mentioned high-intensity >>>>> spots/scratches)? >>>>> 9. I'm unable to plotMA(RG) with spottypes included, because my data is >>>>> single-channel. Now I think about transferring four last arrays to be >>>>> treated as green channel, but won't it affect the further analysis? >>>>> 10. Due to inability to produce imageplots, I needed to set >>>>> RG$printer$ngrid.r = 1 . Wouldn't it affect the plots reliability? >>>>> >>>>> If you consider any of those questions too basic, could you please >>>>> provide >>>>> me with a reliable online materials for basic microarray analysis? I'm >>>>> trying to figure it out by myself. >>>>> >>>>> Source: http://pastebin.com/ZF2NJc7G >>>>> Plots: http://imgur.com/a/I9k5C >>>>> Session info: http://pastebin.com/kVRf2NWf >>>>> >>>>> Thank you for your support, >>>>> Marcin Kaminski, medical student >>>>> Medical University of Bialystok >>>>> -------------- next part -------------- >>>>> R version 3.0.2 (2013-09-25) >>>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>>> >>>>> locale: >>>>> [1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250 >>>>> [3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C >>>>> [5] LC_TIME=Polish_Poland.1250 >>>>> >>>>> attached base packages: >>>>> [1] parallel stats graphics grDevices utils datasets >>>>> [7] methods base >>>>> >>>>> other attached packages: >>>>> [1] vsn_3.30.0 Biobase_2.22.0 BiocGenerics_0.8.0 >>>>> [4] limma_3.18.2 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] affy_1.40.0 affyio_1.30.0 BiocInstaller_1.12.0 >>>>> [4] grid_3.0.2 lattice_0.20-24 preprocessCore_1.24.0 >>>>> [7] tools_3.0.2 zlibbioc_1.8.0 >>>>> -------------- next part -------------- >>>>> A non-text attachment was scrubbed... >>>>> Name: imageplot.1.png >>>>> Type: image/png >>>>> Size: 226759 bytes >>>>> Desc: not available >>>>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>>>> attachments/20131111/a7edc234/attachment-0006.png> >>>>> -------------- next part -------------- >>>>> A non-text attachment was scrubbed... >>>>> Name: imageplot.2.bgcor.png >>>>> Type: image/png >>>>> Size: 125957 bytes >>>>> Desc: not available >>>>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>>>> attachments/20131111/a7edc234/attachment-0007.png> >>>>> -------------- next part -------------- >>>>> A non-text attachment was scrubbed... >>>>> Name: maplot.1.png >>>>> Type: image/png >>>>> Size: 25500 bytes >>>>> Desc: not available >>>>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>>>> attachments/20131111/a7edc234/attachment-0008.png> >>>>> -------------- next part -------------- >>>>> A non-text attachment was scrubbed... >>>>> Name: maplot.2.final.png >>>>> Type: image/png >>>>> Size: 10400 bytes >>>>> Desc: not available >>>>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>>>> attachments/20131111/a7edc234/attachment-0009.png> >>>>> -------------- next part -------------- >>>>> A non-text attachment was scrubbed... >>>>> Name: maplot.3.filtered.png >>>>> Type: image/png >>>>> Size: 16349 bytes >>>>> Desc: not available >>>>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>>>> attachments/20131111/a7edc234/attachment-0010.png> >>>>> -------------- next part -------------- >>>>> A non-text attachment was scrubbed... >>>>> Name: plotdensities.png >>>>> Type: image/png >>>>> Size: 5469 bytes >>>>> Desc: not available >>>>> URL: <https: stat.ethz.ch="" pipermail="" bioconductor="">>>>> attachments/20131111/a7edc234/attachment-0011.png> >>>>> -------------- next part -------------- >>>>> library(limma) >>>>> library(vsn) >>>>> >>>>> # DATA PREPROCESSING >>>>> # Download RAW data from GEO experiment >>>>> setInternet2(use=FALSE) >>>>> download.file('http://www.ncbi.nlm.nih.gov/geo/download/ >>>>> ?acc=GSE34571&format=file', 'GSE34571_RAW.tar', mode='wb') >>>>> untar('GSE34571_RAW.tar', exdir='gpr') >>>>> setwd('gpr') >>>>> >>>>> # Set function to assign weights >>>>> f <- function(x) {ok.flags <- x$Flags > (-99) >>>>> ok.snr <- x[,'SNR 532'] > 3 >>>>> as.numeric(ok.snr & ok.flags) >>>>> } >>>>> >>>>> # Read single-channel .gpr files as double-channel data and perform bg >>>>> correction >>>>> RG <- read.maimages(dir(pattern='*.gpr.gz'), source="genepix", >>>>> columns=list(R="F532 Mean",G="F532 >>>>> Mean",Rb="B532",Gb="B532"), >>>>> wt.fun=f) >>>>> RG.b = backgroundCorrect(RG=RG, method='normexp') >>>>> >>>>> # Delete fake second channel >>>>> rownames(RG.b$R) <- RG.b$genes$Name >>>>> RG.b$G <- NULL >>>>> RG.b$Gb <- NULL >>>>> >>>>> # Specify contrast and design matrices >>>>> design <- model.matrix(~ 0+factor(c(1,1,1,1,2,2,2,2))) >>>>> colnames(design) <- c("group1", "group2") >>>>> contrast.matrix <- makeContrasts(group1-group2, levels=design) >>>>> >>>>> # Peform normalisation and stats >>>>> norm.vsn <- normalizeVSN(RG.b$R) >>>>> fit.vsn <- lmFit(norm.vsn, design) >>>>> fit2.vsn <- contrasts.fit(fit.vsn, contrast.matrix) >>>>> fit3.vsn <- eBayes(fit2.vsn) >>>>> limma::volcanoplot(fit3.vsn) >>>>> >>>>> >>>>> RG$printer$ngrid.r = 1 >>>>> imageplot(log2(RG.b$R[,2]), RG$printer) >>>>> >>>>> # DIAGNOSTIC PLOTS >>>>> >>>>> # maplot.1 >>>>> par(mfrow=c(3,2)) >>>>> limma::plotMA(log2(RG$R[,c(1:4)])) >>>>> limma::plotMA(log2(RG$R[,c(5:8)])) >>>>> >>>>> limma::plotMA(log2(RG.b$R[,c(1:4)])) >>>>> limma::plotMA(log2(RG.b$R[,c(5:8)])) >>>>> >>>>> limma::plotMA(norm.vsn[,c(1:4)]) >>>>> limma::plotMA(norm.vsn[,c(5:8)]) >>>>> >>>>> # maplot.2.final >>>>> par(mfrow=c(1,1)) >>>>> limma::plotMA(fit3.vsn) >>>>> >>>>> # maplot.3.filtered >>>>> par(mfrow=c(2,1)) >>>>> limma::plotMA(log2(RG.b$R[((RG.b$weights[,1] == 1) & (RG.b$weights[,2] >>>>> == 1)),c(1:2)])) >>>>> limma::plotMA(log2(RG.b$R[,c(1:2)])) >>>>> >>>>> boxplot(RG$R) >>>>> boxplot(norm.vsn) >>>>> >>>>> # plotdensities >>>>> plotDensities(RG, singlechannels=c(1:8)) >>>>> plotDensities(RG.b, singlechannels=c(1:8)) >>>>> >>>>> ------------------------------ >>>>> >>>>> >>>> >>>> ______________________________________________________________________ >>> The information in this email is confidential and intended solely for the >>> addressee. >>> You must not disclose, forward, print or use it without the permission of >>> the sender. >>> ______________________________________________________________________ >>> >>> >> > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
ADD REPLY

Login before adding your answer.

Traffic: 485 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6