Question: Reading and normalizing two-color Agilent microarrays
0
gravatar for dr
15 months ago by
dr10
dr10 wrote:

Hi,

 

I have a mouse cell-line time-course gene expression data measured on two-color Agilent microarrays.

At time = 0, RNA was purified from 4 cell-lines. Following that, all cell lines were treated with a drug, and then RNA was purified over 10 time points. The reference channel for each cell line (Cy5 - red) is the time = 0 respective sample.

 

I'm trying to use limma for analyzing the data.

 

I've followed these steps:

1. Created a targets data.frame (targets.df), in which a FileName column species the microarray file path, a Cy5 column specifies the time = 0 sample names , a Cy3 column specifies all other time > 0 sample names, and a Date column which specifies the scan date.

2. I then read the microarrays using this command:

array.list <- limma::read.maimages(targets.df, source = "agilent.median")

3. I then do the background correction using this command:

bg.corrected.array.list <- limma::backgroundCorrect(array.list, method = "normexp", offset = 16)

4. I then do within array normalization using this command:

within.normalized.array.list <- limma::normalizeWithinArrays(bg.corrected.array.list, method = "loess")

5.  I then do between array normalization using this command:

between.normalized.array.list <- limma::normalizeBetweenArrays(within.normalized.array.list, method = "Aquantile")

6. I then read my GEO format annotations for the microarray (using simple read.table), for adding the genes annotation data.frame to the between.normalized.array.list. During this step I filter out the control probes (annotated as CONTROL_TYPE = TRUE in my annotations file).

7. I then collapse the probes to genes using WGCNA::collapseRows. For this I set the rownames of between.normalized.array.list$M and between.normalized.array.list$A to between.normalized.array.list$genes$ID (probe ID), and run:

collapsed.probes.list <- WGCNA::collapseRows(datET = between.normalized.array.list$M, rowGroup = between.normalized.array.list$genes$gene_id, rowID = between.normalized.array.list$genes$ID)

 

6. I then override between.normalized.array.list$Mbetween.normalized.array.list$A, and between.normalized.array.list$genes, with the collapsed.probes.list output :

  between.normalized.array.list$M <- between.normalized.array.list$M[which(collapsed.probes.list$selectedRow),]
  between.normalized.array.list$A <- between.normalized.array.list$A[which(collapsed.probes.list$selectedRow),]
  between.normalized.array.list$genes <- between.normalized.array.list$genes[which(collapsed.probes.list$selectedRow),] %>%
    dplyr::select(gene_id, symbol)))
  rownames(between.normalized.array.list$M) <- NULL
  rownames(between.normalized.array.list$A) <- NULL

 

 

I then looked at a specific gene of interest - plotting its between.normalized.array.list$M values vs. time and it looks the opposite from what I expected - the expression goes up with time rather than down.

Is because I swapped the dyes in my microarrays and the expected reference channel should be Cy3 rather than Cy5?

 

As far as I understand this cannot be corrected by limma::read.maimages. Any idea how to deal with this?

 

 

Thanks a lot,

 

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /sw/R/R-3.4.3-install/lib64/R/lib/libRblas.so
LAPACK: /sw/R/R-3.4.3-install/lib64/R/lib/libRlapack.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] bindrcpp_0.2.2 dplyr_0.7.5

loaded via a namespace (and not attached):
  [1] bitops_1.0-6               matrixStats_0.51.0
  [3] bit64_0.9-7                RColorBrewer_1.1-2
  [5] doParallel_1.0.11          httr_1.3.1
  [7] prabclus_2.2-6             GenomeInfoDb_1.13.4
  [9] dynamicTreeCut_1.63-1      backports_1.1.1
 [11] tools_3.4.3                R6_2.2.2
 [13] rpart_4.1-13               Hmisc_4.0-3
 [15] DBI_0.7                    lazyeval_0.2.1
 [17] BiocGenerics_0.23.4        colorspace_1.2-7
 [19] trimcluster_0.1-2          nnet_7.3-12
 [21] tidyselect_0.2.4           gridExtra_2.3
 [23] parsingUtils_0.1.0         preprocessCore_1.40.0
 [25] bit_1.1-12                 compiler_3.4.3
 [27] WGCNA_1.51                 Biobase_2.38.0
 [29] htmlTable_1.9              DelayedArray_0.4.1
 [31] plotly_4.7.1.9000          rtracklayer_1.37.3
 [33] checkmate_1.8.5            diptest_0.75-7
 [35] scales_0.5.0.9000          DEoptimR_1.0-8
 [37] mvtnorm_1.0-6              robustbase_0.92-8
 [39] stringr_1.3.0              digest_0.6.15
 [41] Rsamtools_1.26.1           foreign_0.8-67
 [43] XVector_0.17.0             base64enc_0.1-3
 [45] pkgconfig_2.0.1            htmltools_0.3.6
 [47] limma_3.34.9               htmlwidgets_1.2
 [49] rlang_0.2.1.9000           impute_1.52.0
 [51] RSQLite_2.0                VGAM_1.0-5
 [53] bindr_0.1.1                jsonlite_1.5
 [55] mclust_5.4                 BiocParallel_1.10.1
 [57] acepack_1.4.1              dendextend_1.5.2
 [59] analysisUtils_0.0.0.9000   RCurl_1.95-4.8
 [61] magrittr_1.5               modeltools_0.2-21
 [63] GO.db_3.5.0                GenomeInfoDbData_0.99.1
 [65] Formula_1.2-2              coneproj_1.12
 [67] Matrix_1.2-12              Rcpp_0.12.18
 [69] munsell_0.4.3              S4Vectors_0.15.5
 [71] viridis_0.5.1              stringi_1.2.2
 [73] whisker_0.3-2              yaml_2.1.19
 [75] MASS_7.3-50                SummarizedExperiment_1.8.0
 [77] zlibbioc_1.20.0            flexmix_2.3-13
 [79] plyr_1.8.4                 grid_3.4.3
 [81] blob_1.1.0                 parallel_3.4.3
 [83] lattice_0.20-34            Biostrings_2.45.3
 [85] splines_3.4.3              knitr_1.17
 [87] pillar_1.1.0               fastcluster_1.1.22
 [89] GenomicRanges_1.29.15      fpc_2.1-10
 [91] codetools_0.2-15           stats4_3.4.3
 [93] XML_3.98-1.9               glue_1.2.0
 [95] latticeExtra_0.6-28        data.table_1.11.4
 [97] foreach_1.4.4              gtable_0.2.0
 [99] purrr_0.2.4                tidyr_0.8.1
[101] kernlab_0.9-25             assertthat_0.2.0
[103] ggplot2_3.0.0.9000         class_7.3-14
[105] survival_2.40-1            viridisLite_0.3.0
[107] tibble_1.4.2               iterators_1.0.10
[109] GenomicAlignments_1.10.0   AnnotationDbi_1.40.0
[111] memoise_1.1.0              IRanges_2.11.19
[113] cluster_2.0.6

 

 

limma agilent microarrays • 331 views
ADD COMMENTlink modified 15 months ago by Peter Langfelder2.3k • written 15 months ago by dr10
Answer: Reading and normalizing two-color Agilent microarrays
4
gravatar for Gordon Smyth
15 months ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

limma defines M-values to be Cy5-Cy3 so, naturally, if you put your reference samples on Cy5 instead of Cy3 then all the M-values will be minus what they would have been had you used Cy3 for the reference samples.

I don't see any reason why that should be a problem. Why does it cause you a problem that the M-values decrease over time for some genes instead of going up? You'll still get the same results in the end. If it does bother you, then just multiple the M matrix by -1.

By the way, WGCNA::collapseRows is not a good way to collapse two-color data down to genes because it's designed for single-channel data. A much better way is:

x <- between.normalized.array.list
Amean <- rowMeans(x$A)
o <- order(Amean, decreasing=TRUE)
x <- x[o,]
dup <- duplicated(x$genes$gene_id)
x <- x[!dup]

 

ADD COMMENTlink modified 15 months ago • written 15 months ago by Gordon Smyth39k
Answer: Reading and normalizing two-color Agilent microarrays
1
gravatar for Peter Langfelder
15 months ago by
United States
Peter Langfelder2.3k wrote:

Regarding WGCNA::collapseRows for two-color data, Gordon is correct that the default setting (method = "MaxMean") is a poor choice for two-color data, or, for that matter, any data that represent deviation from a reference sample. Using argument method="Average" would be a better choice, but Gordon's solution is best (basically does what collapseRows would do on single-channel data - pick the probe with the maximum mean in the A channel).

ADD COMMENTlink modified 15 months ago • written 15 months ago by Peter Langfelder2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 334 users visited in the last hour