Question

Reading and normalizing two-color Agilent microarrays

0

Entering edit mode

dr ▴ 10

@dr-9473

Last seen 12 months ago

United States

Hi,

I have a mouse cell-line time-course gene expression data measured on two-color Agilent microarrays.

At time = 0, RNA was purified from 4 cell-lines. Following that, all cell lines were treated with a drug, and then RNA was purified over 10 time points. The reference channel for each cell line (Cy5 - red) is the time = 0 respective sample.

I'm trying to use limma for analyzing the data.

I've followed these steps:

1. Created a targets data.frame (targets.df), in which a FileName column species the microarray file path, a Cy5 column specifies the time = 0 sample names , a Cy3 column specifies all other time > 0 sample names, and a Date column which specifies the scan date.

2. I then read the microarrays using this command:

array.list <- limma::read.maimages(targets.df, source = "agilent.median")

3. I then do the background correction using this command:

bg.corrected.array.list <- limma::backgroundCorrect(array.list, method = "normexp", offset = 16)

4. I then do within array normalization using this command:

within.normalized.array.list <- limma::normalizeWithinArrays(bg.corrected.array.list, method = "loess")

5. I then do between array normalization using this command:

between.normalized.array.list <- limma::normalizeBetweenArrays(within.normalized.array.list, method = "Aquantile")

6. I then read my GEO format annotations for the microarray (using simple read.table), for adding the genes annotation data.frame to the between.normalized.array.list. During this step I filter out the control probes (annotated as CONTROL_TYPE = TRUE in my annotations file).

7. I then collapse the probes to genes using WGCNA::collapseRows. For this I set the rownames of between.normalized.array.list$M and between.normalized.array.list$A to between.normalized.array.list$genes$ID (probe ID), and run:

collapsed.probes.list <- WGCNA::collapseRows(datET = between.normalized.array.list$M, rowGroup = between.normalized.array.list$genes$gene_id, rowID = between.normalized.array.list$genes$ID)

6. I then override between.normalized.array.list$M, between.normalized.array.list$A, and between.normalized.array.list$genes, with the collapsed.probes.list output :

between.normalized.array.list$M <- between.normalized.array.list$M[which(collapsed.probes.list$selectedRow),] between.normalized.array.list$A <- between.normalized.array.list$A[which(collapsed.probes.list$selectedRow),] between.normalized.array.list$genes <- between.normalized.array.list$genes[which(collapsed.probes.list$selectedRow),] %>% dplyr::select(gene_id, symbol))) rownames(between.normalized.array.list$M) <- NULL rownames(between.normalized.array.list$A) <- NULL

I then looked at a specific gene of interest - plotting its between.normalized.array.list$M values vs. time and it looks the opposite from what I expected - the expression goes up with time rather than down.

Is because I swapped the dyes in my microarrays and the expected reference channel should be Cy3 rather than Cy5?

As far as I understand this cannot be corrected by limma::read.maimages. Any idea how to deal with this?

Thanks a lot,

> sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS: /sw/R/R-3.4.3-install/lib64/R/lib/libRblas.so LAPACK: /sw/R/R-3.4.3-install/lib64/R/lib/libRlapack.so

locale: [1] C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] bindrcpp_0.2.2 dplyr_0.7.5

loaded via a namespace (and not attached): [1] bitops_1.0-6 matrixStats_0.51.0 [3] bit64_0.9-7 RColorBrewer_1.1-2 [5] doParallel_1.0.11 httr_1.3.1 [7] prabclus_2.2-6 GenomeInfoDb_1.13.4 [9] dynamicTreeCut_1.63-1 backports_1.1.1 [11] tools_3.4.3 R6_2.2.2 [13] rpart_4.1-13 Hmisc_4.0-3 [15] DBI_0.7 lazyeval_0.2.1 [17] BiocGenerics_0.23.4 colorspace_1.2-7 [19] trimcluster_0.1-2 nnet_7.3-12 [21] tidyselect_0.2.4 gridExtra_2.3 [23] parsingUtils_0.1.0 preprocessCore_1.40.0 [25] bit_1.1-12 compiler_3.4.3 [27] WGCNA_1.51 Biobase_2.38.0 [29] htmlTable_1.9 DelayedArray_0.4.1 [31] plotly_4.7.1.9000 rtracklayer_1.37.3 [33] checkmate_1.8.5 diptest_0.75-7 [35] scales_0.5.0.9000 DEoptimR_1.0-8 [37] mvtnorm_1.0-6 robustbase_0.92-8 [39] stringr_1.3.0 digest_0.6.15 [41] Rsamtools_1.26.1 foreign_0.8-67 [43] XVector_0.17.0 base64enc_0.1-3 [45] pkgconfig_2.0.1 htmltools_0.3.6 [47] limma_3.34.9 htmlwidgets_1.2 [49] rlang_0.2.1.9000 impute_1.52.0 [51] RSQLite_2.0 VGAM_1.0-5 [53] bindr_0.1.1 jsonlite_1.5 [55] mclust_5.4 BiocParallel_1.10.1 [57] acepack_1.4.1 dendextend_1.5.2 [59] analysisUtils_0.0.0.9000 RCurl_1.95-4.8 [61] magrittr_1.5 modeltools_0.2-21 [63] GO.db_3.5.0 GenomeInfoDbData_0.99.1 [65] Formula_1.2-2 coneproj_1.12 [67] Matrix_1.2-12 Rcpp_0.12.18 [69] munsell_0.4.3 S4Vectors_0.15.5 [71] viridis_0.5.1 stringi_1.2.2 [73] whisker_0.3-2 yaml_2.1.19 [75] MASS_7.3-50 SummarizedExperiment_1.8.0 [77] zlibbioc_1.20.0 flexmix_2.3-13 [79] plyr_1.8.4 grid_3.4.3 [81] blob_1.1.0 parallel_3.4.3 [83] lattice_0.20-34 Biostrings_2.45.3 [85] splines_3.4.3 knitr_1.17 [87] pillar_1.1.0 fastcluster_1.1.22 [89] GenomicRanges_1.29.15 fpc_2.1-10 [91] codetools_0.2-15 stats4_3.4.3 [93] XML_3.98-1.9 glue_1.2.0 [95] latticeExtra_0.6-28 data.table_1.11.4 [97] foreach_1.4.4 gtable_0.2.0 [99] purrr_0.2.4 tidyr_0.8.1 [101] kernlab_0.9-25 assertthat_0.2.0 [103] ggplot2_3.0.0.9000 class_7.3-14 [105] survival_2.40-1 viridisLite_0.3.0 [107] tibble_1.4.2 iterators_1.0.10 [109] GenomicAlignments_1.10.0 AnnotationDbi_1.40.0 [111] memoise_1.1.0 IRanges_2.11.19 [113] cluster_2.0.6

limma agilent microarrays • 1.3k views

ADD COMMENT • link updated 5.7 years ago by Peter Langfelder ★ 3.0k • written 5.7 years ago by dr ▴ 10

1

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 22 days ago

United States

Regarding WGCNA::collapseRows for two-color data, Gordon is correct that the default setting (method = "MaxMean") is a poor choice for two-color data, or, for that matter, any data that represent deviation from a reference sample. Using argument method="Average" would be a better choice, but Gordon's solution is best (basically does what collapseRows would do on single-channel data - pick the probe with the maximum mean in the A channel).

ADD COMMENT • link 5.7 years ago Peter Langfelder ★ 3.0k

score 4 · Accepted Answer · 2018-08-22

limma defines M-values to be Cy5-Cy3 so, naturally, if you put your reference samples on Cy5 instead of Cy3 then all the M-values will be minus what they would have been had you used Cy3 for the reference samples.

I don't see any reason why that should be a problem. Why does it cause you a problem that the M-values decrease over time for some genes instead of going up? You'll still get the same results in the end. If it does bother you, then just multiple the M matrix by -1.

By the way, WGCNA::collapseRows is not a good way to collapse two-color data down to genes because it's designed for single-channel data. A much better way is:

x <- between.normalized.array.list
Amean <- rowMeans(x$A)
o <- order(Amean, decreasing=TRUE)
x <- x[o,]
dup <- duplicated(x$genes$gene_id)
x <- x[!dup]