Question

QC for Affymetrix miRNA 4.0 arrays: Error from qc/QCReport

0

Entering edit mode

federico.comoglio ▴ 100

@federicocomoglio-4524

Last seen 6.8 years ago

Switzerland

Hi,

I'm analyzing >30 Affymetrix miRNA 4.0 microarrays. As the corresponding miRNA 4.0 CDF is not available from BioC, I downloaded it from the Affymetrix website. I then created a CDF environment using make.cdf.env (makecdfenv package) and read in the data as:

rawData <- ReadAffy( )
rawData@cdfName <- 'mirna40'

The returned AffyBatch object seems perfectly fine to me. It has meaningful row.names and runs smoothly through rma (affy).

However, I would like to perform extensive QC for these data before proceding with differential expression analysis. To this end, I understand that the QCReport (affyQCReport package) and/or the qc (simpleaffy) functions are valuable options. Unfortunately, a call to either function currently raises the error below:

QCReport( rawData, file = 'QC.pdf' )

Error in ans[[i]][, i.probes] : subscript out of bounds

qc( rawData )
Error in ans[[i]][, i.probes] : subscript out of bounds

Debugging suggests that the error is generated by the signalDist function, but I was unable to go further.

I would really appreciate your help on this. Thanks a lot in advance.

Federico

sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] hgu95av2cdf_2.15.0   affydata_1.13.1      affyQCReport_1.44.0
 [4] lattice_0.20-29      BiocInstaller_1.16.1 makecdfenv_1.42.0
 [7] affyio_1.34.0        simpleaffy_2.42.0    gcrma_2.38.0
[10] genefilter_1.48.1    affy_1.44.0          Biobase_2.26.0
[13] BiocGenerics_0.12.1

loaded via a namespace (and not attached):
 [1] affyPLM_1.42.0        annotate_1.44.0       AnnotationDbi_1.28.1
 [4] Biostrings_2.34.1     DBI_0.3.1             GenomeInfoDb_1.2.4
 [7] grid_3.1.2            IRanges_2.0.1         preprocessCore_1.28.0
[10] RColorBrewer_1.1-2    RSQLite_1.0.0         S4Vectors_0.4.0
[13] splines_3.1.2         stats4_3.1.2          survival_2.37-7
[16] tools_3.1.2           XML_3.98-1.1          xtable_1.7-4
[19] XVector_0.6.0         zlibbioc_1.12.0

simpleaffy affy affyQCReport • 1.9k views

ADD COMMENT • link 9.2 years ago federico.comoglio ▴ 100

score 1 · Answer 1 · 2015-02-15

Both the simpleaffy and affyQCReport packages were designed with the original 3'-biased arrays in mind. The miRNA arrays don't have the same content, so both of these packages will tend to fail because the miRNA arrays do not fulfill the expectations that particular probesets will exist on the array.

In addition, the miRNA arrays are difficult to QC because in general most of the transcripts are either expressed at relatively low concentrations or not at all. And there is content on the array for any number of different species (and Affy may or may not re-use the same probes for different species, depending on conservation).

Add in the fact that miRNA transcripts are usually 21-23 nt long, and the Affy probes are 25 nt long (so each probe is usually longer than the transcript being measured, and the probeset is made up of the same probe, just distributed across the array), and things like the affyRNADeg() plot no longer make sense.

Long story short, you are pretty much on your own with these arrays.

score 0 · Answer 2 · 2015-02-16

0

Entering edit mode

federico.comoglio ▴ 100

@federicocomoglio-4524

Last seen 6.8 years ago

Switzerland

Hi Jim,

thank you for your insightful answer. I do agree with you that QC such as RNA degration do not make sense for these arrays. However, spike-in controls should be meaningful. In addition, even a simple boxplot of raw intensity values fail in a call to

boxplot( rawData )

raising the same error as above.

ADD COMMENT • link 9.2 years ago federico.comoglio ▴ 100

0

Entering edit mode

I have never used the affy package and the (unsupported) CDF file for these arrays, instead using oligo, which is much better suited.

> dat1 <- read.celfiles(filenames = samps$File[1:6])
Loading required package: pd.mirna.4.0
Loading required package: RSQLite
Loading required package: DBI
Platform design info loaded.
Reading in : ../CEL/A12258.CEL
Reading in : ../CEL/A10033.CEL
Reading in : ../CEL/Z08140.CEL
Reading in : ../CEL/Z08062.CEL
Reading in : ../CEL/A12263.CEL
Reading in : ../CEL/A10016.CEL
> boxplot(dat1)
Warning message:
'isIdCurrent' is deprecated.
Use 'dbIsValid' instead.
See help("Deprecated")

## the above warnings have to do with changes to the RSQLite package, and will not affect the analysis, and will go away in the next release

> dat1
ExpressionFeatureSet (storageMode: lockedEnvironment)
assayData: 292681 features, 6 samples
  element names: exprs
protocolData
  rowNames: A12258.CEL A10033.CEL ... A10016.CEL (6 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: A12258.CEL A10033.CEL ... A10016.CEL (6 total)
  varLabels: index
  varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.mirna.4.0

ADD REPLY • link 9.2 years ago James W. MacDonald 65k