The number of features of Expression Set for Affymetrix miRNA 4.0 data preprocsessed by oligo package
1
0
Entering edit mode
enthelesia • 0
@enthelesia-8259
Last seen 7.3 years ago
Korea, Republic Of

I am analyzing affy miRNA 4.0 data for the first time. Using oligo package, I executed the following script.

library(oligo)
e<-rma(rawData)

I got the following results:

ExpressionSet (storageMode: lockedEnvironment)
assayData: 36353 features, 14 samples
element names: exprs
protocolData
rowNames: 150319_1_S14_25318_(miRNA-4_0).CEL
150319_10_S12_12718_(miRNA-4_0).CEL ...
150319_9_S11_41323_(miRNA-4_0).CEL (14 total)
varLabels: exprs dates
phenoData
rowNames: 150319_1_S14_25318_(miRNA-4_0).CEL
150319_10_S12_12718_(miRNA-4_0).CEL ...
150319_9_S11_41323_(miRNA-4_0).CEL (14 total)
varLabels: index
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.mirna.4.0 

As far as I know, the number of probes of miRNA 4.0 is 36249, but the number of features of expression set generated after background correction, normalization, and summarisation was 36353. When I tried with affy package, the feature number was same as the probe number. 104 additional features were

[1] "AFFX-BioB-3_at"      "AFFX-BioB-5_at"      "AFFX-BioB-M_at"
[4] "AFFX-BioC-3_at"      "AFFX-BioC-5_at"      "AFFX-BioDn-3_at"
[7] "AFFX-BioDn-5_at"     "AFFX-BkGr-GC03_st"   "AFFX-BkGr-GC04_st"
[10] "AFFX-BkGr-GC05_st"   "AFFX-BkGr-GC06_st"   "AFFX-BkGr-GC07_st"
[13] "AFFX-BkGr-GC08_st"   "AFFX-BkGr-GC09_st"   "AFFX-BkGr-GC10_st"
[16] "AFFX-BkGr-GC11_st"   "AFFX-BkGr-GC12_st"   "AFFX-BkGr-GC13_st"
[19] "AFFX-BkGr-GC14_st"   "AFFX-BkGr-GC15_st"   "AFFX-BkGr-GC16_st"
[22] "AFFX-BkGr-GC17_st"   "AFFX-BkGr-GC18_st"   "AFFX-BkGr-GC19_st"
[25] "AFFX-BkGr-GC20_st"   "AFFX-BkGr-GC21_st"   "AFFX-BkGr-GC22_st"
[28] "AFFX-BkGr-GC23_st"   "AFFX-BkGr-GC24_st"   "AFFX-BkGr-GC25_st"
[31] "AFFX-BkGr17-GC03_st" "AFFX-BkGr17-GC04_st" "AFFX-BkGr17-GC05_st"
[34] "AFFX-BkGr17-GC06_st" "AFFX-BkGr17-GC07_st" "AFFX-BkGr17-GC08_st"
[37] "AFFX-BkGr17-GC09_st" "AFFX-BkGr17-GC10_st" "AFFX-BkGr17-GC11_st"
[40] "AFFX-BkGr17-GC12_st" "AFFX-BkGr17-GC13_st" "AFFX-BkGr17-GC14_st"
[43] "AFFX-BkGr17-GC15_st" "AFFX-BkGr17-GC16_st" "AFFX-BkGr17-GC17_st"
[46] "AFFX-BkGr19-GC03_st" "AFFX-BkGr19-GC04_st" "AFFX-BkGr19-GC05_st"
[49] "AFFX-BkGr19-GC06_st" "AFFX-BkGr19-GC07_st" "AFFX-BkGr19-GC08_st"
[52] "AFFX-BkGr19-GC09_st" "AFFX-BkGr19-GC10_st" "AFFX-BkGr19-GC11_st"
[55] "AFFX-BkGr19-GC12_st" "AFFX-BkGr19-GC13_st" "AFFX-BkGr19-GC14_st"
[58] "AFFX-BkGr19-GC15_st" "AFFX-BkGr19-GC16_st" "AFFX-BkGr19-GC17_st"
[61] "AFFX-BkGr19-GC18_st" "AFFX-BkGr19-GC19_st" "AFFX-BkGr21-GC03_st"
[64] "AFFX-BkGr21-GC04_st" "AFFX-BkGr21-GC05_st" "AFFX-BkGr21-GC06_st"
[67] "AFFX-BkGr21-GC07_st" "AFFX-BkGr21-GC08_st" "AFFX-BkGr21-GC09_st"
[70] "AFFX-BkGr21-GC10_st" "AFFX-BkGr21-GC11_st" "AFFX-BkGr21-GC12_st"
[73] "AFFX-BkGr21-GC13_st" "AFFX-BkGr21-GC14_st" "AFFX-BkGr21-GC15_st"
[76] "AFFX-BkGr21-GC16_st" "AFFX-BkGr21-GC17_st" "AFFX-BkGr21-GC18_st"
[79] "AFFX-BkGr21-GC19_st" "AFFX-BkGr21-GC20_st" "AFFX-BkGr21-GC21_st"
[82] "AFFX-BkGr23-GC03_st" "AFFX-BkGr23-GC04_st" "AFFX-BkGr23-GC05_st"
[85] "AFFX-BkGr23-GC06_st" "AFFX-BkGr23-GC07_st" "AFFX-BkGr23-GC08_st"
[88] "AFFX-BkGr23-GC09_st" "AFFX-BkGr23-GC10_st" "AFFX-BkGr23-GC11_st"
[91] "AFFX-BkGr23-GC12_st" "AFFX-BkGr23-GC13_st" "AFFX-BkGr23-GC14_st"
[94] "AFFX-BkGr23-GC15_st" "AFFX-BkGr23-GC16_st" "AFFX-BkGr23-GC17_st"
[97] "AFFX-BkGr23-GC18_st" "AFFX-BkGr23-GC19_st" "AFFX-BkGr23-GC20_st"
[100] "AFFX-BkGr23-GC21_st" "AFFX-BkGr23-GC22_st" "AFFX-BkGr23-GC23_st"
[103] "AFFX-CreX-3_at"      "AFFX-CreX-5_at"   

affy oligo microarray preprocessing microrna • 1.2k views
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

When you say 'affy package' do you mean the Bioconductor affy package? If so, you don't want to use that package to analyze these data. The problem with this array is that a given probe may be used multiple times, in various probesets.

For example, let's say there are 30 miR-155 miRNA probesets on the array (that is a made up number, BTW), for 30 different species. Let's also assume that miR-155 is highly conserved, and has the identical sequence for all of those species. In order to save space, Affy will just tile down 11 or so probes (rather than 330), and then just map all the different species to those 11 probes. So cel-miR-155 will use the exact same probes as mmu-miR-155, and hsa-miR-155.

The oligo package understands this, and has no problem with the re-use of probes in different probesets. So if we look at the number of probes that are in each probeset for the pd.mir.4.0 package, we get this:

> pd.data <- dbGetQuery(con, "select * from pmfeature inner join featureSet using(fsetid);")
> pd.lst <- split(pd.data, pd.data\$man_fsetid)
> table(sapply(pd.lst, nrow))

8     9    10    11    25    40    50    67    73    88    89    90    91
67 30483    78  5626     1    10     2     1     1     1     2     1     1
92    94
1    78

So the smallest probeset has 8 probes, the vast majority have 9 or 11, and there are a few that have lots more than that. But let's look at what you would get from the cdfenv built from the CDF file that Affy supplies:

> mirna40cdf <- make.cdf.env("miRNA-4_0-st-v1.cdf")
Creating CDF environment
> zzz <- as.list(mirna40cdf)
> table(sapply(zzz, nrow))

1     2     3     4     5     6     7     8     9    10    11
340   177   106    59    62    58    63   102 19735   145  4218

Whoops! That's a bit different, and it's due to the fact that makecdfenv package doesn't expect that ANY probes will get re-used in different probesets, so when it's reading the CDF file and it sees a re-used probe, it just ignores it. So any Affy array that uses probes in more than one probeset will not work with the makecdfenv/affy pipeline. Instead you have to use the pdInfoBuilder/oligo pipeline.

Also please note that the files that are used to generate the pd.mir.4.0 package do not contain information for the background probes that you show are missing. In other words, the pd.mir.4.0 package uses the data that Affy supply to map probes to probesets. They don't given any information about those probes in the files that are used by pdInfoBuilder, so it's not an issue of us ignoring/removing/deleting data, but Affy not giving any information in the files that we use to make the package. If it's critical to your purposes to have these particular probes in your data set, then you might try the Affy miRNA QC tool, which may return data for them.