Question: The number of features of Expression Set for Affymetrix miRNA 4.0 data preprocsessed by oligo package
0
gravatar for enthelesia
4.2 years ago by
enthelesia0
Korea, Republic Of
enthelesia0 wrote:

I am analyzing affy miRNA 4.0 data for the first time. Using oligo package, I executed the following script.

library(oligo)
rawData<-read.celfiles(list.celfiles())
e<-rma(rawData)

I got the following results: 

ExpressionSet (storageMode: lockedEnvironment)
assayData: 36353 features, 14 samples 
  element names: exprs 
protocolData
  rowNames: 150319_1_S14_25318_(miRNA-4_0).CEL
    150319_10_S12_12718_(miRNA-4_0).CEL ...
    150319_9_S11_41323_(miRNA-4_0).CEL (14 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: 150319_1_S14_25318_(miRNA-4_0).CEL
    150319_10_S12_12718_(miRNA-4_0).CEL ...
    150319_9_S11_41323_(miRNA-4_0).CEL (14 total)
  varLabels: index
  varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.mirna.4.0 

As far as I know, the number of probes of miRNA 4.0 is 36249, but the number of features of expression set generated after background correction, normalization, and summarisation was 36353. When I tried with affy package, the feature number was same as the probe number. 104 additional features were 

[1] "AFFX-BioB-3_at"      "AFFX-BioB-5_at"      "AFFX-BioB-M_at"     
  [4] "AFFX-BioC-3_at"      "AFFX-BioC-5_at"      "AFFX-BioDn-3_at"    
  [7] "AFFX-BioDn-5_at"     "AFFX-BkGr-GC03_st"   "AFFX-BkGr-GC04_st"  
 [10] "AFFX-BkGr-GC05_st"   "AFFX-BkGr-GC06_st"   "AFFX-BkGr-GC07_st"  
 [13] "AFFX-BkGr-GC08_st"   "AFFX-BkGr-GC09_st"   "AFFX-BkGr-GC10_st"  
 [16] "AFFX-BkGr-GC11_st"   "AFFX-BkGr-GC12_st"   "AFFX-BkGr-GC13_st"  
 [19] "AFFX-BkGr-GC14_st"   "AFFX-BkGr-GC15_st"   "AFFX-BkGr-GC16_st"  
 [22] "AFFX-BkGr-GC17_st"   "AFFX-BkGr-GC18_st"   "AFFX-BkGr-GC19_st"  
 [25] "AFFX-BkGr-GC20_st"   "AFFX-BkGr-GC21_st"   "AFFX-BkGr-GC22_st"  
 [28] "AFFX-BkGr-GC23_st"   "AFFX-BkGr-GC24_st"   "AFFX-BkGr-GC25_st"  
 [31] "AFFX-BkGr17-GC03_st" "AFFX-BkGr17-GC04_st" "AFFX-BkGr17-GC05_st"
 [34] "AFFX-BkGr17-GC06_st" "AFFX-BkGr17-GC07_st" "AFFX-BkGr17-GC08_st"
 [37] "AFFX-BkGr17-GC09_st" "AFFX-BkGr17-GC10_st" "AFFX-BkGr17-GC11_st"
 [40] "AFFX-BkGr17-GC12_st" "AFFX-BkGr17-GC13_st" "AFFX-BkGr17-GC14_st"
 [43] "AFFX-BkGr17-GC15_st" "AFFX-BkGr17-GC16_st" "AFFX-BkGr17-GC17_st"
 [46] "AFFX-BkGr19-GC03_st" "AFFX-BkGr19-GC04_st" "AFFX-BkGr19-GC05_st"
 [49] "AFFX-BkGr19-GC06_st" "AFFX-BkGr19-GC07_st" "AFFX-BkGr19-GC08_st"
 [52] "AFFX-BkGr19-GC09_st" "AFFX-BkGr19-GC10_st" "AFFX-BkGr19-GC11_st"
 [55] "AFFX-BkGr19-GC12_st" "AFFX-BkGr19-GC13_st" "AFFX-BkGr19-GC14_st"
 [58] "AFFX-BkGr19-GC15_st" "AFFX-BkGr19-GC16_st" "AFFX-BkGr19-GC17_st"
 [61] "AFFX-BkGr19-GC18_st" "AFFX-BkGr19-GC19_st" "AFFX-BkGr21-GC03_st"
 [64] "AFFX-BkGr21-GC04_st" "AFFX-BkGr21-GC05_st" "AFFX-BkGr21-GC06_st"
 [67] "AFFX-BkGr21-GC07_st" "AFFX-BkGr21-GC08_st" "AFFX-BkGr21-GC09_st"
 [70] "AFFX-BkGr21-GC10_st" "AFFX-BkGr21-GC11_st" "AFFX-BkGr21-GC12_st"
 [73] "AFFX-BkGr21-GC13_st" "AFFX-BkGr21-GC14_st" "AFFX-BkGr21-GC15_st"
 [76] "AFFX-BkGr21-GC16_st" "AFFX-BkGr21-GC17_st" "AFFX-BkGr21-GC18_st"
 [79] "AFFX-BkGr21-GC19_st" "AFFX-BkGr21-GC20_st" "AFFX-BkGr21-GC21_st"
 [82] "AFFX-BkGr23-GC03_st" "AFFX-BkGr23-GC04_st" "AFFX-BkGr23-GC05_st"
 [85] "AFFX-BkGr23-GC06_st" "AFFX-BkGr23-GC07_st" "AFFX-BkGr23-GC08_st"
 [88] "AFFX-BkGr23-GC09_st" "AFFX-BkGr23-GC10_st" "AFFX-BkGr23-GC11_st"
 [91] "AFFX-BkGr23-GC12_st" "AFFX-BkGr23-GC13_st" "AFFX-BkGr23-GC14_st"
 [94] "AFFX-BkGr23-GC15_st" "AFFX-BkGr23-GC16_st" "AFFX-BkGr23-GC17_st"
 [97] "AFFX-BkGr23-GC18_st" "AFFX-BkGr23-GC19_st" "AFFX-BkGr23-GC20_st"
[100] "AFFX-BkGr23-GC21_st" "AFFX-BkGr23-GC22_st" "AFFX-BkGr23-GC23_st"
[103] "AFFX-CreX-3_at"      "AFFX-CreX-5_at"   

 

ADD COMMENTlink modified 4.2 years ago by James W. MacDonald50k • written 4.2 years ago by enthelesia0
Answer: The number of features of Expression Set for Affymetrix miRNA 4.0 data preprocse
0
gravatar for James W. MacDonald
4.2 years ago by
United States
James W. MacDonald50k wrote:

When you say 'affy package' do you mean the Bioconductor affy package? If so, you don't want to use that package to analyze these data. The problem with this array is that a given probe may be used multiple times, in various probesets.

For example, let's say there are 30 miR-155 miRNA probesets on the array (that is a made up number, BTW), for 30 different species. Let's also assume that miR-155 is highly conserved, and has the identical sequence for all of those species. In order to save space, Affy will just tile down 11 or so probes (rather than 330), and then just map all the different species to those 11 probes. So cel-miR-155 will use the exact same probes as mmu-miR-155, and hsa-miR-155.

The oligo package understands this, and has no problem with the re-use of probes in different probesets. So if we look at the number of probes that are in each probeset for the pd.mir.4.0 package, we get this:

> pd.data <- dbGetQuery(con, "select * from pmfeature inner join featureSet using(fsetid);")
> pd.lst <- split(pd.data, pd.data$man_fsetid)
> table(sapply(pd.lst, nrow))

    8     9    10    11    25    40    50    67    73    88    89    90    91
   67 30483    78  5626     1    10     2     1     1     1     2     1     1
   92    94
    1    78

So the smallest probeset has 8 probes, the vast majority have 9 or 11, and there are a few that have lots more than that. But let's look at what you would get from the cdfenv built from the CDF file that Affy supplies:

> mirna40cdf <- make.cdf.env("miRNA-4_0-st-v1.cdf")
Reading CDF file.
Creating CDF environment
Wait for about 251 dots...................................................................................................................................................................................................................................................................
> zzz <- as.list(mirna40cdf)
> table(sapply(zzz, nrow))

    1     2     3     4     5     6     7     8     9    10    11
  340   177   106    59    62    58    63   102 19735   145  4218

Whoops! That's a bit different, and it's due to the fact that makecdfenv package doesn't expect that ANY probes will get re-used in different probesets, so when it's reading the CDF file and it sees a re-used probe, it just ignores it. So any Affy array that uses probes in more than one probeset will not work with the makecdfenv/affy pipeline. Instead you have to use the pdInfoBuilder/oligo pipeline.

Also please note that the files that are used to generate the pd.mir.4.0 package do not contain information for the background probes that you show are missing. In other words, the pd.mir.4.0 package uses the data that Affy supply to map probes to probesets. They don't given any information about those probes in the files that are used by pdInfoBuilder, so it's not an issue of us ignoring/removing/deleting data, but Affy not giving any information in the files that we use to make the package. If it's critical to your purposes to have these particular probes in your data set, then you might try the Affy miRNA QC tool, which may return data for them.

 

ADD COMMENTlink written 4.2 years ago by James W. MacDonald50k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 253 users visited in the last hour