hello,
I have a problem with the oligo package and affys miRNA 4.0 GeneChips.
I load CEL-files (affymetrix, miRNA 4.0 GeneChips) and want extract the raw probelevel values.
dat1 <- read.cellfles(files)
Using probeNames(dat1) and getProbeInfo(dat1) I get 346085 probes.
But the exprs-slot (dat1@assayData$exprs) or if I extract the expressions ( exprs(dat1) ) shows only 292681 values with rownames from 1 to 292681. Which probes are missing or more important how can I annotate these 292681 values with the correct probenames or fid's (means: feature identifier?).
--- SESSION INFO ---
R version 3.2.2 (2015-08-14) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Scientific Linux release 6.7 (Carbon) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 [9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] limma_3.24.15 pd.mirna.4.0_3.12.0 RSQLite_1.0.0 [4] DBI_0.3.1 oligo_1.32.0 Biostrings_2.36.4 [7] XVector_0.8.0 IRanges_2.2.9 S4Vectors_0.6.6 [10] Biobase_2.28.0 oligoClasses_1.30.0 BiocGenerics_0.14.0 [13] rj_2.0.3-1 loaded via a namespace (and not attached): [1] affxparser_1.40.0 GenomicRanges_1.20.8 splines_3.2.2 [4] zlibbioc_1.14.0 bit_1.1-12 rj.gd_1.1.3-1 [7] foreach_1.4.3 GenomeInfoDb_1.4.3 tools_3.2.2 [10] ff_2.2-13 iterators_1.0.8 preprocessCore_1.30.0 [13] affyio_1.36.0 codetools_0.2-14 BiocInstaller_1.18.5

When making a comment, please use the 'ADD COMMENT' link rather than 'Add your answer'.
I think you probably want to query the underlying SQLite database directly.
> library(pd.mirna.4.0) > con <- db(pd.mirna.4.0) > z <- dbGetQuery(con, "select man_fsetid, fid from featureSet, pmfeature where pmfeature.fsetid=featureSet.fsetid") > head(z) man_fsetid fid 1 MIMAT0000001_st 31288 2 MIMAT0000001_st 32604 3 MIMAT0000001_st 89004 4 MIMAT0000001_st 149368 5 MIMAT0000001_st 150239 6 MIMAT0000001_st 211154 > dim(z) [1] 346085 2And if you want to know which are controls or main probes
> probesets <- dbGetQuery(con, "select * from featureSet;") > head(probesets) fsetid man_fsetid type 1 20500000 MIMAT0000001_st 1 2 20500001 MIMAT0015091_st 1 3 20500002 MIMAT0000002_st 1 4 20500003 MIMAT0015092_st 1 5 20500004 MIMAT0020301_st 1 6 20500005 MIMAT0000003_st 1 > table(probesets$type) 1 7 12 13 14 36205 95 27 17 9 > dbGetQuery(con, "select * from type_dict;") type type_id 1 1 main 2 2 main->junctions 3 3 main->psrs 4 4 main->rescue 5 5 control->affx 6 6 control->chip 7 7 control->bgp->antigenomic 8 8 control->bgp->genomic 9 9 normgene->exon 10 10 normgene->intron 11 11 rescue->FLmRNA->unmapped 12 12 control->affx->bac_spike 13 13 oligo_spike_in 14 14 r1_bac_spike_at 15 15 control->affx->polya_spike 16 16 control->affx->ercc 17 17 control->affx->ercc->stepThank you again James,
good to know how to access the db directly, but I have already this information. I only need the IDs for the raw exprs matrix in the ExpressionFeatureSet. Perhaps it is so easy, that I only miss something...
See below. I cannot assign the probe-info (n=346085) to the rows of the exprs matrix (n=292681), which shows only an index from 1 to 292681.
Unfortunately, in dat1 (ExpressionFeatureSet) there is no annotation to these 292681 rows in the exprs matrix.
I suppose it is obvious. The arrays are read by the scanner, by row, and when you read in the celfile it's in the same order. That is the fid.
The count is zero based, so the first cell read in is at (0,0). The first cell that is used for anything is in the first row (0 on the y axis) and the sixth column (5 on the x axis). So that probe is in the sixth row of the data.frame you get from doing
exprs(dat1).So your pINFO data.frame tells you the row index (fid), the probeset that particular probe goes into, and what type of probeset it is. That should be sufficient, no?
As a side note, there is no profit in accessing slots directly (e.g., dat1@assayData$exprs) - the
exprsfunction does the expected thing, and if Benilton changes the underlying structure, theexprsfunction will continue to do the expected thing, but direct queries may not.