Question

Weird error after trying to subset an microarray expression set with gene symbols from topTable function from limma

0

Entering edit mode

svlachavas ▴ 830

@svlachavas-7225

Last seen 6 months ago

Germany/Heidelberg/German Cancer Resear…

Dear All,

i would like to adress a weird error appeared after acquiring from topTable() function from limma a DE gene list. In detail:

my initial expression set used for limma:

> eset.filtered
ExpressionSet (storageMode: environment)
assayData: 9002 features, 60 samples 
  element names: exprs 
protocolData: none
phenoData
  rowNames: St_1_WL57A.CEL St_2_WL57A.CEL ... 1554_03_Gemmer_1.CEL (60
    total)
  varLabels: Meta_factor Disease Location Study
  varMetadata: labelDescription
featureData
  featureNames: 10_at 100_at ... 9997_at (9002 total)
  fvarLabels: PROBEID SYMBOL ENTREZID
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation: hgu133ahsentrezg

a small output of my topTable:

> head(top2)
          PROBEID   SYMBOL ENTREZID      logFC  AveExpr          t      P.Value
10_at       10_at     NAT2       10 -0.9413907 8.182277 -3.6931419 0.0008417892
100_at     100_at      ADA      100  0.4684892 7.164424  2.9547326 0.0058992355
1000_at   1000_at     CDH2     1000  0.0229453 6.455099  0.3317122 0.7423179275
10000_at 10000_at     AKT3    10000 -0.2402645 6.583278 -2.4979502 0.0179511393
10001_at 10001_at     MED6    10001  0.2387839 7.441471  2.8580600 0.0075169910
10004_at 10004_at NAALADL1    10004 -0.6113054 6.120823 -3.6790795 0.0008747382
           adj.P.Val          B
10_at    0.003173277 -0.8449979
100_at   0.015482483 -2.6620277
1000_at  0.803552908 -6.4885956
10000_at 0.039377383 -3.6735306
10001_at 0.018959920 -2.8844877
10004_at 0.003259269 -0.8812204

as from the arguments of topTable i used to return all the rows-probesets of eset.filtered, then i tried(as i had similarly done in older cases) and also without any sorting, to subset my expressionset in the rows with gene symbols to use it to some other analyses:

top2 <- top2[!is.na(top2$SYMBOL),]

dim(top2)

[1] 9001    9

eset.3 <- eset.filtered[rownames(top2),]

But then, when i used

rownames(eset.3) <- top2$SYMBOL

the following error appeared:

Error in `sampleNames<-`(`*tmp*`, value = c("St_1_WL57A.CEL", "St_2_WL57A.CEL", :
number of new names (60) should equal number of rows in AnnotatedDataFrame (34)

But the weird thing is, that after the above function, indeed my probeset ids have been substituted with the gene symbols !! So, what is this weird error about ?

Also my sessionInfo()

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=Greek_Greece.1253  LC_CTYPE=Greek_Greece.1253   
[3] LC_MONETARY=Greek_Greece.1253 LC_NUMERIC=C                 
[5] LC_TIME=Greek_Greece.1253    

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets 
 [9] methods   base     

other attached packages:
 [1] hgu133ahsentrezg.db_19.0.0 org.Hs.eg.db_3.2.3         MineICA_1.10.0            
 [4] fastICA_1.2-0              Hmisc_3.17-1               Formula_1.2-1             
 [7] survival_2.38-3            lattice_0.20-33            annotate_1.48.0           
[10] XML_3.98-1.3               Rgraphviz_2.14.0           igraph_1.0.1              
[13] colorspace_1.2-6           RColorBrewer_1.1-2         mclust_5.1                
[16] marray_1.48.0              cluster_2.0.3              GOstats_2.36.0            
[19] graph_1.48.0               Category_2.36.0            GO.db_3.2.2               
[22] RSQLite_1.0.0              DBI_0.3.1                  AnnotationDbi_1.32.3      
[25] IRanges_2.4.6              S4Vectors_0.8.11           Matrix_1.2-3              
[28] gtools_3.5.0               biomaRt_2.26.1             xtable_1.8-0              
[31] foreach_1.4.3              scales_0.3.0               ggplot2_2.0.0             
[34] plyr_1.8.3                 JADE_1.9-93                limma_3.26.7              
[37] affy_1.48.0                Biobase_2.30.0             BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] genefilter_1.52.1      splines_3.2.2          RBGL_1.46.0           
 [4] foreign_0.8-66         affyio_1.40.0          zlibbioc_1.16.0       
 [7] munsell_0.4.2          gtable_0.1.2           codetools_0.2-14      
[10] latticeExtra_0.6-26    BiocInstaller_1.20.1   a4Base_1.18.0         
[13] preprocessCore_1.32.0  GSEABase_1.32.0        Rcpp_0.12.3           
[16] acepack_1.3-3.3        gridExtra_2.0.0        clue_0.3-51           
[19] tools_3.2.2            bitops_1.0-6           magrittr_1.5          
[22] RCurl_1.95-4.7         a4Core_1.18.0          iterators_1.0.8       
[25] rpart_4.1-10           AnnotationForge_1.12.2 compiler_3.2.2        
[28] nnet_7.3-12

Thank you !!

expressionset limma annotation affymetrix microarrays • 1.9k views

ADD COMMENT • link updated 8.2 years ago by Martin Morgan 25k • written 8.2 years ago by svlachavas ▴ 830

score 1 · Answer 1 · 2016-02-11

1

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 9 hours ago

United States

There doesn't seem to be a rownames<- method that dispatches on ExpressionSets (or more correctly, on the AssayData slot of an ExpressionSet). So you are probably using the function from base R, which is not likely to do what you expect.

There is a function called featureNames<- that is intended for this situation - is there a reason you aren't using that?

ADD COMMENT • link 8.2 years ago James W. MacDonald 65k

1

Entering edit mode

There is dimnames<-,eSet-method, and rownames<-,ANY-method ends up calling dimnames<-. Does validObject(eset.3) return TRUE? For instance, if I start with a valid ExpressionSet everything is fine

> data(sample.ExpressionSet)
> rownames(sample.ExpressionSet) = rownames(sample.ExpressionSet)

But if I somehow mess things up (by violating the ExpressionSet API, in this case)

> sample.ExpressionSet@phenoData = phenoData(sample.ExpressionSet)[1:5,]

Then things are broken as in the report

> rownames(sample.ExpressionSet) = rownames(sample.ExpressionSet)
Error in `sampleNames<-`(`*tmp*`, value = c("A", "B", "C", "D", "E")) : 
  'value' length (5) must equal sample number in AssayData (26)

validObject() shows the problem, though not how one got into this situation

> validObject(sample.ExpressionSet)
Error in validObject(sample.ExpressionSet) : 
  invalid class "ExpressionSet" object: 1: sample numbers differ between assayData and phenoData
invalid class "ExpressionSet" object: 2: sampleNames differ between assayData and phenoData
invalid class "ExpressionSet" object: 3: sample numbers differ between phenoData and protocolData
invalid class "ExpressionSet" object: 4: sampleNames differ between phenoData and protocolData

ADD REPLY • link 8.2 years ago Martin Morgan 25k

0

Entering edit mode

Dear Martin, also thank you for your valuable explanation !!! --- i used validObject directly on my eset.filtered from the beggining:

> validObject(eset.filtered)
Error in validObject(eset.filtered) :
  invalid class “ExpressionSet” object: 1: sample numbers differ between phenoData and protocolData
invalid class “ExpressionSet” object: 2: sampleNames differ between phenoData and protocolData

So, i guess that probably it has something to do with the fact that i used the function combineTwoExpressionSet()

from the a4base package to merge two affymetrix datasets:

?combineTwoExpressionSet---in the details in mentions:

"exprs and pData are merged. Other data (such as MIAME or annotation) are those of x"

ADD REPLY • link 8.2 years ago svlachavas ▴ 830

0

Entering edit mode

Dear James, thank you for your notification !! i used featureNames() instead of rownames() and did not returned an error !!

Nevertheless, normaly i use featureNames(), but due to "lazyness" lately i used a couple of times rownames() but without an error !! but i guess it has probably something to do with the fact that my eset.filtered is a merged eset--

ADD REPLY • link 8.2 years ago svlachavas ▴ 830