Search
Question: Comparison of Column of AnnotatedDataFrame
1
2.2 years ago by
Dario Strbenac1.4k
Australia
Dario Strbenac1.4k wrote:

The documentation example of AnnotatedDataFrame are quite limited and only show how to coerce between data.frame and AnnotatedDataFrame. How can I subset a column of an AnnotatedDataFrame and check for equality to a particular value, without converting it to a data.frame first, for example ?

modified 2.2 years ago • written 2.2 years ago by Dario Strbenac1.4k

Sorry if I am missing the point but do you mean something different from this?:

library(Biobase)

tmp <- AnnotatedDataFrame(iris)

tmp[, 1] # select first column.
pData(tmp)[, 1] # check.

# select rows based on value of one column:
tmp[tmp$Sepal.Length > 7, 1]@data Sepal.Length 103 7.1 106 7.6 108 7.3 110 7.2 118 7.7 119 7.7 123 7.7 126 7.2 130 7.2 131 7.4 132 7.9 136 7.7 ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Diego Diez730 Yes, but shouldn't the usual column accessor work? > tmp[, "Sepal.Length"] > 7 Error in tmp[, "Sepal.Length"] > 7 : comparison (6) is possible only for atomic and list types ADD REPLYlink written 2.2 years ago by Dario Strbenac1.4k Ah I see. This is what I get: > tmp[, "Sepal.Length"] An object of class 'AnnotatedDataFrame' rowNames: 1 2 ... 150 (150 total) varLabels: Sepal.Length varMetadata: labelDescription Which explains at least why it is not working (you get similar for for tmp[1,]). I guess the idea is that subsetting with [ should return an AnnotatedDataFrame object, but accessing directly with $ gets you the values. I have no idea if this is the intended behavior.
1

The philosophy is that [ is an 'endomorphism' -- it returns the class as it is applied to. $and [[ are not. Also, use pData() rather than slot access, and (strongly) consider S4Vectors::DataFrame for a more modern implementation of the AnnotatedDataFrame concept. ADD REPLYlink written 2.2 years ago by Martin Morgan ♦♦ 22k Thanks! Why is pData() preferred over slot access? ADD REPLYlink written 2.2 years ago by Diego Diez730 1 The 'usual' reasons for object-oriented programming -- it separates the user-oriented interface from design considerations employed by the developer. Often not much divergence but for instance the slots (internal developer business) of a DNAStringSet have little to do with the interface designed for the user. ADD REPLYlink written 2.2 years ago by Martin Morgan ♦♦ 22k Sorry if I look persistent on this but I didn't consider using$ or [[ as slot access. But maybe I am mistaken? I see (at least) three ways to access the data in the example above:

tmp$Sepal.Length # use subsetting method. pData(tmp)$Sepal.Length # use accessor method then subsetting method.
tmp@data$Sepal.Length # use slot- bad. Maybe I misunderstood and when you said "pData() rather than slot access" you meant example 3 here? ADD REPLYlink written 2.2 years ago by Diego Diez730 2 Yes, I meant example 3; the @data in C: Comparison of Column of AnnotatedDataFrame is slot access. tmp$Sepal.Length; pData(tmp)$Sepal.Legnth; pData(tmp[,"Sepal.Length"])$Sepal.Length etc would be acceptable, as with [[.

I see! I completely forgot I used the slot to access the data for checking in my original example. Now I have no idea why I did that on the first place. I have updated that comment to avoid misleading potential readers. Thank you!

And this works also:

tmp[["Sepal.Length"]] > 7

One more thought. I guess this behavior is also consistent with that of data.frame(..., drop = FALSE). [ will always return a data.frame, whereas \$ and [[ return a vector.

It would be nice if there was a section of documentation titled Accessors.

1
2.2 years ago by
Dario Strbenac1.4k
Australia
Dario Strbenac1.4k wrote:
anAnnotatedDataFrame[["columnName"]] == value