Question: Compare Row.names and ID from GSE datasets
4.7 years ago by
PyPer20 wrote:

I am annotating data from a GSE dataset. I want to check that the row.names are equivalent to ID to ensure that there are no mistakes.

gse10072 <- getGEO('gse10072', GSEMATRIX=TRUE)
g72 <- gse10072[[1]]
total <- pData(featureData(g72))

t1 <- data.frame(row.names(total))
t2 <- data.frame(total$ID)

why is it that when I perform a comparison
identical (t1,t2)

the output is false?

I'm sure it seems trivial, but I would like to compare other columns in the future. Why do two seemingly identical data.frames appear to be different?


Answer: Compare Row.names and ID from GSE datasets
4.7 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:

The row names are actually derived from the IDs, so there really isn't a need to check, but in case you wanted to:

> library(GEOquery)
> g72 <- getGEO('gse10072', GSEMatrix=TRUE)[[1]]
Found 1 file(s)
Using locally cached version: /var/folders/21/8t47kwys6vqb8606kdfn71780000gn/T//RtmpEmvD0e/GSE10072_series_matrix.txt.gz
Using locally cached version of GPL96 found here:
> total <- fData(g72)
> t1 <- row.names(total)
# Convert to character vector from factor!
> t2 <- as.character(total$ID)
> identical(t1,t2)

Comparing a character vector to a factor (total$ID is a factor) will result in identical() returning FALSE.  However, after converting to a character vector, identical() returns TRUE.

