Question

Compare Row.names and ID from GSE datasets

1

Entering edit mode

PyPer ▴ 20

@pyper-6819

Last seen 10.4 years ago

Australia

I am annotating data from a GSE dataset. I want to check that the row.names are equivalent to ID to ensure that there are no mistakes.

gse10072 <- getGEO('gse10072', GSEMATRIX=TRUE) g72 <- gse10072[[1]] total <- pData(featureData(g72))

t1 <- data.frame(row.names(total)) t2 <- data.frame(total$ID)

why is it that when I perform a comparison
identical (t1,t2)

the output is false?

I'm sure it seems trivial, but I would like to compare other columns in the future. Why do two seemingly identical data.frames appear to be different?

geoquery GSE identical row.names • 1.4k views

ADD COMMENT • link updated 10.5 years ago by Sean Davis 21k • written 10.5 years ago by PyPer ▴ 20

score 0 · Answer 1 · 2014-10-12

The row names are actually derived from the IDs, so there really isn't a need to check, but in case you wanted to:

> library(GEOquery)
> g72 <- getGEO('gse10072', GSEMatrix=TRUE)[[1]]
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE10nnn/GSE10072/matrix/
Found 1 file(s)
GSE10072_series_matrix.txt.gz
Using locally cached version: /var/folders/21/8t47kwys6vqb8606kdfn71780000gn/T//RtmpEmvD0e/GSE10072_series_matrix.txt.gz
Using locally cached version of GPL96 found here:
/var/folders/21/8t47kwys6vqb8606kdfn71780000gn/T//RtmpEmvD0e/GPL96.soft 
> total <- fData(g72)
> t1 <- row.names(total)
# Convert to character vector from factor!
> t2 <- as.character(total$ID)
> identical(t1,t2)
TRUE

Comparing a character vector to a factor (total$ID is a factor) will result in identical() returning FALSE. However, after converting to a character vector, identical() returns TRUE.