How to compare 10 GSMxxxxx from one GSE to 10 GSM from another GSE with GEOQuery
1
0
Entering edit mode
@carinegenet-10909
Last seen 17 months ago
France

Hello,

I want to compare 10 GSM (GSM1200157 to GSM1200166) from GSE39589 and 10 GSM (GSM972638 to GSM972647) from GSE49505 (same platform, same species, and same tissue).

I followed the GEOquery manual page but was not able to set adequately the expr(gset) of the two .

Any advice will be greatly appreciate .

Thank in advance.

Below are R Script and R session info


# Version info: R 3.2.3, Biobase 2.30.0, GEOquery 2.40.0, limma 3.26.8
# R scripts generated  Thu Mar 16 10:16:33 EDT 2017

################################################################
#   Differential expression analysis with limma
library(Biobase)
#installation de "GEOquery"
#source("https://bioconductor.org/biocLite.R")
#biocLite()
#biocLite("GEOquery")
library(GEOquery)

library(limma)

# load series and platform data from GEO

gset1 <- getGEO("GSE39589", GSEMatrix =TRUE, AnnotGPL=TRUE)
if (length(gset1) > 1) idx <- grep("GPL2112", attr(gset1, "names")) else idx <- 1
gset1 <- gset1[[idx]]
gset2 <- getGEO("GSE49505", GSEMatrix =TRUE, AnnotGPL=TRUE)
if (length(gset2) > 1) idx <- grep("GPL2112", attr(gset2, "names")) else idx <- 1
gset2 <- gset2[[idx]]
# make proper column names to match toptable
fvarLabels(gset1) <- make.names(fvarLabels(gset1))
fvarLabels(gset2) <- make.names(fvarLabels(gset2))

# group names for all samples
gsms1 <- "1111111111XXXXXXXXXX"
gsms2 <-"XXXXX2222222222XXXXX"
sml1 <- c()
for (i in 1:nchar(gsms1)) { sml1[i] <- substr(gsms1,i,i) }
sml2 <- c()
for (i in 1:nchar(gsms2)) { sml2[i] <- substr(gsms2,i,i) }

# eliminate samples marked as "X"
sel1 <- which(sml1 != "X")
sml1 <- sml1[sel1]
gset1 <- gset1[ ,sel1]

# eliminate samples marked as "X"
sel2 <- which(sml2 != "X")
sml2 <- sml2[sel2]
gset2 <- gset2[ ,sel2]

# log2 transform
ex1 <- exprs(gset1)
qx1 <- as.numeric(quantile(ex1, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T))
LogC <- (qx1[5] > 100) ||
  (qx1[6]-qx1[1] > 50 && qx1[2] > 0) ||
  (qx1[2] > 0 && qx1[2] < 1 && qx1[4] > 1 && qx1[4] < 2)
if (LogC) { ex1[which(ex1 <= 0)] <- NaN
exprs(gset1) <- log2(ex1) }

# log2 transform
ex2 <- exprs(gset2)
qx2 <- as.numeric(quantile(ex2, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T))
LogC <- (qx2[5] > 100) ||
  (qx2[6]-qx2[1] > 50 && qx2[2] > 0) ||
  (qx2[2] > 0 && qx2[2] < 1 && qx2[4] > 1 && qx2[4] < 2)
if (LogC) { ex2[which(ex2 <= 0)] <- NaN
exprs(gset2) <- log2(ex2) }

gset<- c(exprs(gset1),exprs(gset2))

# set up the data and proceed with analysis
sml1b <- paste("G", sml1, sep="")    # set group names
fl <- as.factor(sml1b)
sml2b<- paste("G", sml2, sep="")
f2<- as.factor(sml2b)
f3<-as.factor(c(sml1b,sml2b))
gset$description <- f3


design <- model.matrix(~ description + 0, gset)
colnames(design) <- levels(f3)
fit <- lmFit(gset, design)*
Error in getEAWP(object) : data object isn't of a recognized data class

 

 

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] limma_3.30.12              GEOquery_2.40.0            DESeq2_1.14.1              SummarizedExperiment_1.4.0
 [5] Biobase_2.34.0             GenomicRanges_1.26.3       GenomeInfoDb_1.10.3        IRanges_2.8.1             
 [9] S4Vectors_0.12.1           BiocGenerics_0.20.0        BiocInstaller_1.24.0      

loaded via a namespace (and not attached):
 [1] genefilter_1.56.0    locfit_1.5-9.1       splines_3.3.3        lattice_0.20-34      colorspace_1.3-2    
 [6] htmltools_0.3.5      base64enc_0.1-3      survival_2.40-1      XML_3.98-1.5         foreign_0.8-67      
[11] DBI_0.6              BiocParallel_1.8.1   RColorBrewer_1.1-2   plyr_1.8.4           stringr_1.2.0       
[16] zlibbioc_1.20.0      munsell_0.4.3        gtable_0.2.0         htmlwidgets_0.8      memoise_1.0.0       
[21] latticeExtra_0.6-28  knitr_1.15.1         geneplotter_1.52.0   AnnotationDbi_1.36.2 htmlTable_1.9       
[26] Rcpp_0.12.9          acepack_1.4.1        xtable_1.8-2         scales_0.4.1         backports_1.0.5     
[31] checkmate_1.8.2      Hmisc_4.0-2          annotate_1.52.1      XVector_0.14.0       gridExtra_2.2.1     
[36] ggplot2_2.2.1        digest_0.6.12        stringi_1.1.2        grid_3.3.3           tools_3.3.3         
[41] bitops_1.0-6         magrittr_1.5         lazyeval_0.2.0       RCurl_1.95-4.8       tibble_1.2          
[46] RSQLite_1.1-2        Formula_1.2-1        cluster_2.0.6        Matrix_1.2-8         rsconnect_0.7       
[51] data.table_1.10.4    httr_1.2.1           assertthat_0.1       R6_2.2.0             rpart_4.1-10        
[56] nnet_7.3-12         
geoquery • 1.1k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 31 minutes ago
United States

You are doing several things wrong, most of which involve fighting the existing infrastructure that is intended to make your life easier.

The data you download are in ExpressionSets, which are fancy ways to encapsulate data in such a way that you can pretend as if they are data.frames that you can subset and combine without having to go through all the gyrations you are performing. In addition, you are taking logs of data that has already been log converted. You could do this analysis in very few steps:

> first <- getGEO("GSE39589")[[1]]
> second <- getGEO("GSE49505")[[1]]

## combine ExpressionSets, but justthe small columnar and rounded samples
 > eset <- combine(first[,1:10], second[,6:15])
## get rid of extra featureSet columns that we don't need
> fData(eset) <- fData(eset)[,c(1,18,8)]
## design matrix accounting for two batches of samples
> type <- factor(rep(c("Rounded","Columnar"), each = 5, times = 2))
> batch <- factor(rep(1:2, each = 10))
> design <- model.matrix(~type+batch)
## filter out low expressing and AFFX probes
> eset <- eset[rowMeans(exprs(eset), na.rm = TRUE) > 5,]
> eset <- eset[-grep("AFFX", featureNames(eset)),]
## fit model
> fit <- lmFit(eset, design)
> fit2 <- eBayes(fit)
> topTable(fit2, 2)
                                 ID ENTREZ_GENE_ID Gene.Symbol logFC AveExpr
Bt.8824.1.S1_at     Bt.8824.1.S1_at                             0.44   5.850
Bt.19645.1.A1_at   Bt.19645.1.A1_at                             0.86   5.300
Bt.27718.1.S1_at   Bt.27718.1.S1_at                             0.50   6.840
Bt.1993.3.A1_a_at Bt.1993.3.A1_a_at                             0.37   6.045
Bt.18252.1.A1_at   Bt.18252.1.A1_at                             0.33   5.325
Bt.17384.2.S1_at   Bt.17384.2.S1_at                             0.45   5.065
Bt.17150.1.A1_at   Bt.17150.1.A1_at                             0.46   5.480
Bt.19909.1.A1_at   Bt.19909.1.A1_at                             0.51   6.615
Bt.21987.1.S1_at   Bt.21987.1.S1_at                             0.38   6.640
Bt.11954.1.S1_at   Bt.11954.1.S1_at                             0.41   6.685
                         t     P.Value adj.P.Val         B
Bt.8824.1.S1_at   3.707264 0.001301586         1 -2.783709
Bt.19645.1.A1_at  3.232526 0.003982368         1 -3.148296
Bt.27718.1.S1_at  3.154172 0.004776595         1 -3.209269
Bt.1993.3.A1_a_at 3.121159 0.005155423         1 -3.234989
Bt.18252.1.A1_at  2.891316 0.008721984         1 -3.414158
Bt.17384.2.S1_at  2.835272 0.009898647         1 -3.457761
Bt.17150.1.A1_at  2.747234 0.012057845         1 -3.526084
Bt.19909.1.A1_at  2.645507 0.015108938         1 -3.604650
Bt.21987.1.S1_at  2.612535 0.016245273         1 -3.630002
Bt.11954.1.S1_at  2.589115 0.017100774         1 -3.647970

If you do a PCA plot of these data, you will see that there is pretty good overlap between the columnar and rounded samples, so I wouldn't normally expect to see much anyway.

ADD COMMENT
0
Entering edit mode

Thank a lot, James for your quick answer.

 

ADD REPLY

Login before adding your answer.

Traffic: 950 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6