Search
Question: How to compare 10 GSMxxxxx from one GSE to 10 GSM from another GSE with GEOQuery
0
gravatar for carine.genet
8 months ago by
carine.genet0 wrote:

Hello,

I want to compare 10 GSM (GSM1200157 to GSM1200166) from GSE39589 and 10 GSM (GSM972638 to GSM972647) from GSE49505 (same platform, same species, and same tissue).

I followed the GEOquery manual page but was not able to set adequately the expr(gset) of the two .

Any advice will be greatly appreciate .

Thank in advance.

Below are R Script and R session info


# Version info: R 3.2.3, Biobase 2.30.0, GEOquery 2.40.0, limma 3.26.8
# R scripts generated  Thu Mar 16 10:16:33 EDT 2017

################################################################
#   Differential expression analysis with limma
library(Biobase)
#installation de "GEOquery"
#source("https://bioconductor.org/biocLite.R")
#biocLite()
#biocLite("GEOquery")
library(GEOquery)

library(limma)

# load series and platform data from GEO

gset1 <- getGEO("GSE39589", GSEMatrix =TRUE, AnnotGPL=TRUE)
if (length(gset1) > 1) idx <- grep("GPL2112", attr(gset1, "names")) else idx <- 1
gset1 <- gset1[[idx]]
gset2 <- getGEO("GSE49505", GSEMatrix =TRUE, AnnotGPL=TRUE)
if (length(gset2) > 1) idx <- grep("GPL2112", attr(gset2, "names")) else idx <- 1
gset2 <- gset2[[idx]]
# make proper column names to match toptable
fvarLabels(gset1) <- make.names(fvarLabels(gset1))
fvarLabels(gset2) <- make.names(fvarLabels(gset2))

# group names for all samples
gsms1 <- "1111111111XXXXXXXXXX"
gsms2 <-"XXXXX2222222222XXXXX"
sml1 <- c()
for (i in 1:nchar(gsms1)) { sml1[i] <- substr(gsms1,i,i) }
sml2 <- c()
for (i in 1:nchar(gsms2)) { sml2[i] <- substr(gsms2,i,i) }

# eliminate samples marked as "X"
sel1 <- which(sml1 != "X")
sml1 <- sml1[sel1]
gset1 <- gset1[ ,sel1]

# eliminate samples marked as "X"
sel2 <- which(sml2 != "X")
sml2 <- sml2[sel2]
gset2 <- gset2[ ,sel2]

# log2 transform
ex1 <- exprs(gset1)
qx1 <- as.numeric(quantile(ex1, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T))
LogC <- (qx1[5] > 100) ||
  (qx1[6]-qx1[1] > 50 && qx1[2] > 0) ||
  (qx1[2] > 0 && qx1[2] < 1 && qx1[4] > 1 && qx1[4] < 2)
if (LogC) { ex1[which(ex1 <= 0)] <- NaN
exprs(gset1) <- log2(ex1) }

# log2 transform
ex2 <- exprs(gset2)
qx2 <- as.numeric(quantile(ex2, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T))
LogC <- (qx2[5] > 100) ||
  (qx2[6]-qx2[1] > 50 && qx2[2] > 0) ||
  (qx2[2] > 0 && qx2[2] < 1 && qx2[4] > 1 && qx2[4] < 2)
if (LogC) { ex2[which(ex2 <= 0)] <- NaN
exprs(gset2) <- log2(ex2) }

gset<- c(exprs(gset1),exprs(gset2))

# set up the data and proceed with analysis
sml1b <- paste("G", sml1, sep="")    # set group names
fl <- as.factor(sml1b)
sml2b<- paste("G", sml2, sep="")
f2<- as.factor(sml2b)
f3<-as.factor(c(sml1b,sml2b))
gset$description <- f3


design <- model.matrix(~ description + 0, gset)
colnames(design) <- levels(f3)
fit <- lmFit(gset, design)*
Error in getEAWP(object) : data object isn't of a recognized data class

 

 

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] limma_3.30.12              GEOquery_2.40.0            DESeq2_1.14.1              SummarizedExperiment_1.4.0
 [5] Biobase_2.34.0             GenomicRanges_1.26.3       GenomeInfoDb_1.10.3        IRanges_2.8.1             
 [9] S4Vectors_0.12.1           BiocGenerics_0.20.0        BiocInstaller_1.24.0      

loaded via a namespace (and not attached):
 [1] genefilter_1.56.0    locfit_1.5-9.1       splines_3.3.3        lattice_0.20-34      colorspace_1.3-2    
 [6] htmltools_0.3.5      base64enc_0.1-3      survival_2.40-1      XML_3.98-1.5         foreign_0.8-67      
[11] DBI_0.6              BiocParallel_1.8.1   RColorBrewer_1.1-2   plyr_1.8.4           stringr_1.2.0       
[16] zlibbioc_1.20.0      munsell_0.4.3        gtable_0.2.0         htmlwidgets_0.8      memoise_1.0.0       
[21] latticeExtra_0.6-28  knitr_1.15.1         geneplotter_1.52.0   AnnotationDbi_1.36.2 htmlTable_1.9       
[26] Rcpp_0.12.9          acepack_1.4.1        xtable_1.8-2         scales_0.4.1         backports_1.0.5     
[31] checkmate_1.8.2      Hmisc_4.0-2          annotate_1.52.1      XVector_0.14.0       gridExtra_2.2.1     
[36] ggplot2_2.2.1        digest_0.6.12        stringi_1.1.2        grid_3.3.3           tools_3.3.3         
[41] bitops_1.0-6         magrittr_1.5         lazyeval_0.2.0       RCurl_1.95-4.8       tibble_1.2          
[46] RSQLite_1.1-2        Formula_1.2-1        cluster_2.0.6        Matrix_1.2-8         rsconnect_0.7       
[51] data.table_1.10.4    httr_1.2.1           assertthat_0.1       R6_2.2.0             rpart_4.1-10        
[56] nnet_7.3-12         
ADD COMMENTlink modified 8 months ago by James W. MacDonald45k • written 8 months ago by carine.genet0
1
gravatar for James W. MacDonald
8 months ago by
United States
James W. MacDonald45k wrote:

You are doing several things wrong, most of which involve fighting the existing infrastructure that is intended to make your life easier.

The data you download are in ExpressionSets, which are fancy ways to encapsulate data in such a way that you can pretend as if they are data.frames that you can subset and combine without having to go through all the gyrations you are performing. In addition, you are taking logs of data that has already been log converted. You could do this analysis in very few steps:

> first <- getGEO("GSE39589")[[1]]
> second <- getGEO("GSE49505")[[1]]

## combine ExpressionSets, but justthe small columnar and rounded samples
 > eset <- combine(first[,1:10], second[,6:15])
## get rid of extra featureSet columns that we don't need
> fData(eset) <- fData(eset)[,c(1,18,8)]
## design matrix accounting for two batches of samples
> type <- factor(rep(c("Rounded","Columnar"), each = 5, times = 2))
> batch <- factor(rep(1:2, each = 10))
> design <- model.matrix(~type+batch)
## filter out low expressing and AFFX probes
> eset <- eset[rowMeans(exprs(eset), na.rm = TRUE) > 5,]
> eset <- eset[-grep("AFFX", featureNames(eset)),]
## fit model
> fit <- lmFit(eset, design)
> fit2 <- eBayes(fit)
> topTable(fit2, 2)
                                 ID ENTREZ_GENE_ID Gene.Symbol logFC AveExpr
Bt.8824.1.S1_at     Bt.8824.1.S1_at                             0.44   5.850
Bt.19645.1.A1_at   Bt.19645.1.A1_at                             0.86   5.300
Bt.27718.1.S1_at   Bt.27718.1.S1_at                             0.50   6.840
Bt.1993.3.A1_a_at Bt.1993.3.A1_a_at                             0.37   6.045
Bt.18252.1.A1_at   Bt.18252.1.A1_at                             0.33   5.325
Bt.17384.2.S1_at   Bt.17384.2.S1_at                             0.45   5.065
Bt.17150.1.A1_at   Bt.17150.1.A1_at                             0.46   5.480
Bt.19909.1.A1_at   Bt.19909.1.A1_at                             0.51   6.615
Bt.21987.1.S1_at   Bt.21987.1.S1_at                             0.38   6.640
Bt.11954.1.S1_at   Bt.11954.1.S1_at                             0.41   6.685
                         t     P.Value adj.P.Val         B
Bt.8824.1.S1_at   3.707264 0.001301586         1 -2.783709
Bt.19645.1.A1_at  3.232526 0.003982368         1 -3.148296
Bt.27718.1.S1_at  3.154172 0.004776595         1 -3.209269
Bt.1993.3.A1_a_at 3.121159 0.005155423         1 -3.234989
Bt.18252.1.A1_at  2.891316 0.008721984         1 -3.414158
Bt.17384.2.S1_at  2.835272 0.009898647         1 -3.457761
Bt.17150.1.A1_at  2.747234 0.012057845         1 -3.526084
Bt.19909.1.A1_at  2.645507 0.015108938         1 -3.604650
Bt.21987.1.S1_at  2.612535 0.016245273         1 -3.630002
Bt.11954.1.S1_at  2.589115 0.017100774         1 -3.647970

If you do a PCA plot of these data, you will see that there is pretty good overlap between the columnar and rounded samples, so I wouldn't normally expect to see much anyway.

ADD COMMENTlink written 8 months ago by James W. MacDonald45k

Thank a lot, James for your quick answer.

 

ADD REPLYlink written 8 months ago by carine.genet0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 285 users visited in the last hour