Question: Strange problem with BeadArrayUseCases vignette
0
gravatar for smt8n
3.6 years ago by
smt8n0
smt8n0 wrote:

Hi all,

I encountered a strange problem running the "Analysis of public data using GEO" part of the code from the BeadArrayUseCases vignette:

    library(GEOquery)
    library(limma)
    library(illuminaHumanv1.db)
    
    url <- "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5350/"
    filenm <- "GSE5350-GPL2507_series_matrix.txt.gz"
    download.file(paste(url, filenm, sep=""), destfile=filenm)
    gse <- getGEO(filename=filenm)
    
    dim(gse)
    exprs(gse)[1:5,1:2]
    
    samples <- as.character(pData(gse)[,"title"])
    sites <- as.numeric(substr(samples,10,10))
    shortlabels <- substr(samples,12,13)
    
    rnasource <- pData(gse)[,"source_name_ch1"]
    levels(rnasource) <- c("UHRR", "Brain", "UHRR75", "UHRR25")
    
    boxplot(log2(exprs(gse)), col=sites+1, names=shortlabels, las=2, cex.names=0.5, ylab=expression(log[2](intensity)), outline=FALSE, ylim=c(3,10), main="Before batch correction")


###################################################
### code chunk number 50: deAnalysisFromGEO (eval = FALSE)
###################################################
    gse.batchcorrect <- removeBatchEffect(log2(exprs(gse)), batch=sites)

    par(mfrow=c(1,2), oma=c(1,0.5,0.2,0.1))
    boxplot(gse.batchcorrect, col=sites+1, names=shortlabels, las=2, cex.names=0.5, ylab=expression(log[2](intensity)), outline=FALSE, ylim=c(3,10),  main="After batch correction")
    plotMDS(gse.batchcorrect, labels=shortlabels, col=sites+1, main="MDS plot")
    
    ids3 <- featureNames(gse)
    
    qual2 <-  unlist(mget(ids3, illuminaHumanv1PROBEQUALITY, ifnotfound=NA))
    
    table(qual2)
    
    rem2 <- qual2 == "No match" | qual2 == "Bad" | is.na(qual2)
    
    gse.batchcorrect.filt <- gse.batchcorrect[!rem2,]

The last line from above gives me an error:

> gse.batchcorrect.filt <- gse.batchcorrect[!rem2,]
Error in gse.batchcorrect[!rem2, ] : 
  (subscript) logical subscript too long

When I check dimensions:

> length(rem2)
[1] 47300

> nrow(gse.batchcorrect)
[1] 47293

 

I see that there is indeed a problem. An interesting thing is that before "unlisting" I get correct dimensions:

> length(mget(ids3, illuminaHumanv1PROBEQUALITY, ifnotfound=NA))
[1] 47293

Upon further checks, it seemed, there might be a duplication of entries: some entries had 2 "perfect" or "perfect" and "good".

Could anybody, please, advise me on what the problem might be and how to fix it. Thank you in advance.

 

This is the sessionInfo() output:

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
Running under: OS X 10.8.5 (Mountain Lion)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] illuminaHumanv1.db_1.26.0 org.Hs.eg.db_3.2.3        RSQLite_1.0.0             DBI_0.3.1                
 [5] AnnotationDbi_1.32.3      IRanges_2.4.8             S4Vectors_0.8.11          limma_3.26.9             
 [9] GEOquery_2.36.0           Biobase_2.30.0            BiocGenerics_0.16.1      

loaded via a namespace (and not attached):
[1] RCurl_1.95-4.8 bitops_1.0-6   XML_3.98-1.4  

 

beadarrayusecases • 424 views
ADD COMMENTlink written 3.6 years ago by smt8n0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 305 users visited in the last hour