Strange problem with BeadArrayUseCases vignette
0
0
Entering edit mode
smt8n • 0
@smt8n-9982
Last seen 8.0 years ago

Hi all,

I encountered a strange problem running the "Analysis of public data using GEO" part of the code from the BeadArrayUseCases vignette:

    library(GEOquery)
    library(limma)
    library(illuminaHumanv1.db)
    
    url <- "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5350/"
    filenm <- "GSE5350-GPL2507_series_matrix.txt.gz"
    download.file(paste(url, filenm, sep=""), destfile=filenm)
    gse <- getGEO(filename=filenm)
    
    dim(gse)
    exprs(gse)[1:5,1:2]
    
    samples <- as.character(pData(gse)[,"title"])
    sites <- as.numeric(substr(samples,10,10))
    shortlabels <- substr(samples,12,13)
    
    rnasource <- pData(gse)[,"source_name_ch1"]
    levels(rnasource) <- c("UHRR", "Brain", "UHRR75", "UHRR25")
    
    boxplot(log2(exprs(gse)), col=sites+1, names=shortlabels, las=2, cex.names=0.5, ylab=expression(log[2](intensity)), outline=FALSE, ylim=c(3,10), main="Before batch correction")


###################################################
### code chunk number 50: deAnalysisFromGEO (eval = FALSE)
###################################################
    gse.batchcorrect <- removeBatchEffect(log2(exprs(gse)), batch=sites)

    par(mfrow=c(1,2), oma=c(1,0.5,0.2,0.1))
    boxplot(gse.batchcorrect, col=sites+1, names=shortlabels, las=2, cex.names=0.5, ylab=expression(log[2](intensity)), outline=FALSE, ylim=c(3,10),  main="After batch correction")
    plotMDS(gse.batchcorrect, labels=shortlabels, col=sites+1, main="MDS plot")
    
    ids3 <- featureNames(gse)
    
    qual2 <-  unlist(mget(ids3, illuminaHumanv1PROBEQUALITY, ifnotfound=NA))
    
    table(qual2)
    
    rem2 <- qual2 == "No match" | qual2 == "Bad" | is.na(qual2)
    
    gse.batchcorrect.filt <- gse.batchcorrect[!rem2,]

The last line from above gives me an error:

> gse.batchcorrect.filt <- gse.batchcorrect[!rem2,]
Error in gse.batchcorrect[!rem2, ] : 
  (subscript) logical subscript too long

When I check dimensions:

> length(rem2)
[1] 47300

> nrow(gse.batchcorrect)
[1] 47293

 

I see that there is indeed a problem. An interesting thing is that before "unlisting" I get correct dimensions:

> length(mget(ids3, illuminaHumanv1PROBEQUALITY, ifnotfound=NA))
[1] 47293

Upon further checks, it seemed, there might be a duplication of entries: some entries had 2 "perfect" or "perfect" and "good".

Could anybody, please, advise me on what the problem might be and how to fix it. Thank you in advance.

 

This is the sessionInfo() output:

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
Running under: OS X 10.8.5 (Mountain Lion)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] illuminaHumanv1.db_1.26.0 org.Hs.eg.db_3.2.3        RSQLite_1.0.0             DBI_0.3.1                
 [5] AnnotationDbi_1.32.3      IRanges_2.4.8             S4Vectors_0.8.11          limma_3.26.9             
 [9] GEOquery_2.36.0           Biobase_2.30.0            BiocGenerics_0.16.1      

loaded via a namespace (and not attached):
[1] RCurl_1.95-4.8 bitops_1.0-6   XML_3.98-1.4  

 

BeadArrayUseCases • 912 views
ADD COMMENT

Login before adding your answer.

Traffic: 855 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6