Entering edit mode
smt8n
•
0
@smt8n-9982
Last seen 7.9 years ago
Hi all,
I encountered a strange problem running the "Analysis of public data using GEO" part of the code from the BeadArrayUseCases vignette:
library(GEOquery) library(limma) library(illuminaHumanv1.db) url <- "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5350/" filenm <- "GSE5350-GPL2507_series_matrix.txt.gz" download.file(paste(url, filenm, sep=""), destfile=filenm) gse <- getGEO(filename=filenm) dim(gse) exprs(gse)[1:5,1:2] samples <- as.character(pData(gse)[,"title"]) sites <- as.numeric(substr(samples,10,10)) shortlabels <- substr(samples,12,13) rnasource <- pData(gse)[,"source_name_ch1"] levels(rnasource) <- c("UHRR", "Brain", "UHRR75", "UHRR25") boxplot(log2(exprs(gse)), col=sites+1, names=shortlabels, las=2, cex.names=0.5, ylab=expression(log[2](intensity)), outline=FALSE, ylim=c(3,10), main="Before batch correction") ################################################### ### code chunk number 50: deAnalysisFromGEO (eval = FALSE) ################################################### gse.batchcorrect <- removeBatchEffect(log2(exprs(gse)), batch=sites) par(mfrow=c(1,2), oma=c(1,0.5,0.2,0.1)) boxplot(gse.batchcorrect, col=sites+1, names=shortlabels, las=2, cex.names=0.5, ylab=expression(log[2](intensity)), outline=FALSE, ylim=c(3,10), main="After batch correction") plotMDS(gse.batchcorrect, labels=shortlabels, col=sites+1, main="MDS plot") ids3 <- featureNames(gse) qual2 <- unlist(mget(ids3, illuminaHumanv1PROBEQUALITY, ifnotfound=NA)) table(qual2) rem2 <- qual2 == "No match" | qual2 == "Bad" | is.na(qual2) gse.batchcorrect.filt <- gse.batchcorrect[!rem2,]
The last line from above gives me an error:
> gse.batchcorrect.filt <- gse.batchcorrect[!rem2,] Error in gse.batchcorrect[!rem2, ] : (subscript) logical subscript too long
When I check dimensions:
> length(rem2) [1] 47300 > nrow(gse.batchcorrect) [1] 47293
I see that there is indeed a problem. An interesting thing is that before "unlisting" I get correct dimensions:
> length(mget(ids3, illuminaHumanv1PROBEQUALITY, ifnotfound=NA)) [1] 47293
Upon further checks, it seemed, there might be a duplication of entries: some entries had 2 "perfect" or "perfect" and "good".
Could anybody, please, advise me on what the problem might be and how to fix it. Thank you in advance.
This is the sessionInfo() output:
> sessionInfo() R version 3.2.1 (2015-06-18) Platform: x86_64-apple-darwin10.8.0 (64-bit) Running under: OS X 10.8.5 (Mountain Lion) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] illuminaHumanv1.db_1.26.0 org.Hs.eg.db_3.2.3 RSQLite_1.0.0 DBI_0.3.1 [5] AnnotationDbi_1.32.3 IRanges_2.4.8 S4Vectors_0.8.11 limma_3.26.9 [9] GEOquery_2.36.0 Biobase_2.30.0 BiocGenerics_0.16.1 loaded via a namespace (and not attached): [1] RCurl_1.95-4.8 bitops_1.0-6 XML_3.98-1.4