Error in readGenericHeader when using read.maimages
1
0
Entering edit mode
@ireneroman-22635
Last seen 3.7 years ago

Dear Bioconductor readers,

My name is Irene and I am trying to analyze a dataset from GEO obtained with an Agilent platform. I have successfully analyzed Agilent data from GEO recently using the information on the limma package. But this time I cannot read the txt files. It seems as if the columns that read.maimages is looking for were not in the txt files or had a different name. If I knew the names of these columns I could provide those with the columns argument of the read.maimages function, but I do not know them.

I would be extremely grateful if you could give me some advice.

Thank you in advance, Irene.

Here is my code:

### Fetching the data
workingDir<-"C:/Users/iroman/Documents/Master_Omics/Project"
setwd(workingDir)
GEO48872<-getGEOSuppFiles("GSE48872",makeDirectory=TRUE, fetch_files = TRUE)

setwd(paste(workingDir,"GSE48872",sep="/"))
untar("GSE48872_RAW.tar", exdir = getwd())

### Targets file
SampleNumber<-c(1,2,3,4,5,6,7)
FileName<-c("GSM1186204_raw_data_ActivatedaOPCs_1.txt","GSM1186205_raw_data_ActivatedaOPCs_2.txt",
            "GSM1186206_raw_data_ActivatedaOPCs_3.txt","GSM1186207_raw_data_ActivatedaOPCs_4.txt",
            "GSM1186208_raw_data_NonactivatedaOPCs_1.txt","GSM1186209_raw_data_NonactivatedaOPCs_2.txt",
            "GSM1186210_raw_data_NonactivatedaOPCs_3.txt")
Condition<-c("Cupri","Cupri","Cupri","Cupri","Ctr","Ctr","Ctr")
designO<-as.data.frame(cbind(SampleNumber,FileName,Condition))
write.table(designO,file="targetsO.txt",sep="\t")
targetsO = readTargets("targetsO.txt")

### Reading the files
rawO = read.maimages(targetsO, source="agilent",green.only=FALSE,ext = "gz",other.columns="gIsWellAboveBG")
#Error in readGenericHeader(fullname, columns = columns, sep = sep) : 
#  Specified column headings not found in file

> traceback()
3: file(file, "r")
2: readGenericHeader(fullname, columns = columns, sep = sep)
1: read.maimages(targetsO, source = "agilent", green.only = FALSE, 
       ext = "gz", other.columns = "gIsWellAboveBG")

Here is the sessionInfo:
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252    LC_MONETARY=Spanish_Spain.1252
[4] LC_NUMERIC=C                   LC_TIME=Spanish_Spain.1252    

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] agilp_3.14.0                         mgug4122a.db_3.2.3                   topGO_2.34.0 
 [4] SparseM_1.77                         graph_1.60.0                         dplyr_0.8.0.1
 [7] sva_3.30.1                           mgcv_1.8-27                          nlme_3.1-137
[10] casper_2.16.1                        a4Base_1.30.0                        a4Core_1.30.0 
[13] a4Preproc_1.30.0                     glmnet_2.0-16                        foreach_1.4.4
[16] Matrix_1.2-15                        multtest_2.38.0                      genefilter_1.64.0
[19] mpm_1.0-22                           KernSmooth_2.23-15                   MASS_7.3-51.1
[22] annaffy_1.54.0                       KEGG.db_3.2.3                        GO.db_3.7.0
[25] ReactomePA_1.26.0                    tidyr_0.8.3                          oligo_1.46.0
[28] Biostrings_2.50.2                    XVector_0.22.0                       oligoClasses_1.44.0
[31] mogene10sttranscriptcluster.db_8.7.0 org.Mm.eg.db_3.7.0                   annotate_1.60.0
[34] XML_3.98-1.19                        AnnotationDbi_1.44.0                 GEOquery_2.50.5
[37] limma_3.38.3                         gplots_3.0.1.1                       scatterplot3d_0.3-41
[40] affyQCReport_1.60.0                  lattice_0.20-38                      affyPLM_1.58.0
[43] preprocessCore_1.44.0                gcrma_2.54.0                         affy_1.60.0
[46] SummarizedExperiment_1.12.0          DelayedArray_0.8.0                   BiocParallel_1.16.6
[49] matrixStats_0.54.0                   Biobase_2.42.0                       GenomicRanges_1.34.0
[52] GenomeInfoDb_1.18.2                  IRanges_2.16.0                       S4Vectors_0.20.1
[55] BiocGenerics_0.28.0                 

loaded via a namespace (and not attached):
  [1] proto_1.0.0              tidyselect_0.2.5         RSQLite_2.1.1            munsell_0.5.0           
  [5] codetools_0.2-16         chron_2.3-53             statmod_1.4.30           colorspace_1.4-1        
  [9] GOSemSim_2.8.0           knitr_1.22               rstudioapi_0.10          DOSE_3.8.2              
 [13] simpleaffy_2.58.0        urltools_1.7.3           GenomeInfoDbData_1.2.0   polyclip_1.10-0         
 [17] bit64_0.9-7              farver_2.0.1             coda_0.19-3              xfun_0.6                
 [21] affxparser_1.54.0        R6_2.4.0                 graphlayouts_0.5.0       VGAM_1.1-2              
 [25] bitops_1.0-6             fgsea_1.8.0              gridGraphics_0.4-1       assertthat_0.2.1        
 [29] scales_1.0.0             ggraph_2.0.0             enrichplot_1.2.0         gtable_0.3.0            
 [33] tidygraph_1.1.2          rlang_0.3.4              splines_3.5.3            rtracklayer_1.42.2      
 [37] lazyeval_0.2.2           europepmc_0.3            checkmate_1.9.1          BiocManager_1.30.4      
 [41] yaml_2.2.0               reshape2_1.4.3           GenomicFeatures_1.34.3   backports_1.1.3         
 [45] qvalue_2.14.1            tools_3.5.3              ggplotify_0.0.4          ggplot2_3.1.1           
 [49] affyio_1.52.0            ff_2.2-14                RColorBrewer_1.1-2       ggridges_0.5.1          
 [53] gsubfn_0.7               Rcpp_1.0.1               plyr_1.8.4               progress_1.2.0          
 [57] zlibbioc_1.28.0          purrr_0.3.2              RCurl_1.95-4.12          prettyunits_1.0.2       
 [61] sqldf_0.4-11             viridis_0.5.1            cowplot_0.9.4            ggrepel_0.8.1           
 [65] cluster_2.0.7-1          magrittr_1.5             data.table_1.12.2        DO.db_2.9               
 [69] triebeard_0.3.0          reactome.db_1.66.0       hms_0.4.2                xtable_1.8-3            
 [73] gaga_2.28.1              gridExtra_2.3            compiler_3.5.3           biomaRt_2.38.0          
 [77] tibble_2.1.1             crayon_1.3.4             DBI_1.0.0                tweenr_1.0.1            
 [81] rappdirs_0.3.1           readr_1.3.1              gdata_2.18.0             igraph_1.2.4.1          
 [85] pkgconfig_2.0.2          rvcheck_0.1.7            GenomicAlignments_1.18.1 xml2_1.2.0              
 [89] EBarrays_2.46.0          stringr_1.4.0            digest_0.6.18            fastmatch_1.1-0         
 [93] curl_3.3                 Rsamtools_1.34.1         gtools_3.8.1             graphite_1.28.2         
 [97] jsonlite_1.6             viridisLite_0.3.0        pillar_1.3.1             httr_1.4.0              
[101] survival_2.43-3          glue_1.3.1               UpSetR_1.4.0             iterators_1.0.10        
[105] bit_1.1-14               ggforce_0.3.1            stringi_1.4.3            blob_1.1.1              
[109] caTools_1.17.1.2         memoise_1.1.0 
limma • 1.8k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 19 minutes ago
WEHI, Melbourne, Australia

GenePix not Agilent

You need to specify GenePix instead of Agilent input:

rawO <- read.maimages(targetsO, source="genepix",
                      green.only=TRUE, ext = "gz")

The only way to know what column names are in the GEO files is to open one of the files in a text editor and look. When I did that, I saw that the files were actually created with GenePix rather than with Agilent software.

Note that the source argument of read.maimages refers to the name of the image analysis software used to quantify the probe intensities, not the name of the microarray manufacturer.

One color not two color

The GEO entry for these microarraysarrays shows that they were hybridized with one color (Cy3=green) only. So you need to specify green.only=TRUE.

The following code worked fine for me:

> files <- dir(pattern="txt.gz")
> files
[1] "GSM1186204_raw_data_ActivatedaOPCs_1.txt.gz"   
[2] "GSM1186205_raw_data_ActivatedaOPCs_2.txt.gz"   
[3] "GSM1186206_raw_data_ActivatedaOPCs_3.txt.gz"   
[4] "GSM1186207_raw_data_ActivatedaOPCs_4.txt.gz"   
[5] "GSM1186208_raw_data_NonactivatedaOPCs_1.txt.gz"
[6] "GSM1186209_raw_data_NonactivatedaOPCs_2.txt.gz"
[7] "GSM1186210_raw_data_NonactivatedaOPCs_3.txt.gz"
> AnnCols <- c("ID","AutoFlag","Normalisation","Name","RefNumber",
+              "ControlType","GeneName","TopHit","Description")
> raw0 <- read.maimages(files,source="genepix",green.only=TRUE,annotation=AnnCols)
Read GSM1186204_raw_data_ActivatedaOPCs_1.txt.gz 
Read GSM1186205_raw_data_ActivatedaOPCs_2.txt.gz 
Read GSM1186206_raw_data_ActivatedaOPCs_3.txt.gz 
Read GSM1186207_raw_data_ActivatedaOPCs_4.txt.gz 
Read GSM1186208_raw_data_NonactivatedaOPCs_1.txt.gz 
Read GSM1186209_raw_data_NonactivatedaOPCs_2.txt.gz 
Read GSM1186210_raw_data_NonactivatedaOPCs_3.txt.gz 
> raw <- raw0[raw0$genes$ControlType=="false", ]
> yb <- backgroundCorrect(raw,method="normexp")
Array 1 corrected
Array 2 corrected
Array 3 corrected
Array 4 corrected
Array 5 corrected
Array 6 corrected
Array 7 corrected
> y <- normalizeBetweenArrays(yb,method="quantile")
> design <- cbind(Intercept=1,Activated=c(1,1,1,1,0,0,0))
> fit <- lmFit(y,design)
> fit <- eBayes(fit,robust=TRUE,trend=TRUE)
> topTable(fit)
ADD COMMENT
0
Entering edit mode

Dear Gordon,

Thank you so much for your quick response.

I tried using "genepix" as source but it did not work. Do you know what could be the problem?

Thank you so much, sincerely, Irene.

> rawO <- read.maimages(targetsO, source="genepix",ext = "gz")
Error in `[.data.frame`(obj, , columns[[a]]) : undefined columns selected

> traceback()
4: stop("undefined columns selected")
3: `[.data.frame`(obj, , columns[[a]])
2: obj[, columns[[a]]]
1: read.maimages(targetsO, source = "genepix", ext = "gz")
ADD REPLY
0
Entering edit mode

That's because these are one-color microarrays so you need to specify green.only=TRUE. I have edited my answer above to reflect this.

ADD REPLY

Login before adding your answer.

Traffic: 681 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6