Question

Error in readGenericHeader when using read.maimages

0

Entering edit mode

irene.roman • 0

@ireneroman-22635

Last seen 4.4 years ago

Dear Bioconductor readers,

My name is Irene and I am trying to analyze a dataset from GEO obtained with an Agilent platform. I have successfully analyzed Agilent data from GEO recently using the information on the limma package. But this time I cannot read the txt files. It seems as if the columns that read.maimages is looking for were not in the txt files or had a different name. If I knew the names of these columns I could provide those with the columns argument of the read.maimages function, but I do not know them.

I would be extremely grateful if you could give me some advice.

Thank you in advance, Irene.

Here is my code:

### Fetching the data
workingDir<-"C:/Users/iroman/Documents/Master_Omics/Project"
setwd(workingDir)
GEO48872<-getGEOSuppFiles("GSE48872",makeDirectory=TRUE, fetch_files = TRUE)

setwd(paste(workingDir,"GSE48872",sep="/"))
untar("GSE48872_RAW.tar", exdir = getwd())

### Targets file
SampleNumber<-c(1,2,3,4,5,6,7)
FileName<-c("GSM1186204_raw_data_ActivatedaOPCs_1.txt","GSM1186205_raw_data_ActivatedaOPCs_2.txt",
            "GSM1186206_raw_data_ActivatedaOPCs_3.txt","GSM1186207_raw_data_ActivatedaOPCs_4.txt",
            "GSM1186208_raw_data_NonactivatedaOPCs_1.txt","GSM1186209_raw_data_NonactivatedaOPCs_2.txt",
            "GSM1186210_raw_data_NonactivatedaOPCs_3.txt")
Condition<-c("Cupri","Cupri","Cupri","Cupri","Ctr","Ctr","Ctr")
designO<-as.data.frame(cbind(SampleNumber,FileName,Condition))
write.table(designO,file="targetsO.txt",sep="\t")
targetsO = readTargets("targetsO.txt")

### Reading the files
rawO = read.maimages(targetsO, source="agilent",green.only=FALSE,ext = "gz",other.columns="gIsWellAboveBG")
#Error in readGenericHeader(fullname, columns = columns, sep = sep) : 
#  Specified column headings not found in file

> traceback()
3: file(file, "r")
2: readGenericHeader(fullname, columns = columns, sep = sep)
1: read.maimages(targetsO, source = "agilent", green.only = FALSE, 
       ext = "gz", other.columns = "gIsWellAboveBG")

Here is the sessionInfo:
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252    LC_MONETARY=Spanish_Spain.1252
[4] LC_NUMERIC=C                   LC_TIME=Spanish_Spain.1252    

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] agilp_3.14.0                         mgug4122a.db_3.2.3                   topGO_2.34.0 
 [4] SparseM_1.77                         graph_1.60.0                         dplyr_0.8.0.1
 [7] sva_3.30.1                           mgcv_1.8-27                          nlme_3.1-137
[10] casper_2.16.1                        a4Base_1.30.0                        a4Core_1.30.0 
[13] a4Preproc_1.30.0                     glmnet_2.0-16                        foreach_1.4.4
[16] Matrix_1.2-15                        multtest_2.38.0                      genefilter_1.64.0
[19] mpm_1.0-22                           KernSmooth_2.23-15                   MASS_7.3-51.1
[22] annaffy_1.54.0                       KEGG.db_3.2.3                        GO.db_3.7.0
[25] ReactomePA_1.26.0                    tidyr_0.8.3                          oligo_1.46.0
[28] Biostrings_2.50.2                    XVector_0.22.0                       oligoClasses_1.44.0
[31] mogene10sttranscriptcluster.db_8.7.0 org.Mm.eg.db_3.7.0                   annotate_1.60.0
[34] XML_3.98-1.19                        AnnotationDbi_1.44.0                 GEOquery_2.50.5
[37] limma_3.38.3                         gplots_3.0.1.1                       scatterplot3d_0.3-41
[40] affyQCReport_1.60.0                  lattice_0.20-38                      affyPLM_1.58.0
[43] preprocessCore_1.44.0                gcrma_2.54.0                         affy_1.60.0
[46] SummarizedExperiment_1.12.0          DelayedArray_0.8.0                   BiocParallel_1.16.6
[49] matrixStats_0.54.0                   Biobase_2.42.0                       GenomicRanges_1.34.0
[52] GenomeInfoDb_1.18.2                  IRanges_2.16.0                       S4Vectors_0.20.1
[55] BiocGenerics_0.28.0                 

loaded via a namespace (and not attached):
  [1] proto_1.0.0              tidyselect_0.2.5         RSQLite_2.1.1            munsell_0.5.0           
  [5] codetools_0.2-16         chron_2.3-53             statmod_1.4.30           colorspace_1.4-1        
  [9] GOSemSim_2.8.0           knitr_1.22               rstudioapi_0.10          DOSE_3.8.2              
 [13] simpleaffy_2.58.0        urltools_1.7.3           GenomeInfoDbData_1.2.0   polyclip_1.10-0         
 [17] bit64_0.9-7              farver_2.0.1             coda_0.19-3              xfun_0.6                
 [21] affxparser_1.54.0        R6_2.4.0                 graphlayouts_0.5.0       VGAM_1.1-2              
 [25] bitops_1.0-6             fgsea_1.8.0              gridGraphics_0.4-1       assertthat_0.2.1        
 [29] scales_1.0.0             ggraph_2.0.0             enrichplot_1.2.0         gtable_0.3.0            
 [33] tidygraph_1.1.2          rlang_0.3.4              splines_3.5.3            rtracklayer_1.42.2      
 [37] lazyeval_0.2.2           europepmc_0.3            checkmate_1.9.1          BiocManager_1.30.4      
 [41] yaml_2.2.0               reshape2_1.4.3           GenomicFeatures_1.34.3   backports_1.1.3         
 [45] qvalue_2.14.1            tools_3.5.3              ggplotify_0.0.4          ggplot2_3.1.1           
 [49] affyio_1.52.0            ff_2.2-14                RColorBrewer_1.1-2       ggridges_0.5.1          
 [53] gsubfn_0.7               Rcpp_1.0.1               plyr_1.8.4               progress_1.2.0          
 [57] zlibbioc_1.28.0          purrr_0.3.2              RCurl_1.95-4.12          prettyunits_1.0.2       
 [61] sqldf_0.4-11             viridis_0.5.1            cowplot_0.9.4            ggrepel_0.8.1           
 [65] cluster_2.0.7-1          magrittr_1.5             data.table_1.12.2        DO.db_2.9               
 [69] triebeard_0.3.0          reactome.db_1.66.0       hms_0.4.2                xtable_1.8-3            
 [73] gaga_2.28.1              gridExtra_2.3            compiler_3.5.3           biomaRt_2.38.0          
 [77] tibble_2.1.1             crayon_1.3.4             DBI_1.0.0                tweenr_1.0.1            
 [81] rappdirs_0.3.1           readr_1.3.1              gdata_2.18.0             igraph_1.2.4.1          
 [85] pkgconfig_2.0.2          rvcheck_0.1.7            GenomicAlignments_1.18.1 xml2_1.2.0              
 [89] EBarrays_2.46.0          stringr_1.4.0            digest_0.6.18            fastmatch_1.1-0         
 [93] curl_3.3                 Rsamtools_1.34.1         gtools_3.8.1             graphite_1.28.2         
 [97] jsonlite_1.6             viridisLite_0.3.0        pillar_1.3.1             httr_1.4.0              
[101] survival_2.43-3          glue_1.3.1               UpSetR_1.4.0             iterators_1.0.10        
[105] bit_1.1-14               ggforce_0.3.1            stringi_1.4.3            blob_1.1.1              
[109] caTools_1.17.1.2         memoise_1.1.0

limma • 2.0k views

ADD COMMENT • link updated 5.5 years ago by Gordon Smyth 52k • written 5.5 years ago by irene.roman • 0

Gordon Smyth · Answer 1 · 2020-01-01

GenePix not Agilent

You need to specify GenePix instead of Agilent input:

rawO <- read.maimages(targetsO, source="genepix",
                      green.only=TRUE, ext = "gz")

The only way to know what column names are in the GEO files is to open one of the files in a text editor and look. When I did that, I saw that the files were actually created with GenePix rather than with Agilent software.

Note that the source argument of read.maimages refers to the name of the image analysis software used to quantify the probe intensities, not the name of the microarray manufacturer.

One color not two color

The GEO entry for these microarraysarrays shows that they were hybridized with one color (Cy3=green) only. So you need to specify green.only=TRUE.

The following code worked fine for me:

> files <- dir(pattern="txt.gz")
> files
[1] "GSM1186204_raw_data_ActivatedaOPCs_1.txt.gz"   
[2] "GSM1186205_raw_data_ActivatedaOPCs_2.txt.gz"   
[3] "GSM1186206_raw_data_ActivatedaOPCs_3.txt.gz"   
[4] "GSM1186207_raw_data_ActivatedaOPCs_4.txt.gz"   
[5] "GSM1186208_raw_data_NonactivatedaOPCs_1.txt.gz"
[6] "GSM1186209_raw_data_NonactivatedaOPCs_2.txt.gz"
[7] "GSM1186210_raw_data_NonactivatedaOPCs_3.txt.gz"
> AnnCols <- c("ID","AutoFlag","Normalisation","Name","RefNumber",
+              "ControlType","GeneName","TopHit","Description")
> raw0 <- read.maimages(files,source="genepix",green.only=TRUE,annotation=AnnCols)
Read GSM1186204_raw_data_ActivatedaOPCs_1.txt.gz 
Read GSM1186205_raw_data_ActivatedaOPCs_2.txt.gz 
Read GSM1186206_raw_data_ActivatedaOPCs_3.txt.gz 
Read GSM1186207_raw_data_ActivatedaOPCs_4.txt.gz 
Read GSM1186208_raw_data_NonactivatedaOPCs_1.txt.gz 
Read GSM1186209_raw_data_NonactivatedaOPCs_2.txt.gz 
Read GSM1186210_raw_data_NonactivatedaOPCs_3.txt.gz 
> raw <- raw0[raw0$genes$ControlType=="false", ]
> yb <- backgroundCorrect(raw,method="normexp")
Array 1 corrected
Array 2 corrected
Array 3 corrected
Array 4 corrected
Array 5 corrected
Array 6 corrected
Array 7 corrected
> y <- normalizeBetweenArrays(yb,method="quantile")
> design <- cbind(Intercept=1,Activated=c(1,1,1,1,0,0,0))
> fit <- lmFit(y,design)
> fit <- eBayes(fit,robust=TRUE,trend=TRUE)
> topTable(fit)