AB Rat Genome Survey Microarray

0

Entering edit mode

Iain Gallagher ▴ 930

@iain-gallagher-2532

Last seen 8.8 years ago

United Kingdom

Hello List & Yongming Andrew Sun I've been tasked with the analysis of some AB Rat Genome Survey Microarray data. The data I have are text files with the following columns: Assay_Name Row Column Probe_ID Probe_Type Gene_ID X Y Assay_Normalized_Signal Signal CL_Sig CL_Raw SDEV CV S_N CL_Sig_Error CL_Raw_SDEV Flags Sample_Name I was planning to use the ABarray package for this analysis but reading through the vignette I'm not following the examples. From the vignette: "Data File. The rows of the file represent probes. The first column is probeID (or Probe Name), the second column is geneID, next set of columns should contain Signal, S/N and FLAGS for all samples... If array name in the header of data file is present and arrayName is defined in experiment design file, arrayName will be used to distinguish which column is for which hybridization sample." So, I take my array files and create a large text file representing all arrays (20) in long format. > arrays <- read.table('allArrays.csv', header=T, sep='\t') > head(arrays) ? arrayName Probe_ID??? Gene_ID?? Signal?? S.N Flags 1 GSM517967 20693443?? rCG63297?? 264.31 -0.88???? 1 2 GSM517967 20693548 rCG43972.1? 3258.81 15.17???? 0 3 GSM517967 20693561?? rCG44662? 1200.68 -0.22???? 1 4 GSM517967 20693609?? rCG59171 30762.77 40.18???? 0 5 GSM517967 20693611?? rCG58434?? 838.39? 2.13???? 0 6 GSM517967 20693655?? rCG21764? 5331.88 20.66???? 0 ... 537135 GSM524823 22425691?? rCG48508?? 535.35? 1.27???? 0 537136 GSM524823 22425699?? rCG40754?? 518.45 -0.26???? 1 537137 GSM524823 22425720?? rCG31597?? 975.71? 3.62???? 0 537138 GSM524823 22425784 rCG61959.1?? 331.31 -1.62???? 1 537139 GSM524823 22425829?? rCG38412 27281.49 45.11???? 0 537140 GSM524823 22426049?? rCG34097? 2172.00? 3.65???? 0 I create a design file as suggested in the vignette: > des <- read.table('design.csv', header=T, sep='\t') > head(des) ? sampleName arrayName phenotype 1??? LV_LRT1 GSM492839?????? LRT 2??? LV_LRT2 GSM517967?????? LRT 3??? LV_LRT3 GSM517968?????? LRT 4??? LV_LRT4 GSM517969?????? LRT 5??? LV_LRT5 GSM517970?????? LRT 6??? LV_LRS1 GSM518311?????? LRS Then, trying to create the expressionSet object: > eset <- ABarray('allArrays.csv', 'design.csv', 'phenotype') Using arrayName to match experiment with signal in file: allArrays.csv Error in `[.data.frame`(pd, , group) : undefined columns selected Hmm, could someone let me know where I'm going wrong? Thanks & best Iain

probe ABarray probe ABarray • 1.1k views

ADD COMMENT • link 12.2 years ago Iain Gallagher ▴ 930

0

Entering edit mode

Iain Gallagher ▴ 930

@iain-gallagher-2532

Last seen 8.8 years ago

United Kingdom

For those that come afterwards. Firstly I had the structure of the data file wrong - in the first two columns put the probeID and geneID, then create 3 columns for each array consisting of the Signal, S/N and Flags where the column names are in the form: Signal GSM492839??? S/N GSM492839??? Flags GSM492839??? Signal GSM517967??? S/N GSM517967??? Flags GSM517967... etc etc Here each GSM.... is the name of an array (these were downloaded from GEO). Secondly the 'Data file' as defined in the ABarray has to be comma delimited (at least on linux - I haven't tried other platforms) not tab delimited (as it says it can be in the vignette). Trying to use tab delimited data results in an error about there being a mismatch in the number of Signal and S/N columns. The package can now at least read the data and I'll see how I progress from here. HTH best i > sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: ?[1] LC_CTYPE=en_GB.utf8?????? LC_NUMERIC=C???????????? ?[3] LC_TIME=en_GB.utf8??????? LC_COLLATE=en_GB.utf8??? ?[5] LC_MONETARY=en_GB.utf8??? LC_MESSAGES=en_GB.utf8?? ?[7] LC_PAPER=C??????????????? LC_NAME=C??????????????? ?[9] LC_ADDRESS=C????????????? LC_TELEPHONE=C?????????? [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C????? attached base packages: [1] stats???? graphics? grDevices utils???? datasets? methods?? base???? other attached packages: [1] ABarray_1.22.0 loaded via a namespace (and not attached): [1] Biobase_2.14.0?? MASS_7.3-17????? multtest_2.10.0? splines_2.14.1? [5] survival_2.36-12 tcltk_2.14.1???? tools_2.14.1??? ----- Original Message ----- From: Iain Gallagher <iaingallagher@btopenworld.com> To: "sunya at appliedbiosystems.com" <sunya at="" appliedbiosystems.com=""> Cc: bioconductor <bioconductor at="" stat.math.ethz.ch=""> Sent: Sunday, 12 February 2012, 9:19 Subject: [BioC] AB Rat Genome Survey Microarray Hello List & Yongming Andrew Sun I've been tasked with the analysis of some AB Rat Genome Survey Microarray data. The data I have are text files with the following columns: Assay_Name Row Column Probe_ID Probe_Type Gene_ID X Y Assay_Normalized_Signal Signal CL_Sig CL_Raw SDEV CV S_N CL_Sig_Error CL_Raw_SDEV Flags Sample_Name I was planning to use the ABarray package for this analysis but reading through the vignette I'm not following the examples. From the vignette: "Data File. The rows of the file represent probes. The first column is probeID (or Probe Name), the second column is geneID, next set of columns should contain Signal, S/N and FLAGS for all samples... If array name in the header of data file is present and arrayName is defined in experiment design file, arrayName will be used to distinguish which column is for which hybridization sample." So, I take my array files and create a large text file representing all arrays (20) in long format. > arrays <- read.table('allArrays.csv', header=T, sep='\t') > head(arrays) ? arrayName Probe_ID??? Gene_ID?? Signal?? S.N Flags 1 GSM517967 20693443?? rCG63297?? 264.31 -0.88???? 1 2 GSM517967 20693548 rCG43972.1? 3258.81 15.17???? 0 3 GSM517967 20693561?? rCG44662? 1200.68 -0.22???? 1 4 GSM517967 20693609?? rCG59171 30762.77 40.18???? 0 5 GSM517967 20693611?? rCG58434?? 838.39? 2.13???? 0 6 GSM517967 20693655?? rCG21764? 5331.88 20.66???? 0 ... 537135 GSM524823 22425691?? rCG48508?? 535.35? 1.27???? 0 537136 GSM524823 22425699?? rCG40754?? 518.45 -0.26???? 1 537137 GSM524823 22425720?? rCG31597?? 975.71? 3.62???? 0 537138 GSM524823 22425784 rCG61959.1?? 331.31 -1.62???? 1 537139 GSM524823 22425829?? rCG38412 27281.49 45.11???? 0 537140 GSM524823 22426049?? rCG34097? 2172.00? 3.65???? 0 I create a design file as suggested in the vignette: > des <- read.table('design.csv', header=T, sep='\t') > head(des) ? sampleName arrayName phenotype 1??? LV_LRT1 GSM492839?????? LRT 2??? LV_LRT2 GSM517967?????? LRT 3??? LV_LRT3 GSM517968?????? LRT 4??? LV_LRT4 GSM517969?????? LRT 5??? LV_LRT5 GSM517970?????? LRT 6??? LV_LRS1 GSM518311?????? LRS Then, trying to create the expressionSet object: > eset <- ABarray('allArrays.csv', 'design.csv', 'phenotype') Using arrayName to match experiment with signal in file: allArrays.csv Error in `[.data.frame`(pd, , group) : undefined columns selected Hmm, could someone let me know where I'm going wrong? Thanks & best Iain _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.2 years ago Iain Gallagher ▴ 930

Login before adding your answer.