how to import Agilent CGH custom design xml file into limma
1
0
Entering edit mode
Jarek Bryk ▴ 110
@jarek-bryk-3457
Last seen 8.0 years ago
Hello, I am trying to analyse my two-color mouse aCGH data on Agilent platform. I have custom-designed 1M probe CGH arrays for which I have a xml file that defines the arrays' design (layout etc.). My goal (for now) is to identify CNVs in the samples. I am trying to use snapCGH to analyse the samples, but I stumble upon some basic preprocessing steps. I can read in the Feature Extraction files with no problem (following almost literally snapCGH user guide, just for single sample): > RG1<-read.maimages(file="sample1.txt",source="agilent") Read sample1.txt > str(RG1) Formal class 'RGList' [package "limma"] with 1 slots ..@ .Data:List of 8 .. ..$: num [1:974016, 1] 297 33.6 35.2 145.7 333.7 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : NULL .. .. .. ..$: chr "sample1" .. ..$ : num [1:974016, 1] 25 25 25 25 25 25 25 25 25 25 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$: NULL .. .. .. ..$ : chr "sample1" .. ..$: num [1:974016, 1] 333.8 78.2 54.7 111.8 170.5 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : NULL .. .. .. ..$: chr "sample1" .. ..$ : num [1:974016, 1] 52 52 52 53 53 53 53 53 53 52 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$: NULL .. .. .. ..$ : chr "sample1" .. ..$:'data.frame': 1 obs. of 1 variable: .. .. ..$ FileName: chr "sample1" .. ..$:'data.frame': 974016 obs. of 9 variables: .. .. ..$ Row : int [1:974016] 1 1 1 1 1 1 1 1 1 1 ... .. .. ..$Col : int [1:974016] 1 2 3 4 5 6 7 8 9 10 ... .. .. ..$ Start : int [1:974016] 0 0 0 0 0 0 0 0 0 0 ... .. .. ..$Sequence : chr [1:974016] "" "" "" "CCCATTCACTAATCTACATATTATCTCCATCCAACAAAAATTTCTTTCAGTAAGGTGTGG" ... .. .. ..$ ProbeUID : int [1:974016] 0 1 1 3 5 7 9 11 13 15 ... .. .. ..$ControlType : int [1:974016] 1 1 1 0 0 0 0 0 0 0 ... .. .. ..$ ProbeName : chr [1:974016] "MmCGHBrightCorner" "DarkCorner" "DarkCorner" "A_67_P21917383" ... .. .. ..$GeneName : chr [1:974016] "MmCGHBrightCorner" "DarkCorner" "DarkCorner" "Msn" ... .. .. ..$ SystematicName: chr [1:974016] "MmCGHBrightCorner" "DarkCorner" "DarkCorner" "chrX:93323379-93323438" ... .. ..$: chr "agilent" .. ..$ :List of 4 .. .. ..$ngrid.r: num 1 .. .. ..$ ngrid.c: num 1 .. .. ..$nspot.r: int 1068 .. .. ..$ nspot.c: int 912 > RG1$design<- -1 # I have flipped data where red is the reference > RG2<-backgroundCorrect(RG1,method="normexp") Array 1 corrected Array 1 corrected > MA<-normalizeWithinArrays(RG2,method="loess") And after this command I think I need to supply the design file to limma/snapCGH so that it knows what to do with processCGH or genomePlot, because this happens during the next steps: > MA2<-processCGH(MA, method.of.averaging=mean,ID="ProbeUID") Error in order(MA$genes$Chr, MA$genes$Position) : argument 1 is not a vector > genomePlot(MA,array=1) Error in order(datainfo$Chr, datainfo$Position) : argument 1 is not a vector I am shooting in the dark with the ProbeUID field here, as this should be the unique probe identifier, but it's not where the problem is. I need position of genes in the genome, and this is in the design file. I think I should use the readPositionalInfo to relay this information to limma/snapCGH, but it only accepts objects of class RGList or MAList, and my design is an xml file - I don't know how to parse the file... I would be very grateful for any clarification on how to proceed with the analysis. cheers jarek > sessionInfo() R version 2.12.0 (2010-10-15) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] snapCGH_1.20.0 DNAcopy_1.24.0 limma_3.6.0 aCGH_1.28.0 multtest_2.6.0 Biobase_2.10.0 [7] survival_2.35-8 cluster_1.13.1 GLAD_2.12.0 loaded via a namespace (and not attached): [1] affy_1.28.0 affyio_1.18.0 annotate_1.28.0 AnnotationDbi_1.12.0 [5] DBI_0.2-5 genefilter_1.32.0 grid_2.12.0 lattice_0.19-13 [9] MASS_7.3-8 preprocessCore_1.12.0 RColorBrewer_1.0-2 RSQLite_0.9-2 [13] strucchange_1.4-2 tilingArray_1.28.0 tools_2.12.0 vsn_3.18.0 [17] xtable_1.5-6 -- Jarek Bryk | www.evolbio.mpg.de/~bryk Max Planck Institute for Evolutionary Biology August Thienemann Str. 2 | 24306 Pl?n, Germany tel. +49 4522 763 287 | bryk at evolbio.mpg.de aCGH CGH probe aCGH snapCGH aCGH CGH probe aCGH snapCGH • 1.3k views ADD COMMENT 0 Entering edit mode @sean-davis-490 Last seen 3 months ago United States On Wed, Oct 20, 2010 at 11:26 AM, Jarek Bryk <bryk@evolbio.mpg.de> wrote: > Hello, > > I am trying to analyse my two-color mouse aCGH data on Agilent platform. I > have custom-designed 1M probe CGH arrays for which I have a xml file that > defines the arrays' design (layout etc.). My goal (for now) is to identify > CNVs in the samples. I am trying to use snapCGH to analyse the samples, but > I stumble upon some basic preprocessing steps. > > I can read in the Feature Extraction files with no problem (following > almost literally snapCGH user guide, just for single sample): > > > RG1<-read.maimages(file="sample1.txt",source="agilent") > Read sample1.txt > > str(RG1) > Formal class 'RGList' [package "limma"] with 1 slots > ..@ .Data:List of 8 > .. ..$ : num [1:974016, 1] 297 33.6 35.2 145.7 333.7 ... > .. .. ..- attr(*, "dimnames")=List of 2 > .. .. .. ..$: NULL > .. .. .. ..$ : chr "sample1" > .. ..$: num [1:974016, 1] 25 25 25 25 25 25 25 25 25 25 ... > .. .. ..- attr(*, "dimnames")=List of 2 > .. .. .. ..$ : NULL > .. .. .. ..$: chr "sample1" > .. ..$ : num [1:974016, 1] 333.8 78.2 54.7 111.8 170.5 ... > .. .. ..- attr(*, "dimnames")=List of 2 > .. .. .. ..$: NULL > .. .. .. ..$ : chr "sample1" > .. ..$: num [1:974016, 1] 52 52 52 53 53 53 53 53 53 52 ... > .. .. ..- attr(*, "dimnames")=List of 2 > .. .. .. ..$ : NULL > .. .. .. ..$: chr "sample1" > .. ..$ :'data.frame': 1 obs. of 1 variable: > .. .. ..$FileName: chr "sample1" > .. ..$ :'data.frame': 974016 obs. of 9 variables: > .. .. ..$Row : int [1:974016] 1 1 1 1 1 1 1 1 1 1 ... > .. .. ..$ Col : int [1:974016] 1 2 3 4 5 6 7 8 9 10 ... > .. .. ..$Start : int [1:974016] 0 0 0 0 0 0 0 0 0 0 ... > .. .. ..$ Sequence : chr [1:974016] "" "" "" > "CCCATTCACTAATCTACATATTATCTCCATCCAACAAAAATTTCTTTCAGTAAGGTGTGG" ... > .. .. ..$ProbeUID : int [1:974016] 0 1 1 3 5 7 9 11 13 15 ... > .. .. ..$ ControlType : int [1:974016] 1 1 1 0 0 0 0 0 0 0 ... > .. .. ..$ProbeName : chr [1:974016] "MmCGHBrightCorner" "DarkCorner" > "DarkCorner" "A_67_P21917383" ... > .. .. ..$ GeneName : chr [1:974016] "MmCGHBrightCorner" "DarkCorner" > "DarkCorner" "Msn" ... > .. .. ..$SystematicName: chr [1:974016] "MmCGHBrightCorner" "DarkCorner" > "DarkCorner" "chrX:93323379-93323438" ... > .. ..$ : chr "agilent" > .. ..$:List of 4 > .. .. ..$ ngrid.r: num 1 > .. .. ..$ngrid.c: num 1 > .. .. ..$ nspot.r: int 1068 > .. .. ..$nspot.c: int 912 > > RG1$design<- -1 # I have flipped data where red is the reference > > RG2<-backgroundCorrect(RG1,method="normexp") > Array 1 corrected > Array 1 corrected > > MA<-normalizeWithinArrays(RG2,method="loess") > > Hi, Jarek. You definitely DO NOT want to be loess-normalizing CGH data. There is an expected positive correlation between average intensity and log-ratio; loess-normalization will effectively eliminate that signal. > And after this command I think I need to supply the design file to > limma/snapCGH so that it knows what to do with processCGH or genomePlot, > because this happens during the next steps: > > > MA2<-processCGH(MA, method.of.averaging=mean,ID="ProbeUID") > Error in order(MA$genes$Chr, MA$genes$Position) : > argument 1 is not a vector > You may need something like: MA$design=rep(1,ncol(MA)) This assumes that there are no dye swaps. If there are, you may need to put some -1 in there. > genomePlot(MA,array=1) > Error in order(datainfo$Chr, datainfo$Position) : > argument 1 is not a vector > > I am shooting in the dark with the ProbeUID field here, as this should be > the unique probe identifier, but it's not where the problem is. I need > position of genes in the genome, and this is in the design file. I think I > should use the readPositionalInfo to relay this information to > limma/snapCGH, but it only accepts objects of class RGList or MAList, and my > design is an xml file - I don't know how to parse the file... > > This information is stored in the MA$Genes\$SystematicName above. You will need to split out the pieces (chromosome, position) and then use those for the locations. You will also probably need to remove control probes. Hope that helps. Sean > I would be very grateful for any clarification on how to proceed with the > analysis. > > cheers > jarek > > > sessionInfo() > R version 2.12.0 (2010-10-15) > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] snapCGH_1.20.0 DNAcopy_1.24.0 limma_3.6.0 aCGH_1.28.0 > multtest_2.6.0 Biobase_2.10.0 > [7] survival_2.35-8 cluster_1.13.1 GLAD_2.12.0 > > loaded via a namespace (and not attached): > [1] affy_1.28.0 affyio_1.18.0 annotate_1.28.0 > AnnotationDbi_1.12.0 > [5] DBI_0.2-5 genefilter_1.32.0 grid_2.12.0 > lattice_0.19-13 > [9] MASS_7.3-8 preprocessCore_1.12.0 RColorBrewer_1.0-2 > RSQLite_0.9-2 > [13] strucchange_1.4-2 tilingArray_1.28.0 tools_2.12.0 > vsn_3.18.0 > [17] xtable_1.5-6 > > -- > Jarek Bryk | www.evolbio.mpg.de/~bryk > Max Planck Institute for Evolutionary Biology > August Thienemann Str. 2 | 24306 PlÃ¶n, Germany > tel. +49 4522 763 287 | bryk@evolbio.mpg.de > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]