Search
Question: PAIR files -- feature set table
0
gravatar for Guest User
4.5 years ago by
Guest User12k
Guest User12k wrote:
Dear Maintainer, I downloaded available NimbleGen 'single channel' 532.PAIR files for a custom built expression microarray from NCBI/GEO (GPL11164). However, I get an error message when I try to make the annotation for this platform using pdInfoBuild. In pdInfoBuilder Reference Manual (June 5, 2013), under the NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although, showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I changed the .pair file extension to .xys. (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read annotation file [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GPL11164.ndf" (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1]) [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys" But, doing this resulted in an error message: seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys, author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD") makePdInfoPackage(arrays, destDir = getwd()) ====================================================================== ====================================================================== Building annotation package for Nimblegen Expression Array NDF: GPL11164.ndf XYS: GSM618107_14418002_532.xys ====================================================================== ====================================================================== Parsing file: GPL11164.ndf... OK Parsing file: GSM618107_14418002_532.xys... OK Merging NDF and XYS files... OK Preparing contents for featureSet table... Error in `[.data.frame`(ndfdata, , colsFS) : undefined columns selected In addition: Warning message: In is.na(ndfdata[["SIGNAL"]]) : is.na() applied to non-(list or vector) of type 'NULL' The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It seems .xys has a different arrangement than .pair, thus .ndf is not applicable to annotate the .pair file? Any suggestions? Hope to hear from you soon. Franklin -- output of sessionInfo(): > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] tcltk grid parallel stats graphics grDevices utils datasets methods base other attached packages: [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7 [7] Mfuzz_2.18.0 DynDoc_1.38.0 widgetTools_1.38.0 e1071_1.6-1 class_7.3-7 gplots_2.11.0.1 [13] KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2 gtools_2.7.1 timecourse_1.32.0 MASS_7.3-26 [19] Biobase_2.20.0 BiocGenerics_0.6.0 limma_3.16.5 ggplot2_0.9.3.1 BiocInstaller_1.10.1 loaded via a namespace (and not attached): [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 bitops_1.0-5 codetools_0.2-8 colorspace_1.2-2 [7] dichromat_2.0-0 digest_0.6.3 ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4 gtable_0.1.2 [13] IRanges_1.18.1 iterators_1.0.6 labeling_0.1 marray_1.38.0 munsell_0.4 plyr_1.8 [19] preprocessCore_1.22.0 proto_0.3-10 RColorBrewer_1.0-5 reshape2_1.2.2 scales_0.2.3 splines_3.0.1 [25] stats4_3.0.1 stringr_0.6.2 tkWidgets_1.38.0 tools_3.0.1 zlibbioc_1.6.0 > -- Sent via the guest posting facility at bioconductor.org.
ADD COMMENTlink modified 4.5 years ago by Benilton Carvalho4.2k • written 4.5 years ago by Guest User12k
0
gravatar for Benilton Carvalho
4.5 years ago by
Brazil/Campinas/UNICAMP
Benilton Carvalho4.2k wrote:
It's an unfortunate mistake to have the pairFile *argument* in the call (not in the slots session, but I see your point). :-( I'll make sure that this is fixed. You need to convert the PAIR files to XYS... Some refs that should help you in the process: https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html http://comments.gmane.org/gmane.science.biology.informatics.conductor/ 27547 b 2013/6/5 FRANKLIN JOHNSON [guest] <guest at="" bioconductor.org="">: > > Dear Maintainer, > > I downloaded available NimbleGen 'single channel' 532.PAIR files for a custom built expression microarray from NCBI/GEO (GPL11164). However, I get an error message when I try to make the annotation for this platform using pdInfoBuild. > > In pdInfoBuilder Reference Manual (June 5, 2013), under the NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although, showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I changed the .pair file extension to .xys. > > (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read annotation file > [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GPL11164.ndf" > > (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1]) > [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys" > > But, doing this resulted in an error message: > seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys, author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD") > > makePdInfoPackage(arrays, destDir = getwd()) > ==================================================================== ====================================================================== == > Building annotation package for Nimblegen Expression Array > NDF: GPL11164.ndf > XYS: GSM618107_14418002_532.xys > ==================================================================== ====================================================================== == > Parsing file: GPL11164.ndf... OK > Parsing file: GSM618107_14418002_532.xys... OK > Merging NDF and XYS files... OK > Preparing contents for featureSet table... Error in `[.data.frame`(ndfdata, , colsFS) : undefined columns selected > In addition: Warning message: > In is.na(ndfdata[["SIGNAL"]]) : > is.na() applied to non-(list or vector) of type 'NULL' > > The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It seems .xys has a different arrangement than .pair, thus .ndf is not applicable to annotate the .pair file? Any suggestions? > Hope to hear from you soon. > Franklin > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] tcltk grid parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7 > [7] Mfuzz_2.18.0 DynDoc_1.38.0 widgetTools_1.38.0 e1071_1.6-1 class_7.3-7 gplots_2.11.0.1 > [13] KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2 gtools_2.7.1 timecourse_1.32.0 MASS_7.3-26 > [19] Biobase_2.20.0 BiocGenerics_0.6.0 limma_3.16.5 ggplot2_0.9.3.1 BiocInstaller_1.10.1 > > loaded via a namespace (and not attached): > [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 bitops_1.0-5 codetools_0.2-8 colorspace_1.2-2 > [7] dichromat_2.0-0 digest_0.6.3 ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4 gtable_0.1.2 > [13] IRanges_1.18.1 iterators_1.0.6 labeling_0.1 marray_1.38.0 munsell_0.4 plyr_1.8 > [19] preprocessCore_1.22.0 proto_0.3-10 RColorBrewer_1.0-5 reshape2_1.2.2 scales_0.2.3 splines_3.0.1 > [25] stats4_3.0.1 stringr_0.6.2 tkWidgets_1.38.0 tools_3.0.1 zlibbioc_1.6.0 >> > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 4.5 years ago by Benilton Carvalho4.2k
Dear Dr. Carvalho, Muchos grasias for the reply. Actually, this is what my .ndf file looks like: > head(ndf) PROBE_DESIGN_ID CONTAINER DESIGN_NOTE SELECTION_CRITERIA SEQ_ID 1 7552_0343_0009 Duplicate_1 2 7552_0345_0009 Duplicate_2 3 7552_0347_0009 Duplicate_1 4 7552_0349_0009 Duplicate_2 5 7552_0351_0009 Duplicate_2 6 7552_0353_0009 Duplicate_1 PROBE_SEQUENCE MISMATCH MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS 1 cttgactcttctaagttcaaaggtaactcaagtgaagctgtcagatatgatccttcca 0 64535488 64535488 9 343 2 cccaagcattaaaccttactcatatacttataatgcagccatcaagagtttgtgcaagg 0 64799310 64799310 9 345 3 agggaggctgaaagagagagtgaatggtccagctgggcataattgctgca 0 64476989 64476989 9 347 4 ttgttggtgggggtgttgcccttagtaccccagaccttgaagcagttaaa 0 64862794 64862794 9 349 5 gtgtggggccccctttctttaactggaacctttctttgaagcaatttggg 0 64832726 64832726 9 351 6 ttgtccaattccaacatgccgagacggcagggattgtgatcgtgttgttc 0 64435686 64435686 9 353 PROBE_ID POSITION DESIGN_ID X Y 1 Contig19819_1_f_28_10_535 0 7552 343 9 2 Malus_CN899188_2_f_147_1_755 0 7552 345 9 3 Contig20738_8_r_1179_2_1432 0 7552 347 9 4 Malus_CN880097_2_r_336_2_536 0 7552 349 9 5 Malus_CN918117_2_f_632_1_781 0 7552 351 9 6 Contig1991_1_f_71_2_1239 0 7552 353 9 The pair files, .532 pair files only (one-color arrays), only obtain the probe ID and signal; after some text at the top describing the experiment. My real issue is that I can further normalize and analyze the RMA files with sva and limma, etc. However, I cannot annotate the probes without the array annotation, as there are duplicates in the ndf file which are removed in the RMA.pair files available on NCBI/GEO. So they will not match in any annotation package I've failed at trying. So, I' tried to go back and start from the raw pair files...this custom array is really a "custom" array without NimbleScan. Salud, Franklin Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt ________________________________________ From: Benilton Carvalho [beniltoncarvalho@gmail.com] Sent: Wednesday, June 05, 2013 6:42 PM To: FRANKLIN JOHNSON [guest] Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu; pdInfoBuilder Maintainer Subject: Re: [BioC] PAIR files -- feature set table It's an unfortunate mistake to have the pairFile *argument* in the call (not in the slots session, but I see your point). :-( I'll make sure that this is fixed. You need to convert the PAIR files to XYS... Some refs that should help you in the process: https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html http://comments.gmane.org/gmane.science.biology.informatics.conductor/ 27547 b 2013/6/5 FRANKLIN JOHNSON [guest] <guest at="" bioconductor.org="">: > > Dear Maintainer, > > I downloaded available NimbleGen 'single channel' 532.PAIR files for a custom built expression microarray from NCBI/GEO (GPL11164). However, I get an error message when I try to make the annotation for this platform using pdInfoBuild. > > In pdInfoBuilder Reference Manual (June 5, 2013), under the NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although, showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I changed the .pair file extension to .xys. > > (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read annotation file > [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GPL11164.ndf" > > (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1]) > [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys" > > But, doing this resulted in an error message: > seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys, author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD") > > makePdInfoPackage(arrays, destDir = getwd()) > ==================================================================== ====================================================================== == > Building annotation package for Nimblegen Expression Array > NDF: GPL11164.ndf > XYS: GSM618107_14418002_532.xys > ==================================================================== ====================================================================== == > Parsing file: GPL11164.ndf... OK > Parsing file: GSM618107_14418002_532.xys... OK > Merging NDF and XYS files... OK > Preparing contents for featureSet table... Error in `[.data.frame`(ndfdata, , colsFS) : undefined columns selected > In addition: Warning message: > In is.na(ndfdata[["SIGNAL"]]) : > is.na() applied to non-(list or vector) of type 'NULL' > > The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It seems .xys has a different arrangement than .pair, thus .ndf is not applicable to annotate the .pair file? Any suggestions? > Hope to hear from you soon. > Franklin > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] tcltk grid parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7 > [7] Mfuzz_2.18.0 DynDoc_1.38.0 widgetTools_1.38.0 e1071_1.6-1 class_7.3-7 gplots_2.11.0.1 > [13] KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2 gtools_2.7.1 timecourse_1.32.0 MASS_7.3-26 > [19] Biobase_2.20.0 BiocGenerics_0.6.0 limma_3.16.5 ggplot2_0.9.3.1 BiocInstaller_1.10.1 > > loaded via a namespace (and not attached): > [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 bitops_1.0-5 codetools_0.2-8 colorspace_1.2-2 > [7] dichromat_2.0-0 digest_0.6.3 ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4 gtable_0.1.2 > [13] IRanges_1.18.1 iterators_1.0.6 labeling_0.1 marray_1.38.0 munsell_0.4 plyr_1.8 > [19] preprocessCore_1.22.0 proto_0.3-10 RColorBrewer_1.0-5 reshape2_1.2.2 scales_0.2.3 splines_3.0.1 > [25] stats4_3.0.1 stringr_0.6.2 tkWidgets_1.38.0 tools_3.0.1 zlibbioc_1.6.0 >> > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 4.5 years ago by Johnson, Franklin Theodore140
You will need to merge the PAIR and the NDF using the PROBE_ID column as key. This will allow you to get the X/Y coordinates needed to create the XYS as described on the other messages. Regarding annotation, you may need to contact NimbleGen to request this information directly from them... benilton 2013/6/6 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">: > Dear Dr. Carvalho, > > Muchos grasias for the reply. > > Actually, this is what my .ndf file looks like: >> head(ndf) > PROBE_DESIGN_ID CONTAINER DESIGN_NOTE SELECTION_CRITERIA SEQ_ID > 1 7552_0343_0009 Duplicate_1 > 2 7552_0345_0009 Duplicate_2 > 3 7552_0347_0009 Duplicate_1 > 4 7552_0349_0009 Duplicate_2 > 5 7552_0351_0009 Duplicate_2 > 6 7552_0353_0009 Duplicate_1 > PROBE_SEQUENCE MISMATCH MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS > 1 cttgactcttctaagttcaaaggtaactcaagtgaagctgtcagatatgatccttcca 0 64535488 64535488 9 343 > 2 cccaagcattaaaccttactcatatacttataatgcagccatcaagagtttgtgcaagg 0 64799310 64799310 9 345 > 3 agggaggctgaaagagagagtgaatggtccagctgggcataattgctgca 0 64476989 64476989 9 347 > 4 ttgttggtgggggtgttgcccttagtaccccagaccttgaagcagttaaa 0 64862794 64862794 9 349 > 5 gtgtggggccccctttctttaactggaacctttctttgaagcaatttggg 0 64832726 64832726 9 351 > 6 ttgtccaattccaacatgccgagacggcagggattgtgatcgtgttgttc 0 64435686 64435686 9 353 > PROBE_ID POSITION DESIGN_ID X Y > 1 Contig19819_1_f_28_10_535 0 7552 343 9 > 2 Malus_CN899188_2_f_147_1_755 0 7552 345 9 > 3 Contig20738_8_r_1179_2_1432 0 7552 347 9 > 4 Malus_CN880097_2_r_336_2_536 0 7552 349 9 > 5 Malus_CN918117_2_f_632_1_781 0 7552 351 9 > 6 Contig1991_1_f_71_2_1239 0 7552 353 9 > > The pair files, .532 pair files only (one-color arrays), only obtain the probe ID and signal; after some text at the top describing the experiment. My real issue is that I can further normalize and analyze the RMA files with sva and limma, etc. However, I cannot annotate the probes without the array annotation, as there are duplicates in the ndf file which are removed in the RMA.pair files available on NCBI/GEO. So they will not match in any annotation package I've failed at trying. > So, I' tried to go back and start from the raw pair files...this custom array is really a "custom" array without > NimbleScan. > > Salud, > Franklin > > > > > > > Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt > > > > > ________________________________________ > From: Benilton Carvalho [beniltoncarvalho at gmail.com] > Sent: Wednesday, June 05, 2013 6:42 PM > To: FRANKLIN JOHNSON [guest] > Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu; pdInfoBuilder Maintainer > Subject: Re: [BioC] PAIR files -- feature set table > > It's an unfortunate mistake to have the pairFile *argument* in the > call (not in the slots session, but I see your point). :-( I'll make > sure that this is fixed. > > You need to convert the PAIR files to XYS... > > Some refs that should help you in the process: > > https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html > http://comments.gmane.org/gmane.science.biology.informatics.conducto r/27547 > > b > > 2013/6/5 FRANKLIN JOHNSON [guest] <guest at="" bioconductor.org="">: >> >> Dear Maintainer, >> >> I downloaded available NimbleGen 'single channel' 532.PAIR files for a custom built expression microarray from NCBI/GEO (GPL11164). However, I get an error message when I try to make the annotation for this platform using pdInfoBuild. >> >> In pdInfoBuilder Reference Manual (June 5, 2013), under the NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although, showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I changed the .pair file extension to .xys. >> >> (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read annotation file >> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GPL11164.ndf" >> >> (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1]) >> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys" >> >> But, doing this resulted in an error message: >> seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys, author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD") >> >> makePdInfoPackage(arrays, destDir = getwd()) >> =================================================================== ====================================================================== === >> Building annotation package for Nimblegen Expression Array >> NDF: GPL11164.ndf >> XYS: GSM618107_14418002_532.xys >> =================================================================== ====================================================================== === >> Parsing file: GPL11164.ndf... OK >> Parsing file: GSM618107_14418002_532.xys... OK >> Merging NDF and XYS files... OK >> Preparing contents for featureSet table... Error in `[.data.frame`(ndfdata, , colsFS) : undefined columns selected >> In addition: Warning message: >> In is.na(ndfdata[["SIGNAL"]]) : >> is.na() applied to non-(list or vector) of type 'NULL' >> >> The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It seems .xys has a different arrangement than .pair, thus .ndf is not applicable to annotate the .pair file? Any suggestions? >> Hope to hear from you soon. >> Franklin >> >> -- output of sessionInfo(): >> >>> sessionInfo() >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] tcltk grid parallel stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7 >> [7] Mfuzz_2.18.0 DynDoc_1.38.0 widgetTools_1.38.0 e1071_1.6-1 class_7.3-7 gplots_2.11.0.1 >> [13] KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2 gtools_2.7.1 timecourse_1.32.0 MASS_7.3-26 >> [19] Biobase_2.20.0 BiocGenerics_0.6.0 limma_3.16.5 ggplot2_0.9.3.1 BiocInstaller_1.10.1 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 bitops_1.0-5 codetools_0.2-8 colorspace_1.2-2 >> [7] dichromat_2.0-0 digest_0.6.3 ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4 gtable_0.1.2 >> [13] IRanges_1.18.1 iterators_1.0.6 labeling_0.1 marray_1.38.0 munsell_0.4 plyr_1.8 >> [19] preprocessCore_1.22.0 proto_0.3-10 RColorBrewer_1.0-5 reshape2_1.2.2 scales_0.2.3 splines_3.0.1 >> [25] stats4_3.0.1 stringr_0.6.2 tkWidgets_1.38.0 tools_3.0.1 zlibbioc_1.6.0 >>> >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 4.5 years ago by Benilton Carvalho4.2k
Resending to bioconductor message thread: Dear Dr. Carvalho, Thanks for the response. As you suggested, I will look into the merge function using "Probe_ID". After reading in the data, I will start here: merge.datasets(dataset1, dataset2, by="key"). Best Regards, Franklin Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt ________________________________________ From: Benilton Carvalho [beniltoncarvalho@gmail.com] Sent: Thursday, June 06, 2013 8:11 PM To: Johnson, Franklin Theodore Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu Subject: Re: [BioC] PAIR files -- feature set table You will need to merge the PAIR and the NDF using the PROBE_ID column as key. This will allow you to get the X/Y coordinates needed to create the XYS as described on the other messages. Regarding annotation, you may need to contact NimbleGen to request this information directly from them... benilton 2013/6/6 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">: > Dear Dr. Carvalho, > > Muchos grasias for the reply. > > Actually, this is what my .ndf file looks like: >> head(ndf) > PROBE_DESIGN_ID CONTAINER DESIGN_NOTE SELECTION_CRITERIA SEQ_ID > 1 7552_0343_0009 Duplicate_1 > 2 7552_0345_0009 Duplicate_2 > 3 7552_0347_0009 Duplicate_1 > 4 7552_0349_0009 Duplicate_2 > 5 7552_0351_0009 Duplicate_2 > 6 7552_0353_0009 Duplicate_1 > PROBE_SEQUENCE MISMATCH MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS > 1 cttgactcttctaagttcaaaggtaactcaagtgaagctgtcagatatgatccttcca 0 64535488 64535488 9 343 > 2 cccaagcattaaaccttactcatatacttataatgcagccatcaagagtttgtgcaagg 0 64799310 64799310 9 345 > 3 agggaggctgaaagagagagtgaatggtccagctgggcataattgctgca 0 64476989 64476989 9 347 > 4 ttgttggtgggggtgttgcccttagtaccccagaccttgaagcagttaaa 0 64862794 64862794 9 349 > 5 gtgtggggccccctttctttaactggaacctttctttgaagcaatttggg 0 64832726 64832726 9 351 > 6 ttgtccaattccaacatgccgagacggcagggattgtgatcgtgttgttc 0 64435686 64435686 9 353 > PROBE_ID POSITION DESIGN_ID X Y > 1 Contig19819_1_f_28_10_535 0 7552 343 9 > 2 Malus_CN899188_2_f_147_1_755 0 7552 345 9 > 3 Contig20738_8_r_1179_2_1432 0 7552 347 9 > 4 Malus_CN880097_2_r_336_2_536 0 7552 349 9 > 5 Malus_CN918117_2_f_632_1_781 0 7552 351 9 > 6 Contig1991_1_f_71_2_1239 0 7552 353 9 > > The pair files, .532 pair files only (one-color arrays), only obtain the probe ID and signal; after some text at the top describing the experiment. My real issue is that I can further normalize and analyze the RMA files with sva and limma, etc. However, I cannot annotate the probes without the array annotation, as there are duplicates in the ndf file which are removed in the RMA.pair files available on NCBI/GEO. So they will not match in any annotation package I've failed at trying. > So, I' tried to go back and start from the raw pair files...this custom array is really a "custom" array without > NimbleScan. > > Salud, > Franklin > > > > > > > Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt > > > > > ________________________________________ > From: Benilton Carvalho [beniltoncarvalho at gmail.com] > Sent: Wednesday, June 05, 2013 6:42 PM > To: FRANKLIN JOHNSON [guest] > Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu; pdInfoBuilder Maintainer > Subject: Re: [BioC] PAIR files -- feature set table > > It's an unfortunate mistake to have the pairFile *argument* in the > call (not in the slots session, but I see your point). :-( I'll make > sure that this is fixed. > > You need to convert the PAIR files to XYS... > > Some refs that should help you in the process: > > https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html > http://comments.gmane.org/gmane.science.biology.informatics.conducto r/27547 > > b > > 2013/6/5 FRANKLIN JOHNSON [guest] <guest at="" bioconductor.org="">: >> >> Dear Maintainer, >> >> I downloaded available NimbleGen 'single channel' 532.PAIR files for a custom built expression microarray from NCBI/GEO (GPL11164). However, I get an error message when I try to make the annotation for this platform using pdInfoBuild. >> >> In pdInfoBuilder Reference Manual (June 5, 2013), under the NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although, showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I changed the .pair file extension to .xys. >> >> (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read annotation file >> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GPL11164.ndf" >> >> (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1]) >> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys" >> >> But, doing this resulted in an error message: >> seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys, author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD") >> >> makePdInfoPackage(arrays, destDir = getwd()) >> =================================================================== ====================================================================== === >> Building annotation package for Nimblegen Expression Array >> NDF: GPL11164.ndf >> XYS: GSM618107_14418002_532.xys >> =================================================================== ====================================================================== === >> Parsing file: GPL11164.ndf... OK >> Parsing file: GSM618107_14418002_532.xys... OK >> Merging NDF and XYS files... OK >> Preparing contents for featureSet table... Error in `[.data.frame`(ndfdata, , colsFS) : undefined columns selected >> In addition: Warning message: >> In is.na(ndfdata[["SIGNAL"]]) : >> is.na() applied to non-(list or vector) of type 'NULL' >> >> The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It seems .xys has a different arrangement than .pair, thus .ndf is not applicable to annotate the .pair file? Any suggestions? >> Hope to hear from you soon. >> Franklin >> >> -- output of sessionInfo(): >> >>> sessionInfo() >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] tcltk grid parallel stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7 >> [7] Mfuzz_2.18.0 DynDoc_1.38.0 widgetTools_1.38.0 e1071_1.6-1 class_7.3-7 gplots_2.11.0.1 >> [13] KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2 gtools_2.7.1 timecourse_1.32.0 MASS_7.3-26 >> [19] Biobase_2.20.0 BiocGenerics_0.6.0 limma_3.16.5 ggplot2_0.9.3.1 BiocInstaller_1.10.1 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 bitops_1.0-5 codetools_0.2-8 colorspace_1.2-2 >> [7] dichromat_2.0-0 digest_0.6.3 ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4 gtable_0.1.2 >> [13] IRanges_1.18.1 iterators_1.0.6 labeling_0.1 marray_1.38.0 munsell_0.4 plyr_1.8 >> [19] preprocessCore_1.22.0 proto_0.3-10 RColorBrewer_1.0-5 reshape2_1.2.2 scales_0.2.3 splines_3.0.1 >> [25] stats4_3.0.1 stringr_0.6.2 tkWidgets_1.38.0 tools_3.0.1 zlibbioc_1.6.0 >>> >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 4.5 years ago by Johnson, Franklin Theodore140
Dear Dr. Carvalho, Recently, we had cooresponence regaring makePDInfoPackage for an NimbleGen apple microarray. I was able to merge the ndf design and XYS files using PROBE_ID. As a reminder this is a custom array, and there are no SIGNAL==NAs for control probes. It seemed to work: > makePdInfoPackage(seed, destDir("")) ====================================================================== ====================================================================== ================ Building annotation package for Nimblegen Expression Array NDF: GPL11164.ndf XYS: XYS.txt ====================================================================== ====================================================================== ================ Parsing file: GPL11164.ndf... OK Parsing file: XYS.txt... OK Merging NDF and XYS files... OK Preparing contents for featureSet table... OK Preparing contents for bgfeature table... OK Preparing contents for pmfeature table... OK Creating package in C:/Users/franklin.johnson.PW50-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/pd.gpl11164 Inserting 2 rows into table featureSet... OK Inserting 765524 rows into table pmfeature... OK Inserting 5075 rows into table bgfeature... OK Counting rows in bgfeature Counting rows in featureSet Counting rows in pmfeature Creating index idx_bgfsetid on bgfeature... OK Creating index idx_bgfid on bgfeature... OK Creating index idx_pmfsetid on pmfeature... OK Creating index idx_pmfid on pmfeature... OK Creating index idx_fsfsetid on featureSet... OK Saving DataFrame object for PM. Saving DataFrame object for BG. Done. Warning message: In is.na(ndfdata[["SIGNAL"]]) : is.na() applied to non-(list or vector) of type 'NULL' > In contrast to this warning message, I see a pdinfopackage directory with 4 subdirectories: c=("data", "inst", "man", R"), as well as subsubdirectories in "inst"=c("extdata", and "Unit Tests"), in addition to two text files in the main directory: c=("DESCRIPTION", "NAMESPACE") were created in my destination folder. Before using "oligo", if possible, I wanted to confirm with you that this package is viable to use with "oligo" although a warning message that may not pertain to my custom designed microarray was printed. Regards, Franklin Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt ________________________________________ From: Johnson, Franklin Theodore Sent: Friday, June 07, 2013 10:39 AM To: Benilton Carvalho Cc: bioconductor at r-project.org Subject: RE: [BioC] PAIR files -- feature set table Resending to bioconductor message thread: Dear Dr. Carvalho, Thanks for the response. As you suggested, I will look into the merge function using "Probe_ID". After reading in the data, I will start here: merge.datasets(dataset1, dataset2, by="key"). Best Regards, Franklin Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt ________________________________________ From: Benilton Carvalho [beniltoncarvalho@gmail.com] Sent: Thursday, June 06, 2013 8:11 PM To: Johnson, Franklin Theodore Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu Subject: Re: [BioC] PAIR files -- feature set table You will need to merge the PAIR and the NDF using the PROBE_ID column as key. This will allow you to get the X/Y coordinates needed to create the XYS as described on the other messages. Regarding annotation, you may need to contact NimbleGen to request this information directly from them... benilton 2013/6/6 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">: > Dear Dr. Carvalho, > > Muchos grasias for the reply. > > Actually, this is what my .ndf file looks like: >> head(ndf) > PROBE_DESIGN_ID CONTAINER DESIGN_NOTE SELECTION_CRITERIA SEQ_ID > 1 7552_0343_0009 Duplicate_1 > 2 7552_0345_0009 Duplicate_2 > 3 7552_0347_0009 Duplicate_1 > 4 7552_0349_0009 Duplicate_2 > 5 7552_0351_0009 Duplicate_2 > 6 7552_0353_0009 Duplicate_1 > PROBE_SEQUENCE MISMATCH MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS > 1 cttgactcttctaagttcaaaggtaactcaagtgaagctgtcagatatgatccttcca 0 64535488 64535488 9 343 > 2 cccaagcattaaaccttactcatatacttataatgcagccatcaagagtttgtgcaagg 0 64799310 64799310 9 345 > 3 agggaggctgaaagagagagtgaatggtccagctgggcataattgctgca 0 64476989 64476989 9 347 > 4 ttgttggtgggggtgttgcccttagtaccccagaccttgaagcagttaaa 0 64862794 64862794 9 349 > 5 gtgtggggccccctttctttaactggaacctttctttgaagcaatttggg 0 64832726 64832726 9 351 > 6 ttgtccaattccaacatgccgagacggcagggattgtgatcgtgttgttc 0 64435686 64435686 9 353 > PROBE_ID POSITION DESIGN_ID X Y > 1 Contig19819_1_f_28_10_535 0 7552 343 9 > 2 Malus_CN899188_2_f_147_1_755 0 7552 345 9 > 3 Contig20738_8_r_1179_2_1432 0 7552 347 9 > 4 Malus_CN880097_2_r_336_2_536 0 7552 349 9 > 5 Malus_CN918117_2_f_632_1_781 0 7552 351 9 > 6 Contig1991_1_f_71_2_1239 0 7552 353 9 > > The pair files, .532 pair files only (one-color arrays), only obtain the probe ID and signal; after some text at the top describing the experiment. My real issue is that I can further normalize and analyze the RMA files with sva and limma, etc. However, I cannot annotate the probes without the array annotation, as there are duplicates in the ndf file which are removed in the RMA.pair files available on NCBI/GEO. So they will not match in any annotation package I've failed at trying. > So, I' tried to go back and start from the raw pair files...this custom array is really a "custom" array without > NimbleScan. > > Salud, > Franklin > > > > > > > Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt > > > > > ________________________________________ > From: Benilton Carvalho [beniltoncarvalho at gmail.com] > Sent: Wednesday, June 05, 2013 6:42 PM > To: FRANKLIN JOHNSON [guest] > Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu; pdInfoBuilder Maintainer > Subject: Re: [BioC] PAIR files -- feature set table > > It's an unfortunate mistake to have the pairFile *argument* in the > call (not in the slots session, but I see your point). :-( I'll make > sure that this is fixed. > > You need to convert the PAIR files to XYS... > > Some refs that should help you in the process: > > https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html > http://comments.gmane.org/gmane.science.biology.informatics.conducto r/27547 > > b > > 2013/6/5 FRANKLIN JOHNSON [guest] <guest at="" bioconductor.org="">: >> >> Dear Maintainer, >> >> I downloaded available NimbleGen 'single channel' 532.PAIR files for a custom built expression microarray from NCBI/GEO (GPL11164). However, I get an error message when I try to make the annotation for this platform using pdInfoBuild. >> >> In pdInfoBuilder Reference Manual (June 5, 2013), under the NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although, showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I changed the .pair file extension to .xys. >> >> (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read annotation file >> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GPL11164.ndf" >> >> (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1]) >> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys" >> >> But, doing this resulted in an error message: >> seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys, author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD") >> >> makePdInfoPackage(arrays, destDir = getwd()) >> =================================================================== ====================================================================== === >> Building annotation package for Nimblegen Expression Array >> NDF: GPL11164.ndf >> XYS: GSM618107_14418002_532.xys >> =================================================================== ====================================================================== === >> Parsing file: GPL11164.ndf... OK >> Parsing file: GSM618107_14418002_532.xys... OK >> Merging NDF and XYS files... OK >> Preparing contents for featureSet table... Error in `[.data.frame`(ndfdata, , colsFS) : undefined columns selected >> In addition: Warning message: >> In is.na(ndfdata[["SIGNAL"]]) : >> is.na() applied to non-(list or vector) of type 'NULL' >> >> The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It seems .xys has a different arrangement than .pair, thus .ndf is not applicable to annotate the .pair file? Any suggestions? >> Hope to hear from you soon. >> Franklin >> >> -- output of sessionInfo(): >> >>> sessionInfo() >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] tcltk grid parallel stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7 >> [7] Mfuzz_2.18.0 DynDoc_1.38.0 widgetTools_1.38.0 e1071_1.6-1 class_7.3-7 gplots_2.11.0.1 >> [13] KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2 gtools_2.7.1 timecourse_1.32.0 MASS_7.3-26 >> [19] Biobase_2.20.0 BiocGenerics_0.6.0 limma_3.16.5 ggplot2_0.9.3.1 BiocInstaller_1.10.1 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 bitops_1.0-5 codetools_0.2-8 colorspace_1.2-2 >> [7] dichromat_2.0-0 digest_0.6.3 ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4 gtable_0.1.2 >> [13] IRanges_1.18.1 iterators_1.0.6 labeling_0.1 marray_1.38.0 munsell_0.4 plyr_1.8 >> [19] preprocessCore_1.22.0 proto_0.3-10 RColorBrewer_1.0-5 reshape2_1.2.2 scales_0.2.3 splines_3.0.1 >> [25] stats4_3.0.1 stringr_0.6.2 tkWidgets_1.38.0 tools_3.0.1 zlibbioc_1.6.0 >>> >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 4.5 years ago by Johnson, Franklin Theodore140
That does not look ok. The problem is the count for the featureSet table... This table stores the information for "genes" (or whatever the target for this particular array is)... so, it is unlikely that you have a microarray with only 2 "target units"... I'd expect something around the thousands... pdInfoBuilder uses the information in SEQ_ID (in the NDF) to get the target information (i.e., the contents for featureSet). Given that this is a custom array, I believe that the best idea is to contact the person who designed it and ask more details about the design (in particular, how many probesets and average number of probes per probeset)... I've seen some designs in which the information that was expected to be in SEQ_ID was actually stored in PROBE_ID (in such cases, the user needs to create a backup copy of the NDF, and then move the contents of PROBE_ID to SEQ_ID - and vice-versa). b 2013/6/12 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">: > Dear Dr. Carvalho, > > Recently, we had cooresponence regaring makePDInfoPackage for an NimbleGen apple microarray. > I was able to merge the ndf design and XYS files using PROBE_ID. > As a reminder this is a custom array, and there are no SIGNAL==NAs for control probes. > It seemed to work: >> makePdInfoPackage(seed, destDir("")) > ==================================================================== ====================================================================== ================== > Building annotation package for Nimblegen Expression Array > NDF: GPL11164.ndf > XYS: XYS.txt > ==================================================================== ====================================================================== ================== > Parsing file: GPL11164.ndf... OK > Parsing file: XYS.txt... OK > Merging NDF and XYS files... OK > Preparing contents for featureSet table... OK > Preparing contents for bgfeature table... OK > Preparing contents for pmfeature table... OK > Creating package in C:/Users/franklin.johnson.PW50-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/pd.gpl11164 > Inserting 2 rows into table featureSet... OK > Inserting 765524 rows into table pmfeature... OK > Inserting 5075 rows into table bgfeature... OK > Counting rows in bgfeature > Counting rows in featureSet > Counting rows in pmfeature > Creating index idx_bgfsetid on bgfeature... OK > Creating index idx_bgfid on bgfeature... OK > Creating index idx_pmfsetid on pmfeature... OK > Creating index idx_pmfid on pmfeature... OK > Creating index idx_fsfsetid on featureSet... OK > Saving DataFrame object for PM. > Saving DataFrame object for BG. > Done. > Warning message: > In is.na(ndfdata[["SIGNAL"]]) : > is.na() applied to non-(list or vector) of type 'NULL' >> > > In contrast to this warning message, I see a pdinfopackage directory with 4 subdirectories: c=("data", "inst", "man", R"), as well as subsubdirectories in "inst"=c("extdata", and "Unit Tests"), in addition to two text files in the main directory: c=("DESCRIPTION", "NAMESPACE") were created in my destination folder. > Before using "oligo", if possible, I wanted to confirm with you that this package is viable to use with "oligo" although a warning message that may not pertain to my custom designed microarray was printed. > > Regards, > Franklin > > Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt > > > > > ________________________________________ > From: Johnson, Franklin Theodore > Sent: Friday, June 07, 2013 10:39 AM > To: Benilton Carvalho > Cc: bioconductor at r-project.org > Subject: RE: [BioC] PAIR files -- feature set table > > Resending to bioconductor message thread: > > Dear Dr. Carvalho, > Thanks for the response. > As you suggested, I will look into the merge function using "Probe_ID". > After reading in the data, I will start here: merge.datasets(dataset1, dataset2, by="key"). > Best Regards, > Franklin > > Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt > > ________________________________________ > From: Benilton Carvalho [beniltoncarvalho at gmail.com] > Sent: Thursday, June 06, 2013 8:11 PM > To: Johnson, Franklin Theodore > Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu > Subject: Re: [BioC] PAIR files -- feature set table > > You will need to merge the PAIR and the NDF using the PROBE_ID column > as key. This will allow you to get the X/Y coordinates needed to > create the XYS as described on the other messages. > > Regarding annotation, you may need to contact NimbleGen to request > this information directly from them... > > benilton > > 2013/6/6 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">: >> Dear Dr. Carvalho, >> >> Muchos grasias for the reply. >> >> Actually, this is what my .ndf file looks like: >>> head(ndf) >> PROBE_DESIGN_ID CONTAINER DESIGN_NOTE SELECTION_CRITERIA SEQ_ID >> 1 7552_0343_0009 Duplicate_1 >> 2 7552_0345_0009 Duplicate_2 >> 3 7552_0347_0009 Duplicate_1 >> 4 7552_0349_0009 Duplicate_2 >> 5 7552_0351_0009 Duplicate_2 >> 6 7552_0353_0009 Duplicate_1 >> PROBE_SEQUENCE MISMATCH MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS >> 1 cttgactcttctaagttcaaaggtaactcaagtgaagctgtcagatatgatccttcca 0 64535488 64535488 9 343 >> 2 cccaagcattaaaccttactcatatacttataatgcagccatcaagagtttgtgcaagg 0 64799310 64799310 9 345 >> 3 agggaggctgaaagagagagtgaatggtccagctgggcataattgctgca 0 64476989 64476989 9 347 >> 4 ttgttggtgggggtgttgcccttagtaccccagaccttgaagcagttaaa 0 64862794 64862794 9 349 >> 5 gtgtggggccccctttctttaactggaacctttctttgaagcaatttggg 0 64832726 64832726 9 351 >> 6 ttgtccaattccaacatgccgagacggcagggattgtgatcgtgttgttc 0 64435686 64435686 9 353 >> PROBE_ID POSITION DESIGN_ID X Y >> 1 Contig19819_1_f_28_10_535 0 7552 343 9 >> 2 Malus_CN899188_2_f_147_1_755 0 7552 345 9 >> 3 Contig20738_8_r_1179_2_1432 0 7552 347 9 >> 4 Malus_CN880097_2_r_336_2_536 0 7552 349 9 >> 5 Malus_CN918117_2_f_632_1_781 0 7552 351 9 >> 6 Contig1991_1_f_71_2_1239 0 7552 353 9 >> >> The pair files, .532 pair files only (one-color arrays), only obtain the probe ID and signal; after some text at the top describing the experiment. My real issue is that I can further normalize and analyze the RMA files with sva and limma, etc. However, I cannot annotate the probes without the array annotation, as there are duplicates in the ndf file which are removed in the RMA.pair files available on NCBI/GEO. So they will not match in any annotation package I've failed at trying. >> So, I' tried to go back and start from the raw pair files...this custom array is really a "custom" array without >> NimbleScan. >> >> Salud, >> Franklin >> >> >> >> >> >> >> Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt >> >> >> >> >> ________________________________________ >> From: Benilton Carvalho [beniltoncarvalho at gmail.com] >> Sent: Wednesday, June 05, 2013 6:42 PM >> To: FRANKLIN JOHNSON [guest] >> Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu; pdInfoBuilder Maintainer >> Subject: Re: [BioC] PAIR files -- feature set table >> >> It's an unfortunate mistake to have the pairFile *argument* in the >> call (not in the slots session, but I see your point). :-( I'll make >> sure that this is fixed. >> >> You need to convert the PAIR files to XYS... >> >> Some refs that should help you in the process: >> >> https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html >> http://comments.gmane.org/gmane.science.biology.informatics.conduct or/27547 >> >> b >> >> 2013/6/5 FRANKLIN JOHNSON [guest] <guest at="" bioconductor.org="">: >>> >>> Dear Maintainer, >>> >>> I downloaded available NimbleGen 'single channel' 532.PAIR files for a custom built expression microarray from NCBI/GEO (GPL11164). However, I get an error message when I try to make the annotation for this platform using pdInfoBuild. >>> >>> In pdInfoBuilder Reference Manual (June 5, 2013), under the NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although, showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I changed the .pair file extension to .xys. >>> >>> (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read annotation file >>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GPL11164.ndf" >>> >>> (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1]) >>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys" >>> >>> But, doing this resulted in an error message: >>> seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys, author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD") >>> >>> makePdInfoPackage(arrays, destDir = getwd()) >>> ================================================================== ====================================================================== ==== >>> Building annotation package for Nimblegen Expression Array >>> NDF: GPL11164.ndf >>> XYS: GSM618107_14418002_532.xys >>> ================================================================== ====================================================================== ==== >>> Parsing file: GPL11164.ndf... OK >>> Parsing file: GSM618107_14418002_532.xys... OK >>> Merging NDF and XYS files... OK >>> Preparing contents for featureSet table... Error in `[.data.frame`(ndfdata, , colsFS) : undefined columns selected >>> In addition: Warning message: >>> In is.na(ndfdata[["SIGNAL"]]) : >>> is.na() applied to non-(list or vector) of type 'NULL' >>> >>> The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It seems .xys has a different arrangement than .pair, thus .ndf is not applicable to annotate the .pair file? Any suggestions? >>> Hope to hear from you soon. >>> Franklin >>> >>> -- output of sessionInfo(): >>> >>>> sessionInfo() >>> R version 3.0.1 (2013-05-16) >>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >>> [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >>> >>> attached base packages: >>> [1] tcltk grid parallel stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7 >>> [7] Mfuzz_2.18.0 DynDoc_1.38.0 widgetTools_1.38.0 e1071_1.6-1 class_7.3-7 gplots_2.11.0.1 >>> [13] KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2 gtools_2.7.1 timecourse_1.32.0 MASS_7.3-26 >>> [19] Biobase_2.20.0 BiocGenerics_0.6.0 limma_3.16.5 ggplot2_0.9.3.1 BiocInstaller_1.10.1 >>> >>> loaded via a namespace (and not attached): >>> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 bitops_1.0-5 codetools_0.2-8 colorspace_1.2-2 >>> [7] dichromat_2.0-0 digest_0.6.3 ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4 gtable_0.1.2 >>> [13] IRanges_1.18.1 iterators_1.0.6 labeling_0.1 marray_1.38.0 munsell_0.4 plyr_1.8 >>> [19] preprocessCore_1.22.0 proto_0.3-10 RColorBrewer_1.0-5 reshape2_1.2.2 scales_0.2.3 splines_3.0.1 >>> [25] stats4_3.0.1 stringr_0.6.2 tkWidgets_1.38.0 tools_3.0.1 zlibbioc_1.6.0 >>>> >>> >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLYlink written 4.5 years ago by Benilton Carvalho4.2k
Dr. Carvalho, Yes. I see what you mean. Switching the columns helped in the FeatureSet table loading inserted more that 2 rows: Inserting 198661 rows into table featureSet... OK However, the warning message did print again. Warning message: In is.na(ndfdata[["SIGNAL"]]) : is.na() applied to non-(list or vector) of type 'NULL' Below is the output + sessionInfo(), as I upgraded to R 3.0.1. #Begin R command line code: > makePdInfoPackage(arrays, destDir = getwd(), unlink=TRUE) ====================================================================== ====================================================================== ================== Building annotation package for Nimblegen Expression Array NDF: pdinfo_GPL11164.ndf.txt <-new .ndf file with PROBE_ID<->SEQ_ID XYS: XYS.txt ====================================================================== ====================================================================== ================== Parsing file: pdinfo_GPL11164.ndf.txt... OK Parsing file: XYS.txt... OK Merging NDF and XYS files... OK Preparing contents for featureSet table... OK Preparing contents for bgfeature table... OK Preparing contents for pmfeature table... OK Creating package in E:/RANDOM/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/pd.pdinfo.gpl11164.ndf.txt Inserting 198661 rows into table featureSet... OK Inserting 770599 rows into table pmfeature... OK Counting rows in featureSet Counting rows in pmfeature Creating index idx_pmfsetid on pmfeature... OK Creating index idx_pmfid on pmfeature... OK Creating index idx_fsfsetid on featureSet... OK Saving DataFrame object for PM. Done. Warning message: In is.na(ndfdata[["SIGNAL"]]) : is.na() applied to non-(list or vector) of type 'NULL' > sessionInfo() R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7 Biobase_2.20.0 [8] BiocGenerics_0.6.0 BiocInstaller_1.10.2 loaded via a namespace (and not attached): [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 codetools_0.2-8 ff_2.2-11 foreach_1.4.1 GenomicRanges_1.12.4 [8] IRanges_1.18.1 iterators_1.0.6 preprocessCore_1.22.0 splines_3.0.1 stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0 >q() The built pdInfopackage loaded in Destdir is identical to previous message. However the featureSet table now has more than 2 rows... Lastly, I did multiple combos, as my merged file has (X.x, Y.x)<-seems to be identifiers for the 'probe IDs' on the array as well as (X.y, Y.y) <- seems to be the sequence identifiers for the "SEQ_ID". I used X.x, Y.x and PM which gave the result I pasted above. All others had errors. I'm close, but that Warning Message is annoying... Regards, Franklin Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt ________________________________________ From: Benilton Carvalho [beniltoncarvalho@gmail.com] Sent: Wednesday, June 12, 2013 8:25 PM To: Johnson, Franklin Theodore Cc: bioconductor@r-project.org Subject: Re: [BioC] PAIR files -- feature set table That does not look ok. The problem is the count for the featureSet table... This table stores the information for "genes" (or whatever the target for this particular array is)... so, it is unlikely that you have a microarray with only 2 "target units"... I'd expect something around the thousands... pdInfoBuilder uses the information in SEQ_ID (in the NDF) to get the target information (i.e., the contents for featureSet). Given that this is a custom array, I believe that the best idea is to contact the person who designed it and ask more details about the design (in particular, how many probesets and average number of probes per probeset)... I've seen some designs in which the information that was expected to be in SEQ_ID was actually stored in PROBE_ID (in such cases, the user needs to create a backup copy of the NDF, and then move the contents of PROBE_ID to SEQ_ID - and vice-versa). b 2013/6/12 Johnson, Franklin Theodore <franklin.johnson@email.wsu.edu>: > Dear Dr. Carvalho, > > Recently, we had cooresponence regaring makePDInfoPackage for an NimbleGen apple microarray. > I was able to merge the ndf design and XYS files using PROBE_ID. > As a reminder this is a custom array, and there are no SIGNAL==NAs for control probes. > It seemed to work: >> makePdInfoPackage(seed, destDir("")) > ==================================================================== ====================================================================== ================== > Building annotation package for Nimblegen Expression Array > NDF: GPL11164.ndf > XYS: XYS.txt > ==================================================================== ====================================================================== ================== > Parsing file: GPL11164.ndf... OK > Parsing file: XYS.txt... OK > Merging NDF and XYS files... OK > Preparing contents for featureSet table... OK > Preparing contents for bgfeature table... OK > Preparing contents for pmfeature table... OK > Creating package in C:/Users/franklin.johnson.PW50-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/pd.gpl11164 > Inserting 2 rows into table featureSet... OK > Inserting 765524 rows into table pmfeature... OK > Inserting 5075 rows into table bgfeature... OK > Counting rows in bgfeature > Counting rows in featureSet > Counting rows in pmfeature > Creating index idx_bgfsetid on bgfeature... OK > Creating index idx_bgfid on bgfeature... OK > Creating index idx_pmfsetid on pmfeature... OK > Creating index idx_pmfid on pmfeature... OK > Creating index idx_fsfsetid on featureSet... OK > Saving DataFrame object for PM. > Saving DataFrame object for BG. > Done. > Warning message: > In is.na(ndfdata[["SIGNAL"]]) : > is.na() applied to non-(list or vector) of type 'NULL' >> > > In contrast to this warning message, I see a pdinfopackage directory with 4 subdirectories: c=("data", "inst", "man", R"), as well as subsubdirectories in "inst"=c("extdata", and "Unit Tests"), in addition to two text files in the main directory: c=("DESCRIPTION", "NAMESPACE") were created in my destination folder. > Before using "oligo", if possible, I wanted to confirm with you that this package is viable to use with "oligo" although a warning message that may not pertain to my custom designed microarray was printed. > > Regards, > Franklin > > Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt > > > > > ________________________________________ > From: Johnson, Franklin Theodore > Sent: Friday, June 07, 2013 10:39 AM > To: Benilton Carvalho > Cc: bioconductor@r-project.org > Subject: RE: [BioC] PAIR files -- feature set table > > Resending to bioconductor message thread: > > Dear Dr. Carvalho, > Thanks for the response. > As you suggested, I will look into the merge function using "Probe_ID". > After reading in the data, I will start here: merge.datasets(dataset1, dataset2, by="key"). > Best Regards, > Franklin > > Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt > > ________________________________________ > From: Benilton Carvalho [beniltoncarvalho@gmail.com] > Sent: Thursday, June 06, 2013 8:11 PM > To: Johnson, Franklin Theodore > Cc: bioconductor@r-project.org; franklin.johnson@wsu.edu > Subject: Re: [BioC] PAIR files -- feature set table > > You will need to merge the PAIR and the NDF using the PROBE_ID column > as key. This will allow you to get the X/Y coordinates needed to > create the XYS as described on the other messages. > > Regarding annotation, you may need to contact NimbleGen to request > this information directly from them... > > benilton > > 2013/6/6 Johnson, Franklin Theodore <franklin.johnson@email.wsu.edu>: >> Dear Dr. Carvalho, >> >> Muchos grasias for the reply. >> >> Actually, this is what my .ndf file looks like: >>> head(ndf) >> PROBE_DESIGN_ID CONTAINER DESIGN_NOTE SELECTION_CRITERIA SEQ_ID >> 1 7552_0343_0009 Duplicate_1 >> 2 7552_0345_0009 Duplicate_2 >> 3 7552_0347_0009 Duplicate_1 >> 4 7552_0349_0009 Duplicate_2 >> 5 7552_0351_0009 Duplicate_2 >> 6 7552_0353_0009 Duplicate_1 >> PROBE_SEQUENCE MISMATCH MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS >> 1 cttgactcttctaagttcaaaggtaactcaagtgaagctgtcagatatgatccttcca 0 64535488 64535488 9 343 >> 2 cccaagcattaaaccttactcatatacttataatgcagccatcaagagtttgtgcaagg 0 64799310 64799310 9 345 >> 3 agggaggctgaaagagagagtgaatggtccagctgggcataattgctgca 0 64476989 64476989 9 347 >> 4 ttgttggtgggggtgttgcccttagtaccccagaccttgaagcagttaaa 0 64862794 64862794 9 349 >> 5 gtgtggggccccctttctttaactggaacctttctttgaagcaatttggg 0 64832726 64832726 9 351 >> 6 ttgtccaattccaacatgccgagacggcagggattgtgatcgtgttgttc 0 64435686 64435686 9 353 >> PROBE_ID POSITION DESIGN_ID X Y >> 1 Contig19819_1_f_28_10_535 0 7552 343 9 >> 2 Malus_CN899188_2_f_147_1_755 0 7552 345 9 >> 3 Contig20738_8_r_1179_2_1432 0 7552 347 9 >> 4 Malus_CN880097_2_r_336_2_536 0 7552 349 9 >> 5 Malus_CN918117_2_f_632_1_781 0 7552 351 9 >> 6 Contig1991_1_f_71_2_1239 0 7552 353 9 >> >> The pair files, .532 pair files only (one-color arrays), only obtain the probe ID and signal; after some text at the top describing the experiment. My real issue is that I can further normalize and analyze the RMA files with sva and limma, etc. However, I cannot annotate the probes without the array annotation, as there are duplicates in the ndf file which are removed in the RMA.pair files available on NCBI/GEO. So they will not match in any annotation package I've failed at trying. >> So, I' tried to go back and start from the raw pair files...this custom array is really a "custom" array without >> NimbleScan. >> >> Salud, >> Franklin >> >> >> >> >> >> >> Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt >> >> >> >> >> ________________________________________ >> From: Benilton Carvalho [beniltoncarvalho@gmail.com] >> Sent: Wednesday, June 05, 2013 6:42 PM >> To: FRANKLIN JOHNSON [guest] >> Cc: bioconductor@r-project.org; franklin.johnson@wsu.edu; pdInfoBuilder Maintainer >> Subject: Re: [BioC] PAIR files -- feature set table >> >> It's an unfortunate mistake to have the pairFile *argument* in the >> call (not in the slots session, but I see your point). :-( I'll make >> sure that this is fixed. >> >> You need to convert the PAIR files to XYS... >> >> Some refs that should help you in the process: >> >> https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html >> http://comments.gmane.org/gmane.science.biology.informatics.conduct or/27547 >> >> b >> >> 2013/6/5 FRANKLIN JOHNSON [guest] <guest@bioconductor.org>: >>> >>> Dear Maintainer, >>> >>> I downloaded available NimbleGen 'single channel' 532.PAIR files for a custom built expression microarray from NCBI/GEO (GPL11164). However, I get an error message when I try to make the annotation for this platform using pdInfoBuild. >>> >>> In pdInfoBuilder Reference Manual (June 5, 2013), under the NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although, showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I changed the .pair file extension to .xys. >>> >>> (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read annotation file >>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GPL11164.ndf" >>> >>> (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1]) >>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys" >>> >>> But, doing this resulted in an error message: >>> seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys, author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD") >>> >>> makePdInfoPackage(arrays, destDir = getwd()) >>> ================================================================== ====================================================================== ==== >>> Building annotation package for Nimblegen Expression Array >>> NDF: GPL11164.ndf >>> XYS: GSM618107_14418002_532.xys >>> ================================================================== ====================================================================== ==== >>> Parsing file: GPL11164.ndf... OK >>> Parsing file: GSM618107_14418002_532.xys... OK >>> Merging NDF and XYS files... OK >>> Preparing contents for featureSet table... Error in `[.data.frame`(ndfdata, , colsFS) : undefined columns selected >>> In addition: Warning message: >>> In is.na(ndfdata[["SIGNAL"]]) : >>> is.na() applied to non-(list or vector) of type 'NULL' >>> >>> The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It seems .xys has a different arrangement than .pair, thus .ndf is not applicable to annotate the .pair file? Any suggestions? >>> Hope to hear from you soon. >>> Franklin >>> >>> -- output of sessionInfo(): >>> >>>> sessionInfo() >>> R version 3.0.1 (2013-05-16) >>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >>> [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >>> >>> attached base packages: >>> [1] tcltk grid parallel stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7 >>> [7] Mfuzz_2.18.0 DynDoc_1.38.0 widgetTools_1.38.0 e1071_1.6-1 class_7.3-7 gplots_2.11.0.1 >>> [13] KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2 gtools_2.7.1 timecourse_1.32.0 MASS_7.3-26 >>> [19] Biobase_2.20.0 BiocGenerics_0.6.0 limma_3.16.5 ggplot2_0.9.3.1 BiocInstaller_1.10.1 >>> >>> loaded via a namespace (and not attached): >>> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 bitops_1.0-5 codetools_0.2-8 colorspace_1.2-2 >>> [7] dichromat_2.0-0 digest_0.6.3 ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4 gtable_0.1.2 >>> [13] IRanges_1.18.1 iterators_1.0.6 labeling_0.1 marray_1.38.0 munsell_0.4 plyr_1.8 >>> [19] preprocessCore_1.22.0 proto_0.3-10 RColorBrewer_1.0-5 reshape2_1.2.2 scales_0.2.3 splines_3.0.1 >>> [25] stats4_3.0.1 stringr_0.6.2 tkWidgets_1.38.0 tools_3.0.1 zlibbioc_1.6.0 >>>> >>> >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 4.5 years ago by Johnson, Franklin Theodore140
dont worry about that particular warning.... just install the package and try to read your XYS files. 2013/6/13 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">: > Dr. Carvalho, > > Yes. I see what you mean. > Switching the columns helped in the FeatureSet table loading inserted more > that 2 rows: > > Inserting 198661 rows into table featureSet... OK > However, the warning message did print again. > > > Warning message: > In is.na(ndfdata[["SIGNAL"]]) : > is.na() applied to non-(list or vector) of type 'NULL' > > Below is the output + sessionInfo(), as I upgraded to R 3.0.1. > > #Begin R command line code: > >> makePdInfoPackage(arrays, destDir = getwd(), unlink=TRUE) > ==================================================================== ====================================================================== ==================== > > > Building annotation package for Nimblegen Expression Array > NDF: pdinfo_GPL11164.ndf.txt <-new .ndf file with PROBE_ID<->SEQ_ID > XYS: XYS.txt > ==================================================================== ====================================================================== ==================== > Parsing file: pdinfo_GPL11164.ndf.txt... OK > > Parsing file: XYS.txt... OK > Merging NDF and XYS files... OK > Preparing contents for featureSet table... OK > Preparing contents for bgfeature table... OK > Preparing contents for pmfeature table... OK > Creating package in E:/RANDOM/Test/Yanmin's Microarray Paper/Yanmin > Microarray RAW/pd.pdinfo.gpl11164.ndf.txt > Inserting 198661 rows into table featureSet... OK > Inserting 770599 rows into table pmfeature... OK > > Counting rows in featureSet > Counting rows in pmfeature > Creating index idx_pmfsetid on pmfeature... OK > Creating index idx_pmfid on pmfeature... OK > Creating index idx_fsfsetid on featureSet... OK > Saving DataFrame object for PM. > Done. > Warning message: > In is.na(ndfdata[["SIGNAL"]]) : > is.na() applied to non-(list or vector) of type 'NULL' > > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 > affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7 > Biobase_2.20.0 > [8] BiocGenerics_0.6.0 BiocInstaller_1.10.2 > > loaded via a namespace (and not attached): > [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 > codetools_0.2-8 ff_2.2-11 foreach_1.4.1 > GenomicRanges_1.12.4 > [8] IRanges_1.18.1 iterators_1.0.6 preprocessCore_1.22.0 > splines_3.0.1 stats4_3.0.1 tools_3.0.1 > zlibbioc_1.6.0 > > > >>q() > > > > The built pdInfopackage loaded in Destdir is identical to previous message. > > However the featureSet table now has more than 2 rows... > > Lastly, I did multiple combos, as my merged file has (X.x, Y.x)<-seems to be > identifiers for the 'probe IDs' on the array as well as (X.y, Y.y) <- seems > to be the sequence identifiers for the "SEQ_ID". I used X.x, Y.x and PM > which gave the result I pasted above. All others had errors. I'm close, but > that Warning Message is annoying... > > > > Regards, > > Franklin > > > Great minds discuss ideas. Average minds discuss events. Small minds discuss > people. -Eleanor Roosevelt > > > > > ________________________________________ > From: Benilton Carvalho [beniltoncarvalho at gmail.com] > Sent: Wednesday, June 12, 2013 8:25 PM > > To: Johnson, Franklin Theodore > Cc: bioconductor at r-project.org > Subject: Re: [BioC] PAIR files -- feature set table > > That does not look ok. > > The problem is the count for the featureSet table... This table stores > the information for "genes" (or whatever the target for this > particular array is)... so, it is unlikely that you have a microarray > with only 2 "target units"... I'd expect something around the > thousands... > > pdInfoBuilder uses the information in SEQ_ID (in the NDF) to get the > target information (i.e., the contents for featureSet). > > Given that this is a custom array, I believe that the best idea is to > contact the person who designed it and ask more details about the > design (in particular, how many probesets and average number of probes > per probeset)... > > I've seen some designs in which the information that was expected to > be in SEQ_ID was actually stored in PROBE_ID (in such cases, the user > needs to create a backup copy of the NDF, and then move the contents > of PROBE_ID to SEQ_ID - and vice-versa). > > b > > 2013/6/12 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">: >> Dear Dr. Carvalho, >> >> Recently, we had cooresponence regaring makePDInfoPackage for an NimbleGen >> apple microarray. >> I was able to merge the ndf design and XYS files using PROBE_ID. >> As a reminder this is a custom array, and there are no SIGNAL==NAs for >> control probes. >> It seemed to work: >>> makePdInfoPackage(seed, destDir("")) >> >> =================================================================== ====================================================================== =================== >> Building annotation package for Nimblegen Expression Array >> NDF: GPL11164.ndf >> XYS: XYS.txt >> >> =================================================================== ====================================================================== =================== >> Parsing file: GPL11164.ndf... OK >> Parsing file: XYS.txt... OK >> Merging NDF and XYS files... OK >> Preparing contents for featureSet table... OK >> Preparing contents for bgfeature table... OK >> Preparing contents for pmfeature table... OK >> Creating package in >> C:/Users/franklin.johnson.PW50-WEN/Desktop/Test/Yanmin's Microarray >> Paper/Yanmin Microarray RAW/pd.gpl11164 >> Inserting 2 rows into table featureSet... OK >> Inserting 765524 rows into table pmfeature... OK >> Inserting 5075 rows into table bgfeature... OK >> Counting rows in bgfeature >> Counting rows in featureSet >> Counting rows in pmfeature >> Creating index idx_bgfsetid on bgfeature... OK >> Creating index idx_bgfid on bgfeature... OK >> Creating index idx_pmfsetid on pmfeature... OK >> Creating index idx_pmfid on pmfeature... OK >> Creating index idx_fsfsetid on featureSet... OK >> Saving DataFrame object for PM. >> Saving DataFrame object for BG. >> Done. >> Warning message: >> In is.na(ndfdata[["SIGNAL"]]) : >> is.na() applied to non-(list or vector) of type 'NULL' >>> >> >> In contrast to this warning message, I see a pdinfopackage directory with >> 4 subdirectories: c=("data", "inst", "man", R"), as well as >> subsubdirectories in "inst"=c("extdata", and "Unit Tests"), in addition to >> two text files in the main directory: c=("DESCRIPTION", "NAMESPACE") were >> created in my destination folder. >> Before using "oligo", if possible, I wanted to confirm with you that this >> package is viable to use with "oligo" although a warning message that may >> not pertain to my custom designed microarray was printed. >> >> Regards, >> Franklin >> >> Great minds discuss ideas. Average minds discuss events. Small minds >> discuss people. -Eleanor Roosevelt >> >> >> >> >> ________________________________________ >> From: Johnson, Franklin Theodore >> Sent: Friday, June 07, 2013 10:39 AM >> To: Benilton Carvalho >> Cc: bioconductor at r-project.org >> Subject: RE: [BioC] PAIR files -- feature set table >> >> Resending to bioconductor message thread: >> >> Dear Dr. Carvalho, >> Thanks for the response. >> As you suggested, I will look into the merge function using "Probe_ID". >> After reading in the data, I will start here: merge.datasets(dataset1, >> dataset2, by="key"). >> Best Regards, >> Franklin >> >> Great minds discuss ideas. Average minds discuss events. Small minds >> discuss people. -Eleanor Roosevelt >> >> ________________________________________ >> From: Benilton Carvalho [beniltoncarvalho at gmail.com] >> Sent: Thursday, June 06, 2013 8:11 PM >> To: Johnson, Franklin Theodore >> Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu >> Subject: Re: [BioC] PAIR files -- feature set table >> >> You will need to merge the PAIR and the NDF using the PROBE_ID column >> as key. This will allow you to get the X/Y coordinates needed to >> create the XYS as described on the other messages. >> >> Regarding annotation, you may need to contact NimbleGen to request >> this information directly from them... >> >> benilton >> >> 2013/6/6 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">: >>> Dear Dr. Carvalho, >>> >>> Muchos grasias for the reply. >>> >>> Actually, this is what my .ndf file looks like: >>>> head(ndf) >>> PROBE_DESIGN_ID CONTAINER DESIGN_NOTE SELECTION_CRITERIA SEQ_ID >>> 1 7552_0343_0009 Duplicate_1 >>> 2 7552_0345_0009 Duplicate_2 >>> 3 7552_0347_0009 Duplicate_1 >>> 4 7552_0349_0009 Duplicate_2 >>> 5 7552_0351_0009 Duplicate_2 >>> 6 7552_0353_0009 Duplicate_1 >>> PROBE_SEQUENCE MISMATCH >>> MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS >>> 1 cttgactcttctaagttcaaaggtaactcaagtgaagctgtcagatatgatccttcca 0 >>> 64535488 64535488 9 343 >>> 2 cccaagcattaaaccttactcatatacttataatgcagccatcaagagtttgtgcaagg 0 >>> 64799310 64799310 9 345 >>> 3 agggaggctgaaagagagagtgaatggtccagctgggcataattgctgca 0 >>> 64476989 64476989 9 347 >>> 4 ttgttggtgggggtgttgcccttagtaccccagaccttgaagcagttaaa 0 >>> 64862794 64862794 9 349 >>> 5 gtgtggggccccctttctttaactggaacctttctttgaagcaatttggg 0 >>> 64832726 64832726 9 351 >>> 6 ttgtccaattccaacatgccgagacggcagggattgtgatcgtgttgttc 0 >>> 64435686 64435686 9 353 >>> PROBE_ID POSITION DESIGN_ID X Y >>> 1 Contig19819_1_f_28_10_535 0 7552 343 9 >>> 2 Malus_CN899188_2_f_147_1_755 0 7552 345 9 >>> 3 Contig20738_8_r_1179_2_1432 0 7552 347 9 >>> 4 Malus_CN880097_2_r_336_2_536 0 7552 349 9 >>> 5 Malus_CN918117_2_f_632_1_781 0 7552 351 9 >>> 6 Contig1991_1_f_71_2_1239 0 7552 353 9 >>> >>> The pair files, .532 pair files only (one-color arrays), only obtain the >>> probe ID and signal; after some text at the top describing the experiment. >>> My real issue is that I can further normalize and analyze the RMA files with >>> sva and limma, etc. However, I cannot annotate the probes without the array >>> annotation, as there are duplicates in the ndf file which are removed in the >>> RMA.pair files available on NCBI/GEO. So they will not match in any >>> annotation package I've failed at trying. >>> So, I' tried to go back and start from the raw pair files...this custom >>> array is really a "custom" array without >>> NimbleScan. >>> >>> Salud, >>> Franklin >>> >>> >>> >>> >>> >>> >>> Great minds discuss ideas. Average minds discuss events. Small minds >>> discuss people. -Eleanor Roosevelt >>> >>> >>> >>> >>> ________________________________________ >>> From: Benilton Carvalho [beniltoncarvalho at gmail.com] >>> Sent: Wednesday, June 05, 2013 6:42 PM >>> To: FRANKLIN JOHNSON [guest] >>> Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu; pdInfoBuilder >>> Maintainer >>> Subject: Re: [BioC] PAIR files -- feature set table >>> >>> It's an unfortunate mistake to have the pairFile *argument* in the >>> call (not in the slots session, but I see your point). :-( I'll make >>> sure that this is fixed. >>> >>> You need to convert the PAIR files to XYS... >>> >>> Some refs that should help you in the process: >>> >>> https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html >>> >>> http://comments.gmane.org/gmane.science.biology.informatics.conduc tor/27547 >>> >>> b >>> >>> 2013/6/5 FRANKLIN JOHNSON [guest] <guest at="" bioconductor.org="">: >>>> >>>> Dear Maintainer, >>>> >>>> I downloaded available NimbleGen 'single channel' 532.PAIR files for a >>>> custom built expression microarray from NCBI/GEO (GPL11164). However, I get >>>> an error message when I try to make the annotation for this platform using >>>> pdInfoBuild. >>>> >>>> In pdInfoBuilder Reference Manual (June 5, 2013), under the >>>> NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although, >>>> showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I >>>> changed the .pair file extension to .xys. >>>> >>>> (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read >>>> annotation file >>>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray >>>> Paper/Yanmin Microarray RAW/GPL11164.ndf" >>>> >>>> (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1]) >>>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray >>>> Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys" >>>> >>>> But, doing this resulted in an error message: >>>> seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys, >>>> author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD") >>>> >>>> makePdInfoPackage(arrays, destDir = getwd()) >>>> >>>> ================================================================= ====================================================================== ===== >>>> Building annotation package for Nimblegen Expression Array >>>> NDF: GPL11164.ndf >>>> XYS: GSM618107_14418002_532.xys >>>> >>>> ================================================================= ====================================================================== ===== >>>> Parsing file: GPL11164.ndf... OK >>>> Parsing file: GSM618107_14418002_532.xys... OK >>>> Merging NDF and XYS files... OK >>>> Preparing contents for featureSet table... Error in >>>> `[.data.frame`(ndfdata, , colsFS) : undefined columns selected >>>> In addition: Warning message: >>>> In is.na(ndfdata[["SIGNAL"]]) : >>>> is.na() applied to non-(list or vector) of type 'NULL' >>>> >>>> The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It >>>> seems .xys has a different arrangement than .pair, thus .ndf is not >>>> applicable to annotate the .pair file? Any suggestions? >>>> Hope to hear from you soon. >>>> Franklin >>>> >>>> -- output of sessionInfo(): >>>> >>>>> sessionInfo() >>>> R version 3.0.1 (2013-05-16) >>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>> >>>> locale: >>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>>> States.1252 LC_MONETARY=English_United States.1252 >>>> [4] LC_NUMERIC=C LC_TIME=English_United >>>> States.1252 >>>> >>>> attached base packages: >>>> [1] tcltk grid parallel stats graphics grDevices utils >>>> datasets methods base >>>> >>>> other attached packages: >>>> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 >>>> affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7 >>>> [7] Mfuzz_2.18.0 DynDoc_1.38.0 widgetTools_1.38.0 >>>> e1071_1.6-1 class_7.3-7 gplots_2.11.0.1 >>>> [13] KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2 >>>> gtools_2.7.1 timecourse_1.32.0 MASS_7.3-26 >>>> [19] Biobase_2.20.0 BiocGenerics_0.6.0 limma_3.16.5 >>>> ggplot2_0.9.3.1 BiocInstaller_1.10.1 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 >>>> bitops_1.0-5 codetools_0.2-8 colorspace_1.2-2 >>>> [7] dichromat_2.0-0 digest_0.6.3 ff_2.2-11 >>>> foreach_1.4.0 GenomicRanges_1.12.4 gtable_0.1.2 >>>> [13] IRanges_1.18.1 iterators_1.0.6 labeling_0.1 >>>> marray_1.38.0 munsell_0.4 plyr_1.8 >>>> [19] preprocessCore_1.22.0 proto_0.3-10 RColorBrewer_1.0-5 >>>> reshape2_1.2.2 scales_0.2.3 splines_3.0.1 >>>> [25] stats4_3.0.1 stringr_0.6.2 tkWidgets_1.38.0 >>>> tools_3.0.1 zlibbioc_1.6.0 >>>>> >>>> >>>> >>>> -- >>>> Sent via the guest posting facility at bioconductor.org. >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>
ADD REPLYlink written 4.5 years ago by Benilton Carvalho4.2k
Franklin, my impression is that your conversion from PAIR to XYS was not successful. So, I created a script for this task, which I will include on the next release.... I give the code below for your convenience, but I also describe what I did, so others can benefit from this. 1) Downloaded the data you refer to (Series: GSE24523, Platform: GPL11164) 2) Modified the NDF by swapping the column names SEQ_ID and PROBE_ID. This is NOT expected to happen, NOT a rule. The problem is that oligo summarizes to the "SEQ_ID"-level (by the documentation I have access to, SEQ_ID defines the probesets). By checking the NDF, I noticed that SEQ_ID is pretty much empty and the "probeset" info was stored in PROBE_ID... This is the only reason I swapped the column names. I also renamed the file to its original name: 080501_GDR_Malus_EST-V4_EXP.ndf. 3) Ran the pair2xys script (shown below), using the call (in R): pair2xys(list.files(pattern='\\.pair$')) 4) Loaded pdInfoBuilder, built the annotation package using: library(pdInfoBuilder) seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile='080501_GDR_Malus_EST-V4_EXP.ndf', xysFile=list.files(patt='xys$')[1], author="Benilton Carvalho", email="beniltoncarvalho at gmail.com", biocViews="AnnotationData", genomebuild="Put Build Here", organism="Put Organism Here", species="Put Species Here", url="Put URL here") makePdInfoPackage(seed, destDir=".") 5) Installed the resulting package using: install.packages('pd.080501.gdr.malus.est.v4.exp', type='source', repo=NULL) 6) Loaded oligo, read the just-created XYS files and applied RMA using: library(oligo) xys = list.xysfiles() rawData = read.xysfiles(xys) res = rma(rawData) 7) Appreciated a resulting ExpressionSet with 193586 features and 24 samples... The code is shown right below my "signature"... and after the code, I also show the log for my R session starting at Step 4. best, b ## CONVERSION TOOL - Benilton Carvalho - June/2013 pair2xys <- function(pairFiles, outdir=getwd(), verbose=TRUE){ if (verbose) message('Output directory: ', outdir) for (pairFile in pairFiles){ if (verbose) message('Processing ', basename(pairFile)) header <- readLines(pairFile, n=1) pair <- read.delim(pairFile, header=TRUE, sep='\t', stringsAsFactors=FALSE, comment.char='#') maxX <- max(pair$X) maxY <- max(pair$Y) xys <- expand.grid(X=1:maxX, Y=1:maxY) xys <- merge(xys, pair[, c('X', 'Y', 'PM')], all.x=TRUE) xys <- pair[, c('X', 'Y', 'PM')] names(xys) <- c('X', 'Y', 'SIGNAL') xys$COUNT <- ifelseis.na(xys$SIGNAL), NA_integer_, 1L) xys <- xys[with(xys, order(Y, X)),] rownames(xys) <- NULL xysFile <- file.path(outdir, gsub('\\.pair$', '\\.xys', basename(pairFile))) if (verbose) message('Writing ', basename(xysFile)) writeLines(header, con=xysFile) suppressWarnings(write.table(xys, file=xysFile, sep='\t', row.names=FALSE, quote=FALSE, append=TRUE)) } } ### END CONVERSION TOOL #### R SESSION > library(pdInfoBuilder) Carregando pacotes exigidos: Biobase Carregando pacotes exigidos: BiocGenerics Carregando pacotes exigidos: parallel Attaching package: ?BiocGenerics? The following objects are masked from ?package:parallel?: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following object is masked from ?package:stats?: xtabs The following objects are masked from ?package:base?: anyDuplicated, as.data.frame, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table, tapply, union, unique, unlist Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Carregando pacotes exigidos: RSQLite Carregando pacotes exigidos: DBI Carregando pacotes exigidos: affxparser Carregando pacotes exigidos: oligo Carregando pacotes exigidos: oligoClasses Welcome to oligoClasses version 1.22.0 ====================================================================== ========== Welcome to oligo version 1.24.0 ====================================================================== ========== > seed <- new("NgsExpressionPDInfoPkgSeed", + ndfFile=list.files(patt='ndf$')[1], + xysFile=list.files(patt='xys$')[1], + author="Benilton Carvalho", + email="beniltoncarvalho at gmail.com", + biocViews="AnnotationData", + genomebuild="Put Build Here", + organism="Put Organism Here", + species="Put Species Here", + url="Put URL here") > makePdInfoPackage(seed, destDir=".") ====================================================================== ========== Building annotation package for Nimblegen Expression Array NDF: 080501_GDR_Malus_EST-V4_EXP.ndf XYS: GSM618107_14418002_532.xys ====================================================================== ========== Parsing file: 080501_GDR_Malus_EST-V4_EXP.ndf... OK Parsing file: GSM618107_14418002_532.xys... OK Merging NDF and XYS files... OK Preparing contents for featureSet table... OK Preparing contents for bgfeature table... OK Preparing contents for pmfeature table... OK Creating package in ./pd.080501.gdr.malus.est.v4.exp Inserting 198661 rows into table featureSet... OK Inserting 384232 rows into table pmfeature... OK Inserting 5075 rows into table bgfeature... OK Counting rows in bgfeature Counting rows in featureSet Counting rows in pmfeature Creating index idx_bgfsetid on bgfeature... OK Creating index idx_bgfid on bgfeature... OK Creating index idx_pmfsetid on pmfeature... OK Creating index idx_pmfid on pmfeature... OK Creating index idx_fsfsetid on featureSet... OK Saving DataFrame object for PM. Saving DataFrame object for BG. Done. > install.packages('pd.080501.gdr.malus.est.v4.exp', type='source', repo=NULL) * installing *source* package ?pd.080501.gdr.malus.est.v4.exp? ... ** R ** data ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** testing if installed package can be loaded * DONE (pd.080501.gdr.malus.est.v4.exp) > library(oligo) > xys = list.xysfiles() > xys [1] "GSM618107_14418002_532.xys" "GSM618108_12742302_532.xys" [3] "GSM618109_12743902_532.xys" "GSM618110_12746002_532.xys" [5] "GSM618111_12746102_532.xys" "GSM618112_12782802_532.xys" [7] "GSM618113_12750102_532.xys" "GSM618114_12834702_532.xys" [9] "GSM618115_14460802_532.xys" "GSM618116_12835502_532.xys" [11] "GSM618117_12756402_532.xys" "GSM618118_12756502_532.xys" [13] "GSM618119_12758502_532.xys" "GSM618120_12758402_532.xys" [15] "GSM618121_13325702_532.xys" "GSM618122_12760702_532.xys" [17] "GSM618123_13327302_532.xys" "GSM618124_12765302_532.xys" [19] "GSM618125_12765402_532.xys" "GSM618126_13923502_532.xys" [21] "GSM618127_12781902_532.xys" "GSM618128_12766102_532.xys" [23] "GSM618129_12780402_532.xys" "GSM618130_12782502_532.xys" > rawData = read.xysfiles(xys) Loading required package: pd.080501.gdr.malus.est.v4.exp Platform design info loaded. Checking designs for each XYS file... Done. Allocating memory... Done. Reading GSM618107_14418002_532.xys. Reading GSM618108_12742302_532.xys. Reading GSM618109_12743902_532.xys. Reading GSM618110_12746002_532.xys. Reading GSM618111_12746102_532.xys. Reading GSM618112_12782802_532.xys. Reading GSM618113_12750102_532.xys. Reading GSM618114_12834702_532.xys. Reading GSM618115_14460802_532.xys. Reading GSM618116_12835502_532.xys. Reading GSM618117_12756402_532.xys. Reading GSM618118_12756502_532.xys. Reading GSM618119_12758502_532.xys. Reading GSM618120_12758402_532.xys. Reading GSM618121_13325702_532.xys. Reading GSM618122_12760702_532.xys. Reading GSM618123_13327302_532.xys. Reading GSM618124_12765302_532.xys. Reading GSM618125_12765402_532.xys. Reading GSM618126_13923502_532.xys. Reading GSM618127_12781902_532.xys. Reading GSM618128_12766102_532.xys. Reading GSM618129_12780402_532.xys. Reading GSM618130_12782502_532.xys. > res = rma(rawData) Background correcting Normalizing Calculating Expression > res ExpressionSet (storageMode: lockedEnvironment) assayData: 193586 features, 24 samples element names: exprs protocolData rowNames: GSM618107_14418002_532.xys GSM618108_12742302_532.xys ... GSM618130_12782502_532.xys (24 total) varLabels: exprs dates varMetadata: labelDescription channel phenoData rowNames: GSM618107_14418002_532.xys GSM618108_12742302_532.xys ... GSM618130_12782502_532.xys (24 total) varLabels: index varMetadata: labelDescription channel featureData: none experimentData: use 'experimentData(object)' Annotation: pd.080501.gdr.malus.est.v4.exp > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=pt_BR.utf8 LC_NUMERIC=C [3] LC_TIME=pt_BR.utf8 LC_COLLATE=pt_BR.utf8 [5] LC_MONETARY=pt_BR.utf8 LC_MESSAGES=pt_BR.utf8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=pt_BR.utf8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] pd.080501.gdr.malus.est.v4.exp_0.0.1 pdInfoBuilder_1.24.0 [3] oligo_1.24.0 oligoClasses_1.22.0 [5] affxparser_1.32.1 RSQLite_0.11.4 [7] DBI_0.2-7 Biobase_2.20.0 [9] BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] affyio_1.28.0 BiocInstaller_1.10.2 Biostrings_2.28.0 [4] bit_1.1-10 codetools_0.2-8 ff_2.2-11 [7] foreach_1.4.1 GenomicRanges_1.12.4 IRanges_1.18.1 [10] iterators_1.0.6 preprocessCore_1.22.0 splines_3.0.1 [13] stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0 > 2013/6/13 Benilton Carvalho <beniltoncarvalho at="" gmail.com="">: > dont worry about that particular warning.... just install the package > and try to read your XYS files. > > 2013/6/13 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">: >> Dr. Carvalho, >> >> Yes. I see what you mean. >> Switching the columns helped in the FeatureSet table loading inserted more >> that 2 rows: >> >> Inserting 198661 rows into table featureSet... OK >> However, the warning message did print again. >> >> >> Warning message: >> In is.na(ndfdata[["SIGNAL"]]) : >> is.na() applied to non-(list or vector) of type 'NULL' >> >> Below is the output + sessionInfo(), as I upgraded to R 3.0.1. >> >> #Begin R command line code: >> >>> makePdInfoPackage(arrays, destDir = getwd(), unlink=TRUE) >> =================================================================== ====================================================================== ===================== >> >> >> Building annotation package for Nimblegen Expression Array >> NDF: pdinfo_GPL11164.ndf.txt <-new .ndf file with PROBE_ID<->SEQ_ID >> XYS: XYS.txt >> =================================================================== ====================================================================== ===================== >> Parsing file: pdinfo_GPL11164.ndf.txt... OK >> >> Parsing file: XYS.txt... OK >> Merging NDF and XYS files... OK >> Preparing contents for featureSet table... OK >> Preparing contents for bgfeature table... OK >> Preparing contents for pmfeature table... OK >> Creating package in E:/RANDOM/Test/Yanmin's Microarray Paper/Yanmin >> Microarray RAW/pd.pdinfo.gpl11164.ndf.txt >> Inserting 198661 rows into table featureSet... OK >> Inserting 770599 rows into table pmfeature... OK >> >> Counting rows in featureSet >> Counting rows in pmfeature >> Creating index idx_pmfsetid on pmfeature... OK >> Creating index idx_pmfid on pmfeature... OK >> Creating index idx_fsfsetid on featureSet... OK >> Saving DataFrame object for PM. >> Done. >> Warning message: >> In is.na(ndfdata[["SIGNAL"]]) : >> is.na() applied to non-(list or vector) of type 'NULL' >> >> >>> sessionInfo() >> R version 3.0.1 (2013-05-16) >> Platform: i386-w64-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >> States.1252 LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C LC_TIME=English_United >> States.1252 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> base >> >> other attached packages: >> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 >> affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7 >> Biobase_2.20.0 >> [8] BiocGenerics_0.6.0 BiocInstaller_1.10.2 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 >> codetools_0.2-8 ff_2.2-11 foreach_1.4.1 >> GenomicRanges_1.12.4 >> [8] IRanges_1.18.1 iterators_1.0.6 preprocessCore_1.22.0 >> splines_3.0.1 stats4_3.0.1 tools_3.0.1 >> zlibbioc_1.6.0 >> >> >> >>>q() >> >> >> >> The built pdInfopackage loaded in Destdir is identical to previous message. >> >> However the featureSet table now has more than 2 rows... >> >> Lastly, I did multiple combos, as my merged file has (X.x, Y.x)<-seems to be >> identifiers for the 'probe IDs' on the array as well as (X.y, Y.y) <- seems >> to be the sequence identifiers for the "SEQ_ID". I used X.x, Y.x and PM >> which gave the result I pasted above. All others had errors. I'm close, but >> that Warning Message is annoying... >> >> >> >> Regards, >> >> Franklin >> >> >> Great minds discuss ideas. Average minds discuss events. Small minds discuss >> people. -Eleanor Roosevelt >> >> >> >> >> ________________________________________ >> From: Benilton Carvalho [beniltoncarvalho at gmail.com] >> Sent: Wednesday, June 12, 2013 8:25 PM >> >> To: Johnson, Franklin Theodore >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] PAIR files -- feature set table >> >> That does not look ok. >> >> The problem is the count for the featureSet table... This table stores >> the information for "genes" (or whatever the target for this >> particular array is)... so, it is unlikely that you have a microarray >> with only 2 "target units"... I'd expect something around the >> thousands... >> >> pdInfoBuilder uses the information in SEQ_ID (in the NDF) to get the >> target information (i.e., the contents for featureSet). >> >> Given that this is a custom array, I believe that the best idea is to >> contact the person who designed it and ask more details about the >> design (in particular, how many probesets and average number of probes >> per probeset)... >> >> I've seen some designs in which the information that was expected to >> be in SEQ_ID was actually stored in PROBE_ID (in such cases, the user >> needs to create a backup copy of the NDF, and then move the contents >> of PROBE_ID to SEQ_ID - and vice-versa). >> >> b >> >> 2013/6/12 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">: >>> Dear Dr. Carvalho, >>> >>> Recently, we had cooresponence regaring makePDInfoPackage for an NimbleGen >>> apple microarray. >>> I was able to merge the ndf design and XYS files using PROBE_ID. >>> As a reminder this is a custom array, and there are no SIGNAL==NAs for >>> control probes. >>> It seemed to work: >>>> makePdInfoPackage(seed, destDir("")) >>> >>> ================================================================== ====================================================================== ==================== >>> Building annotation package for Nimblegen Expression Array >>> NDF: GPL11164.ndf >>> XYS: XYS.txt >>> >>> ================================================================== ====================================================================== ==================== >>> Parsing file: GPL11164.ndf... OK >>> Parsing file: XYS.txt... OK >>> Merging NDF and XYS files... OK >>> Preparing contents for featureSet table... OK >>> Preparing contents for bgfeature table... OK >>> Preparing contents for pmfeature table... OK >>> Creating package in >>> C:/Users/franklin.johnson.PW50-WEN/Desktop/Test/Yanmin's Microarray >>> Paper/Yanmin Microarray RAW/pd.gpl11164 >>> Inserting 2 rows into table featureSet... OK >>> Inserting 765524 rows into table pmfeature... OK >>> Inserting 5075 rows into table bgfeature... OK >>> Counting rows in bgfeature >>> Counting rows in featureSet >>> Counting rows in pmfeature >>> Creating index idx_bgfsetid on bgfeature... OK >>> Creating index idx_bgfid on bgfeature... OK >>> Creating index idx_pmfsetid on pmfeature... OK >>> Creating index idx_pmfid on pmfeature... OK >>> Creating index idx_fsfsetid on featureSet... OK >>> Saving DataFrame object for PM. >>> Saving DataFrame object for BG. >>> Done. >>> Warning message: >>> In is.na(ndfdata[["SIGNAL"]]) : >>> is.na() applied to non-(list or vector) of type 'NULL' >>>> >>> >>> In contrast to this warning message, I see a pdinfopackage directory with >>> 4 subdirectories: c=("data", "inst", "man", R"), as well as >>> subsubdirectories in "inst"=c("extdata", and "Unit Tests"), in addition to >>> two text files in the main directory: c=("DESCRIPTION", "NAMESPACE") were >>> created in my destination folder. >>> Before using "oligo", if possible, I wanted to confirm with you that this >>> package is viable to use with "oligo" although a warning message that may >>> not pertain to my custom designed microarray was printed. >>> >>> Regards, >>> Franklin >>> >>> Great minds discuss ideas. Average minds discuss events. Small minds >>> discuss people. -Eleanor Roosevelt >>> >>> >>> >>> >>> ________________________________________ >>> From: Johnson, Franklin Theodore >>> Sent: Friday, June 07, 2013 10:39 AM >>> To: Benilton Carvalho >>> Cc: bioconductor at r-project.org >>> Subject: RE: [BioC] PAIR files -- feature set table >>> >>> Resending to bioconductor message thread: >>> >>> Dear Dr. Carvalho, >>> Thanks for the response. >>> As you suggested, I will look into the merge function using "Probe_ID". >>> After reading in the data, I will start here: merge.datasets(dataset1, >>> dataset2, by="key"). >>> Best Regards, >>> Franklin >>> >>> Great minds discuss ideas. Average minds discuss events. Small minds >>> discuss people. -Eleanor Roosevelt >>> >>> ________________________________________ >>> From: Benilton Carvalho [beniltoncarvalho at gmail.com] >>> Sent: Thursday, June 06, 2013 8:11 PM >>> To: Johnson, Franklin Theodore >>> Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu >>> Subject: Re: [BioC] PAIR files -- feature set table >>> >>> You will need to merge the PAIR and the NDF using the PROBE_ID column >>> as key. This will allow you to get the X/Y coordinates needed to >>> create the XYS as described on the other messages. >>> >>> Regarding annotation, you may need to contact NimbleGen to request >>> this information directly from them... >>> >>> benilton >>> >>> 2013/6/6 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">: >>>> Dear Dr. Carvalho, >>>> >>>> Muchos grasias for the reply. >>>> >>>> Actually, this is what my .ndf file looks like: >>>>> head(ndf) >>>> PROBE_DESIGN_ID CONTAINER DESIGN_NOTE SELECTION_CRITERIA SEQ_ID >>>> 1 7552_0343_0009 Duplicate_1 >>>> 2 7552_0345_0009 Duplicate_2 >>>> 3 7552_0347_0009 Duplicate_1 >>>> 4 7552_0349_0009 Duplicate_2 >>>> 5 7552_0351_0009 Duplicate_2 >>>> 6 7552_0353_0009 Duplicate_1 >>>> PROBE_SEQUENCE MISMATCH >>>> MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS >>>> 1 cttgactcttctaagttcaaaggtaactcaagtgaagctgtcagatatgatccttcca 0 >>>> 64535488 64535488 9 343 >>>> 2 cccaagcattaaaccttactcatatacttataatgcagccatcaagagtttgtgcaagg 0 >>>> 64799310 64799310 9 345 >>>> 3 agggaggctgaaagagagagtgaatggtccagctgggcataattgctgca 0 >>>> 64476989 64476989 9 347 >>>> 4 ttgttggtgggggtgttgcccttagtaccccagaccttgaagcagttaaa 0 >>>> 64862794 64862794 9 349 >>>> 5 gtgtggggccccctttctttaactggaacctttctttgaagcaatttggg 0 >>>> 64832726 64832726 9 351 >>>> 6 ttgtccaattccaacatgccgagacggcagggattgtgatcgtgttgttc 0 >>>> 64435686 64435686 9 353 >>>> PROBE_ID POSITION DESIGN_ID X Y >>>> 1 Contig19819_1_f_28_10_535 0 7552 343 9 >>>> 2 Malus_CN899188_2_f_147_1_755 0 7552 345 9 >>>> 3 Contig20738_8_r_1179_2_1432 0 7552 347 9 >>>> 4 Malus_CN880097_2_r_336_2_536 0 7552 349 9 >>>> 5 Malus_CN918117_2_f_632_1_781 0 7552 351 9 >>>> 6 Contig1991_1_f_71_2_1239 0 7552 353 9 >>>> >>>> The pair files, .532 pair files only (one-color arrays), only obtain the >>>> probe ID and signal; after some text at the top describing the experiment. >>>> My real issue is that I can further normalize and analyze the RMA files with >>>> sva and limma, etc. However, I cannot annotate the probes without the array >>>> annotation, as there are duplicates in the ndf file which are removed in the >>>> RMA.pair files available on NCBI/GEO. So they will not match in any >>>> annotation package I've failed at trying. >>>> So, I' tried to go back and start from the raw pair files...this custom >>>> array is really a "custom" array without >>>> NimbleScan. >>>> >>>> Salud, >>>> Franklin >>>> >>>> >>>> >>>> >>>> >>>> >>>> Great minds discuss ideas. Average minds discuss events. Small minds >>>> discuss people. -Eleanor Roosevelt >>>> >>>> >>>> >>>> >>>> ________________________________________ >>>> From: Benilton Carvalho [beniltoncarvalho at gmail.com] >>>> Sent: Wednesday, June 05, 2013 6:42 PM >>>> To: FRANKLIN JOHNSON [guest] >>>> Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu; pdInfoBuilder >>>> Maintainer >>>> Subject: Re: [BioC] PAIR files -- feature set table >>>> >>>> It's an unfortunate mistake to have the pairFile *argument* in the >>>> call (not in the slots session, but I see your point). :-( I'll make >>>> sure that this is fixed. >>>> >>>> You need to convert the PAIR files to XYS... >>>> >>>> Some refs that should help you in the process: >>>> >>>> https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html >>>> >>>> http://comments.gmane.org/gmane.science.biology.informatics.condu ctor/27547 >>>> >>>> b >>>> >>>> 2013/6/5 FRANKLIN JOHNSON [guest] <guest at="" bioconductor.org="">: >>>>> >>>>> Dear Maintainer, >>>>> >>>>> I downloaded available NimbleGen 'single channel' 532.PAIR files for a >>>>> custom built expression microarray from NCBI/GEO (GPL11164). However, I get >>>>> an error message when I try to make the annotation for this platform using >>>>> pdInfoBuild. >>>>> >>>>> In pdInfoBuilder Reference Manual (June 5, 2013), under the >>>>> NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although, >>>>> showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I >>>>> changed the .pair file extension to .xys. >>>>> >>>>> (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read >>>>> annotation file >>>>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray >>>>> Paper/Yanmin Microarray RAW/GPL11164.ndf" >>>>> >>>>> (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1]) >>>>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray >>>>> Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys" >>>>> >>>>> But, doing this resulted in an error message: >>>>> seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys, >>>>> author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD") >>>>> >>>>> makePdInfoPackage(arrays, destDir = getwd()) >>>>> >>>>> ================================================================ ====================================================================== ====== >>>>> Building annotation package for Nimblegen Expression Array >>>>> NDF: GPL11164.ndf >>>>> XYS: GSM618107_14418002_532.xys >>>>> >>>>> ================================================================ ====================================================================== ====== >>>>> Parsing file: GPL11164.ndf... OK >>>>> Parsing file: GSM618107_14418002_532.xys... OK >>>>> Merging NDF and XYS files... OK >>>>> Preparing contents for featureSet table... Error in >>>>> `[.data.frame`(ndfdata, , colsFS) : undefined columns selected >>>>> In addition: Warning message: >>>>> In is.na(ndfdata[["SIGNAL"]]) : >>>>> is.na() applied to non-(list or vector) of type 'NULL' >>>>> >>>>> The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It >>>>> seems .xys has a different arrangement than .pair, thus .ndf is not >>>>> applicable to annotate the .pair file? Any suggestions? >>>>> Hope to hear from you soon. >>>>> Franklin >>>>> >>>>> -- output of sessionInfo(): >>>>> >>>>>> sessionInfo() >>>>> R version 3.0.1 (2013-05-16) >>>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>>> >>>>> locale: >>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>>>> States.1252 LC_MONETARY=English_United States.1252 >>>>> [4] LC_NUMERIC=C LC_TIME=English_United >>>>> States.1252 >>>>> >>>>> attached base packages: >>>>> [1] tcltk grid parallel stats graphics grDevices utils >>>>> datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 >>>>> affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7 >>>>> [7] Mfuzz_2.18.0 DynDoc_1.38.0 widgetTools_1.38.0 >>>>> e1071_1.6-1 class_7.3-7 gplots_2.11.0.1 >>>>> [13] KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2 >>>>> gtools_2.7.1 timecourse_1.32.0 MASS_7.3-26 >>>>> [19] Biobase_2.20.0 BiocGenerics_0.6.0 limma_3.16.5 >>>>> ggplot2_0.9.3.1 BiocInstaller_1.10.1 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 >>>>> bitops_1.0-5 codetools_0.2-8 colorspace_1.2-2 >>>>> [7] dichromat_2.0-0 digest_0.6.3 ff_2.2-11 >>>>> foreach_1.4.0 GenomicRanges_1.12.4 gtable_0.1.2 >>>>> [13] IRanges_1.18.1 iterators_1.0.6 labeling_0.1 >>>>> marray_1.38.0 munsell_0.4 plyr_1.8 >>>>> [19] preprocessCore_1.22.0 proto_0.3-10 RColorBrewer_1.0-5 >>>>> reshape2_1.2.2 scales_0.2.3 splines_3.0.1 >>>>> [25] stats4_3.0.1 stringr_0.6.2 tkWidgets_1.38.0 >>>>> tools_3.0.1 zlibbioc_1.6.0 >>>>>> >>>>> >>>>> >>>>> -- >>>>> Sent via the guest posting facility at bioconductor.org. >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>
ADD REPLYlink written 4.5 years ago by Benilton Carvalho4.2k
Dr. Carvalho, Thanks for the email. I'm insignificant, but you are truly a gentlemen, scholar and a brilliant Spainard. I hope to cite you and your efforts in my future publications. Very humbly, Franklin Johnson ________________________________________ From: Benilton Carvalho [beniltoncarvalho@gmail.com] Sent: Thursday, June 20, 2013 8:45 AM To: Johnson, Franklin Theodore Cc: bioconductor at r-project.org Subject: Re: [BioC] PAIR files -- feature set table Franklin, my impression is that your conversion from PAIR to XYS was not successful. So, I created a script for this task, which I will include on the next release.... I give the code below for your convenience, but I also describe what I did, so others can benefit from this. 1) Downloaded the data you refer to (Series: GSE24523, Platform: GPL11164) 2) Modified the NDF by swapping the column names SEQ_ID and PROBE_ID. This is NOT expected to happen, NOT a rule. The problem is that oligo summarizes to the "SEQ_ID"-level (by the documentation I have access to, SEQ_ID defines the probesets). By checking the NDF, I noticed that SEQ_ID is pretty much empty and the "probeset" info was stored in PROBE_ID... This is the only reason I swapped the column names. I also renamed the file to its original name: 080501_GDR_Malus_EST-V4_EXP.ndf. 3) Ran the pair2xys script (shown below), using the call (in R): pair2xys(list.files(pattern='\\.pair$')) 4) Loaded pdInfoBuilder, built the annotation package using: library(pdInfoBuilder) seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile='080501_GDR_Malus_EST-V4_EXP.ndf', xysFile=list.files(patt='xys$')[1], author="Benilton Carvalho", email="beniltoncarvalho at gmail.com", biocViews="AnnotationData", genomebuild="Put Build Here", organism="Put Organism Here", species="Put Species Here", url="Put URL here") makePdInfoPackage(seed, destDir=".") 5) Installed the resulting package using: install.packages('pd.080501.gdr.malus.est.v4.exp', type='source', repo=NULL) 6) Loaded oligo, read the just-created XYS files and applied RMA using: library(oligo) xys = list.xysfiles() rawData = read.xysfiles(xys) res = rma(rawData) 7) Appreciated a resulting ExpressionSet with 193586 features and 24 samples... The code is shown right below my "signature"... and after the code, I also show the log for my R session starting at Step 4. best, b ## CONVERSION TOOL - Benilton Carvalho - June/2013 pair2xys <- function(pairFiles, outdir=getwd(), verbose=TRUE){ if (verbose) message('Output directory: ', outdir) for (pairFile in pairFiles){ if (verbose) message('Processing ', basename(pairFile)) header <- readLines(pairFile, n=1) pair <- read.delim(pairFile, header=TRUE, sep='\t', stringsAsFactors=FALSE, comment.char='#') maxX <- max(pair$X) maxY <- max(pair$Y) xys <- expand.grid(X=1:maxX, Y=1:maxY) xys <- merge(xys, pair[, c('X', 'Y', 'PM')], all.x=TRUE) xys <- pair[, c('X', 'Y', 'PM')] names(xys) <- c('X', 'Y', 'SIGNAL') xys$COUNT <- ifelseis.na(xys$SIGNAL), NA_integer_, 1L) xys <- xys[with(xys, order(Y, X)),] rownames(xys) <- NULL xysFile <- file.path(outdir, gsub('\\.pair$', '\\.xys', basename(pairFile))) if (verbose) message('Writing ', basename(xysFile)) writeLines(header, con=xysFile) suppressWarnings(write.table(xys, file=xysFile, sep='\t', row.names=FALSE, quote=FALSE, append=TRUE)) } } ### END CONVERSION TOOL #### R SESSION > library(pdInfoBuilder) Carregando pacotes exigidos: Biobase Carregando pacotes exigidos: BiocGenerics Carregando pacotes exigidos: parallel Attaching package: ?BiocGenerics? The following objects are masked from ?package:parallel?: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following object is masked from ?package:stats?: xtabs The following objects are masked from ?package:base?: anyDuplicated, as.data.frame, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table, tapply, union, unique, unlist Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Carregando pacotes exigidos: RSQLite Carregando pacotes exigidos: DBI Carregando pacotes exigidos: affxparser Carregando pacotes exigidos: oligo Carregando pacotes exigidos: oligoClasses Welcome to oligoClasses version 1.22.0 ====================================================================== ========== Welcome to oligo version 1.24.0 ====================================================================== ========== > seed <- new("NgsExpressionPDInfoPkgSeed", + ndfFile=list.files(patt='ndf$')[1], + xysFile=list.files(patt='xys$')[1], + author="Benilton Carvalho", + email="beniltoncarvalho at gmail.com", + biocViews="AnnotationData", + genomebuild="Put Build Here", + organism="Put Organism Here", + species="Put Species Here", + url="Put URL here") > makePdInfoPackage(seed, destDir=".") ====================================================================== ========== Building annotation package for Nimblegen Expression Array NDF: 080501_GDR_Malus_EST-V4_EXP.ndf XYS: GSM618107_14418002_532.xys ====================================================================== ========== Parsing file: 080501_GDR_Malus_EST-V4_EXP.ndf... OK Parsing file: GSM618107_14418002_532.xys... OK Merging NDF and XYS files... OK Preparing contents for featureSet table... OK Preparing contents for bgfeature table... OK Preparing contents for pmfeature table... OK Creating package in ./pd.080501.gdr.malus.est.v4.exp Inserting 198661 rows into table featureSet... OK Inserting 384232 rows into table pmfeature... OK Inserting 5075 rows into table bgfeature... OK Counting rows in bgfeature Counting rows in featureSet Counting rows in pmfeature Creating index idx_bgfsetid on bgfeature... OK Creating index idx_bgfid on bgfeature... OK Creating index idx_pmfsetid on pmfeature... OK Creating index idx_pmfid on pmfeature... OK Creating index idx_fsfsetid on featureSet... OK Saving DataFrame object for PM. Saving DataFrame object for BG. Done. > install.packages('pd.080501.gdr.malus.est.v4.exp', type='source', repo=NULL) * installing *source* package ?pd.080501.gdr.malus.est.v4.exp? ... ** R ** data ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** testing if installed package can be loaded * DONE (pd.080501.gdr.malus.est.v4.exp) > library(oligo) > xys = list.xysfiles() > xys [1] "GSM618107_14418002_532.xys" "GSM618108_12742302_532.xys" [3] "GSM618109_12743902_532.xys" "GSM618110_12746002_532.xys" [5] "GSM618111_12746102_532.xys" "GSM618112_12782802_532.xys" [7] "GSM618113_12750102_532.xys" "GSM618114_12834702_532.xys" [9] "GSM618115_14460802_532.xys" "GSM618116_12835502_532.xys" [11] "GSM618117_12756402_532.xys" "GSM618118_12756502_532.xys" [13] "GSM618119_12758502_532.xys" "GSM618120_12758402_532.xys" [15] "GSM618121_13325702_532.xys" "GSM618122_12760702_532.xys" [17] "GSM618123_13327302_532.xys" "GSM618124_12765302_532.xys" [19] "GSM618125_12765402_532.xys" "GSM618126_13923502_532.xys" [21] "GSM618127_12781902_532.xys" "GSM618128_12766102_532.xys" [23] "GSM618129_12780402_532.xys" "GSM618130_12782502_532.xys" > rawData = read.xysfiles(xys) Loading required package: pd.080501.gdr.malus.est.v4.exp Platform design info loaded. Checking designs for each XYS file... Done. Allocating memory... Done. Reading GSM618107_14418002_532.xys. Reading GSM618108_12742302_532.xys. Reading GSM618109_12743902_532.xys. Reading GSM618110_12746002_532.xys. Reading GSM618111_12746102_532.xys. Reading GSM618112_12782802_532.xys. Reading GSM618113_12750102_532.xys. Reading GSM618114_12834702_532.xys. Reading GSM618115_14460802_532.xys. Reading GSM618116_12835502_532.xys. Reading GSM618117_12756402_532.xys. Reading GSM618118_12756502_532.xys. Reading GSM618119_12758502_532.xys. Reading GSM618120_12758402_532.xys. Reading GSM618121_13325702_532.xys. Reading GSM618122_12760702_532.xys. Reading GSM618123_13327302_532.xys. Reading GSM618124_12765302_532.xys. Reading GSM618125_12765402_532.xys. Reading GSM618126_13923502_532.xys. Reading GSM618127_12781902_532.xys. Reading GSM618128_12766102_532.xys. Reading GSM618129_12780402_532.xys. Reading GSM618130_12782502_532.xys. > res = rma(rawData) Background correcting Normalizing Calculating Expression > res ExpressionSet (storageMode: lockedEnvironment) assayData: 193586 features, 24 samples element names: exprs protocolData rowNames: GSM618107_14418002_532.xys GSM618108_12742302_532.xys ... GSM618130_12782502_532.xys (24 total) varLabels: exprs dates varMetadata: labelDescription channel phenoData rowNames: GSM618107_14418002_532.xys GSM618108_12742302_532.xys ... GSM618130_12782502_532.xys (24 total) varLabels: index varMetadata: labelDescription channel featureData: none experimentData: use 'experimentData(object)' Annotation: pd.080501.gdr.malus.est.v4.exp > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=pt_BR.utf8 LC_NUMERIC=C [3] LC_TIME=pt_BR.utf8 LC_COLLATE=pt_BR.utf8 [5] LC_MONETARY=pt_BR.utf8 LC_MESSAGES=pt_BR.utf8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=pt_BR.utf8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] pd.080501.gdr.malus.est.v4.exp_0.0.1 pdInfoBuilder_1.24.0 [3] oligo_1.24.0 oligoClasses_1.22.0 [5] affxparser_1.32.1 RSQLite_0.11.4 [7] DBI_0.2-7 Biobase_2.20.0 [9] BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] affyio_1.28.0 BiocInstaller_1.10.2 Biostrings_2.28.0 [4] bit_1.1-10 codetools_0.2-8 ff_2.2-11 [7] foreach_1.4.1 GenomicRanges_1.12.4 IRanges_1.18.1 [10] iterators_1.0.6 preprocessCore_1.22.0 splines_3.0.1 [13] stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0 > 2013/6/13 Benilton Carvalho <beniltoncarvalho at="" gmail.com="">: > dont worry about that particular warning.... just install the package > and try to read your XYS files. > > 2013/6/13 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">: >> Dr. Carvalho, >> >> Yes. I see what you mean. >> Switching the columns helped in the FeatureSet table loading inserted more >> that 2 rows: >> >> Inserting 198661 rows into table featureSet... OK >> However, the warning message did print again. >> >> >> Warning message: >> In is.na(ndfdata[["SIGNAL"]]) : >> is.na() applied to non-(list or vector) of type 'NULL' >> >> Below is the output + sessionInfo(), as I upgraded to R 3.0.1. >> >> #Begin R command line code: >> >>> makePdInfoPackage(arrays, destDir = getwd(), unlink=TRUE) >> =================================================================== ====================================================================== ===================== >> >> >> Building annotation package for Nimblegen Expression Array >> NDF: pdinfo_GPL11164.ndf.txt <-new .ndf file with PROBE_ID<->SEQ_ID >> XYS: XYS.txt >> =================================================================== ====================================================================== ===================== >> Parsing file: pdinfo_GPL11164.ndf.txt... OK >> >> Parsing file: XYS.txt... OK >> Merging NDF and XYS files... OK >> Preparing contents for featureSet table... OK >> Preparing contents for bgfeature table... OK >> Preparing contents for pmfeature table... OK >> Creating package in E:/RANDOM/Test/Yanmin's Microarray Paper/Yanmin >> Microarray RAW/pd.pdinfo.gpl11164.ndf.txt >> Inserting 198661 rows into table featureSet... OK >> Inserting 770599 rows into table pmfeature... OK >> >> Counting rows in featureSet >> Counting rows in pmfeature >> Creating index idx_pmfsetid on pmfeature... OK >> Creating index idx_pmfid on pmfeature... OK >> Creating index idx_fsfsetid on featureSet... OK >> Saving DataFrame object for PM. >> Done. >> Warning message: >> In is.na(ndfdata[["SIGNAL"]]) : >> is.na() applied to non-(list or vector) of type 'NULL' >> >> >>> sessionInfo() >> R version 3.0.1 (2013-05-16) >> Platform: i386-w64-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >> States.1252 LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C LC_TIME=English_United >> States.1252 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> base >> >> other attached packages: >> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 >> affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7 >> Biobase_2.20.0 >> [8] BiocGenerics_0.6.0 BiocInstaller_1.10.2 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 >> codetools_0.2-8 ff_2.2-11 foreach_1.4.1 >> GenomicRanges_1.12.4 >> [8] IRanges_1.18.1 iterators_1.0.6 preprocessCore_1.22.0 >> splines_3.0.1 stats4_3.0.1 tools_3.0.1 >> zlibbioc_1.6.0 >> >> >> >>>q() >> >> >> >> The built pdInfopackage loaded in Destdir is identical to previous message. >> >> However the featureSet table now has more than 2 rows... >> >> Lastly, I did multiple combos, as my merged file has (X.x, Y.x)<-seems to be >> identifiers for the 'probe IDs' on the array as well as (X.y, Y.y) <- seems >> to be the sequence identifiers for the "SEQ_ID". I used X.x, Y.x and PM >> which gave the result I pasted above. All others had errors. I'm close, but >> that Warning Message is annoying... >> >> >> >> Regards, >> >> Franklin >> >> >> Great minds discuss ideas. Average minds discuss events. Small minds discuss >> people. -Eleanor Roosevelt >> >> >> >> >> ________________________________________ >> From: Benilton Carvalho [beniltoncarvalho at gmail.com] >> Sent: Wednesday, June 12, 2013 8:25 PM >> >> To: Johnson, Franklin Theodore >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] PAIR files -- feature set table >> >> That does not look ok. >> >> The problem is the count for the featureSet table... This table stores >> the information for "genes" (or whatever the target for this >> particular array is)... so, it is unlikely that you have a microarray >> with only 2 "target units"... I'd expect something around the >> thousands... >> >> pdInfoBuilder uses the information in SEQ_ID (in the NDF) to get the >> target information (i.e., the contents for featureSet). >> >> Given that this is a custom array, I believe that the best idea is to >> contact the person who designed it and ask more details about the >> design (in particular, how many probesets and average number of probes >> per probeset)... >> >> I've seen some designs in which the information that was expected to >> be in SEQ_ID was actually stored in PROBE_ID (in such cases, the user >> needs to create a backup copy of the NDF, and then move the contents >> of PROBE_ID to SEQ_ID - and vice-versa). >> >> b >> >> 2013/6/12 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">: >>> Dear Dr. Carvalho, >>> >>> Recently, we had cooresponence regaring makePDInfoPackage for an NimbleGen >>> apple microarray. >>> I was able to merge the ndf design and XYS files using PROBE_ID. >>> As a reminder this is a custom array, and there are no SIGNAL==NAs for >>> control probes. >>> It seemed to work: >>>> makePdInfoPackage(seed, destDir("")) >>> >>> ================================================================== ====================================================================== ==================== >>> Building annotation package for Nimblegen Expression Array >>> NDF: GPL11164.ndf >>> XYS: XYS.txt >>> >>> ================================================================== ====================================================================== ==================== >>> Parsing file: GPL11164.ndf... OK >>> Parsing file: XYS.txt... OK >>> Merging NDF and XYS files... OK >>> Preparing contents for featureSet table... OK >>> Preparing contents for bgfeature table... OK >>> Preparing contents for pmfeature table... OK >>> Creating package in >>> C:/Users/franklin.johnson.PW50-WEN/Desktop/Test/Yanmin's Microarray >>> Paper/Yanmin Microarray RAW/pd.gpl11164 >>> Inserting 2 rows into table featureSet... OK >>> Inserting 765524 rows into table pmfeature... OK >>> Inserting 5075 rows into table bgfeature... OK >>> Counting rows in bgfeature >>> Counting rows in featureSet >>> Counting rows in pmfeature >>> Creating index idx_bgfsetid on bgfeature... OK >>> Creating index idx_bgfid on bgfeature... OK >>> Creating index idx_pmfsetid on pmfeature... OK >>> Creating index idx_pmfid on pmfeature... OK >>> Creating index idx_fsfsetid on featureSet... OK >>> Saving DataFrame object for PM. >>> Saving DataFrame object for BG. >>> Done. >>> Warning message: >>> In is.na(ndfdata[["SIGNAL"]]) : >>> is.na() applied to non-(list or vector) of type 'NULL' >>>> >>> >>> In contrast to this warning message, I see a pdinfopackage directory with >>> 4 subdirectories: c=("data", "inst", "man", R"), as well as >>> subsubdirectories in "inst"=c("extdata", and "Unit Tests"), in addition to >>> two text files in the main directory: c=("DESCRIPTION", "NAMESPACE") were >>> created in my destination folder. >>> Before using "oligo", if possible, I wanted to confirm with you that this >>> package is viable to use with "oligo" although a warning message that may >>> not pertain to my custom designed microarray was printed. >>> >>> Regards, >>> Franklin >>> >>> Great minds discuss ideas. Average minds discuss events. Small minds >>> discuss people. -Eleanor Roosevelt >>> >>> >>> >>> >>> ________________________________________ >>> From: Johnson, Franklin Theodore >>> Sent: Friday, June 07, 2013 10:39 AM >>> To: Benilton Carvalho >>> Cc: bioconductor at r-project.org >>> Subject: RE: [BioC] PAIR files -- feature set table >>> >>> Resending to bioconductor message thread: >>> >>> Dear Dr. Carvalho, >>> Thanks for the response. >>> As you suggested, I will look into the merge function using "Probe_ID". >>> After reading in the data, I will start here: merge.datasets(dataset1, >>> dataset2, by="key"). >>> Best Regards, >>> Franklin >>> >>> Great minds discuss ideas. Average minds discuss events. Small minds >>> discuss people. -Eleanor Roosevelt >>> >>> ________________________________________ >>> From: Benilton Carvalho [beniltoncarvalho at gmail.com] >>> Sent: Thursday, June 06, 2013 8:11 PM >>> To: Johnson, Franklin Theodore >>> Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu >>> Subject: Re: [BioC] PAIR files -- feature set table >>> >>> You will need to merge the PAIR and the NDF using the PROBE_ID column >>> as key. This will allow you to get the X/Y coordinates needed to >>> create the XYS as described on the other messages. >>> >>> Regarding annotation, you may need to contact NimbleGen to request >>> this information directly from them... >>> >>> benilton >>> >>> 2013/6/6 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">: >>>> Dear Dr. Carvalho, >>>> >>>> Muchos grasias for the reply. >>>> >>>> Actually, this is what my .ndf file looks like: >>>>> head(ndf) >>>> PROBE_DESIGN_ID CONTAINER DESIGN_NOTE SELECTION_CRITERIA SEQ_ID >>>> 1 7552_0343_0009 Duplicate_1 >>>> 2 7552_0345_0009 Duplicate_2 >>>> 3 7552_0347_0009 Duplicate_1 >>>> 4 7552_0349_0009 Duplicate_2 >>>> 5 7552_0351_0009 Duplicate_2 >>>> 6 7552_0353_0009 Duplicate_1 >>>> PROBE_SEQUENCE MISMATCH >>>> MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS >>>> 1 cttgactcttctaagttcaaaggtaactcaagtgaagctgtcagatatgatccttcca 0 >>>> 64535488 64535488 9 343 >>>> 2 cccaagcattaaaccttactcatatacttataatgcagccatcaagagtttgtgcaagg 0 >>>> 64799310 64799310 9 345 >>>> 3 agggaggctgaaagagagagtgaatggtccagctgggcataattgctgca 0 >>>> 64476989 64476989 9 347 >>>> 4 ttgttggtgggggtgttgcccttagtaccccagaccttgaagcagttaaa 0 >>>> 64862794 64862794 9 349 >>>> 5 gtgtggggccccctttctttaactggaacctttctttgaagcaatttggg 0 >>>> 64832726 64832726 9 351 >>>> 6 ttgtccaattccaacatgccgagacggcagggattgtgatcgtgttgttc 0 >>>> 64435686 64435686 9 353 >>>> PROBE_ID POSITION DESIGN_ID X Y >>>> 1 Contig19819_1_f_28_10_535 0 7552 343 9 >>>> 2 Malus_CN899188_2_f_147_1_755 0 7552 345 9 >>>> 3 Contig20738_8_r_1179_2_1432 0 7552 347 9 >>>> 4 Malus_CN880097_2_r_336_2_536 0 7552 349 9 >>>> 5 Malus_CN918117_2_f_632_1_781 0 7552 351 9 >>>> 6 Contig1991_1_f_71_2_1239 0 7552 353 9 >>>> >>>> The pair files, .532 pair files only (one-color arrays), only obtain the >>>> probe ID and signal; after some text at the top describing the experiment. >>>> My real issue is that I can further normalize and analyze the RMA files with >>>> sva and limma, etc. However, I cannot annotate the probes without the array >>>> annotation, as there are duplicates in the ndf file which are removed in the >>>> RMA.pair files available on NCBI/GEO. So they will not match in any >>>> annotation package I've failed at trying. >>>> So, I' tried to go back and start from the raw pair files...this custom >>>> array is really a "custom" array without >>>> NimbleScan. >>>> >>>> Salud, >>>> Franklin >>>> >>>> >>>> >>>> >>>> >>>> >>>> Great minds discuss ideas. Average minds discuss events. Small minds >>>> discuss people. -Eleanor Roosevelt >>>> >>>> >>>> >>>> >>>> ________________________________________ >>>> From: Benilton Carvalho [beniltoncarvalho at gmail.com] >>>> Sent: Wednesday, June 05, 2013 6:42 PM >>>> To: FRANKLIN JOHNSON [guest] >>>> Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu; pdInfoBuilder >>>> Maintainer >>>> Subject: Re: [BioC] PAIR files -- feature set table >>>> >>>> It's an unfortunate mistake to have the pairFile *argument* in the >>>> call (not in the slots session, but I see your point). :-( I'll make >>>> sure that this is fixed. >>>> >>>> You need to convert the PAIR files to XYS... >>>> >>>> Some refs that should help you in the process: >>>> >>>> https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html >>>> >>>> http://comments.gmane.org/gmane.science.biology.informatics.condu ctor/27547 >>>> >>>> b >>>> >>>> 2013/6/5 FRANKLIN JOHNSON [guest] <guest at="" bioconductor.org="">: >>>>> >>>>> Dear Maintainer, >>>>> >>>>> I downloaded available NimbleGen 'single channel' 532.PAIR files for a >>>>> custom built expression microarray from NCBI/GEO (GPL11164). However, I get >>>>> an error message when I try to make the annotation for this platform using >>>>> pdInfoBuild. >>>>> >>>>> In pdInfoBuilder Reference Manual (June 5, 2013), under the >>>>> NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although, >>>>> showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I >>>>> changed the .pair file extension to .xys. >>>>> >>>>> (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read >>>>> annotation file >>>>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray >>>>> Paper/Yanmin Microarray RAW/GPL11164.ndf" >>>>> >>>>> (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1]) >>>>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray >>>>> Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys" >>>>> >>>>> But, doing this resulted in an error message: >>>>> seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys, >>>>> author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD") >>>>> >>>>> makePdInfoPackage(arrays, destDir = getwd()) >>>>> >>>>> ================================================================ ====================================================================== ====== >>>>> Building annotation package for Nimblegen Expression Array >>>>> NDF: GPL11164.ndf >>>>> XYS: GSM618107_14418002_532.xys >>>>> >>>>> ================================================================ ====================================================================== ====== >>>>> Parsing file: GPL11164.ndf... OK >>>>> Parsing file: GSM618107_14418002_532.xys... OK >>>>> Merging NDF and XYS files... OK >>>>> Preparing contents for featureSet table... Error in >>>>> `[.data.frame`(ndfdata, , colsFS) : undefined columns selected >>>>> In addition: Warning message: >>>>> In is.na(ndfdata[["SIGNAL"]]) : >>>>> is.na() applied to non-(list or vector) of type 'NULL' >>>>> >>>>> The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It >>>>> seems .xys has a different arrangement than .pair, thus .ndf is not >>>>> applicable to annotate the .pair file? Any suggestions? >>>>> Hope to hear from you soon. >>>>> Franklin >>>>> >>>>> -- output of sessionInfo(): >>>>> >>>>>> sessionInfo() >>>>> R version 3.0.1 (2013-05-16) >>>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>>> >>>>> locale: >>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>>>> States.1252 LC_MONETARY=English_United States.1252 >>>>> [4] LC_NUMERIC=C LC_TIME=English_United >>>>> States.1252 >>>>> >>>>> attached base packages: >>>>> [1] tcltk grid parallel stats graphics grDevices utils >>>>> datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 >>>>> affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7 >>>>> [7] Mfuzz_2.18.0 DynDoc_1.38.0 widgetTools_1.38.0 >>>>> e1071_1.6-1 class_7.3-7 gplots_2.11.0.1 >>>>> [13] KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2 >>>>> gtools_2.7.1 timecourse_1.32.0 MASS_7.3-26 >>>>> [19] Biobase_2.20.0 BiocGenerics_0.6.0 limma_3.16.5 >>>>> ggplot2_0.9.3.1 BiocInstaller_1.10.1 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 >>>>> bitops_1.0-5 codetools_0.2-8 colorspace_1.2-2 >>>>> [7] dichromat_2.0-0 digest_0.6.3 ff_2.2-11 >>>>> foreach_1.4.0 GenomicRanges_1.12.4 gtable_0.1.2 >>>>> [13] IRanges_1.18.1 iterators_1.0.6 labeling_0.1 >>>>> marray_1.38.0 munsell_0.4 plyr_1.8 >>>>> [19] preprocessCore_1.22.0 proto_0.3-10 RColorBrewer_1.0-5 >>>>> reshape2_1.2.2 scales_0.2.3 splines_3.0.1 >>>>> [25] stats4_3.0.1 stringr_0.6.2 tkWidgets_1.38.0 >>>>> tools_3.0.1 zlibbioc_1.6.0 >>>>>> >>>>> >>>>> >>>>> -- >>>>> Sent via the guest posting facility at bioconductor.org. >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>
ADD REPLYlink written 4.5 years ago by Johnson, Franklin Theodore140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 152 users visited in the last hour