Entering edit mode
Johnson, Franklin Theodore
▴
140
@johnson-franklin-theodore-5427
Last seen 10.3 years ago
Dear Dr. Carvalho,
Thanks for the reply.
I saw the thread of FAQs how to read in the annotation package made
using pdInfoBuilder.
For anyone having issues, it seems as straight forward as:
#install pdinfo.gpl11164.ndf.txt
install.packages("pd.pdinfo.gpl11164.ndf.txt", type="source",
repos=NULL)
Installing package into ?C:/Users/ZHUGRP/Documents/R/win-library/3.0?
(as ?lib? is unspecified)
* installing *source* package 'pd.pdinfo.gpl11164.ndf.txt' ...
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (pd.pdinfo.gpl11164.ndf.txt)
######################################################################
######################################
I am currently trying to make the FeatureExpressionSet with my
converted PAIR -> XYS.txt files unfortunately obtaining X/Y/S only.
NimbleScan expected .tiff files to read into the software. These files
were not available from NCBI/GEO. NimbleGen also did not respond to my
inquiry regarding this matter to be able to obtain XYS files from
available PAIR files. Using R, I'm testing 12 of 24 tab-delimited XYS
files, to also test the annotation package made using pdInfoBuilder.
#read in files from wd()
filelist=list.files(pattern=".*.txt")
> filelist
[1] "GSM01.txt" "GSM02.txt" "GSM03.txt" "GSM04.txt" "GSM05.txt"
"GSM06.txt" "GSM07.txt" "GSM08.txt" "GSM09.txt" "GSM10.txt"
"GSM11.txt" "GSM12.txt"
#read in each data file in filelist as a matrix to make EFS object
> datalist=lapply(filelist, function(x)as.matrix(read.table(x,
header=T, sep="\t", as.is=T)))
#construct phenoData frame
> theData=data.frame(Key=rep(c("Week0","Week-2","Week-4"), each=4))
> rownames(theData)=basename(filelist)
> pd=new("AnnotatedDataFrame", data=theData)
....
However, I fail the EFS construction:
hardline=new("ExpressionFeatureSet", datalist, phenoData=pd,
annotation=library(pd.pdinfo.gpl11164.ndf.txt))
Error in .names_found_unique(names(value), names(object)) :
'sampleNames' replacement list must have unique named elements
corresponding to assayData element names
To confirm,
> sampleNames(datalist)
[1] "X" "Y" "PM"
So, it seems EFS is expecting unique sampleNames for each file in
filelist?
How to read in multiple files into an efs object, as is done with
read.xysfiles? Is this doable?
Is it necessary to execute datalist=lapply(filelist,
function(x)as.matrix(read.table(x, header=T, sep="\t", as.is=T)))
surrounded with Booleans to make the object TRUE, per se?
i.e. (datalist=lapply(filelist, function(x)as.matrix(read.table(x,
header=T, sep="\t", as.is=T))) )
Best Regards,
Franklin
Great minds discuss ideas. Average minds discuss events. Small minds
discuss people. -Eleanor Roosevelt
________________________________________
From: Benilton Carvalho [beniltoncarvalho@gmail.com]
Sent: Thursday, June 13, 2013 4:43 PM
To: Johnson, Franklin Theodore
Cc: bioconductor at r-project.org
Subject: Re: [BioC] PAIR files -- feature set table
dont worry about that particular warning.... just install the package
and try to read your XYS files.
2013/6/13 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">:
> Dr. Carvalho,
>
> Yes. I see what you mean.
> Switching the columns helped in the FeatureSet table loading
inserted more
> that 2 rows:
>
> Inserting 198661 rows into table featureSet... OK
> However, the warning message did print again.
>
>
> Warning message:
> In is.na(ndfdata[["SIGNAL"]]) :
> is.na() applied to non-(list or vector) of type 'NULL'
>
> Below is the output + sessionInfo(), as I upgraded to R 3.0.1.
>
> #Begin R command line code:
>
>> makePdInfoPackage(arrays, destDir = getwd(), unlink=TRUE)
> ====================================================================
======================================================================
====================
>
>
> Building annotation package for Nimblegen Expression Array
> NDF: pdinfo_GPL11164.ndf.txt <-new .ndf file with PROBE_ID<->SEQ_ID
> XYS: XYS.txt
> ====================================================================
======================================================================
====================
> Parsing file: pdinfo_GPL11164.ndf.txt... OK
>
> Parsing file: XYS.txt... OK
> Merging NDF and XYS files... OK
> Preparing contents for featureSet table... OK
> Preparing contents for bgfeature table... OK
> Preparing contents for pmfeature table... OK
> Creating package in E:/RANDOM/Test/Yanmin's Microarray Paper/Yanmin
> Microarray RAW/pd.pdinfo.gpl11164.ndf.txt
> Inserting 198661 rows into table featureSet... OK
> Inserting 770599 rows into table pmfeature... OK
>
> Counting rows in featureSet
> Counting rows in pmfeature
> Creating index idx_pmfsetid on pmfeature... OK
> Creating index idx_pmfid on pmfeature... OK
> Creating index idx_fsfsetid on featureSet... OK
> Saving DataFrame object for PM.
> Done.
> Warning message:
> In is.na(ndfdata[["SIGNAL"]]) :
> is.na() applied to non-(list or vector) of type 'NULL'
>
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: i386-w64-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252 LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C LC_TIME=English_United
> States.1252
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
methods
> base
>
> other attached packages:
> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0
> affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7
> Biobase_2.20.0
> [8] BiocGenerics_0.6.0 BiocInstaller_1.10.2
>
> loaded via a namespace (and not attached):
> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10
> codetools_0.2-8 ff_2.2-11 foreach_1.4.1
> GenomicRanges_1.12.4
> [8] IRanges_1.18.1 iterators_1.0.6
preprocessCore_1.22.0
> splines_3.0.1 stats4_3.0.1 tools_3.0.1
> zlibbioc_1.6.0
>
>
>
>>q()
>
>
>
> The built pdInfopackage loaded in Destdir is identical to previous
message.
>
> However the featureSet table now has more than 2 rows...
>
> Lastly, I did multiple combos, as my merged file has (X.x,
Y.x)<-seems to be
> identifiers for the 'probe IDs' on the array as well as (X.y, Y.y)
<- seems
> to be the sequence identifiers for the "SEQ_ID". I used X.x, Y.x and
PM
> which gave the result I pasted above. All others had errors. I'm
close, but
> that Warning Message is annoying...
>
>
>
> Regards,
>
> Franklin
>
>
> Great minds discuss ideas. Average minds discuss events. Small minds
discuss
> people. -Eleanor Roosevelt
>
>
>
>
> ________________________________________
> From: Benilton Carvalho [beniltoncarvalho at gmail.com]
> Sent: Wednesday, June 12, 2013 8:25 PM
>
> To: Johnson, Franklin Theodore
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] PAIR files -- feature set table
>
> That does not look ok.
>
> The problem is the count for the featureSet table... This table
stores
> the information for "genes" (or whatever the target for this
> particular array is)... so, it is unlikely that you have a
microarray
> with only 2 "target units"... I'd expect something around the
> thousands...
>
> pdInfoBuilder uses the information in SEQ_ID (in the NDF) to get the
> target information (i.e., the contents for featureSet).
>
> Given that this is a custom array, I believe that the best idea is
to
> contact the person who designed it and ask more details about the
> design (in particular, how many probesets and average number of
probes
> per probeset)...
>
> I've seen some designs in which the information that was expected to
> be in SEQ_ID was actually stored in PROBE_ID (in such cases, the
user
> needs to create a backup copy of the NDF, and then move the contents
> of PROBE_ID to SEQ_ID - and vice-versa).
>
> b
>
> 2013/6/12 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">:
>> Dear Dr. Carvalho,
>>
>> Recently, we had cooresponence regaring makePDInfoPackage for an
NimbleGen
>> apple microarray.
>> I was able to merge the ndf design and XYS files using PROBE_ID.
>> As a reminder this is a custom array, and there are no SIGNAL==NAs
for
>> control probes.
>> It seemed to work:
>>> makePdInfoPackage(seed, destDir(""))
>>
>> ===================================================================
======================================================================
===================
>> Building annotation package for Nimblegen Expression Array
>> NDF: GPL11164.ndf
>> XYS: XYS.txt
>>
>> ===================================================================
======================================================================
===================
>> Parsing file: GPL11164.ndf... OK
>> Parsing file: XYS.txt... OK
>> Merging NDF and XYS files... OK
>> Preparing contents for featureSet table... OK
>> Preparing contents for bgfeature table... OK
>> Preparing contents for pmfeature table... OK
>> Creating package in
>> C:/Users/franklin.johnson.PW50-WEN/Desktop/Test/Yanmin's Microarray
>> Paper/Yanmin Microarray RAW/pd.gpl11164
>> Inserting 2 rows into table featureSet... OK
>> Inserting 765524 rows into table pmfeature... OK
>> Inserting 5075 rows into table bgfeature... OK
>> Counting rows in bgfeature
>> Counting rows in featureSet
>> Counting rows in pmfeature
>> Creating index idx_bgfsetid on bgfeature... OK
>> Creating index idx_bgfid on bgfeature... OK
>> Creating index idx_pmfsetid on pmfeature... OK
>> Creating index idx_pmfid on pmfeature... OK
>> Creating index idx_fsfsetid on featureSet... OK
>> Saving DataFrame object for PM.
>> Saving DataFrame object for BG.
>> Done.
>> Warning message:
>> In is.na(ndfdata[["SIGNAL"]]) :
>> is.na() applied to non-(list or vector) of type 'NULL'
>>>
>>
>> In contrast to this warning message, I see a pdinfopackage
directory with
>> 4 subdirectories: c=("data", "inst", "man", R"), as well as
>> subsubdirectories in "inst"=c("extdata", and "Unit Tests"), in
addition to
>> two text files in the main directory: c=("DESCRIPTION",
"NAMESPACE") were
>> created in my destination folder.
>> Before using "oligo", if possible, I wanted to confirm with you
that this
>> package is viable to use with "oligo" although a warning message
that may
>> not pertain to my custom designed microarray was printed.
>>
>> Regards,
>> Franklin
>>
>> Great minds discuss ideas. Average minds discuss events. Small
minds
>> discuss people. -Eleanor Roosevelt
>>
>>
>>
>>
>> ________________________________________
>> From: Johnson, Franklin Theodore
>> Sent: Friday, June 07, 2013 10:39 AM
>> To: Benilton Carvalho
>> Cc: bioconductor at r-project.org
>> Subject: RE: [BioC] PAIR files -- feature set table
>>
>> Resending to bioconductor message thread:
>>
>> Dear Dr. Carvalho,
>> Thanks for the response.
>> As you suggested, I will look into the merge function using
"Probe_ID".
>> After reading in the data, I will start here:
merge.datasets(dataset1,
>> dataset2, by="key").
>> Best Regards,
>> Franklin
>>
>> Great minds discuss ideas. Average minds discuss events. Small
minds
>> discuss people. -Eleanor Roosevelt
>>
>> ________________________________________
>> From: Benilton Carvalho [beniltoncarvalho at gmail.com]
>> Sent: Thursday, June 06, 2013 8:11 PM
>> To: Johnson, Franklin Theodore
>> Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu
>> Subject: Re: [BioC] PAIR files -- feature set table
>>
>> You will need to merge the PAIR and the NDF using the PROBE_ID
column
>> as key. This will allow you to get the X/Y coordinates needed to
>> create the XYS as described on the other messages.
>>
>> Regarding annotation, you may need to contact NimbleGen to request
>> this information directly from them...
>>
>> benilton
>>
>> 2013/6/6 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">:
>>> Dear Dr. Carvalho,
>>>
>>> Muchos grasias for the reply.
>>>
>>> Actually, this is what my .ndf file looks like:
>>>> head(ndf)
>>> PROBE_DESIGN_ID CONTAINER DESIGN_NOTE SELECTION_CRITERIA
SEQ_ID
>>> 1 7552_0343_0009 Duplicate_1
>>> 2 7552_0345_0009 Duplicate_2
>>> 3 7552_0347_0009 Duplicate_1
>>> 4 7552_0349_0009 Duplicate_2
>>> 5 7552_0351_0009 Duplicate_2
>>> 6 7552_0353_0009 Duplicate_1
>>> PROBE_SEQUENCE
MISMATCH
>>> MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS
>>> 1 cttgactcttctaagttcaaaggtaactcaagtgaagctgtcagatatgatccttcca
0
>>> 64535488 64535488 9 343
>>> 2 cccaagcattaaaccttactcatatacttataatgcagccatcaagagtttgtgcaagg
0
>>> 64799310 64799310 9 345
>>> 3 agggaggctgaaagagagagtgaatggtccagctgggcataattgctgca
0
>>> 64476989 64476989 9 347
>>> 4 ttgttggtgggggtgttgcccttagtaccccagaccttgaagcagttaaa
0
>>> 64862794 64862794 9 349
>>> 5 gtgtggggccccctttctttaactggaacctttctttgaagcaatttggg
0
>>> 64832726 64832726 9 351
>>> 6 ttgtccaattccaacatgccgagacggcagggattgtgatcgtgttgttc
0
>>> 64435686 64435686 9 353
>>> PROBE_ID POSITION DESIGN_ID X Y
>>> 1 Contig19819_1_f_28_10_535 0 7552 343 9
>>> 2 Malus_CN899188_2_f_147_1_755 0 7552 345 9
>>> 3 Contig20738_8_r_1179_2_1432 0 7552 347 9
>>> 4 Malus_CN880097_2_r_336_2_536 0 7552 349 9
>>> 5 Malus_CN918117_2_f_632_1_781 0 7552 351 9
>>> 6 Contig1991_1_f_71_2_1239 0 7552 353 9
>>>
>>> The pair files, .532 pair files only (one-color arrays), only
obtain the
>>> probe ID and signal; after some text at the top describing the
experiment.
>>> My real issue is that I can further normalize and analyze the RMA
files with
>>> sva and limma, etc. However, I cannot annotate the probes without
the array
>>> annotation, as there are duplicates in the ndf file which are
removed in the
>>> RMA.pair files available on NCBI/GEO. So they will not match in
any
>>> annotation package I've failed at trying.
>>> So, I' tried to go back and start from the raw pair files...this
custom
>>> array is really a "custom" array without
>>> NimbleScan.
>>>
>>> Salud,
>>> Franklin
>>>
>>>
>>>
>>>
>>>
>>>
>>> Great minds discuss ideas. Average minds discuss events. Small
minds
>>> discuss people. -Eleanor Roosevelt
>>>
>>>
>>>
>>>
>>> ________________________________________
>>> From: Benilton Carvalho [beniltoncarvalho at gmail.com]
>>> Sent: Wednesday, June 05, 2013 6:42 PM
>>> To: FRANKLIN JOHNSON [guest]
>>> Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu;
pdInfoBuilder
>>> Maintainer
>>> Subject: Re: [BioC] PAIR files -- feature set table
>>>
>>> It's an unfortunate mistake to have the pairFile *argument* in the
>>> call (not in the slots session, but I see your point). :-( I'll
make
>>> sure that this is fixed.
>>>
>>> You need to convert the PAIR files to XYS...
>>>
>>> Some refs that should help you in the process:
>>>
>>>
https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html
>>>
>>> http://comments.gmane.org/gmane.science.biology.informatics.conduc
tor/27547
>>>
>>> b
>>>
>>> 2013/6/5 FRANKLIN JOHNSON [guest] <guest at="" bioconductor.org="">:
>>>>
>>>> Dear Maintainer,
>>>>
>>>> I downloaded available NimbleGen 'single channel' 532.PAIR files
for a
>>>> custom built expression microarray from NCBI/GEO (GPL11164).
However, I get
>>>> an error message when I try to make the annotation for this
platform using
>>>> pdInfoBuild.
>>>>
>>>> In pdInfoBuilder Reference Manual (June 5, 2013), under the
>>>> NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile,
although,
>>>> showClasses("Ngs.."), does not show a slot for this, only, XYS.
Thus, I
>>>> changed the .pair file extension to .xys.
>>>>
>>>> (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) #
read
>>>> annotation file
>>>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's
Microarray
>>>> Paper/Yanmin Microarray RAW/GPL11164.ndf"
>>>>
>>>> (xys <- list.files(getwd(), pattern = ".xys", full.names =
TRUE)[1])
>>>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's
Microarray
>>>> Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys"
>>>>
>>>> But, doing this resulted in an error message:
>>>> seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile
= xys,
>>>> author = "FJ", organism = "Apple", species = "Malus x Domestica
cv.GD")
>>>>
>>>> makePdInfoPackage(arrays, destDir = getwd())
>>>>
>>>> =================================================================
======================================================================
=====
>>>> Building annotation package for Nimblegen Expression Array
>>>> NDF: GPL11164.ndf
>>>> XYS: GSM618107_14418002_532.xys
>>>>
>>>> =================================================================
======================================================================
=====
>>>> Parsing file: GPL11164.ndf... OK
>>>> Parsing file: GSM618107_14418002_532.xys... OK
>>>> Merging NDF and XYS files... OK
>>>> Preparing contents for featureSet table... Error in
>>>> `[.data.frame`(ndfdata, , colsFS) : undefined columns selected
>>>> In addition: Warning message:
>>>> In is.na(ndfdata[["SIGNAL"]]) :
>>>> is.na() applied to non-(list or vector) of type 'NULL'
>>>>
>>>> The only files available from NCBI/GEO are 24 PAIR files and 1
ndf. It
>>>> seems .xys has a different arrangement than .pair, thus .ndf is
not
>>>> applicable to annotate the .pair file? Any suggestions?
>>>> Hope to hear from you soon.
>>>> Franklin
>>>>
>>>> -- output of sessionInfo():
>>>>
>>>>> sessionInfo()
>>>> R version 3.0.1 (2013-05-16)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252
LC_CTYPE=English_United
>>>> States.1252 LC_MONETARY=English_United States.1252
>>>> [4] LC_NUMERIC=C LC_TIME=English_United
>>>> States.1252
>>>>
>>>> attached base packages:
>>>> [1] tcltk grid parallel stats graphics grDevices
utils
>>>> datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] pdInfoBuilder_1.24.0 oligo_1.24.0
oligoClasses_1.22.0
>>>> affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7
>>>> [7] Mfuzz_2.18.0 DynDoc_1.38.0 widgetTools_1.38.0
>>>> e1071_1.6-1 class_7.3-7 gplots_2.11.0.1
>>>> [13] KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2
>>>> gtools_2.7.1 timecourse_1.32.0 MASS_7.3-26
>>>> [19] Biobase_2.20.0 BiocGenerics_0.6.0 limma_3.16.5
>>>> ggplot2_0.9.3.1 BiocInstaller_1.10.1
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10
>>>> bitops_1.0-5 codetools_0.2-8 colorspace_1.2-2
>>>> [7] dichromat_2.0-0 digest_0.6.3 ff_2.2-11
>>>> foreach_1.4.0 GenomicRanges_1.12.4 gtable_0.1.2
>>>> [13] IRanges_1.18.1 iterators_1.0.6 labeling_0.1
>>>> marray_1.38.0 munsell_0.4 plyr_1.8
>>>> [19] preprocessCore_1.22.0 proto_0.3-10
RColorBrewer_1.0-5
>>>> reshape2_1.2.2 scales_0.2.3 splines_3.0.1
>>>> [25] stats4_3.0.1 stringr_0.6.2 tkWidgets_1.38.0
>>>> tools_3.0.1 zlibbioc_1.6.0
>>>>>
>>>>
>>>>
>>>> --
>>>> Sent via the guest posting facility at bioconductor.org.
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>