Johnson, Franklin Theodore
Last seen 10.5 years ago
Dear Dr. Carvalho,
Thanks for the reply.
I saw the thread of FAQs how to read in the annotation package made
using pdInfoBuilder.
For anyone having issues, it seems as straight forward as:
#install pdinfo.gpl11164.ndf.txt
install.packages("pd.pdinfo.gpl11164.ndf.txt", type="source",
Installing package into ?C:/Users/ZHUGRP/Documents/R/win-library/3.0?
(as ?lib? is unspecified)
* installing *source* package 'pd.pdinfo.gpl11164.ndf.txt' ...
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (pd.pdinfo.gpl11164.ndf.txt)
I am currently trying to make the FeatureExpressionSet with my
converted PAIR -> XYS.txt files unfortunately obtaining X/Y/S only.
NimbleScan expected .tiff files to read into the software. These files
were not available from NCBI/GEO. NimbleGen also did not respond to my
inquiry regarding this matter to be able to obtain XYS files from
available PAIR files. Using R, I'm testing 12 of 24 tab-delimited XYS
files, to also test the annotation package made using pdInfoBuilder.
#read in files from wd()
> filelist
[1] "GSM01.txt" "GSM02.txt" "GSM03.txt" "GSM04.txt" "GSM05.txt"
"GSM06.txt" "GSM07.txt" "GSM08.txt" "GSM09.txt" "GSM10.txt"
"GSM11.txt" "GSM12.txt"
#read in each data file in filelist as a matrix to make EFS object
> datalist=lapply(filelist, function(x)as.matrix(read.table(x,
header=T, sep="\t", as.is=T)))
#construct phenoData frame
> theData=data.frame(Key=rep(c("Week0","Week-2","Week-4"), each=4))
> rownames(theData)=basename(filelist)
> pd=new("AnnotatedDataFrame", data=theData)
However, I fail the EFS construction:
hardline=new("ExpressionFeatureSet", datalist, phenoData=pd,
Error in .names_found_unique(names(value), names(object)) :
'sampleNames' replacement list must have unique named elements
corresponding to assayData element names
To confirm,
> sampleNames(datalist)
[1] "X" "Y" "PM"
So, it seems EFS is expecting unique sampleNames for each file in
How to read in multiple files into an efs object, as is done with
read.xysfiles? Is this doable?
Is it necessary to execute datalist=lapply(filelist,
function(x)as.matrix(read.table(x, header=T, sep="\t", as.is=T)))
surrounded with Booleans to make the object TRUE, per se?
i.e. (datalist=lapply(filelist, function(x)as.matrix(read.table(x,
header=T, sep="\t", as.is=T))) )
Best Regards,
From: Benilton Carvalho [beniltoncarvalho@gmail.com]
Sent: Thursday, June 13, 2013 4:43 PM
To: Johnson, Franklin Theodore
Cc: bioconductor at r-project.org
Subject: Re: [BioC] PAIR files -- feature set table
dont worry about that particular warning.... just install the package
and try to read your XYS files.
2013/6/13 Johnson, Franklin Theodore <franklin.johnson at="" email.wsu.edu="">:
> Dr. Carvalho,
> Yes. I see what you mean.
> Switching the columns helped in the FeatureSet table loading
inserted more
> that 2 rows:
> Inserting 198661 rows into table featureSet... OK
> However, the warning message did print again.
> Warning message:
> In is.na(ndfdata[["SIGNAL"]]) :
> is.na() applied to non-(list or vector) of type 'NULL'
> Below is the output + sessionInfo(), as I upgraded to R 3.0.1.
> #Begin R command line code:
>> makePdInfoPackage(arrays, destDir = getwd(), unlink=TRUE)
> ====================================================================
> Building annotation package for Nimblegen Expression Array
> NDF: pdinfo_GPL11164.ndf.txt <-new .ndf file with PROBE_ID<->SEQ_ID
> XYS: XYS.txt
> ====================================================================
> Parsing file: pdinfo_GPL11164.ndf.txt... OK
> Parsing file: XYS.txt... OK
> Merging NDF and XYS files... OK
> Preparing contents for featureSet table... OK
> Preparing contents for bgfeature table... OK
> Preparing contents for pmfeature table... OK
> Creating package in E:/RANDOM/Test/Yanmin's Microarray Paper/Yanmin
> Microarray RAW/pd.pdinfo.gpl11164.ndf.txt
> Inserting 198661 rows into table featureSet... OK
> Inserting 770599 rows into table pmfeature... OK
> Counting rows in featureSet
> Counting rows in pmfeature
> Creating index idx_pmfsetid on pmfeature... OK
> Creating index idx_pmfid on pmfeature... OK
> Creating index idx_fsfsetid on featureSet... OK
> Saving DataFrame object for PM.
> Done.
> Warning message:
> In is.na(ndfdata[["SIGNAL"]]) :
> is.na() applied to non-(list or vector) of type 'NULL'
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: i386-w64-mingw32/i386 (32-bit)
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252 LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C LC_TIME=English_United
> States.1252
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
> base
> other attached packages:
> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0
> affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7
> Biobase_2.20.0
> [8] BiocGenerics_0.6.0 BiocInstaller_1.10.2
> loaded via a namespace (and not attached):
> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10
> codetools_0.2-8 ff_2.2-11 foreach_1.4.1
> GenomicRanges_1.12.4
> [8] IRanges_1.18.1 iterators_1.0.6
> splines_3.0.1 stats4_3.0.1 tools_3.0.1
> zlibbioc_1.6.0
> The built pdInfopackage loaded in Destdir is identical to previous
> However the featureSet table now has more than 2 rows...
> Lastly, I did multiple combos, as my merged file has (X.x,
Y.x)<-seems to be
> identifiers for the 'probe IDs' on the array as well as (X.y, Y.y)
<- seems
> to be the sequence identifiers for the "SEQ_ID". I used X.x, Y.x and
> which gave the result I pasted above. All others had errors. I'm
close, but
> that Warning Message is annoying...
> Regards,
> Franklin
> From: Benilton Carvalho [beniltoncarvalho at gmail.com]
> Sent: Wednesday, June 12, 2013 8:25 PM
> To: Johnson, Franklin Theodore
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] PAIR files -- feature set table
> That does not look ok.
> The problem is the count for the featureSet table... This table
> the information for "genes" (or whatever the target for this
> particular array is)... so, it is unlikely that you have a
> with only 2 "target units"... I'd expect something around the
> thousands...
> pdInfoBuilder uses the information in SEQ_ID (in the NDF) to get the
> target information (i.e., the contents for featureSet).
> Given that this is a custom array, I believe that the best idea is
> contact the person who designed it and ask more details about the
> design (in particular, how many probesets and average number of
> per probeset)...
> I've seen some designs in which the information that was expected to
> be in SEQ_ID was actually stored in PROBE_ID (in such cases, the
> needs to create a backup copy of the NDF, and then move the contents
> of PROBE_ID to SEQ_ID - and vice-versa).
> b
