bead-level data from Infinium methylation arrays
1
0
Entering edit mode
Tim ▴ 160
@tim-2058
Last seen 9.6 years ago
Hello bioconductor-list subscribers, I am interested in various approaches to preprocessing and normalizing Infinium data, some rather different from Illumina's. I noticed that the 'beadarray' package has the ability to read in bead-level data, and while I'm sure this isn't the most scalable solution, given the way that Infinium arrays are structured, the thought occurred that dealing with one problem at a time (eg. reading the data into a sensible structure, THEN dealing with memory issues perhaps using R.huge etc.) might be a good idea. From the raw data, I constructed a 'targets' file (attached) and then attempted to pull in the bead-level information from a couple of arrays (so as not to exceed my laptop's RAM; I have raw data for 72 arrays on 8 slides to start with): > Mack <- readIllumina(arrayNames = targets$ArrayName, targets=targets, backgroundMethod = "none", singleChannel=FALSE, metrics=TRUE) Found 2 arrays Error in order(dat1$ProbeID) : object 'dat1' not found In addition: Warning message: In readIllumina(arrayNames = targets$ArrayName, targets = targets, : No annotation package was specified. Need to use SetAnnotation later The data structure 'dat1' is buried within a switch statement and each branch can apparently throw the error message seen above. I assume that, once I get the data read into a reasonable form, I can deal with the lack of annotation files later (in fact, I ought to be able to extract that from Illumina's own manifest, correct?). But the current error is puzzling me and there is no point (for my project) working with anything other than bead-level data. On the off chance that someone else has seen this before, I figured I'd try the list. Any assistance, suggestions, constructive criticism ("hey why aren't you using 'otherpackage'!?!"), etc. would be most appreciated. Thanks in advance, --tim [[alternative HTML version deleted]]
Annotation Preprocessing Annotation Preprocessing • 908 views
ADD COMMENT
0
Entering edit mode
Mark Dunning ★ 1.1k
@mark-dunning-3319
Last seen 12 months ago
Sheffield, Uk
Hi Tim, Do you know what scanning software was used to create these bead-level data? BeadScan or the newer iScan system? I'm wondering if the format of the files has changed since we wrote readIllumina. When the object 'dat1' is created in readIllumina it assumes a set number of columns in the bead-level text files (4,6 or 7) so if the number of columns is something different then this dat1 object will not be created causing the function to error. Are you able to read in the .txt files and print out the first 10 lines so that I can see what columns there are in the file? Regards, Mark On Tue, Jul 7, 2009 at 11:46 PM, Tim Triche, Jr.<ttriche at="" usc.edu=""> wrote: > Hello bioconductor-list subscribers, > > ? I am interested in various approaches to preprocessing and normalizing > Infinium data, some rather different from Illumina's. ?I noticed that the > 'beadarray' package has the ability to read in bead-level data, and while > I'm sure this isn't the most scalable solution, given the way that Infinium > arrays are structured, the thought occurred that dealing with one problem at > a time (eg. reading the data into a sensible structure, THEN dealing with > memory issues perhaps using R.huge etc.) might be a good idea. > > ?From the raw data, I constructed a 'targets' file (attached) and then > attempted to pull in the bead-level information from a couple of arrays (so > as not to exceed my laptop's RAM; I have raw data for 72 arrays on 8 slides > to start with): > >> Mack <- readIllumina(arrayNames = targets$ArrayName, > ? ? ? ? ? ? ? ? ? ? ? targets=targets, backgroundMethod = "none", > ? ? ? ? ? ? ? ? ? ? ? singleChannel=FALSE, metrics=TRUE) > Found 2 arrays > Error in order(dat1$ProbeID) : object 'dat1' not found > In addition: Warning message: > In readIllumina(arrayNames = targets$ArrayName, targets = targets, ?: > ?No annotation package was specified. > ?Need to use SetAnnotation later > > The data structure 'dat1' is buried within a switch statement and each > branch can apparently throw the error message seen above. ?I assume that, > once I get the data read into a reasonable form, I can deal with the lack of > annotation files later (in fact, I ought to be able to extract that from > Illumina's own manifest, correct?). ?But the current error is puzzling me > and there is no point (for my project) working with anything other than > bead-level data. ?On the off chance that someone else has seen this before, > I figured I'd try the list. > > Any assistance, suggestions, constructive criticism ("hey why aren't you > using 'otherpackage'!?!"), etc. would be most appreciated. > > Thanks in advance, > > --tim > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
On Wed, Jul 8, 2009 at 2:16 AM, Mark Dunning<mark.dunning at="" gmail.com=""> wrote: > Hi Tim, > > Do you know what scanning software was used to create these bead- level > data? BeadScan or the newer iScan system? I'm wondering if the format > of the files has changed since we wrote readIllumina. When the object > 'dat1' is created in readIllumina it assumes a set number of columns > in the bead-level text files (4,6 or 7) so if the number of columns is > something different then this dat1 object will not be created causing > the function to error. I confirmed with the staff of the data production facility that my files are from BeadScan. I don't yet have a copy of the settings.xml file in use, or changes to it, but I'll get one. I have attached other files suggested by you and Dr. Carey, along with a feeble patch I wrote. The files I have are chipnumber_array_color.(idat|xml|locs|tif), chipnumber_array.txt, and chipnumber.sdf for each chip, along with a Metrics.txt file, a manifest file (Excel, but I converted it to CSV in hopes of turning it into an annotation package), and a targets.txt file which I wrote in the format shown by the example bead-level-data in the vignette. The .txt files with which I am provided have only the columns 'Code', 'Grn', and 'Red' (all with integer-valued contents). If I'm not hosed -- if the .txt and .tif files are enough -- could anyone provide a bit of guidance in terms of where I should start hacking? I'm not averse to monkeying around in the C code but I don't know where I should look first. I did write a simple kludge to read in Infinium two-channel data. It is not clever, just a small patch to readIllumina to deal with the 3-column format I have. Nonetheless it causes the package to inspect the .tif files, putting quite a strain on my pokey laptop. Then an error (and not the one I added as a checkpoint) is thrown: Error in data[, 2] = bgCorrectSingleArray(fg = greenIntensities[[5]], : replacement has length zero I didn't request background correction, for what that's worth. The lack of useful X,Y location information seems to be the culprit here. I am not sure how best to fix this. Files with the extension .locs are provided, but I could not find useful specs on this file format. Am I stymied with regards to accessing the bead-level data? (A presentation by Matt Ritchie at Cambridge hinted that this may be the case. Dr. Carey's reply suggested that perhaps the oft-changing Illumina file formats might also be involved.) I could request that the core facility not default to these proprietary formats, if that is an insurmountable obstacle. Have others found themselves in this situation before? Thanks for any suggestions, --tim -------------- next part -------------- Code Grn Red 10008 106 1847 10008 139 1680 10008 135 1675 10008 52 1315 10008 59 1832 10008 96 1250 10008 65 1314 10008 66 1457 10008 85 1560 -------------- next part -------------- 4321207025_A_Grn.idat 4321207025_A_Grn.locs 4321207025_A_Grn.tif 4321207025_A_Grn.xml 4321207025_A_Red.idat 4321207025_A_Red.locs 4321207025_A_Red.tif 4321207025_A_Red.xml 4321207025_A.txt 4321207025_B_Grn.idat 4321207025_B_Grn.locs 4321207025_B_Grn.tif 4321207025_B_Grn.xml 4321207025_B_Red.idat 4321207025_B_Red.locs 4321207025_B_Red.tif 4321207025_B_Red.xml 4321207025_B.txt 4321207025.sdf files.txt Metrics.txt probe_sequences.csv readIllumina.diff readIllumina.orig.R readIllumina.patched.R targets.txt -------------- next part -------------- R version 2.10.0 Under development (unstable) (2009-06-25 r48836) i686-pc-linux-gnu locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8 [7] LC_PAPER=en_US.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] beadarray_1.13.4 Biobase_2.5.4 sandwich_2.2-1 zoo_1.5-6 [5] Design_2.2-0 survival_2.35-4 Hmisc_3.6-0 loaded via a namespace (and not attached): [1] cluster_1.12.0 grid_2.10.0 hwriter_1.1 lattice_0.17-25 [5] limma_2.19.2 tools_2.10.0
ADD REPLY

Login before adding your answer.

Traffic: 749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6