SCAN.UPC for Illumina arrays?

0

Entering edit mode

Tim Triche ★ 4.2k

@tim-triche-3561

Last seen 3.6 years ago

United States

I've been running arrays (Affy mouse, Affy human) and RNAseq data (also mouse and human) through UPC() for the past few days, after seeing some nice results from the PNAS paper in comparison to qPCR and fRMA. However, I was hoping that the same aspect of the method which makes it useful for RNAseq (i.e., decoupling background/foreground estimation from platform-parameter estimation) would allow me to use it on Illumina array data. Has anyone tried this? I can read in the appropriate bits (target sequence, length, GC content, blah blah) for Illumina probes as if they were RNAseq data, or perhaps assemble an oligo 'probedesign' package (less palatable) but I was hoping someone else already bit the bullet. Also, I wrote some patches to make the package work better with e.g. TCGA RSEM data, where the count files have an initial header line. (Obviously this was trivial, but there was another interesting opportunity for improvement so I hit that too) Would be interested to hear from the authors/maintainers regarding its utility. Thanks, --t *He that would live in peace and at ease, * *Must not speak all he knows, nor judge all he sees.* Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> [[alternative HTML version deleted]]

RNASeq qPCR affy oligo frma RNASeq qPCR affy oligo frma • 1.7k views

ADD COMMENT • link updated 10.4 years ago by Stephen Piccolo ▴ 590 • written 10.5 years ago by Tim Triche ★ 4.2k

0

Entering edit mode

Mark Dunning ★ 1.1k

@mark-dunning-3319

Last seen 13 months ago

Sheffield, Uk

Hi Tim, The probe sequences for illumina expression arrays are already available in the illuminaHumanV.db pacakges. e.g. illuminaHumanv4PROBESEQUENCE I'd be happy to add other fields if they would be useful for this kind of analysis. Best, Mark On Thu, Nov 14, 2013 at 10:38 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > I've been running arrays (Affy mouse, Affy human) and RNAseq data (also > mouse and human) through UPC() for the past few days, after seeing some > nice results from the PNAS paper in comparison to qPCR and fRMA. However, > I was hoping that the same aspect of the method which makes it useful for > RNAseq (i.e., decoupling background/foreground estimation from > platform-parameter estimation) would allow me to use it on Illumina array > data. Has anyone tried this? > > I can read in the appropriate bits (target sequence, length, GC content, > blah blah) for Illumina probes as if they were RNAseq data, or perhaps > assemble an oligo 'probedesign' package (less palatable) but I was hoping > someone else already bit the bullet. > > Also, I wrote some patches to make the package work better with e.g. TCGA > RSEM data, where the count files have an initial header line. (Obviously > this was trivial, but there was another interesting opportunity for > improvement so I hit that too) Would be interested to hear from the > authors/maintainers regarding its utility. > > Thanks, > --t > > > > *He that would live in peace and at ease, * > *Must not speak all he knows, nor judge all he sees.* > > Benjamin Franklin, Poor Richard's > Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 10.4 years ago Mark Dunning ★ 1.1k

0

Entering edit mode

I've been using the humanV4 package directly for working from IDAT files of HT12v4 chips, as it happens... good to know that all of the bits are already in there. I don't know how this would work with 'oligo', but it seems like it should be relatively straightforward to get the length (always 50bp) and the gc content (from the probe sequence) then map them to hg19 (again from sequence). Essentially I just like having various expression assays (affy, HuEx, Illumina, RNAseq) comparable for purposes of maximizing statistical power. UPC seems to do a decent job of that, so I'm looking to expand its utility further. Thanks again for packaging up these annotations! --t *He that would live in peace and at ease, * *Must not speak all he knows, nor judge all he sees.* Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> On Fri, Nov 15, 2013 at 12:25 AM, Mark Dunning <mark.dunning@gmail.com>wrote: > Hi Tim, > > The probe sequences for illumina expression arrays are already available > in the illuminaHumanV.db pacakges. e.g. > > illuminaHumanv4PROBESEQUENCE > > I'd be happy to add other fields if they would be useful for this kind of > analysis. > > Best, > > Mark > > > On Thu, Nov 14, 2013 at 10:38 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > >> I've been running arrays (Affy mouse, Affy human) and RNAseq data (also >> mouse and human) through UPC() for the past few days, after seeing some >> nice results from the PNAS paper in comparison to qPCR and fRMA. However, >> I was hoping that the same aspect of the method which makes it useful for >> RNAseq (i.e., decoupling background/foreground estimation from >> platform-parameter estimation) would allow me to use it on Illumina array >> data. Has anyone tried this? >> >> I can read in the appropriate bits (target sequence, length, GC content, >> blah blah) for Illumina probes as if they were RNAseq data, or perhaps >> assemble an oligo 'probedesign' package (less palatable) but I was hoping >> someone else already bit the bullet. >> >> Also, I wrote some patches to make the package work better with e.g. TCGA >> RSEM data, where the count files have an initial header line. (Obviously >> this was trivial, but there was another interesting opportunity for >> improvement so I hit that too) Would be interested to hear from the >> authors/maintainers regarding its utility. >> >> Thanks, >> --t >> >> >> >> *He that would live in peace and at ease, * >> *Must not speak all he knows, nor judge all he sees.* >> >> >> Benjamin Franklin, Poor Richard's >> Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]

ADD REPLY • link 10.4 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Stephen Piccolo ▴ 590

@stephen-piccolo-6761

Last seen 3.6 years ago

United States

Hi Tim, Thanks for your message. Sorry for the delay in getting back to you. Several other people have also requested support for Illumina expression arrays. We are working to address this soon. However, rather than provide a function that is specific to Illumina arrays, we will likely provide a more generic UPC function that users can invoke for whatever platform they wish. Optionally, the user could indicate the GC content for each probe/gene on whichever platform they are using. So it sounds like this information could be extracted pretty easily from the illuminaHumanV.db packages, as Mark mentioned. I'm planning to work on this functionality soon and will add it to the devel version of the package. I'll let you know as soon as I have it ready for testing. I'll also see if I can put together a little primer that demonstrates how to invoke this on Illumina arrays specifically. As far as adding an option to specifically handle TCGA/RSEM files, can you give me a little more information about this file format. Is it mostly just used within TCGA? Or is it used more broadly in other contexts? I wouldn't be opposed to adding a helper function for handling these files, but am hesitant about opening this up for many different formats. Regards, -Steve Date: Fri, 15 Nov 2013 08:25:28 +0000 From: Mark Dunning <mark.dunning@gmail.com> To: "Tim Triche, Jr." <ttriche at="" usc.edu=""> Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: Re: [BioC] SCAN.UPC for Illumina arrays? Message-ID: <cagpynukk721_7e52mbygp=slk1okp=cww3-cftyysf-ncxwhtw at="" mail.gmail.com=""> Content-Type: text/plain Hi Tim, The probe sequences for illumina expression arrays are already available in the illuminaHumanV.db pacakges. e.g. illuminaHumanv4PROBESEQUENCE I'd be happy to add other fields if they would be useful for this kind of analysis. Best, Mark On Thu, Nov 14, 2013 at 10:38 PM, Tim Triche, Jr. <tim.triche at="" gmail.com="">wrote: >I've been running arrays (Affy mouse, Affy human) and RNAseq data (also >mouse and human) through UPC() for the past few days, after seeing some >nice results from the PNAS paper in comparison to qPCR and fRMA. However, >I was hoping that the same aspect of the method which makes it useful for >RNAseq (i.e., decoupling background/foreground estimation from >platform-parameter estimation) would allow me to use it on Illumina array >data. Has anyone tried this? > >I can read in the appropriate bits (target sequence, length, GC content, >blah blah) for Illumina probes as if they were RNAseq data, or perhaps >assemble an oligo 'probedesign' package (less palatable) but I was hoping >someone else already bit the bullet. > >Also, I wrote some patches to make the package work better with e.g. TCGA >RSEM data, where the count files have an initial header line. (Obviously >this was trivial, but there was another interesting opportunity for >improvement so I hit that too) Would be interested to hear from the >authors/maintainers regarding its utility. > >Thanks, >--t

ADD COMMENT • link 10.4 years ago Stephen Piccolo ▴ 590

0

Entering edit mode

It looks like RSEM is rapidly becoming standard for RNAseq summarization (it helps that one of the Cufflinks authors, Lior Pachter, has publicly stated that RSEM's tpm estimates are what people should use in place of raw counts as a posterior estimate of how many fragments "come from" a transcript). See eg. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE41658 The files don't always have headers but they always have additional columns, which is the root of the fix I sent for merging RNAseq annotations with RNAseq counts. The two required changes were to specify which columns to keep in the merge, and how many lines (default 0) to skip when reading them in. There are some other fiddly bits with TCGA data (probably one of the largest human RNAseq datasets in existence at the moment) involving the way UCSC GAF annotations are specified for them. Short of re-counting features or dropping unannotated transcripts, the best fix for that involves parsing the GAF. Thanks for making the package available! *He that would live in peace and at ease, * *Must not speak all he knows, nor judge all he sees.* Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> On Fri, Nov 15, 2013 at 10:56 AM, Steve Piccolo < stephen.piccolo@hsc.utah.edu> wrote: > Hi Tim, > > Thanks for your message. Sorry for the delay in getting back to you. > > Several other people have also requested support for Illumina expression > arrays. We are working to address this soon. However, rather than provide > a function that is specific to Illumina arrays, we will likely provide a > more generic UPC function that users can invoke for whatever platform they > wish. Optionally, the user could indicate the GC content for each > probe/gene on whichever platform they are using. So it sounds like this > information could be extracted pretty easily from the illuminaHumanV.db > packages, as Mark mentioned. I'm planning to work on this functionality > soon and will add it to the devel version of the package. I'll let you > know as soon as I have it ready for testing. I'll also see if I can put > together a little primer that demonstrates how to invoke this on Illumina > arrays specifically. > > As far as adding an option to specifically handle TCGA/RSEM files, can you > give me a little more information about this file format. Is it mostly > just used within TCGA? Or is it used more broadly in other contexts? I > wouldn't be opposed to adding a helper function for handling these files, > but am hesitant about opening this up for many different formats. > > Regards, > -Steve > > > > Date: Fri, 15 Nov 2013 08:25:28 +0000 > From: Mark Dunning <mark.dunning@gmail.com> > To: "Tim Triche, Jr." <ttriche@usc.edu> > Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> > Subject: Re: [BioC] SCAN.UPC for Illumina arrays? > Message-ID: > <cagpynukk721_7e52mbygp=slk1okp=> cww3-CfTyYsf-NCXwhtw@mail.gmail.com> > Content-Type: text/plain > > Hi Tim, > > The probe sequences for illumina expression arrays are already available in > the illuminaHumanV.db pacakges. e.g. > > illuminaHumanv4PROBESEQUENCE > > I'd be happy to add other fields if they would be useful for this kind of > analysis. > > Best, > > Mark > > > On Thu, Nov 14, 2013 at 10:38 PM, Tim Triche, Jr. > <tim.triche@gmail.com>wrote: > > >I've been running arrays (Affy mouse, Affy human) and RNAseq data (also > >mouse and human) through UPC() for the past few days, after seeing some > >nice results from the PNAS paper in comparison to qPCR and fRMA. However, > >I was hoping that the same aspect of the method which makes it useful for > >RNAseq (i.e., decoupling background/foreground estimation from > >platform-parameter estimation) would allow me to use it on Illumina array > >data. Has anyone tried this? > > > >I can read in the appropriate bits (target sequence, length, GC content, > >blah blah) for Illumina probes as if they were RNAseq data, or perhaps > >assemble an oligo 'probedesign' package (less palatable) but I was hoping > >someone else already bit the bullet. > > > >Also, I wrote some patches to make the package work better with e.g. TCGA > >RSEM data, where the count files have an initial header line. (Obviously > >this was trivial, but there was another interesting opportunity for > >improvement so I hit that too) Would be interested to hear from the > >authors/maintainers regarding its utility. > > > >Thanks, > >--t > > [[alternative HTML version deleted]]

ADD REPLY • link 10.4 years ago Tim Triche ★ 4.2k

0

Entering edit mode

It would be great to have UPC support for Illumina arrays. The illuminaHumanv4 package also has the hg19 locations in it; 'illuminaHumanv4GENOMICLOCATION'. So it sounds like you have everything you need then. On Fri, Nov 15, 2013 at 6:56 PM, Steve Piccolo <stephen.piccolo@hsc.utah.edu> wrote: > Hi Tim, > > Thanks for your message. Sorry for the delay in getting back to you. > > Several other people have also requested support for Illumina expression > arrays. We are working to address this soon. However, rather than provide > a function that is specific to Illumina arrays, we will likely provide a > more generic UPC function that users can invoke for whatever platform they > wish. Optionally, the user could indicate the GC content for each > probe/gene on whichever platform they are using. So it sounds like this > information could be extracted pretty easily from the illuminaHumanV.db > packages, as Mark mentioned. I'm planning to work on this functionality > soon and will add it to the devel version of the package. I'll let you > know as soon as I have it ready for testing. I'll also see if I can put > together a little primer that demonstrates how to invoke this on Illumina > arrays specifically. > > As far as adding an option to specifically handle TCGA/RSEM files, can you > give me a little more information about this file format. Is it mostly > just used within TCGA? Or is it used more broadly in other contexts? I > wouldn't be opposed to adding a helper function for handling these files, > but am hesitant about opening this up for many different formats. > > Regards, > -Steve > > > > Date: Fri, 15 Nov 2013 08:25:28 +0000 > From: Mark Dunning <mark.dunning@gmail.com> > To: "Tim Triche, Jr." <ttriche@usc.edu> > Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> > Subject: Re: [BioC] SCAN.UPC for Illumina arrays? > Message-ID: > <cagpynukk721_7e52mbygp=slk1okp=> cww3-CfTyYsf-NCXwhtw@mail.gmail.com> > Content-Type: text/plain > > Hi Tim, > > The probe sequences for illumina expression arrays are already available in > the illuminaHumanV.db pacakges. e.g. > > illuminaHumanv4PROBESEQUENCE > > I'd be happy to add other fields if they would be useful for this kind of > analysis. > > Best, > > Mark > > > On Thu, Nov 14, 2013 at 10:38 PM, Tim Triche, Jr. > <tim.triche@gmail.com>wrote: > > >I've been running arrays (Affy mouse, Affy human) and RNAseq data (also > >mouse and human) through UPC() for the past few days, after seeing some > >nice results from the PNAS paper in comparison to qPCR and fRMA. However, > >I was hoping that the same aspect of the method which makes it useful for > >RNAseq (i.e., decoupling background/foreground estimation from > >platform-parameter estimation) would allow me to use it on Illumina array > >data. Has anyone tried this? > > > >I can read in the appropriate bits (target sequence, length, GC content, > >blah blah) for Illumina probes as if they were RNAseq data, or perhaps > >assemble an oligo 'probedesign' package (less palatable) but I was hoping > >someone else already bit the bullet. > > > >Also, I wrote some patches to make the package work better with e.g. TCGA > >RSEM data, where the count files have an initial header line. (Obviously > >this was trivial, but there was another interesting opportunity for > >improvement so I hit that too) Would be interested to hear from the > >authors/maintainers regarding its utility. > > > >Thanks, > >--t > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 10.4 years ago Mark Dunning ★ 1.1k

Login before adding your answer.