Question

Transcript-level SCAN.UPC preprocessing for Affymetrix HTA 2.0 arrays

0

Entering edit mode

Lukas__ • 0

@lukas__-14680

Last seen 6.9 years ago

I have a question concerning SCAN normalization for the Affymetrix HTA 2.0 platform. When I normalize .CEL files from this platform using SCAN.UPC, everything works smoothly. However, the resulting exprs object contains data on the probeset/exon level (analogous to the Gene Expression Omnibus annotation file GPL17585), rather than on the gene/transcript level (as in GPL17586).

How could I use SCAN.UPC to generate transcript-level data? The number of probesets (925032) in the data returned by SCAN is huge, and I would prefer the transcript-level mappings. The annotation information in hta20transcriptcluster.db is on the gene/transcript level.

Thank you for your help!

microarray scan.upc hta2.0 • 2.1k views

ADD COMMENT • link updated 6.9 years ago by Stephen Piccolo ▴ 600 • written 6.9 years ago by Lukas__ • 0

score 1 · Answer 1 · 2017-12-22

1

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 14 hours ago

United States

It seems you can't do that. There is the exonArrayTarget argument, which is supposed to allow you to choose the various summarization levels for the Exon ST arrays, and since the HTA arrays are pretty much just Exon ST arrays, one would think that would work for those as well. However, there is this in the function that processes the data:

if (is.na(exonArrayTarget))
        exonArrayTarget = "probeset"
    if (exonArrayTarget != "probeset")
        stop("Currently, 'probeset' is the only allowed setting for the exonArrayTarget parameter for the Affymetrix HTA 2.0 arrays.")

ADD COMMENT • link 6.9 years ago James W. MacDonald 67k

1

Entering edit mode

I will have to go into the code to remember why that is the case and see if there is a better way to handle it. It might take me a few days (due to the US holidays) to get to it.

Does hta20transcriptcluster.db have probe information?

ADD REPLY • link 6.9 years ago Stephen Piccolo ▴ 600

0

Entering edit mode

What do you mean by 'have probe information'? The hta20transcriptcluster.db package maps the transcript-level probeset IDs to annotation data like Entrez Gene, etc.

The pdInfo package for this array is the pd.hta.2.0 package, which is probably what you are after.

ADD REPLY • link 6.9 years ago James W. MacDonald 67k

0

Entering edit mode

Nice, that would be great!

ADD REPLY • link 6.9 years ago Lukas__ • 0

score 0 · Answer 2 · 2018-01-02

0

Entering edit mode

Stephen Piccolo ▴ 600

@stephen-piccolo-6761

Last seen 4.2 years ago

United States

I apologize for my delay in replying.

I may be misunderstanding what you want to do, but you should be able to solve this by using BrainArray annotations. You can read about this in the SCAN.UPC vignette. BrainArray now has annotations for the HTA 2.0 arrays. You can use the probeSummaryPackage argument in the SCAN (or UPC) function to perform this mapping. It may not map to the same identifiers you see in GPL17586, but there are various options for mapping/summarizing (e.g., Ensembl transcripts, Ensembl genes, Entrez, etc.). Please let me know if this does not address your need.

ADD COMMENT • link 6.9 years ago Stephen Piccolo ▴ 600

0

Entering edit mode

Here's a link to the BrainArray site: http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp#v22

ADD REPLY • link 6.9 years ago Stephen Piccolo ▴ 600

0

Entering edit mode

For all of the random primer based Affymetrix arrays you can summarize at what Affy calls the probe set region (PSR) or the transcript level. The difference being that the PSR is usually around 4 probes that measure some small section of a transcript (roughly exon, but sometimes there are multiple PSRs per exon). For the oligo package this is the 'probeset' summarization level. This is a bit confusing, as for or the Exon ST arrays there was also some different groups (core, extended, full) where you could include PSRs based on more (and more) speculative content. For all Affy arrays that came out after the Exon ST arrays they got rid of this speculative content and you only have the core/probeset summarization and the transcript summarization levels.

The transcript summarization takes all the probes that align to a given transcript and then summarizes them as one big probeset. This is analogous to what the MBNI group does, minus the extra step where they throw out probes that don't meet certain criteria.

Is there a particular reason that SCAN.UPC can handle the MBNI remapped arrays but not the conventional transcript level mappings? For the HTA arrays they only have the cdf packages for the old affy package rather than oligo pdInfo packages, and I believe you need to use their specially modified affy package to use those.

ADD REPLY • link 6.9 years ago James W. MacDonald 67k

0

Entering edit mode

Thanks for the detailed response. There's no reason that SCAN.UPC can't handle these conventional transcript-level mappings. I haven't come across anyone who needed this, though. Have you?

ADD REPLY • link 6.9 years ago Stephen Piccolo ▴ 600