Question

QC probes in HTA2-0

0

Entering edit mode

Guilherme Rocha ▴ 40

@guilherme-rocha-6354

Last seen 7.3 years ago

Hi all,

In the simpleaffy package, there are a series of deprecated functions for picking up QC probes in microarray chips, namely:

getTao(name) getAlpha1(name) getAlpha2(name) getActin3(name) getActinM(name) getActin5(name) getGapdh3(name) getGapdhM(name) getGapdh5(name) getAllQCProbes(name) getBioB(name) getBioC(name) getBioD(name) getCreX(name) getAllSpikeProbes(name) haveQCParams(name)

These functions are deprecated. Is there a guide to how to obtain the QC probesets from Affy chips, for instance, in the new HTA-2.0 chip?

Any help appreciated.

Best,

Guilherme Rocha

affy qc simpleaffy HTA • 2.0k views

ADD COMMENT • link updated 9.4 years ago by James W. MacDonald 66k • written 9.4 years ago by Guilherme Rocha ▴ 40

score 1 · Answer 1 · 2015-03-17

1

Entering edit mode

James W. MacDonald 66k

@james-w-macdonald-5106

Last seen 3 hours ago

United States

The QC probes on the random primer Affy arrays are completely different from what they put on the 3'-biased arrays, and how you get them is different as well, as you have to use the oligo/pdInfo pipeline rather than the affy/makecdfenv pipeline. Normally you would be able to query the pd.hta.2.0 database directly, but we are unfortunately in a period where the pd.hta.2.0 database doesn't have the right data for the probeset types, and since Benilton is refactoring it for the next release, it won't get fixed until April.

Note that this only extends to the part of the database that says what 'type' a probeset is. The mapping of probes to probesets is unaffected by this issue.

However, there is still a way to get these data using the netaffxTranscript.rda that comes with the pd.hta.2.0 package:

> library(pd.hta.2.0)
> load(paste0(path.package("pd.hta.2.0"), "/extdata/netaffxTranscript.rda"))
> annot <- pData(netaffxTranscript)
> table(annot$category)

                additional   control->affx->bac_spike
                       230                          4
       control->affx->ercc  control->affx->ercc->step
                        92                         63
control->affx->polya_spike  control->bgp->antigenomic
                         4                         23
                      main      main///normgene->exon
                     67528                       1465
            normgene->exon           normgene->intron
                       698                        646

The 'Main' probes are those that are intended to measure transcripts, and all these other probesets are intended to be controls of one type or another.

Note that the ordering of this object (annot) will NOT be the same as the ordering of your data, so you have to make sure you get things ordered correctly!

ADD COMMENT • link 9.4 years ago James W. MacDonald 66k

0

Entering edit mode

Thanks, James.

Do you know where to find documentation detailing what these control probesets are?

Also, do I understand correctly that there are probes used to both serve as control AND measure transcripts (main///normgene->exon)?

ADD REPLY • link 9.4 years ago Guilherme Rocha ▴ 40

0

Entering edit mode

The product pdf gives some hints (http://www.affymetrix.com/support/technical/datasheets/hta_array_2_0_datasheet.pdf). And you can look at the names of the probes for more hints

> annot[annot$category == "control->affx->polya_spike",1]
[1] "AFFX-r2-Bs-dap-5_st" "AFFX-r2-Bs-lys-5_st" "AFFX-r2-Bs-phe-5_st"
[4] "AFFX-r2-Bs-thr-5_st"
> annot[annot$category == "control->affx->bac_spike",1]
[1] "AFFX-r2-Ec-bioB-5_at" "AFFX-r2-Ec-bioC-5_at" "AFFX-r2-Ec-bioD-5_at"
[4] "AFFX-r2-P1-cre-5_at"

I don't know of other documentation, and I have just figured things out by poking around.

The 'additional' probesets are a mystery. If you search netaffx, they will helpfully let you know that these are 'additional' probesets. What they are for is beyond me.

The ERCC probes are for using the ERCC spike-ins for normalization. The antigenomic probesets are intended to be background probesets - Affymetrix claim that these sequences don't exist in the human genome.

The normgene->intron probes are supposed to be background as well, being that they are targeting intronic regions. I think Affy somehow forgot that with a random primer you are just as likely to pick up signal from nascent mRNA that have not yet been processed to excise the introns - these probes have an irritating ability to show up in almost all sets of 'top' genes, which of course makes people skittish.

The normgene->exon probes are (AFAICT) always just exon-level probes that get used twice. In other words, say you have Gene X, and there are 12 exon-level (or more correctly, 'probe set region' or PSR probesets) that are aggregated to make the transcript probeset. There may also be a single PSR probeset from that gene that Affy labels as a normgene->exon probeset. So that single PSR probeset is used twice; once when aggregated at the Gene X transcript level, and once more as an individual control probeset.

This is the first I have seen the main///normgene->exon type designation, and Affy seem to want to keep these things a mystery. Searching netaffx brings nothing up, for both a transcript or a probeset search. However, if you look in the netaffxProbeset.rda file that comes with the pd.hta.2.0 package, it appears that these are single PSR probesets (or JUC probesets) that are never aggregated into a larger transcript probeset. So it appears that your supposition is correct; these do seem to be probesets that are both 'main' and 'normgene->exon', all at the same time.

ADD REPLY • link 9.4 years ago James W. MacDonald 66k