Dear Community,
as described in this post (C: Appropriate pre-processing pipeline for Human Transcriptome Array HTA 2.0 with o), I would like to plot the distribution for main, antigenomic and intronic probesets in an HTA 2.0 in order to decide on an appropriate expression cutoff to separate expressed from unexpressed probesets.
According to the following type definition of pd.hta.2.0, main probesets are annotated as type 1, antigenomic probesets as type 2 and intronic probesets as type 7:
> dbGetQuery(db(pd.hta.2.0), "select * from type_dict;") type type_id 1 1 main 2 2 Antigenomic background control 3 3 control->affx->bac_spike 4 4 control->affx->polya_spike 5 5 ERCC (External RNA Controls Consortium) step control 6 6 Exonic normalization control (Positive Control) 7 7 Intronic normalization control (Negative Control) 8 8 Positive Control
However, there seems to be a problem with the current version of the pd.hta.2.0 package (version 3.12.1) because when I use affycoretools::getMainProbes(), the only available annotation is type 1 and everything else is annotated with NA despite there being antigenomic probesets (whose transcript cluster id starts with "AFFX").
> z <- getMainProbes("pd.hta.2.0") > table(z$type) 1 67516 > z[z$type %in% 2,] [1] transcript_cluster_id type <0 rows> (or 0-length row.names)
I read in this post (C: problems filtering antigenomic probes from HTA 2.0 , written 5 months ago), that there will be an updated version of the pd.hta.2.0 package (version 3.12.2) where this is fixed and I wondered when it will be released/whether it is possible to get a pre-release version?
I would be very grateful for any help.
Best
Rukeia
The corrected package is available now:
Great, it works now! Thank you very much for your help.