Question

Filtering of lowly expressed probes in HTA 2.0 using new pd.hta.2.0 version 3.12.2

1

Entering edit mode

relathman ▴ 20

@relathman-11472

Last seen 6.7 years ago

Germany

Dear Community,

as described in this post (C: Appropriate pre-processing pipeline for Human Transcriptome Array HTA 2.0 with o), I would like to plot the distribution for main, antigenomic and intronic probesets in an HTA 2.0 in order to decide on an appropriate expression cutoff to separate expressed from unexpressed probesets.

According to the following type definition of pd.hta.2.0, main probesets are annotated as type 1, antigenomic probesets as type 2 and intronic probesets as type 7:

> dbGetQuery(db(pd.hta.2.0), "select * from type_dict;")

  type                                              type_id
1    1                                                 main
2    2                       Antigenomic background control
3    3                             control->affx->bac_spike
4    4                           control->affx->polya_spike
5    5 ERCC (External RNA Controls Consortium) step control
6    6      Exonic normalization control (Positive Control)
7    7    Intronic normalization control (Negative Control)
8    8                                     Positive Control

However, there seems to be a problem with the current version of the pd.hta.2.0 package (version 3.12.1) because when I use affycoretools::getMainProbes(), the only available annotation is type 1 and everything else is annotated with NA despite there being antigenomic probesets (whose transcript cluster id starts with "AFFX").

> z <- getMainProbes("pd.hta.2.0")
> table(z$type)
    1
67516
> z[z$type %in% 2,]
[1] transcript_cluster_id type                
<0 rows> (or 0-length row.names)

I read in this post (C: problems filtering antigenomic probes from HTA 2.0 , written 5 months ago), that there will be an updated version of the pd.hta.2.0 package (version 3.12.2) where this is fixed and I wondered when it will be released/whether it is possible to get a pre-release version?

I would be very grateful for any help.

Best

Rukeia

pd.hta.2.0 hta2.0 affycoretools • 1.7k views

ADD COMMENT • link updated 7.6 years ago by James W. MacDonald 68k • written 7.6 years ago by relathman ▴ 20

score 1 · Answer 1 · 2017-08-23

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 9 hours ago

United States

I think this fell through the cracks. The updated pdInfo package has been pushed to the devel server, and is being pushed to the release server. It should appear tomorrow.

ADD COMMENT • link 7.6 years ago James W. MacDonald 68k

0

Entering edit mode

The corrected package is available now:

 table(getMainProbes("pd.hta.2.0")$type)

    1     2     3     4     5     6     7
67516    23     4     4   155   698   646
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default
BLAS: /data/oldR/R-3.4.0/lib64/R/lib/libRblas.so
LAPACK: /data/oldR/R-3.4.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] affycoretools_1.49.4 pd.hta.2.0_3.12.2    DBI_0.7