problems filtering antigenomic probes from HTA 2.0
7
2
Entering edit mode
s.munster ▴ 40
@smunster-11308
Last seen 3.9 years ago
USA/Oklahoma City

Hello,

  I am currently working with data from 16 HTA 2.0 microarrays which I have normalized using RMA using the following commands in R:

# create and verify a list of the celfiles before processing
celFiles<-list.celfiles()

# read in celfiles and verify
rawData<-read.celfiles(celFiles)

# for genes, can only pull core probeset summaries, for exons, can pull core, full, or extended
eset<-rma(rawData, target='core')

# connect annotation file to data set
con = db(pd.hta.2.0)

#list the types of probesets in the dataset
dbGetQuery(con, "select * from type_dict;")

 

When I proceed to filtering, I am running into difficulty. I am doing the following:

antig <- dbGetQuery(con, "select core_mps.meta_fsetid from core_mps inner join featureSet on core_mps.fsetid=featureSet.man_fsetid where featureSet.type = '2';")

I do get a list in antig. I would expect the lines to be in the general format of aa000000000.hg.1

Instead I get a list like this:

dbGetQuery(con, "select core_mps.meta_fsetid from core_mps inner join featureSet on core_mps.fsetid=featureSet.man_fsetid where featureSet.type = '2';")
   meta_fsetid
1     18677993
2     18677994
3     18677995
4     18677996
5     18677997
6     18677998
7     18677999
8     18678000
9     18678001
10    18678002
11    18678003
12    18678004
13    18678005
14    18678006
15    18678007
16    18678008
17    18678009
18    18678010
19    18678011
20    18678012
21    18678013
22    18678014
23    18678015

 

Strangely, when I look at all the columns in core_mps, I get:

> dbGetQuery(con, "select * from core_mps inner join featureSet on core_mps.fsetid=featureSet.man_fsetid where featureSet.type = '2';")
   meta_fsetid transcript_cluster_id   fsetid fsetid man_fsetid strand start stop transcript_cluster_id exon_id crosshyb_type level
1     18677993     AFFX-BkGr-GC03_at 18677993   5054   18677993     NA    NA   NA                    NA      NA            NA    NA
2     18677994     AFFX-BkGr-GC04_at 18677994   5055   18677994     NA    NA   NA                    NA      NA            NA    NA
3     18677995     AFFX-BkGr-GC05_at 18677995   5056   18677995     NA    NA   NA                    NA      NA            NA    NA
4     18677996     AFFX-BkGr-GC06_at 18677996   5057   18677996     NA    NA   NA                    NA      NA            NA    NA
5     18677997     AFFX-BkGr-GC07_at 18677997   5058   18677997     NA    NA   NA                    NA      NA            NA    NA
6     18677998     AFFX-BkGr-GC08_at 18677998   5059   18677998     NA    NA   NA                    NA      NA            NA    NA
7     18677999     AFFX-BkGr-GC09_at 18677999   5060   18677999     NA    NA   NA                    NA      NA            NA    NA
8     18678000     AFFX-BkGr-GC10_at 18678000   5061   18678000     NA    NA   NA                    NA      NA            NA    NA
9     18678001     AFFX-BkGr-GC11_at 18678001   5062   18678001     NA    NA   NA                    NA      NA            NA    NA
10    18678002     AFFX-BkGr-GC12_at 18678002   5063   18678002     NA    NA   NA                    NA      NA            NA    NA
11    18678003     AFFX-BkGr-GC13_at 18678003   5064   18678003     NA    NA   NA                    NA      NA            NA    NA
12    18678004     AFFX-BkGr-GC14_at 18678004   5065   18678004     NA    NA   NA                    NA      NA            NA    NA
13    18678005     AFFX-BkGr-GC15_at 18678005   5066   18678005     NA    NA   NA                    NA      NA            NA    NA
14    18678006     AFFX-BkGr-GC16_at 18678006   5067   18678006     NA    NA   NA                    NA      NA            NA    NA
15    18678007     AFFX-BkGr-GC17_at 18678007   5068   18678007     NA    NA   NA                    NA      NA            NA    NA
16    18678008     AFFX-BkGr-GC18_at 18678008   5069   18678008     NA    NA   NA                    NA      NA            NA    NA
17    18678009     AFFX-BkGr-GC19_at 18678009   5070   18678009     NA    NA   NA                    NA      NA            NA    NA
18    18678010     AFFX-BkGr-GC20_at 18678010   5071   18678010     NA    NA   NA                    NA      NA            NA    NA
19    18678011     AFFX-BkGr-GC21_at 18678011   5072   18678011     NA    NA   NA                    NA      NA            NA    NA
20    18678012     AFFX-BkGr-GC22_at 18678012   5073   18678012     NA    NA   NA                    NA      NA            NA    NA
21    18678013     AFFX-BkGr-GC23_at 18678013   5074   18678013     NA    NA   NA                    NA      NA            NA    NA
22    18678014     AFFX-BkGr-GC24_at 18678014   5075   18678014     NA    NA   NA                    NA      NA            NA    NA
23    18678015     AFFX-BkGr-GC25_at 18678015   5076   18678015     NA    NA   NA                    NA      NA            NA    NA
   junction_start_edge junction_stop_edge junction_sequence has_cds chrom type
1                   NA                 NA              <NA>      NA    NA    2
2                   NA                 NA              <NA>      NA    NA    2
3                   NA                 NA              <NA>      NA    NA    2
4                   NA                 NA              <NA>      NA    NA    2
5                   NA                 NA              <NA>      NA    NA    2
6                   NA                 NA              <NA>      NA    NA    2
7                   NA                 NA              <NA>      NA    NA    2
8                   NA                 NA              <NA>      NA    NA    2
9                   NA                 NA              <NA>      NA    NA    2
10                  NA                 NA              <NA>      NA    NA    2
11                  NA                 NA              <NA>      NA    NA    2
12                  NA                 NA              <NA>      NA    NA    2
13                  NA                 NA              <NA>      NA    NA    2
14                  NA                 NA              <NA>      NA    NA    2
15                  NA                 NA              <NA>      NA    NA    2
16                  NA                 NA              <NA>      NA    NA    2
17                  NA                 NA              <NA>      NA    NA    2
18                  NA                 NA              <NA>      NA    NA    2
19                  NA                 NA              <NA>      NA    NA    2
20                  NA                 NA              <NA>      NA    NA    2
21                  NA                 NA              <NA>      NA    NA    2
22                  NA                 NA              <NA>      NA    NA    2
23                  NA                 NA              <NA>      NA    NA    2
>

But when I just query core_mps by itself, I get a list with the terms in a very different format.

 dbGetQuery(con, "select * from core_mps limit 10")
       meta_fsetid transcript_cluster_id   fsetid
1  TC01000001.hg.1       TC01000001.hg.1 19021059
2  TC01000001.hg.1       TC01000001.hg.1 19021060
3  TC01000001.hg.1       TC01000001.hg.1 19021061
4  TC01000001.hg.1       TC01000001.hg.1 19021062
5  TC01000001.hg.1       TC01000001.hg.1 19021063
6  TC01000002.hg.1       TC01000002.hg.1 19021064
7  TC01000002.hg.1       TC01000002.hg.1 19021065
8  TC01000002.hg.1       TC01000002.hg.1 19021066
9  TC01000002.hg.1       TC01000002.hg.1 19021067
10 TC01000002.hg.1       TC01000002.hg.1 19021068

 

I realize that in the "dbGetQuery(con, "select * from core_mps inner join featureSet on core_mps.fsetid=featureSet.man_fsetid where featureSet.type = '2';")" example I am looking for just the type 2 probesets (the antigenomic ones), but how do I get it to output a list of the meta_fsetid's that are not just 8 digit numbers?  For the list to match what is in my RMA file from my microarray CEL files, I need it to output a list of the meta_fsetid's like those listed above, when I did "dbGetQuery(con, "select * from core_mps limit 10")".

 

My R sessionInfo is:

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252   

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
 [1] limma_3.28.17       pd.hta.2.0_3.12.1   RSQLite_1.0.0       DBI_0.5             oligo_1.36.1        Biostrings_2.40.2 
 [7] XVector_0.12.1      IRanges_2.6.1       S4Vectors_0.10.2    Biobase_2.32.0      oligoClasses_1.34.0 BiocGenerics_0.18.0

loaded via a namespace (and not attached):
 [1] affxparser_1.44.0          GenomicRanges_1.24.2       splines_3.3.1              zlibbioc_1.18.0          
 [5] bit_1.1-12                 foreach_1.4.3              GenomeInfoDb_1.8.3         tools_3.3.1              
 [9] SummarizedExperiment_1.2.3 ff_2.2-13                  iterators_1.0.8            preprocessCore_1.34.0    
[13] affyio_1.42.0              codetools_0.2-14           BiocInstaller_1.22.3     

 

Much thanks!

Susan Munster, Research Geneticist

Functional Genomics Group

Civil Aerospace Medical Institute, AAM-612

6500 S. MacArthur Blvd.

Oklahoma City OK  73169

405-954-8631

susan.munster@faa.gov

 

pd.hta.2.0 R • 3.6k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 12 hours ago
United States

You are making an (unwarranted, as it happens) assumption that the probeset IDs all have the same format, similar to the first 10 probeset IDs. They don't actually have the same format for their probeset IDs, primarily I think because the antigenomic probesets are pretty much the same across all the Exon ST, Gene ST and HTA arrays.

These probesets are based on probes with varying GC content that are not expected to be found in any species. Given a set of such probes, the smart play is to just recycle across all arrays that you are planning to use them on, because why would you do anything else? And if you were planning to recycle, would you give them different probeset IDs for each array, or would you just recycle the IDs as well?

Since these were first developed (IIRC) for the Exon arrays, which have numeric probeset IDs, the antigenomic probesets on the HTA arrays are numeric as well, which as you have noted is not the same as all the other probesets. Do note that there are still some AFFX probesets floating around as well, which are relics from the bygone era of the 3'-biased arrays.
 

ADD COMMENT
0
Entering edit mode
s.munster ▴ 40
@smunster-11308
Last seen 3.9 years ago
USA/Oklahoma City

Thanks for your quick response.  The problem I am having (my apologies for not being clear enough) is that I am attempting to use GetQuery to make a list using core_mps and featureSet to make a list of the meta_fsetid's for the antigenomic probes so that I can filter for those in my samples.  When I use the line of code:

antig <- dbGetQuery(con, "select core_mps.meta_fsetid from core_mps inner join featureSet on core_mps.fsetid=featureSet.man_fsetid where featureSet.type = '2';")

I don't get a list of the meta_fsetid's, I get a list of the fsetid's which don't match anything in my sample set.  I can't seem to figure out how to do this query where I join core_mps to featureSet and output a list of the meta_fsetid's from it.  Any suggestions would be greatly appreciated.

ADD COMMENT
0
Entering edit mode

To comment on a post, please use the ADD COMMENT button and type in the dialog box that pops up. Using the answer box is confusing for future readers, as you aren't actually answering a question.

Anyway, to answer your question, consider the following.

> con <- db(pd.hta.2.0)
> antiprbs <- dbGetQuery(con, "select * from featureSet where type='2';")
> coremps <- dbGetQuery(con, "select * from core_mps;") 
> any(antiprbs$fsetid %in% coremps$fsetid)
 [1] FALSE                                   

So none of the antigenomic probes actually get summarized into probesets at the transcript summary level. In fact, only main and NA type probesets (which are like, mysterious and stuff! If you search for them on NetAffx, no results are returned...) get summarized. Using getMainProbes from my affycoretools package:

> z <- getMainProbes("pd.hta.2.0") 
> table(z$type)
     1
 67516
> sum(is.na(z$type))
 [1] 3151 
> head(z)
   meta_fsetid type
 1    18670005   NA
 2    18670007   NA
 3    18670009   NA
 4    18670011   NA         

 

 

ADD REPLY
0
Entering edit mode
s.munster ▴ 40
@smunster-11308
Last seen 3.9 years ago
USA/Oklahoma City

James,

  Thanks for getting back to me.  I am still trying to resolve this.  After I perform oligo::rma to get my eset and then write my data to a text file using write.exprs(eset, file="eset.txt"), I get an output file with the following characteristics:

 

It has 70523 lines of data for probesets in total.

 

I have 1344 lines of data starting with identifiers like "2824546_st" (this is the first one)

There are 23 lines of data starting with identifiers like "AFFX-BkGr_GC03_at" (this is the first one)

     This one, because of the AFFX-BkGr and the fact that there are 23 of them makes me think that these are the antigenomic background control probesets.

There are 155 lines of data starting with identifiers like "ERCC-0002_st" (this is the first one)

     I was wondering if these were some of the ERCC probes?

There is a single line "gi312147440_st"

There are 28 lines of data with identifiers like "JUC01000985.hg.1" (this is the first one)

There are 1437 lines of data with identifiers like "PSR01001649.hg.1"

The remaining 67528 lines of data have identifiers like "TC0100002.hg.1"

 

The last group, 67, 528 lines of data with identifiers like "TC0100002.hg.1", is very close in number to the expected number of core probesets from the code you had above.

 

    I haven't been able to find any of these identifiers in the main probesets when I search like this:

           any("JUC01000985.hg.1" %in% coremps$meta_fsetid"), where main is a variable containing the data for all the core (type = 1) probesets.  I have been able to find some of them (2824546_st, ERCC-00002_st, JUC01000985.hg.1, PSR01001649.hg.1, and TC01000002.hg.1) if I search using:

           any("JUC01000985.hg.1" %in% coremps$transcript_cluster_id)

So if the probesets I was able to find in transcript_cluster_id, that would mean that I have 70,492 main probesets.  This would certainly be more than I was expecting.

Can you tell me what the other lines of data are?  I am concerned that when I did oligo::rma, it did not remove all of the antigenomic and control probesets.  Am I doing something incorrectly? 

Thanks again for your assistance!

 

 

 

ADD COMMENT
0
Entering edit mode
s.munster ▴ 40
@smunster-11308
Last seen 3.9 years ago
USA/Oklahoma City

Sorry, forgot to add in my R code:

Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> source("https://bioconductor.org/biocLite.R")
Bioconductor version 3.4 (BiocInstaller 1.24.0), ?biocLite for help
> biocLite("pd.hta.2.0")
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.4 (BiocInstaller 1.24.0), R 3.3.1 (2016-06-21).
Installing package(s) ‘pd.hta.2.0’
installing the source package ‘pd.hta.2.0’

trying URL 'https://bioconductor.org/packages/3.4/data/annotation/src/contrib/pd.hta.2.0_3.12.1.tar.gz'
Content type 'application/x-gzip' length 367042108 bytes (350.0 MB)
downloaded 350.0 MB

* installing *source* package 'pd.hta.2.0' ...
** R
** data
** inst
** preparing package for lazy loading
Warning: package 'Biostrings' was built under R version 3.3.2
Warning: package 'S4Vectors' was built under R version 3.3.2
Warning: package 'RSQLite' was built under R version 3.3.2
Warning: package 'DBI' was built under R version 3.3.2
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
Warning: package 'Biostrings' was built under R version 3.3.2
Warning: package 'S4Vectors' was built under R version 3.3.2
Warning: package 'RSQLite' was built under R version 3.3.2
Warning: package 'DBI' was built under R version 3.3.2
*** arch - x64
Warning: package 'Biostrings' was built under R version 3.3.2
Warning: package 'S4Vectors' was built under R version 3.3.2
Warning: package 'RSQLite' was built under R version 3.3.2
Warning: package 'DBI' was built under R version 3.3.2
* DONE (pd.hta.2.0)

The downloaded source packages are in
        ‘C:\Users\Williams\AppData\Local\Temp\Rtmpgr67Hn\downloaded_packages’
> library(pd.hta.2.0)
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport,
    clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply,
    parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colnames, do.call, duplicated,
    eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
    lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
    pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Loading required package: RSQLite
Loading required package: oligoClasses
Welcome to oligoClasses version 1.36.0
Loading required package: oligo
Loading required package: Biobase
Welcome to Bioconductor

 

ADD COMMENT
0
Entering edit mode
s.munster ▴ 40
@smunster-11308
Last seen 3.9 years ago
USA/Oklahoma City

    Vignettes contain introductory material; view with 'browseVignettes()'. To
    cite Bioconductor, see 'citation("Biobase")', and for packages
    'citation("pkgname")'.

===========================================================================================
Welcome to oligo version 1.38.0
===========================================================================================
Loading required package: DBI
Warning messages:
1: package ‘Biostrings’ was built under R version 3.3.2 
2: package ‘S4Vectors’ was built under R version 3.3.2 
3: package ‘RSQLite’ was built under R version 3.3.2 
4: package ‘DBI’ was built under R version 3.3.2 
> library(oligo)
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] pd.hta.2.0_3.12.1    DBI_0.5-1            oligo_1.38.0         Biobase_2.34.0      
 [5] oligoClasses_1.36.0  RSQLite_1.1-2        Biostrings_2.42.1    XVector_0.14.0      
 [9] IRanges_2.8.1        S4Vectors_0.12.1     BiocGenerics_0.20.0  BiocInstaller_1.24.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.9                affxparser_1.46.0          splines_3.3.1             
 [4] GenomicRanges_1.26.2       zlibbioc_1.20.0            bit_1.1-12                
 [7] lattice_0.20-34            foreach_1.4.3              GenomeInfoDb_1.10.3       
[10] tools_3.3.1                SummarizedExperiment_1.4.0 grid_3.3.1                
[13] ff_2.2-13                  iterators_1.0.8            digest_0.6.12             
[16] preprocessCore_1.36.0      affyio_1.44.0              Matrix_1.2-8              
[19] codetools_0.2-15           bitops_1.0-6               RCurl_1.95-4.8            
[22] memoise_1.0.0             
> getwd()
[1] "C:/Users/Munster"
> setwd("C:/Users/Munster/Comparison Study/microarrays HTA2.0/")
> celFiles<-list.celfiles()
> rawData<-read.celfiles(celFiles)
Platform design info loaded.
Reading in : 1000 Blood 500 Brain A_Affy_(HTA-2_0).CEL
Reading in : 1000 Blood 500 Brain A_Nugen_(HTA-2_0).CEL
Reading in : 1000 Blood 500 Brain B_Affy_(HTA-2_0).CEL
Reading in : 1000 Blood 500 Brain B_Nugen_(HTA-2_0).CEL
Reading in : 1500 Blood A_Affy_(HTA-2_0).CEL
Reading in : 1500 Blood A_Nugen_(HTA-2_0).CEL
Reading in : 1500 Blood B_Affy_(HTA-2_0)_2.CEL
Reading in : 1500 Blood B_Nugen_(HTA-2_0).CEL
Reading in : 1500 Brain A_Affy_(HTA-2_0).CEL
Reading in : 1500 Brain A_Nugen_(HTA-2_0).CEL
Reading in : 1500 Brain B_Affy_(HTA-2_0)_2.CEL
Reading in : 1500 Brain B_Nugen_(HTA-2_0).CEL
Reading in : 500 Blood 1000 Brain A_Affy_(HTA-2_0).CEL
Reading in : 500 Blood 1000 Brain A_Nugen_(HTA-2_0).CEL
Reading in : 500 Blood 1000 Brain B_Affy_(HTA-2_0).CEL
Reading in : 500 Blood 1000 Brain B_Nugen_(HTA-2_0).CEL
>

ADD COMMENT
0
Entering edit mode
s.munster ▴ 40
@smunster-11308
Last seen 3.9 years ago
USA/Oklahoma City

> rawData
HTAFeatureSet (storageMode: lockedEnvironment)
assayData: 6892960 features, 16 samples 
  element names: exprs 
protocolData
  rowNames: 1000 Blood 500 Brain A_Affy_(HTA-2_0).CEL 1000 Blood 500 Brain
    A_Nugen_(HTA-2_0).CEL ... 500 Blood 1000 Brain B_Nugen_(HTA-2_0).CEL (16
    total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: 1000 Blood 500 Brain A_Affy_(HTA-2_0).CEL 1000 Blood 500 Brain
    A_Nugen_(HTA-2_0).CEL ... 500 Blood 1000 Brain B_Nugen_(HTA-2_0).CEL (16
    total)
  varLabels: index
  varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.hta.2.0 
> eset<-oligo::rma(rawData, target='core')
Background correcting
Normalizing
Calculating Expression
> eset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 70523 features, 16 samples 
  element names: exprs 
protocolData
  rowNames: 1000 Blood 500 Brain A_Affy_(HTA-2_0).CEL 1000 Blood 500 Brain
    A_Nugen_(HTA-2_0).CEL ... 500 Blood 1000 Brain B_Nugen_(HTA-2_0).CEL (16
    total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: 1000 Blood 500 Brain A_Affy_(HTA-2_0).CEL 1000 Blood 500 Brain
    A_Nugen_(HTA-2_0).CEL ... 500 Blood 1000 Brain B_Nugen_(HTA-2_0).CEL (16
    total)
  varLabels: index
  varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.hta.2.0 
> write.exprs(eset, file="eset17Feb.txt")
> con=db(pd.hta.2.0)
> dbGetQuery(con, "select * from type_dict;")
  type                                              type_id
1    1                                                 main
2    2                       Antigenomic background control
3    3                             control->affx->bac_spike
4    4                           control->affx->polya_spike
5    5 ERCC (External RNA Controls Consortium) step control
6    6      Exonic normalization control (Positive Control)
7    7    Intronic normalization control (Negative Control)
8    8                                     Positive Control
> table(dbGetQuery(con, "select type from featureSet;"))

     1      2      3      4      5      6      7      8 
911590     23     18     39    247   1626   3432   3648 
>

ADD COMMENT
0
Entering edit mode
s.munster ▴ 40
@smunster-11308
Last seen 3.9 years ago
USA/Oklahoma City

> antiprbs <- dbGetQuery(con, "select * from featureSet where type='2';")
> coremps <- dbGetQuery(con, "select * from core_mps;")
> any(antiprbs$fsetid %in% coremps$fsetid)
[1] FALSE
> z <- getMainProbes("pd.hta.2.0")
Error: could not find function "getMainProbes"
> library(affycoretools)


Warning message:
package ‘affycoretools’ was built under R version 3.3.2 
> z <- getMainProbes("pd.hta.2.0")
> table(z$type)

    1 
67516 
> head(antiprbs)
  fsetid man_fsetid strand start stop transcript_cluster_id exon_id crosshyb_type level
1   5054   18677993     NA    NA   NA                    NA      NA            NA    NA
2   5055   18677994     NA    NA   NA                    NA      NA            NA    NA
3   5056   18677995     NA    NA   NA                    NA      NA            NA    NA
4   5057   18677996     NA    NA   NA                    NA      NA            NA    NA
5   5058   18677997     NA    NA   NA                    NA      NA            NA    NA
6   5059   18677998     NA    NA   NA                    NA      NA            NA    NA
  junction_start_edge junction_stop_edge junction_sequence has_cds chrom type
1                  NA                 NA              <NA>      NA    NA    2
2                  NA                 NA              <NA>      NA    NA    2
3                  NA                 NA              <NA>      NA    NA    2
4                  NA                 NA              <NA>      NA    NA    2
5                  NA                 NA              <NA>      NA    NA    2
6                  NA                 NA              <NA>      NA    NA    2
> head(coremps)
      meta_fsetid transcript_cluster_id   fsetid
1 TC01000001.hg.1       TC01000001.hg.1 19021059
2 TC01000001.hg.1       TC01000001.hg.1 19021060
3 TC01000001.hg.1       TC01000001.hg.1 19021061
4 TC01000001.hg.1       TC01000001.hg.1 19021062
5 TC01000001.hg.1       TC01000001.hg.1 19021063
6 TC01000002.hg.1       TC01000002.hg.1 19021064
 

ADD COMMENT
0
Entering edit mode

OK, so don't do that. If you are over the limit of what the support site will allow you, then take the hint and cut it down to something reasonable rather than posting five separate blasts of text. Pretty much nobody has the time to read all that.

Second, I told you last time to please use the ADD COMMENT rather than Add your answer box. This now looks like there are 6-7 answers to a question, only one of which actually is an answer!

Additionally, I will point you back to what I told you six months ago:

"So none of the antigenomic probes actually get summarized into probesets at the transcript summary level."

I don't know how to make that any clearer! You are summarizing at the transcript level, and none of the antigenomic probes exist at that level, so you will not ever be able to find them, no matter what you do. If you want those probesets, summarize at the probeset level!

ADD REPLY
0
Entering edit mode

My apologies for the mistakes in how I posted this.  I understand your statement that "none of the antigenomic probes actually get summarized into probesets at the transcript summary".  I am confused because there are probesets in my summarized (after running rma) transcript summary that look very much like antigenomic probesets.  They don't show up in pd.hta.2.0 but they are in my eset after running oligo::rma

 

For example, I have 23 lines of probesets with names such as "AFFX-BkGr_GC03_at" in my eset

 

What are these if not antigenomic probesets?

ADD REPLY
0
Entering edit mode

You are right - there is a problem with the current pd.hta.2.0 package. The files that are available from Affy don't follow the convention that they used for all the other files of this type, and I thought I had pdInfoBuilder fixed to handle this, but evidently I was mistaken.

So thank you for being persistent - I have finally got this fixed, and have sent the updated version of the package to Val Obenchain, who has the authoritah to push to the download repo. It's version 3.12.2, so when that appears, you can get it. Alternatively, send me your email and I will give you a Dropbox link you can use to get it.

I also made some changes to getMainProbes in affycoretools to do a better job of handling all the different arrays that have (and don't have, more to the point) non-main probes. I pushed those changes to the release repo yesterday, so the updated version should appear in a day or two. You are looking for affycoretools version 1.46.5.

> z <- getMainProbes("pd.hta.2.0")
> table(z$type)

    1     2     3     4     5     6     7
67516    23     4     4   155   698   646
> dbGetQuery(db(pd.hta.2.0), "select * from type_dict;")
  type                                              type_id
1    1                                                 main
2    2                       Antigenomic background control
3    3                             control->affx->bac_spike
4    4                           control->affx->polya_spike
5    5 ERCC (External RNA Controls Consortium) step control
6    6      Exonic normalization control (Positive Control)
7    7    Intronic normalization control (Negative Control)
8    8                                     Positive Control
> z[z$type %in% 2,]
      transcript_cluster_id type
67673     AFFX-BkGr-GC03_at    2
67674     AFFX-BkGr-GC04_at    2
67675     AFFX-BkGr-GC05_at    2
67676     AFFX-BkGr-GC06_at    2
67677     AFFX-BkGr-GC07_at    2
67678     AFFX-BkGr-GC08_at    2
67679     AFFX-BkGr-GC09_at    2
67680     AFFX-BkGr-GC10_at    2
67681     AFFX-BkGr-GC11_at    2
67682     AFFX-BkGr-GC12_at    2
67683     AFFX-BkGr-GC13_at    2
67684     AFFX-BkGr-GC14_at    2
67685     AFFX-BkGr-GC15_at    2
67686     AFFX-BkGr-GC16_at    2
67687     AFFX-BkGr-GC17_at    2
67688     AFFX-BkGr-GC18_at    2
67689     AFFX-BkGr-GC19_at    2
67690     AFFX-BkGr-GC20_at    2
67691     AFFX-BkGr-GC21_at    2
67692     AFFX-BkGr-GC22_at    2
67693     AFFX-BkGr-GC23_at    2
67694     AFFX-BkGr-GC24_at    2
67695     AFFX-BkGr-GC25_at    2
ADD REPLY
0
Entering edit mode

THANKS!!!! I will be looking for the new updates!

ADD REPLY
0
Entering edit mode

The new update worked beautifully! THANKS!!!!

ADD REPLY

Login before adding your answer.

Traffic: 743 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6