Question

affycoretools package, HTA 2.0 annotation query

1

Entering edit mode

a.karanikolou ▴ 10

@akaranikolou-20591

Last seen 4.8 years ago

Hi all, I have recently run the HTA 2.0 annotation using the two packages affycoretools and hta20transcriptcluster.db as outlined in the post https://support.bioconductor.org/p/89308/. However, I have run into an error after running the rma() as follows:

> exprs_mainProbes<-getMainProbes(exprsgs, pd.hta.2.0)
Error in switch(level, core = dbGetQuery(con, paste("select distinct core_mps.transcript_cluster_id, type from featureSet inner join core_mps",  : EXPR must be a length 1 vector

Could you shed light into this problem? Thanks in advance.

annotation affy • 1.4k views

ADD COMMENT • link updated 5.0 years ago by James W. MacDonald 65k • written 5.0 years ago by a.karanikolou ▴ 10

score 1 · Answer 1 · 2019-04-23

1

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 21 hours ago

United States

You need to become familiar with the help system, and to check there to ensure you are doing things correctly. From ?getMainProbes, we have:

Usage:

     getMainProbes(input, level = "core")

Arguments:

   input: Either a character string (e.g., "pd.hugene.1.0.st.v1") or a
          FeatureSet object.

   level: The summarization level used when calling rma.

And nowhere in the help page does it say that the second argument should be a pdInfoPackage.

ADD COMMENT • link 5.0 years ago James W. MacDonald 65k

0

Entering edit mode

Thank you for your help.

ADD REPLY • link 5.0 years ago a.karanikolou ▴ 10

0

Entering edit mode

Is it mandatory to include "level = "core"" in the "getMainProbes(input, level = "core")?

In the "help" menu, the argument "level" refers to "the summarization level used when calling rma". I'm not sure I understand it well on how to apply this argument in getMainProbes.

ADD REPLY • link 5.0 years ago a.karanikolou ▴ 10

0

Entering edit mode

If a function has a default, then by definition it isn't necessary to supply a value for that argument. But perhaps you are actually asking what value you should use? Again, the answer to your question can be found by reading the help page, this time for rma:

Description:

     Robust Multichip Average preprocessing methodology. This strategy
     allows background subtraction, quantile normalization and
     summarization (via median-polish).

Usage:

     ## S4 method for signature 'ExonFeatureSet'
     rma(object, background=TRUE, normalize=TRUE, subset=NULL, target="core")
       ## S4 method for signature 'HTAFeatureSet'
     rma(object, background=TRUE, normalize=TRUE, subset=NULL, target="core")

If you didn't specify a target argument for rma, then you by definition used the default, which matches the default for getMainProbes.

ADD REPLY • link 5.0 years ago James W. MacDonald 65k

0

Entering edit mode

Dear James, thank you for your previous reply. We have used the function getMainProbes using the default as in rma, however instead of the 67,528 main probes that should have been retained, the result was 67,480 main probes. What would cause this difference? Thanks in advance.

ADD REPLY • link 4.9 years ago a.karanikolou ▴ 10

0

Entering edit mode

I think your math is off.

> eset <- rma(read.celfiles(filenames = dir(".","gz$")))
> annotation(eset)
[1] "pd.hta.2.0"
> dim(eset)
Features  Samples 
   70523        2 
> con <- db(pd.hta.2.0)
> z <- dbGetQuery(con, "select distinct core_mps.transcript_cluster_id, type from featureSet inner join core_mps using(fsetid);")
> table(z$type)

    1     2     3     4     5     6     7 
67516    23     4     4   155   698   646 
> dim(getMainProbes(eset))
Features  Samples 
   67480        2

So there are 67480 main probesets out of the expected 67516, or 36 missing probesets. Where might those probesets be?

> reps <- z[duplicated(z[,1]) & !is.na(z$type),1]
> table(z[z[,1] %in% reps,2], useNA = "ifany")

   1 <NA> 
  36   36

So there are 36 probesets that are both 'main' and 'NA'. If we look closer, it's sort of weird

> zlst <- lapply(seq(along = reps), function(x) dbGetQuery(con, paste0("select fsetid, core_mps.transcript_cluster_id, type from featureSet inner join core_mps using(fsetid) where core_mps.transcript_cluster_id='", reps[x], "';")))
> zlst[[2]]
    fsetid transcript_cluster_id type
1 19069316       TC01004261.hg.1   NA
2 19069317       TC01004261.hg.1    1
3 19069318       TC01004261.hg.1    1
4 19069319       TC01004261.hg.1    1
5 19069320       TC01004261.hg.1    1
> zlst[[3]]
    fsetid transcript_cluster_id type
1 19072754       TC01005067.hg.1   NA
2 19072755       TC01005067.hg.1   NA
3 19072756       TC01005067.hg.1   NA
4 19072757       TC01005067.hg.1    1
> zlst[[4]]
     fsetid transcript_cluster_id type
1  19112883       TC02002789.hg.1   NA
2  19112884       TC02002789.hg.1   NA
3  19112885       TC02002789.hg.1   NA
4  19112886       TC02002789.hg.1   NA
5  19112887       TC02002789.hg.1   NA
6  19112888       TC02002789.hg.1   NA
7  19112889       TC02002789.hg.1   NA
8  19112890       TC02002789.hg.1    1
9  19112891       TC02002789.hg.1   NA
10 19112892       TC02002789.hg.1   NA
11 19112893       TC02002789.hg.1   NA
12 19112894       TC02002789.hg.1   NA
13 19112895       TC02002789.hg.1    1
14 19112896       TC02002789.hg.1    1
15 19112897       TC02002789.hg.1    1
16 19112898       TC02002789.hg.1    1
17 19112899       TC02002789.hg.1    1

So for whatever reason, Affy is calling at least one probe per probeset an NA type, so the probesets are getting called both main and 'NA' and that is causing them to be excluded. Given that this only affects 36 out of 67528 probesets (and our core won't use these arrays anyway) it has never been worth it to me to find out any more about it than that.

ADD REPLY • link 4.9 years ago James W. MacDonald 65k

0

Entering edit mode

Thank you a lot for this. The response from Affy was that and I quote "67528 Probe sets listed: 44699 Coding and 22829 NonCoding". The categories used as controls are sum up to 2,995. The is no clear understanding why Affy is calling at least one probe per probeset an NA type. But we will take your response and their response for consideration before moving forward with the analysis. Thank you once again.

ADD REPLY • link 4.9 years ago a.karanikolou ▴ 10

0

Entering edit mode

And the other reason I never pursued it any further is that ThermoFisherSigmaAldrichAffyABI is like, not going to help figure it out anyway.

ADD REPLY • link 4.9 years ago James W. MacDonald 65k