oligo and GenericPDInfo
1
0
Entering edit mode
Aedin Culhane ▴ 510
@aedin-culhane-1526
Last seen 24 months ago
United States

Hi

I have an old array that is not annotated within Bioc annotation.  I tried affy and oligo package, but am running into issues applying a non-standard (invariantset) normalization.   We have good bioloigcal reasons to believe that total transcriptional counts should vary in samples, so rma so I want to compare rma to other normalization approaches.

affy::expresso with invariant falls over.

expresso(abatch,bgcorrect.method="none",normalize.method="invariantset",pmcorrect.method="pmonly",summary.method="liwong")

background correction: none
normalization: invariantset
PM/MM correction : pmonly
expression values: liwong
background correcting...done.
normalizing...
Error in smooth.spline(ref[i.set], data[i.set]) :
missing or infinite values in inputs are not allowed

I know the affy package is older, so I tried oligo. I built my own custom pd package. Both my own package and the brainarray MBNI annotation fail. Both of these have class GenericPDInfo, and the vignette and man pages do not address this case in much detail.

After, searching this message board, I use  target="mps1" (not included in the package documentation).  Both oligo::rma or oligo:: fitProbeLevelModel, return a result, however I wish to compare rma or non-standard approaches.  However oligo::summarize will not accept a "GenericFeatureSet" only a matrix or ff_matrix.

eSetTest   <- oligo::summarize(exprs(eNormTest),method="medianpolish", verbose=TRUE)

Error in (function (classes, fdef, mtable)  :
unable to find an inherited method for function 'summarize' for signature '"GenericFeatureSet"'

The following work;

eNormTest <- oligo::normalize(obatch_bgcorrected, method="invariantset", target="mps1")
eSetTest   <- oligo::summarize(exprs(eNormTest),method="medianpolish", verbose=TRUE)

However of course exprs(eNorm) ignore the probe_level information.  Therefore I need to provide probe with the probe level info in the pd.package (MBNI annotation).

I can start reading code of the pd package, Annotationdbi, but am spending more time that I wanted trying to do something very simple.   Maybe I am missing a really easy work around and am wasting time needlessly.

Thanks

Aedin

0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 hour ago
United States

I don't think there is an easy way to do this (where 'easy' means 'a function exists that you can just use'), particularly since rma has everything hard-coded. But it wouldn't be that hard to roll your own. Something like this?

myrma<- function (object, background = TRUE, normalize = TRUE,
subset = NULL, target = "mps1")
{
pmi <- pmindex(object, subset = subset, target = "mps0")
pm0 <- exprs(object)[pmi, , drop = FALSE]
if (background)
pm0 <- backgroundCorrect(pm0, method = "rma")
if (normalize)
pm0 <- normalize(pm0, method = "invariantset")
rownames(pm0) <- as.character(pmi)
targetInfo <- oligo:::getMPSInfo(get(annotation(object)), substr(target,
4, 4), "fid", type = "pm")
exprs <- basicRMA(pm0[as.character(targetInfo$fid), , drop = FALSE], pnVec = targetInfo$man_fsetid, normalize = FALSE,
background = FALSE, verbose = TRUE)
colnames(exprs) <- sampleNames(object)
out <- new("ExpressionSet")
slot(out, "assayData") <- assayDataNew(exprs = exprs)
slot(out, "phenoData") <- phenoData(object)
slot(out, "featureData") <- basicAnnotatedDataFrame(exprs,
byrow = TRUE)
slot(out, "protocolData") <- protocolData(object)
slot(out, "annotation") <- slot(object, "annotation")
if (!validObject(out))
stop("Resulting object is invalid.")
return(out)
}

0
Entering edit mode
Thanks a million Jim oligo:::getMPSInfo is what I looking for. Since its not exported in the NAMESPACE, there is no help. Where can I find some additional info or help, for example a description of mps1, mps0 etc, as I see you use both below. The 'fields' parameter ("fid", etc). Aedin On 7/17/17 13:54, James W. MacDonald [bioc] wrote: > Activity on a post you are following on support.bioconductor.org > <https: support.bioconductor.org=""> > > User James W. MacDonald <https: support.bioconductor.org="" u="" 5106=""/> > wrote Answer: oligo and GenericPDInfo > <https: support.bioconductor.org="" p="" 98173="" #98175="">: > > I don't think there is an easy way to do this (where 'easy' means 'a > function exists that you can just use'), particularly since rma has > everything hard-coded. But it wouldn't be that hard to roll your own. > Something like this? > > myrma<- function (object, background = TRUE, normalize = TRUE, > subset = NULL, target = "mps1") > { > pmi <- pmindex(object, subset = subset, target = "mps0") > pm0 <- exprs(object)[pmi, , drop = FALSE] > if (background) > pm0 <- backgroundCorrect(pm0, method = "rma") > if (normalize) > pm0 <- normalize(pm0, method = "invariantset") > rownames(pm0) <- as.character(pmi) > targetInfo <- oligo:::getMPSInfo(get(annotation(object)), substr(target, > 4, 4), "fid", type = "pm") > exprs <- basicRMA(pm0[as.character(targetInfo$fid), , > drop = FALSE], pnVec = targetInfo$man_fsetid, normalize = FALSE, > background = FALSE, verbose = TRUE) > colnames(exprs) <- sampleNames(object) > out <- new("ExpressionSet") > slot(out, "assayData") <- assayDataNew(exprs = exprs) > slot(out, "phenoData") <- phenoData(object) > slot(out, "featureData") <- basicAnnotatedDataFrame(exprs, > byrow = TRUE) > slot(out, "protocolData") <- protocolData(object) > slot(out, "annotation") <- slot(object, "annotation") > if (!validObject(out)) > stop("Resulting object is invalid.") > return(out) > } > > ------------------------------------------------------------------------ > > Post tags: affy, oligoclasses, oligo package, pdinfobuilder, > annotationdbi > > You may reply via email or visit > A: oligo and GenericPDInfo > -- Aedin Culhane --------------- Department of Biostatistics and Computational Biology,Dana-Farber Cancer Institute Department of Biostatistics, Harvard TH Chan School of Public Health phone: 617 640 8107 email: aedin@jimmy.harvard.edu
0
Entering edit mode

There isn't at present much help. The GenericArray infrastructure is supposed to be a generalization of the code to allow things like the MBNI arrays (and any random future Array that Affy might foist upon us) to automatically be accommodated.

The old school way of doing things was to take all of Affy's library files and then generate tables in the DB that were named according to the data from Affy. As an example:

> dbListTables(db(pd.moex.1.0.st.v1))
[1] "chrom_dict"   "core_mps"     "extended_mps" "featureSet"   "full_mps"
[6] "level_dict"   "mmfeature"    "pmfeature"    "table_info"   "type_dict" 

Where all those xxx_mps tables map the probes to probesets, depending on the summarization level you wanted to use. But this requires each array type to have its own methods, so we have a profusion of methods for e.g., rma:

> showMethods(rma)
Function: rma (package oligo)
object="ExonFeatureSet"
object="ExpressionFeatureSet"
object="GeneFeatureSet"
object="GenericFeatureSet"
object="HTAFeatureSet"
object="SnpCnvFeatureSet"

The tables in a GenericPDInfoPackage are generic:

> dbListTables(con)
[1] "featureSet1" "mmfeature"   "mps1mm"      "mps1pm"      "pmfeature"
[6] "table_info"

and if there are multiple summarization levels, you can simply add more featureSetK/mpsKpm/mpsKmm triplets to define the probe -> probeset mappings. So this stops the profusion of array types and methods, but it adds the inevitable question of 'so how are the mps1 and mps2 targets different for this array?', which will obviously have a profusion of its own, perhaps even worse?

The function getMPSInfo is intended to do a join between the featureSetK and mpsKpm tables (in your case featureSet1 and mps1pm) and return a data.frame that maps the fid (or what I conventionally call the probe ID) to the fsetid and man_fsetid, which are the probeset level IDs, in order to do the summarization.

And the mps0 target is sort of a joke, I imagine. Or maybe it's just a placeholder that has no inherent meaning as yet. The target argument isn't used, so you could put whatever you want and still get the right thing.

> showMethods(pmindex, class = "DBPDInfo", includeDefs=T)
Function: pmindex (package oligo)
object="DBPDInfo"
function (object, ...)
{
.local <- function (object, subset = NULL, target = NULL)
{
if (!is.null(subset))
warning("Subset not implemented (yet). Returning everything.")
tmp <- dbGetQuery(db(object), "select fid from pmfeature")[[1]]
sort(tmp)
}
.local(object, ...)
}

0
Entering edit mode
Thanks I am very glad, I didn't spend the next few hours trying to work this out by myself... I don't think I would have got too far. Thanks for being so responsive. If oligo and brainarrays folks could get together to agree some standards, these great resources would be more accessible to the rest of us ;-)).. I'd happily buy the coffee/beer;-) Aedin On 7/17/17 14:58, James W. MacDonald [bioc] wrote: > Activity on a post you are following on support.bioconductor.org > <https: support.bioconductor.org=""> > > User James W. MacDonald <https: support.bioconductor.org="" u="" 5106=""/> > wrote Comment: oligo and GenericPDInfo > <https: support.bioconductor.org="" p="" 98173="" #98178="">: > > There isn't at present much help. The GenericArray infrastructure is > supposed to be a generalization of the code to allow things like the > MBNI arrays (and any random future Array that Affy might foist upon > us) to automatically be accommodated. > > The old school way of doing things was to take all of Affy's library > files and then generate tables in the DB that were named according to > the data from Affy. As an example: > > > dbListTables(db(pd.moex.1.0.st.v1)) > [1] "chrom_dict" "core_mps" "extended_mps" "featureSet" "full_mps" > [6] "level_dict" "mmfeature" "pmfeature" "table_info" "type_dict" > > Where all those xxx_mps tables map the probes to probesets, depending > on the summarization level you wanted to use. But this requires each > array type to have its own methods, so we have a profusion of methods > for e.g., rma: > > > showMethods(rma) > Function: rma (package oligo) > object="ExonFeatureSet" > object="ExpressionFeatureSet" > object="GeneFeatureSet" > object="GenericFeatureSet" > object="HTAFeatureSet" > object="SnpCnvFeatureSet" > > The tables in a GenericPDInfoPackage are generic: > > > dbListTables(con) > [1] "featureSet1" "mmfeature" "mps1mm" "mps1pm" "pmfeature" > [6] "table_info" > > and if there are multiple summarization levels, you can simply add > more featureSetK/mpsKpm/mpsKmm triplets to define the probe -> > probeset mappings. So this stops the profusion of array types and > methods, but it adds the inevitable question of 'so how are the mps1 > and mps2 targets different for this array?', which will obviously have > a profusion of its own, perhaps even worse? > > The function getMPSInfo is intended to do a join between the > featureSetK and mpsKpm tables (in your case featureSet1 and mps1pm) > and return a data.frame that maps the fid (or what I conventionally > call the probe ID) to the fsetid and man_fsetid, which are the > probeset level IDs, in order to do the summarization. > > And the mps0 target is sort of a joke, I imagine. Or maybe it's just a > placeholder that has no inherent meaning as yet. The target argument > isn't used, so you could put whatever you want and still get the right > thing. > > > showMethods(pmindex, class = "DBPDInfo", includeDefs=T) > Function: pmindex (package oligo) > object="DBPDInfo" > function (object, ...) > { > .local <- function (object, subset = NULL, target = NULL) > { > if (!is.null(subset)) > warning("Subset not implemented (yet). Returning everything.") > tmp <- dbGetQuery(db(object), "select fid from pmfeature")[[1]] > sort(tmp) > } > .local(object, ...) > } > > ------------------------------------------------------------------------ > > Post tags: affy, oligoclasses, oligo package, pdinfobuilder, > annotationdbi > > You may reply via email or visit > C: oligo and GenericPDInfo > -- Aedin Culhane --------------- Department of Biostatistics and Computational Biology,Dana-Farber Cancer Institute Department of Biostatistics, Harvard TH Chan School of Public Health phone: 617 640 8107 email: aedin@jimmy.harvard.edu
0
Entering edit mode
Hi Jim To make your function "happy", I need to specify the oligo version of the functions. backgroundCorrect was calling a limma function, normalize was calling a igraph function. It couldn't find basicAnnotatedDataFrame, as its not exported. When I modified the following lines, it worked pm0 <- oligo::backgroundCorrect(pm0, method = "rma") pm0 <- oligo::normalize(pm0, method = "invariantset") slot(out, "featureData") <- oligo:::basicAnnotatedDataFrame(exprs, byrow = TRUE) However I have a second question, if I set background = FALSE in the function call, it seems to still performs background correction which seems to be performed as part of oligo:::normalize. What background correction is performed by oligo::normalize and can I modify it? Thanks Aedin On 7/17/17 13:54, James W. MacDonald [bioc] wrote: > Activity on a post you are following on support.bioconductor.org > <https: support.bioconductor.org=""> > > User James W. MacDonald <https: support.bioconductor.org="" u="" 5106=""/> > wrote Answer: oligo and GenericPDInfo > <https: support.bioconductor.org="" p="" 98173="" #98175="">: > > I don't think there is an easy way to do this (where 'easy' means 'a > function exists that you can just use'), particularly since rma has > everything hard-coded. But it wouldn't be that hard to roll your own. > Something like this? > > myrma<- function (object, background = TRUE, normalize = TRUE, > subset = NULL, target = "mps1") > { > pmi <- pmindex(object, subset = subset, target = "mps0") > pm0 <- exprs(object)[pmi, , drop = FALSE] > if (background) > pm0 <- backgroundCorrect(pm0, method = "rma") > if (normalize) > pm0 <- normalize(pm0, method = "invariantset") > rownames(pm0) <- as.character(pmi) > targetInfo <- oligo:::getMPSInfo(get(annotation(object)), substr(target, > 4, 4), "fid", type = "pm") > exprs <- basicRMA(pm0[as.character(targetInfo$fid), , > drop = FALSE], pnVec = targetInfo$man_fsetid, normalize = FALSE, > background = FALSE, verbose = TRUE) > colnames(exprs) <- sampleNames(object) > out <- new("ExpressionSet") > slot(out, "assayData") <- assayDataNew(exprs = exprs) > slot(out, "phenoData") <- phenoData(object) > slot(out, "featureData") <- basicAnnotatedDataFrame(exprs, > byrow = TRUE) > slot(out, "protocolData") <- protocolData(object) > slot(out, "annotation") <- slot(object, "annotation") > if (!validObject(out)) > stop("Resulting object is invalid.") > return(out) > } > > ------------------------------------------------------------------------ > > Post tags: affy, oligoclasses, oligo package, pdinfobuilder, > annotationdbi > > You may reply via email or visit > A: oligo and GenericPDInfo > -- Aedin Culhane --------------- Department of Biostatistics and Computational Biology,Dana-Farber Cancer Institute Department of Biostatistics, Harvard TH Chan School of Public Health phone: 617 640 8107 email: aedin@jimmy.harvard.edu
0
Entering edit mode

I don't understand your question. Are you saying that these lines:

    if (normalize)
pm0 <- normalize(pm0, method = "invariantset")


run regardless of normalize being TRUE or FALSE? That doesn't seem likely - a bug of that magnitude (e.g., R totally ignoring that if statement) would have been caught by now.

Or do I misunderstand your question?

0
Entering edit mode
Hi Jim Ignore the last question. I was incorrect Aedin On 7/18/17 11:15, James W. MacDonald [bioc] wrote: > Activity on a post you are following on support.bioconductor.org > <https: support.bioconductor.org=""> > > User James W. MacDonald <https: support.bioconductor.org="" u="" 5106=""/> > wrote Comment: oligo and GenericPDInfo > <https: support.bioconductor.org="" p="" 98173="" #98206="">: > > I don't understand your question. Are you saying that these lines: > > if (normalize) > pm0 <- normalize(pm0, method = "invariantset") > > > run regardless of normalize being TRUE or FALSE? That doesn't seem > likely - a bug of that magnitude (e.g., R totally ignoring that if > statement) would have been caught by now. > > Or do I misunderstand your question? > > ------------------------------------------------------------------------ > > Post tags: affy, oligoclasses, oligo package, pdinfobuilder, > annotationdbi > > You may reply via email or visit > C: oligo and GenericPDInfo > -- Aedin Culhane --------------- Department of Biostatistics and Computational Biology,Dana-Farber Cancer Institute Department of Biostatistics, Harvard TH Chan School of Public Health phone: 617 640 8107 email: aedin@jimmy.harvard.edu