Search
Question: Oligo package: paCalls questions
0
3.4 years ago by
whyuen0
Hong Kong
whyuen0 wrote:

Hi all,

I am trying to make P/A calls at probeset level for Mouse gene ST 2.1 array using Oligo.

Here is what I did for probeset level P/A calls:

celFiles <- list.celfiles()
genePS <- rma(affyRaw, target = "probeset")
genePS_dabg <- paCalls(affyRaw,"PSDABG")
genePS_dabg_call = exprs(genePS_dabg)

But when I ran "genePS_dabg_call = exprs(genePS_dabg)", an error came out. It was written as

Error in (function (classes, fdef, mtable)  :
unable to find an inherited method for function ‘exprs’ for signature ‘"matrix"’

What's the problem here?

Vincent

modified 3.4 years ago by cstrato3.9k • written 3.4 years ago by whyuen0
1
3.4 years ago by
United States
James W. MacDonald48k wrote:

You shouldn't be trying to extract the P/A calls using exprs() in the first place! From ?paCalls:

Value:

A matrix (of dimension dim(PM) if method="DABG" or "MAS5"; of
dimension length(unique(probeNames(object))) x ncol(object) if
method="PSDABG") with p-values for P/A Calls.

You get back a matrix, not an ExpressionSet:

> genePS <- paCalls(dat, "PSDABG")
Computing DABG calls... OK
> class(genePS)
[1] "matrix"
381T-DMSO-1.CEL 381T-DMSO-3.CEL 381T-DMSO-5.CEL 381T-TSA-1.CEL
16859315    0.0041327997    0.0037922286    0.0036595846   0.0025651771
16928104    0.0001270983    0.0001044760    0.0003044936   0.0005063974
16666750    0.0262915539    0.0158242186    0.0087426994   0.0165175588

I get your idea, James. That's why I'm now artificially assigning P/M/A to genePS_dabg by writing

genePS_dabg_call = ifelse(genePS_dabg<=0.06,ifelse(genePS_dabg<0.04,"P","M"),"A").

The values used for assigning P/M/A are based on mas5calls function.

But I'm not sure if paCalls uses the same algorithm to calculate the same p values as mas5calls does. If not, I doubt if I could use the same values to assign P/M/A.

Actually, I also want to try xps package to get the P/M/A results from my ST array but I could not install it in my computer :(

The implementation in oligo is different from mas5calls() in affy. But so is (by definition) the implementation in xps. This is because you are using PM-only arrays.

With mas5calls(), each PM probe had a corresponding MM probe, so it was a simple exercise of saying 'do the PM probes appear in aggregate to have a higher binding than MM probes?'. But the Gene ST arrays don't have MM probes, so what you have to do is find some background probes that are similar in some respect to the PM probes, and then make the test.

Both oligo and xps are implementing Affymetrix's DABG algorithm (detected above background), which 'bins' the foreground and background probes, based on GC content, and then tests to see if the foreground probes in a given bin are arguably binding at a higher level than the background probes. In oligo, DABG does this at the probe level, and PSDABG collapses to the probeset (exon, or PSR) level. There is nothing in oligo that collapses to the transcript level, but xps will compute calls at probe, probeset, and transcript level.

Hi James,

How could I know which probes are background probes?

Concerning DABG and PSDABG algorithms, you said DABG analyzes data in probe level while PSDABG analyzes data in probeset level. Actually, I'm not very clear about the differences about probe, probeset, and transcript levels. Could you clarify that?

I know a transcript cluster of a gene contains a number of probesets. And one probeset contains a number of probes. So when I try to analyze a gene at the transcript level, does it mean I'm grouping all the signals from all the probesets of that gene into one value and that value represents the expression level of that particular gene?

Does one probeset of a gene cover only one particular exon of that gene?

Thank you very much!

Vincent

1

The background probes are the 'antigenomic' set. I don't have pd.mogene.2.1.st installed, so I will show you using the 2.0 version.

> library(pd.mogene.2.0.st)
> con <- db(pd.mogene.2.0.st)
> dbGetQuery(con, "select * from type_dict;")
type                    type_id
1     1                       main
2     2              control->affx
3     3              control->chip
4     4  control->bgp->antigenomic
5     5      control->bgp->genomic
6     6             normgene->exon
7     7           normgene->intron
8     8   rescue->FLmRNA->unmapped
9     9   control->affx->bac_spike
10   10             oligo_spike_in
11   11            r1_bac_spike_at
12   12 control->affx->polya_spike
13   13        control->affx->ercc
14   14  control->affx->ercc->step
> z <- dbGetQuery(con, "select distinct fsetid, type from featureSet inner join core_mps using(fsetid) where type='4';")
> z
fsetid type
1  17550635    4
2  17550637    4
3  17550639    4
4  17550641    4
5  17550643    4
6  17550645    4
7  17550647    4
8  17550649    4
9  17550651    4
10 17550653    4
11 17550655    4
12 17550657    4
13 17550659    4
14 17550661    4
15 17550663    4
16 17550665    4
17 17550667    4
18 17550669    4
19 17550671    4
20 17550673    4
21 17550675    4
22 17550677    4
23 17550679    4

The control->bgp->antigenomic probes are those that are used for testing background, as they have varying amounts of GC content, and are not expected to be found in the mouse genome.

And you are exactly right about probes, probesets and transcripts. Probes are individual 25-mers. Probesets are intended to measure 'probe set regions' (PSRs), which are part of an exon (or the whole thing, if it is short). Transcripts are where all the PSRs for a given gene are piled into one probeset and summarized to give an aggregate measure of the expression for that gene.

I would only recommend using the transcript (core) level for the Gene ST arrays. Here is the distribution of probes/probeset at the PSR level:

> table(table(dbGetQuery(con, "select fsetid from pmfeature;")[,1]))[1:8]

1      2      3      4      5      6      7      8
103488  81629  32250  18986   8361   4010   4058   3374

Or more to the point

> z <- table(table(dbGetQuery(con, "select fsetid from pmfeature;")[,1]))
> round(cumsum(z)/sum(z), 2)[1:12]
1    2    3    4    5    6    7    8    9   10   11   12
0.38 0.69 0.81 0.88 0.91 0.92 0.94 0.95 0.96 0.97 0.97 0.98 

So 81% of the PSR level probesets have < 4 probes (on the Exon ST arrays, the PSRs have mostly 4 probes/probeset, so we don't really expect more than that.). I'm personally not enthused with RMA on 4 probes/probeset, let alone 3 or fewer.

Hi James,

I used the transcript level for analysis before but later when I checked different probesets representing the same gene, I often found significant differences between them in terms of the signals of the probesets. And I'm afraid if I simply analyze the Gene ST array in transcript level, some significant changes of some probesets would be masked. Moreover, when I tried to perform qPCR to validate microarray data, it's difficult to decide a specific region for amplification if I use the transcript level. But I could specifically design primers to amplify a particular region covered by a probeset. That's why I now switch to probeset level for analysis.

Concerning the background probes, do you mean I could make use of these probes to run mas5calls function? I guess R won't allow me to do so because it would recognize my cel files are from ST arrays instead of expression array. Isn't it?

Vincent

Dear James,

I have read through the Q&A you were discussing with Juan (paCalls oligo package) and I find you guys are discussing the use of "psdabg" and "dabg" in paCalls().

Am I right to say that rma (target = "core") must be used with "psdabg" in paCalls() while rma(target = "probeset") must be used with "dabg"?

If yes, why do they have to be in these combinations? "target = probeset" summarizes data in probeset level but dabg calculates p values in probe level. They are in two different levels. How can I use the p values using dabg to judge if that particular probeset is present or not in a particular RNA sample?

Moreover, you mentioned
"Personally I wouldn't use rma(target = "probeset") for the Gene ST arrays, because tons of the probesets only have one probe at that summarization level."
What does it mean? What is the problem of having only one probe at that summarization level?

Also, you suggested these command lines, which are
N <- 5 ## or whatever
ind <- apply(dagbPS, 1, function(x) sum(x < 0.05) > N)

What exactly is N? For example, in my own microarray analysis, I would like to compare the transcriptomes of 4 types of mice. One type is wild-type control mice. The second type is wild-type mice drug treatment. The third type is disease control mice. And the last type is disease mice with the same drug treatment. And each type has 3 samples. So there are a total of 12 samples.
So what value should I put in N?

Also, I don't quite understand this command line "ind <- apply(dagbPS, 1, function(x) sum(x < 0.05) > N)".
What do you mean by function(x) sum (x<0.05) > N?

I am sorry to ask so many questions. Thank you very much in advance.

Best regards,

Vincent

You would use PSDABG with rma(target = "probeset"). There is no paCalls() for the transcript level summarization in oligo. If you want that, you must use xps.

The reason there are multiple probes/probeset is because Affy realized early on that a 25-mer isn't really long enough give reliable results. By using multiple 25-mers, the idea was that in aggregate you would get a more accurate estimate of gene expression than what you would get from just one 25-mer. The original Affy arrays had 16 probes/probeset, which they reduced to 11 for the last set of 3'-biased arrays, and down to 4 for the random primer arrays (of which the Exon arrays were the first).

The RMA algorithm is intended to estimate (and then ignore) the probe-specific binding. But if you only have a small number of probes/probeset it is difficult to correctly account for the probe-specific binding, so the expression values are likely to still be highly confounded with technical attributes of the individual probes. If you want to do things at the exon level, get an Exon ST array.

In general when you are filtering out probesets, the goal is to do so in an unbiased manner. One way to do that is to say that the expression of a given gene has to be greater than a particular cutoff for at least N of the arrays (where N is the number of samples in the smallest group you have). You aren't saying that a particular set of arrays have to be expressed (which would introduce bias), but you are requiring that at least some of your samples are expressing the gene.

As for the code, this is where you earn your keep. If you plan on using R, the best way to do so is to look at other people's code and figure out what it is doing. I have already told you what it is supposed to be doing, so now you can figure out how.

0
3.4 years ago by
Brazil/Campinas/UNICAMP
Benilton Carvalho4.2k wrote:
as a temporary fix, try manually loading the Biobase package library(Biobase) and repeating the genePS_dabg_call = exprs(genePS_dabg) call. b On Mon, Jun 1, 2015 at 11:29 PM whyuen [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User whyuen <https: support.bioconductor.org="" u="" 7987=""/> wrote Question: > Oligo package: paCalls questions > <https: support.bioconductor.org="" p="" 68256=""/>: > > Hi all, > > I am trying to make P/A calls at probeset level for Mouse gene ST 2.1 > array using Oligo. > > Here is what I did for probeset level P/A calls: > > celFiles <- list.celfiles() > affyRaw <- read.celfiles(celFiles) > genePS <- rma(affyRaw, target = "probeset") > genePS_dabg <- paCalls(affyRaw,"PSDABG") > genePS_dabg_call = exprs(genePS_dabg) > > > But when I ran "genePS_dabg_call = exprs(genePS_dabg)", an error came out. > It was written as > > Error in (function (classes, fdef, mtable) : > unable to find an inherited method for function ‘exprs’ for signature > ‘"matrix"’ > > What's the problem here? > > Vincent > > > ------------------------------ > > You may reply via email or visit Oligo package: paCalls questions >
0
3.4 years ago by
whyuen0
Hong Kong
whyuen0 wrote:

The same error message appeared after I did what you recommended

>library(Biobase) and repeating the genePS_dabg_call = exprs(genePS_dabg)

Vincent

thanks for the report... I'll work on reproducing and fixing the issue. Can you please share the results of: sessionInfo() thanks On Mon, Jun 1, 2015 at 11:58 PM whyuen [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User whyuen <https: support.bioconductor.org="" u="" 7987=""/> wrote Answer: > Oligo package: paCalls questions > <https: support.bioconductor.org="" p="" 68256="" #68258="">: > > Hi Benilton Carvalho <https: support.bioconductor.org="" u="" 1375=""/>, > > The same error message appeared after I did what you recommended > > >library(Biobase) and repeating the genePS_dabg_call = exprs(genePS_dabg) > > Vincent > ------------------------------ > > You may reply via email or visit > A: Oligo package: paCalls questions >

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[3] LC_MONETARY=Chinese (Traditional)_Hong Kong SAR.950 LC_NUMERIC=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] hgu95acdf_2.16.0    affy_1.46.0         Biobase_2.28.0      BiocGenerics_0.14.0

loaded via a namespace (and not attached):
[1] zlibbioc_1.14.0       IRanges_2.2.2         BiocInstaller_1.18.2  DBI_0.3.1             tools_3.2.0           affyio_1.36.0
[7] AnnotationDbi_1.30.1  RSQLite_1.0.0         preprocessCore_1.30.0 S4Vectors_0.6.0       GenomeInfoDb_1.4.0    stats4_3.2.0

Thank you very much!!

0
3.4 years ago by
whyuen0
Hong Kong
whyuen0 wrote:

I am now trying this line:

genePS_dabg_call = ifelse(genePS_dabg<=0.06,ifelse(genePS_dabg<0.04,"P","M"),"A")

after

genePS_dabg <- paCalls(affyRaw,"PSDABG")

The values used for filtering are based on mas5calls function.

0
3.4 years ago by
cstrato3.9k
Austria
cstrato3.9k wrote:

Dear Vincent,

I would help you to install xps, but I do not know which computer you are using,
since you did not supply your sessionInfo().

However, here are few links from recent questions:

Best regards,
Christian

Dear Christian,

It would be nice if you could help me install xps. I have no idea how to install ROOT =.=

Here is my sessioninfo():

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[3] LC_MONETARY=Chinese (Traditional)_Hong Kong SAR.950 LC_NUMERIC=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] hgu95acdf_2.16.0    affy_1.46.0         Biobase_2.28.0      BiocGenerics_0.14.0

loaded via a namespace (and not attached):
[1] zlibbioc_1.14.0       IRanges_2.2.2         BiocInstaller_1.18.2  DBI_0.3.1             tools_3.2.0           affyio_1.36.0
[7] AnnotationDbi_1.30.1  RSQLite_1.0.0         preprocessCore_1.30.0 S4Vectors_0.6.0       GenomeInfoDb_1.4.0    stats4_3.2.0

0
3.4 years ago by
cstrato3.9k
Austria
cstrato3.9k wrote:

Dear Vincent,

Before you install xps you need to install the ROOT framework first. For Windows 7 you need to install a version of ROOT compiled with MinGW.

Best regards,
Christian

Dear Christian,

I could only read the README file in Rstudio, and it is written like this:

README File for ROOT Binary Distribution
----------------------------------------

Contents
========

Directory root/ :

README      - directory containing important information
LICENSE     - usage terms and conditions
configure   - build configuration script
bin         - directory containing executables
include     - directory containing the ROOT header files
lib         - directory containing the ROOT libraries (in shared library format)
etc         - directory containing default resource and mime type files
etc/proof   - directory containing settings for the PROOF system
macros      - directory containing system macros
icons       - directory containing xpm icons
test        - some ROOT test programs
tutorials   - example macros that can be executed by the bin/root module

Environment variables
=====================

To set the needed environment variables, PATH and LD_LIBRARY_PATH, use
the following convenience script. For the sh shell family do:

. <pathname>/root/bin/thisroot.sh

and for the csh shell family do:

source <pathname>/root/bin/thisroot.csh

where <pathname> is the location where you unpacked the ROOT distribution.

Running interactive ROOT and the tutorial macros
================================================

To run the example macros, go to the root/tutorials directory and do, e.g.:

$root root [0] .x benchmarks.C -- this will run all tutorials and will benchmark your machine. -- see http://root.cern.ch/drupal/content/benchmarking for the normalization and comparisons. root [1] .x demos.C -- Click on any button you like to run the corresponding tutorial. -- Move the objects on the canvas around using the mouse. root [2] .q Compiling and running ROOT test programs ======================================== To run some ROOT test programs, go to the root/test directory and do (after having selected the machine dependent flags in the Makefile):$ make
$./Event$ ./hsimple
$./minexam$ ./tcollex
\$ ./tstring
etc...

I wonder what I should do then. I am sorry that I do not have much knowledge on computer programming.

Vincent

0
3.4 years ago by
cstrato3.9k
Austria
cstrato3.9k wrote:

Dear Vincent,

I do not have Rstudio, but you should be able to open the link with any browser,
such as Firefox or Internet Explorer. This file is a text file only.

http://bioconductor.org/packages/3.1/bioc/html/xps.html

Furthermore, package xps does contain the README file, too, which you should be
able to open with any text editor.

Best regards,
Christian

Dear Christian,

I unpacked ROOT from the link above you gave me. Then I put the root file in the defacult directory C:\root. I also changed the environmental variable ROOTSYS:.

And then I tried to install xps in R using these lines.

source("http://bioconductor.org/biocLite.R")
biocLite("xps")

But still I can't install xps. What happens?

Vincent

0
3.4 years ago by
cstrato3.9k
Austria
cstrato3.9k wrote:

Dear Vincent,

At the moment please use the Windows Console (or Rgui.exe) to install xps.

First, open the Console and type 'root' to see if you can open ROOT.
Within ROOT type '.q' to quit ROOT.
(If this does not work, your environment variables are not set correctly.)

If this is ok then try 'biocLite("xps")' again. If this does not work then I would
R CMD INSTALL xps_1.28.0.zip

Regarding Rstudio, if you can run xps from the Console but not from Rstudio, maybe
there is a problem with Rstudio not recognising the environment variables.

Best regards,
Christian

Dear Christian,

Can xps package be installed in 64-bit windows? I tried to install the binary package but it complained about the bit-version.

Vincent

0
3.4 years ago by
cstrato3.9k
Austria
cstrato3.9k wrote:

Dear Vincent,

Did you open the Console and type 'root' to see if you can open ROOT?

The problem is that generally ROOT for Windows in 32 bit only. For this reason the xps
binary is compiled for 32 bit.

I am no Windows expert so I do not know if 32 bit versions can be run on 64 bit machines.

However, on 64 bit Windows 'R'  is 64 bit, too. So I assume that you need to run 'R'  in
32 bit mode in order to install the 32 bit version of xps.

Best regards,
Christian