Question

Oligo package: paCalls questions

1

Entering edit mode

whyuen ▴ 20

@whyuen-7987

Last seen 9.4 years ago

Hong Kong

Hi all,

I am trying to make P/A calls at probeset level for Mouse gene ST 2.1 array using Oligo.

Here is what I did for probeset level P/A calls:

celFiles <- list.celfiles()
affyRaw <- read.celfiles(celFiles)
genePS <- rma(affyRaw, target = "probeset")
genePS_dabg <- paCalls(affyRaw,"PSDABG")
genePS_dabg_call = exprs(genePS_dabg)

But when I ran "genePS_dabg_call = exprs(genePS_dabg)", an error came out. It was written as

Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘exprs’ for signature ‘"matrix"’

What's the problem here?

Vincent

microarray oligo • 4.4k views

ADD COMMENT • link updated 9.6 years ago by cstrato ★ 3.9k • written 9.6 years ago by whyuen ▴ 20

score 2 · Answer 1 · 2015-06-02

2

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 2 days ago

United States

You shouldn't be trying to extract the P/A calls using exprs() in the first place! From ?paCalls:

Value:

     A matrix (of dimension dim(PM) if method="DABG" or "MAS5"; of
     dimension length(unique(probeNames(object))) x ncol(object) if
     method="PSDABG") with p-values for P/A Calls.

You get back a matrix, not an ExpressionSet:

> genePS <- paCalls(dat, "PSDABG")
Computing DABG calls... OK
> class(genePS)
[1] "matrix"
> head(genePS)
         381T-DMSO-1.CEL 381T-DMSO-3.CEL 381T-DMSO-5.CEL 381T-TSA-1.CEL
16859315    0.0041327997    0.0037922286    0.0036595846   0.0025651771
16928104    0.0001270983    0.0001044760    0.0003044936   0.0005063974
16666750    0.0262915539    0.0158242186    0.0087426994   0.0165175588

ADD COMMENT • link 9.6 years ago James W. MacDonald 67k

0

Entering edit mode

I get your idea, James. That's why I'm now artificially assigning P/M/A to genePS_dabg by writing

genePS_dabg_call = ifelse(genePS_dabg<=0.06,ifelse(genePS_dabg<0.04,"P","M"),"A").

The values used for assigning P/M/A are based on mas5calls function.

But I'm not sure if paCalls uses the same algorithm to calculate the same p values as mas5calls does. If not, I doubt if I could use the same values to assign P/M/A.

Actually, I also want to try xps package to get the P/M/A results from my ST array but I could not install it in my computer :(

ADD REPLY • link 9.6 years ago whyuen ▴ 20

0

Entering edit mode

The implementation in oligo is different from mas5calls() in affy. But so is (by definition) the implementation in xps. This is because you are using PM-only arrays.

With mas5calls(), each PM probe had a corresponding MM probe, so it was a simple exercise of saying 'do the PM probes appear in aggregate to have a higher binding than MM probes?'. But the Gene ST arrays don't have MM probes, so what you have to do is find some background probes that are similar in some respect to the PM probes, and then make the test.

Both oligo and xps are implementing Affymetrix's DABG algorithm (detected above background), which 'bins' the foreground and background probes, based on GC content, and then tests to see if the foreground probes in a given bin are arguably binding at a higher level than the background probes. In oligo, DABG does this at the probe level, and PSDABG collapses to the probeset (exon, or PSR) level. There is nothing in oligo that collapses to the transcript level, but xps will compute calls at probe, probeset, and transcript level.

ADD REPLY • link 9.6 years ago James W. MacDonald 67k

0

Entering edit mode

Hi James,

How could I know which probes are background probes?

Concerning DABG and PSDABG algorithms, you said DABG analyzes data in probe level while PSDABG analyzes data in probeset level. Actually, I'm not very clear about the differences about probe, probeset, and transcript levels. Could you clarify that?

I know a transcript cluster of a gene contains a number of probesets. And one probeset contains a number of probes. So when I try to analyze a gene at the transcript level, does it mean I'm grouping all the signals from all the probesets of that gene into one value and that value represents the expression level of that particular gene?

Does one probeset of a gene cover only one particular exon of that gene?

Thank you very much!

Vincent

ADD REPLY • link 9.6 years ago whyuen ▴ 20

2

Entering edit mode

The background probes are the 'antigenomic' set. I don't have pd.mogene.2.1.st installed, so I will show you using the 2.0 version.

> library(pd.mogene.2.0.st)
> con <- db(pd.mogene.2.0.st)
> dbGetQuery(con, "select * from type_dict;")
   type                    type_id
1     1                       main
2     2              control->affx
3     3              control->chip
4     4  control->bgp->antigenomic
5     5      control->bgp->genomic
6     6             normgene->exon
7     7           normgene->intron
8     8   rescue->FLmRNA->unmapped
9     9   control->affx->bac_spike
10   10             oligo_spike_in
11   11            r1_bac_spike_at
12   12 control->affx->polya_spike
13   13        control->affx->ercc
14   14  control->affx->ercc->step
> z <- dbGetQuery(con, "select distinct fsetid, type from featureSet inner join core_mps using(fsetid) where type='4';")
> z
     fsetid type
1  17550635    4
2  17550637    4
3  17550639    4
4  17550641    4
5  17550643    4
6  17550645    4
7  17550647    4
8  17550649    4
9  17550651    4
10 17550653    4
11 17550655    4
12 17550657    4
13 17550659    4
14 17550661    4
15 17550663    4
16 17550665    4
17 17550667    4
18 17550669    4
19 17550671    4
20 17550673    4
21 17550675    4
22 17550677    4
23 17550679    4

The control->bgp->antigenomic probes are those that are used for testing background, as they have varying amounts of GC content, and are not expected to be found in the mouse genome.

And you are exactly right about probes, probesets and transcripts. Probes are individual 25-mers. Probesets are intended to measure 'probe set regions' (PSRs), which are part of an exon (or the whole thing, if it is short). Transcripts are where all the PSRs for a given gene are piled into one probeset and summarized to give an aggregate measure of the expression for that gene.

I would only recommend using the transcript (core) level for the Gene ST arrays. Here is the distribution of probes/probeset at the PSR level:

> table(table(dbGetQuery(con, "select fsetid from pmfeature;")[,1]))[1:8]

     1      2      3      4      5      6      7      8
103488  81629  32250  18986   8361   4010   4058   3374

Or more to the point

> z <- table(table(dbGetQuery(con, "select fsetid from pmfeature;")[,1]))
> round(cumsum(z)/sum(z), 2)[1:12]
   1    2    3    4    5    6    7    8    9   10   11   12
0.38 0.69 0.81 0.88 0.91 0.92 0.94 0.95 0.96 0.97 0.97 0.98

So 81% of the PSR level probesets have < 4 probes (on the Exon ST arrays, the PSRs have mostly 4 probes/probeset, so we don't really expect more than that.). I'm personally not enthused with RMA on 4 probes/probeset, let alone 3 or fewer.

ADD REPLY • link 9.6 years ago James W. MacDonald 67k

1

Entering edit mode

Dear James,

I have read through the Q&A you were discussing with Juan (paCalls oligo package) and I find you guys are discussing the use of "psdabg" and "dabg" in paCalls().

Am I right to say that rma (target = "core") must be used with "psdabg" in paCalls() while rma(target = "probeset") must be used with "dabg"?

If yes, why do they have to be in these combinations? "target = probeset" summarizes data in probeset level but dabg calculates p values in probe level. They are in two different levels. How can I use the p values using dabg to judge if that particular probeset is present or not in a particular RNA sample?

Moreover, you mentioned
"Personally I wouldn't use rma(target = "probeset") for the Gene ST arrays, because tons of the probesets only have one probe at that summarization level."
What does it mean? What is the problem of having only one probe at that summarization level?

Also, you suggested these command lines, which are
N <- 5 ## or whatever
ind <- apply(dagbPS, 1, function(x) sum(x < 0.05) > N)

What exactly is N? For example, in my own microarray analysis, I would like to compare the transcriptomes of 4 types of mice. One type is wild-type control mice. The second type is wild-type mice drug treatment. The third type is disease control mice. And the last type is disease mice with the same drug treatment. And each type has 3 samples. So there are a total of 12 samples.
So what value should I put in N?

Also, I don't quite understand this command line "ind <- apply(dagbPS, 1, function(x) sum(x < 0.05) > N)".
What do you mean by function(x) sum (x<0.05) > N?

I am sorry to ask so many questions. Thank you very much in advance.

Best regards,

Vincent

ADD REPLY • link 9.6 years ago whyuen ▴ 20

0

Entering edit mode

You would use PSDABG with rma(target = "probeset"). There is no paCalls() for the transcript level summarization in oligo. If you want that, you must use xps.

The reason there are multiple probes/probeset is because Affy realized early on that a 25-mer isn't really long enough give reliable results. By using multiple 25-mers, the idea was that in aggregate you would get a more accurate estimate of gene expression than what you would get from just one 25-mer. The original Affy arrays had 16 probes/probeset, which they reduced to 11 for the last set of 3'-biased arrays, and down to 4 for the random primer arrays (of which the Exon arrays were the first).

The RMA algorithm is intended to estimate (and then ignore) the probe-specific binding. But if you only have a small number of probes/probeset it is difficult to correctly account for the probe-specific binding, so the expression values are likely to still be highly confounded with technical attributes of the individual probes. If you want to do things at the exon level, get an Exon ST array.

In general when you are filtering out probesets, the goal is to do so in an unbiased manner. One way to do that is to say that the expression of a given gene has to be greater than a particular cutoff for at least N of the arrays (where N is the number of samples in the smallest group you have). You aren't saying that a particular set of arrays have to be expressed (which would introduce bias), but you are requiring that at least some of your samples are expressing the gene.

As for the code, this is where you earn your keep. If you plan on using R, the best way to do so is to look at other people's code and figure out what it is doing. I have already told you what it is supposed to be doing, so now you can figure out how.

ADD REPLY • link 9.6 years ago James W. MacDonald 67k

0

Entering edit mode

Hi James,

I used the transcript level for analysis before but later when I checked different probesets representing the same gene, I often found significant differences between them in terms of the signals of the probesets. And I'm afraid if I simply analyze the Gene ST array in transcript level, some significant changes of some probesets would be masked. Moreover, when I tried to perform qPCR to validate microarray data, it's difficult to decide a specific region for amplification if I use the transcript level. But I could specifically design primers to amplify a particular region covered by a probeset. That's why I now switch to probeset level for analysis.

Concerning the background probes, do you mean I could make use of these probes to run mas5calls function? I guess R won't allow me to do so because it would recognize my cel files are from ST arrays instead of expression array. Isn't it?

Vincent

ADD REPLY • link 9.6 years ago whyuen ▴ 20

score 0 · Answer 2 · 2015-06-01

as a temporary fix, try manually loading the Biobase package library(Biobase) and repeating the genePS_dabg_call = exprs(genePS_dabg) call. b On Mon, Jun 1, 2015 at 11:29 PM whyuen [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User whyuen <https: support.bioconductor.org="" u="" 7987=""/> wrote Question: > Oligo package: paCalls questions > <https: support.bioconductor.org="" p="" 68256=""/>: > > Hi all, > > I am trying to make P/A calls at probeset level for Mouse gene ST 2.1 > array using Oligo. > > Here is what I did for probeset level P/A calls: > > celFiles <- list.celfiles() > affyRaw <- read.celfiles(celFiles) > genePS <- rma(affyRaw, target = "probeset") > genePS_dabg <- paCalls(affyRaw,"PSDABG") > genePS_dabg_call = exprs(genePS_dabg) > > > But when I ran "genePS_dabg_call = exprs(genePS_dabg)", an error came out. > It was written as > > Error in (function (classes, fdef, mtable) : > unable to find an inherited method for function ‘exprs’ for signature > ‘"matrix"’ > > What's the problem here? > > Vincent > > > ------------------------------ > > You may reply via email or visit Oligo package: paCalls questions >

score 0 · Answer 3 · 2015-06-01

0

Entering edit mode

whyuen ▴ 20

@whyuen-7987

Last seen 9.4 years ago

Hong Kong

Hi Benilton Carvalho,

The same error message appeared after I did what you recommended

>library(Biobase) and repeating the genePS_dabg_call = exprs(genePS_dabg)

Vincent

ADD COMMENT • link 9.6 years ago whyuen ▴ 20

0

Entering edit mode

thanks for the report... I'll work on reproducing and fixing the issue. Can you please share the results of: sessionInfo() thanks On Mon, Jun 1, 2015 at 11:58 PM whyuen [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User whyuen <https: support.bioconductor.org="" u="" 7987=""/> wrote Answer: > Oligo package: paCalls questions > <https: support.bioconductor.org="" p="" 68256="" #68258="">: > > Hi Benilton Carvalho <https: support.bioconductor.org="" u="" 1375=""/>, > > The same error message appeared after I did what you recommended > > >library(Biobase) and repeating the genePS_dabg_call = exprs(genePS_dabg) > > Vincent > ------------------------------ > > You may reply via email or visit > A: Oligo package: paCalls questions >

ADD REPLY • link 9.6 years ago Benilton Carvalho ★ 4.3k

0

Entering edit mode

Hi Benilton Carvalho,

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=Chinese (Traditional)_Hong Kong SAR.950 LC_CTYPE=Chinese (Traditional)_Hong Kong SAR.950
[3] LC_MONETARY=Chinese (Traditional)_Hong Kong SAR.950 LC_NUMERIC=C
[5] LC_TIME=Chinese (Traditional)_Hong Kong SAR.950

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] hgu95acdf_2.16.0 affy_1.46.0 Biobase_2.28.0 BiocGenerics_0.14.0

loaded via a namespace (and not attached):
[1] zlibbioc_1.14.0 IRanges_2.2.2 BiocInstaller_1.18.2 DBI_0.3.1 tools_3.2.0 affyio_1.36.0
[7] AnnotationDbi_1.30.1 RSQLite_1.0.0 preprocessCore_1.30.0 S4Vectors_0.6.0 GenomeInfoDb_1.4.0 stats4_3.2.0

Thank you very much!!

ADD REPLY • link 9.6 years ago whyuen ▴ 20

score 0 · Answer 4 · 2015-06-02

0

Entering edit mode

whyuen ▴ 20

@whyuen-7987

Last seen 9.4 years ago

Hong Kong

I am now trying this line:

genePS_dabg_call = ifelse(genePS_dabg<=0.06,ifelse(genePS_dabg<0.04,"P","M"),"A")

after

genePS_dabg <- paCalls(affyRaw,"PSDABG")

The values used for filtering are based on mas5calls function.

ADD COMMENT • link 9.6 years ago whyuen ▴ 20

score 0 · Answer 5 · 2015-06-03

0

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 6.2 years ago

Austria

Dear Vincent,

I would help you to install xps, but I do not know which computer you are using,
since you did not supply your sessionInfo().

However, here are few links from recent questions:

Problem with xps package and root installation
Problem installing XPS

Best regards,
Christian

ADD COMMENT • link 9.6 years ago cstrato ★ 3.9k

0

Entering edit mode

Dear Christian,

It would be nice if you could help me install xps. I have no idea how to install ROOT =.=

Here is my sessioninfo():

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=Chinese (Traditional)_Hong Kong SAR.950 LC_CTYPE=Chinese (Traditional)_Hong Kong SAR.950
[3] LC_MONETARY=Chinese (Traditional)_Hong Kong SAR.950 LC_NUMERIC=C
[5] LC_TIME=Chinese (Traditional)_Hong Kong SAR.950

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] hgu95acdf_2.16.0 affy_1.46.0 Biobase_2.28.0 BiocGenerics_0.14.0

loaded via a namespace (and not attached):
[1] zlibbioc_1.14.0 IRanges_2.2.2 BiocInstaller_1.18.2 DBI_0.3.1 tools_3.2.0 affyio_1.36.0
[7] AnnotationDbi_1.30.1 RSQLite_1.0.0 preprocessCore_1.30.0 S4Vectors_0.6.0 GenomeInfoDb_1.4.0 stats4_3.2.0

ADD REPLY • link 9.6 years ago whyuen ▴ 20

score 0 · Answer 6 · 2015-06-04

0

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 6.2 years ago

Austria

Dear Vincent,

Before you install xps you need to install the ROOT framework first. For Windows 7 you need to install a version of ROOT compiled with MinGW.
You can download this file from:
https://docs.google.com/file/d/0B0qZ3XBoK1ubVzl5MHBEaWxQdW8/edit

Please read the README file how to install ROOT and xps for Windows, see:
http://www.bioconductor.org/packages/release/bioc/readmes/xps/README

Best regards,
Christian

ADD COMMENT • link 9.6 years ago cstrato ★ 3.9k

0

Entering edit mode

Dear Christian,

I could only read the README file in Rstudio, and it is written like this:

README File for ROOT Binary Distribution
----------------------------------------

Contents
========

Directory root/ :

README - directory containing important information
LICENSE - usage terms and conditions
configure - build configuration script
bin - directory containing executables
include - directory containing the ROOT header files
lib - directory containing the ROOT libraries (in shared library format)
etc - directory containing default resource and mime type files
etc/proof - directory containing settings for the PROOF system
macros - directory containing system macros
icons - directory containing xpm icons
test - some ROOT test programs
tutorials - example macros that can be executed by the bin/root module

Environment variables
=====================

To set the needed environment variables, PATH and LD_LIBRARY_PATH, use
the following convenience script. For the sh shell family do:

. <pathname>/root/bin/thisroot.sh

and for the csh shell family do:

source <pathname>/root/bin/thisroot.csh

where <pathname> is the location where you unpacked the ROOT distribution.

Typically add these lines to your .profile or .login files.

Running interactive ROOT and the tutorial macros
================================================

To run the example macros, go to the root/tutorials directory and do, e.g.:

$ root
root [0] .x benchmarks.C
-- this will run all tutorials and will benchmark your machine.
-- see http://root.cern.ch/drupal/content/benchmarking for the
normalization and comparisons.
root [1] .x demos.C
-- Click on any button you like to run the corresponding tutorial.
-- Move the objects on the canvas around using the mouse.
root [2] .q

Compiling and running ROOT test programs
========================================

To run some ROOT test programs, go to the root/test directory and do
(after having selected the machine dependent flags in the Makefile):

$ make
$ ./Event
$ ./hsimple
$ ./minexam
$ ./tcollex
$ ./tstring
etc...

I wonder what I should do then. I am sorry that I do not have much knowledge on computer programming.

Vincent

ADD REPLY • link 9.6 years ago whyuen ▴ 20

score 0 · Answer 7 · 2015-06-04

0

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 6.2 years ago

Austria

Dear Vincent,

I do not have Rstudio, but you should be able to open the link with any browser,
such as Firefox or Internet Explorer. This file is a text file only.

Alternatively, open this link:
http://bioconductor.org/packages/3.1/bioc/html/xps.html
and scroll down to the documentation, where you should be able to read or download
the README file.

Furthermore, package xps does contain the README file, too, which you should be
able to open with any text editor.

Best regards,
Christian

ADD COMMENT • link 9.6 years ago cstrato ★ 3.9k

0

Entering edit mode

Dear Christian,

I unpacked ROOT from the link above you gave me. Then I put the root file in the defacult directory C:\root. I also changed the environmental variable ROOTSYS:.

And then I tried to install xps in R using these lines.

source("http://bioconductor.org/biocLite.R")
biocLite("xps")

But still I can't install xps. What happens?

Vincent

ADD REPLY • link 9.6 years ago whyuen ▴ 20

score 0 · Answer 8 · 2015-06-05

0

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 6.2 years ago

Austria

Dear Vincent,

At the moment please use the Windows Console (or Rgui.exe) to install xps.

First, open the Console and type 'root' to see if you can open ROOT.
Within ROOT type '.q' to quit ROOT.
(If this does not work, your environment variables are not set correctly.)

If this is ok then try 'biocLite("xps")' again. If this does not work then I would
suggest to download from Bioconductor 'xps_1.28.0.zip' and install xps as follows:
R CMD INSTALL xps_1.28.0.zip

Regarding Rstudio, if you can run xps from the Console but not from Rstudio, maybe
there is a problem with Rstudio not recognising the environment variables.

Best regards,
Christian

ADD COMMENT • link 9.6 years ago cstrato ★ 3.9k

0

Entering edit mode

Dear Christian,

Can xps package be installed in 64-bit windows? I tried to install the binary package but it complained about the bit-version.

Vincent

ADD REPLY • link 9.6 years ago whyuen ▴ 20

score 0 · Answer 9 · 2015-06-07

Dear Vincent,

Did you open the Console and type 'root' to see if you can open ROOT?

The problem is that generally ROOT for Windows in 32 bit only. For this reason the xps
binary is compiled for 32 bit.

I am no Windows expert so I do not know if 32 bit versions can be run on 64 bit machines.

However, on 64 bit Windows 'R' is 64 bit, too. So I assume that you need to run 'R' in
32 bit mode in order to install the 32 bit version of xps.

Best regards,
Christian