Hi,
I've used xps to quantiles normalize (at probe level) some Affy Exon
Array data. I now have a "root" file called "exon_quantiles.root", but
if I try to load it the same was I'd load my raw data (using the
scheme file I created for Affy exon arrays) I get the error below? I
can load my raw data just fine though. Any ideas? Do I perhaps need a
different "root scheme" file for this normalized data? Unfortunately,
I haven't been able to find an answer.
> scheme.huex10stv2 <- root.scheme("huex10stv2.root")
> data_qn <- root.data(scheme.huex10stv2, "exon_quantiles.root")
Error in if (chipname != treetitle) { : argument is of length zero
Hope someone can help,
Paul.
> sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-redhat-linux-gnu
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
--
Paul Geeleher (PhD Student)
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland
--
www.bioinformaticstutorials.com
Dear Paul,
Please have a look at the help ?root.expr.
If I understand you correctly, you did only do quantile normalization?
To see the tree names in your file you should do:
> getTreeNames("exon_quantiles.root")
You will probably see trees with extension "int", see help
?validTreetype.
To load these trees you need to do:
> data_qn <- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"int")
Please let me know if this did solve your problem.
Best regards
Christian
_._._._._._._._._._._._._._._._._._
C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
V.i.e.n.n.a A.u.s.t.r.i.a
e.m.a.i.l: cstrato at aon.at
_._._._._._._._._._._._._._._._._._
On 2/1/12 7:07 PM, Paul Geeleher wrote:
> Hi,
>
> I've used xps to quantiles normalize (at probe level) some Affy Exon
> Array data. I now have a "root" file called "exon_quantiles.root",
but
> if I try to load it the same was I'd load my raw data (using the
> scheme file I created for Affy exon arrays) I get the error below? I
> can load my raw data just fine though. Any ideas? Do I perhaps need
a
> different "root scheme" file for this normalized data?
Unfortunately,
> I haven't been able to find an answer.
>
>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root")
> Error in if (chipname != treetitle) { : argument is of length zero
>
> Hope someone can help,
>
> Paul.
>
>
>
>> sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-redhat-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
>
>
Hi Christian,
Thanks for your quick reply. I check what kind of trees I have using
"getTreeNames()" as you'd suggested, it seems they are of type "cqu"
rather than "int", this is presumably because my analysis required no
background correction step?
So I then tried:
> data_qn <- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"cqu")
but that gives me a huge number of errors that look like this:
Error in <tfile::cd>: Unknown directory PreprocesSet
Error: Could not get directory <preprocesset>.
Error in <tfile::cd>: Unknown directory PreprocesSet
Error: Could not get directory <preprocesset>.
Error in <tfile::cd>: Unknown directory PreprocesSet
Error: Could not get directory <preprocesset>.
Error: Could not get tree <exportset>.
Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", :
error in function ?ExportData?
This file "exon_quantiles.root" definitely exists in the current
working directory though... Thanks again for your help!
Paul.
On Wed, Feb 1, 2012 at 9:01 PM, cstrato <cstrato at="" aon.at=""> wrote:
> Dear Paul,
>
> Please have a look at the help ?root.expr.
>
> If I understand you correctly, you did only do quantile
normalization?
>
> To see the tree names in your file you should do:
>> getTreeNames("exon_quantiles.root")
>
> You will probably see trees with extension "int", see help
?validTreetype.
>
> To load these trees you need to do:
>> data_qn <- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"int")
>
> Please let me know if this did solve your problem.
>
> Best regards
> Christian
> _._._._._._._._._._._._._._._._._._
> C.h.r.i.s.t.i.a.n ? S.t.r.a.t.o.w.a
> V.i.e.n.n.a ? ? ? ? ? A.u.s.t.r.i.a
> e.m.a.i.l: ? ? ? ?cstrato at aon.at
> _._._._._._._._._._._._._._._._._._
>
>
>
>
> On 2/1/12 7:07 PM, Paul Geeleher wrote:
>>
>> Hi,
>>
>> I've used xps to quantiles normalize (at probe level) some Affy
Exon
>> Array data. I now have a "root" file called "exon_quantiles.root",
but
>> if I try to load it the same was I'd load my raw data (using the
>> scheme file I created for Affy exon arrays) I get the error below?
I
>> can load my raw data just fine though. Any ideas? Do I perhaps need
a
>> different "root scheme" file for this normalized data?
Unfortunately,
>> I haven't been able to find an answer.
>>
>>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root")
>>
>> Error in if (chipname != treetitle) { : argument is of length zero
>>
>> Hope someone can help,
>>
>> Paul.
>>
>>
>>
>>> sessionInfo()
>>
>> R version 2.11.0 (2010-04-22)
>> x86_64-redhat-linux-gnu
>>
>> locale:
>> ?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C
>> ?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8
>> ?[5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8
>> ?[7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C
>> ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ?
base
>>
>>
>>
>
--
Paul Geeleher (PhD Student)
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland
--
www.bioinformaticstutorials.com
Dear Paul,
The functions root.data(), root.call() and root.expr() were created to
allow you access to the corresponding root files just in case that you
did not save your R session.
In the cases where you compute expression levels stepwise, or only
part
of them such as normalize.quantiles(), as seems to be the matter in
your
case, there is no corresponding root.xxx() function to access the root
file directly. In these cases you need to save your R session to have
continued access to the resulting root file.
Please note that saving the R session is the usual case to have access
to the root files.
Best regards
Christian
On 2/2/12 1:12 PM, Paul Geeleher wrote:
> Hi Christian,
>
> Thanks for your quick reply. I check what kind of trees I have using
> "getTreeNames()" as you'd suggested, it seems they are of type "cqu"
> rather than "int", this is presumably because my analysis required
no
> background correction step?
>
> So I then tried:
>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"cqu")
>
> but that gives me a huge number of errors that look like this:
>
> Error in<tfile::cd>: Unknown directory PreprocesSet
> Error: Could not get directory<preprocesset>.
> Error in<tfile::cd>: Unknown directory PreprocesSet
> Error: Could not get directory<preprocesset>.
> Error in<tfile::cd>: Unknown directory PreprocesSet
> Error: Could not get directory<preprocesset>.
> Error: Could not get tree<exportset>.
> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", :
> error in function ?ExportData?
>
>
> This file "exon_quantiles.root" definitely exists in the current
> working directory though... Thanks again for your help!
>
> Paul.
>
>
>
> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at=""> wrote:
>> Dear Paul,
>>
>> Please have a look at the help ?root.expr.
>>
>> If I understand you correctly, you did only do quantile
normalization?
>>
>> To see the tree names in your file you should do:
>>> getTreeNames("exon_quantiles.root")
>>
>> You will probably see trees with extension "int", see help
?validTreetype.
>>
>> To load these trees you need to do:
>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"int")
>>
>> Please let me know if this did solve your problem.
>>
>> Best regards
>> Christian
>> _._._._._._._._._._._._._._._._._._
>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
>> V.i.e.n.n.a A.u.s.t.r.i.a
>> e.m.a.i.l: cstrato at aon.at
>> _._._._._._._._._._._._._._._._._._
>>
>>
>>
>>
>> On 2/1/12 7:07 PM, Paul Geeleher wrote:
>>>
>>> Hi,
>>>
>>> I've used xps to quantiles normalize (at probe level) some Affy
Exon
>>> Array data. I now have a "root" file called "exon_quantiles.root",
but
>>> if I try to load it the same was I'd load my raw data (using the
>>> scheme file I created for Affy exon arrays) I get the error below?
I
>>> can load my raw data just fine though. Any ideas? Do I perhaps
need a
>>> different "root scheme" file for this normalized data?
Unfortunately,
>>> I haven't been able to find an answer.
>>>
>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root")
>>>
>>> Error in if (chipname != treetitle) { : argument is of length zero
>>>
>>> Hope someone can help,
>>>
>>> Paul.
>>>
>>>
>>>
>>>> sessionInfo()
>>>
>>> R version 2.11.0 (2010-04-22)
>>> x86_64-redhat-linux-gnu
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods
base
>>>
>>>
>>>
>>
>
>
>
Hi Christian,
Thanks for your quick and informative reply.
I have re-run the analysis and saved the R objects as you suggest. The
next thing I'm trying to do is to obtain the expression levels of the
probes, but this doesn't seem to be working for me:
> a <- validData(data_hapMap)
Error in .local(object, ...) : slot "data" has no data
Based on the documentation I think validData() is the correct
function.
I've also performed probeset level DABG and I'm trying to set
individual probes which belong to probesets with DABG < .05 to 0 in
the expression matrix.
But it seems I can't see the expression matrix using validData().
Perhaps there is another function. Any ideas?
Thank you again for your help with this, I'm very grateful!
Paul.
On 2/2/12, cstrato <cstrato at="" aon.at=""> wrote:
> Dear Paul,
>
> The functions root.data(), root.call() and root.expr() were created
to
> allow you access to the corresponding root files just in case that
you
> did not save your R session.
>
> In the cases where you compute expression levels stepwise, or only
part
> of them such as normalize.quantiles(), as seems to be the matter in
your
> case, there is no corresponding root.xxx() function to access the
root
> file directly. In these cases you need to save your R session to
have
> continued access to the resulting root file.
>
> Please note that saving the R session is the usual case to have
access
> to the root files.
>
> Best regards
> Christian
>
>
> On 2/2/12 1:12 PM, Paul Geeleher wrote:
>> Hi Christian,
>>
>> Thanks for your quick reply. I check what kind of trees I have
using
>> "getTreeNames()" as you'd suggested, it seems they are of type
"cqu"
>> rather than "int", this is presumably because my analysis required
no
>> background correction step?
>>
>> So I then tried:
>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"cqu")
>>
>> but that gives me a huge number of errors that look like this:
>>
>> Error in<tfile::cd>: Unknown directory PreprocesSet
>> Error: Could not get directory<preprocesset>.
>> Error in<tfile::cd>: Unknown directory PreprocesSet
>> Error: Could not get directory<preprocesset>.
>> Error in<tfile::cd>: Unknown directory PreprocesSet
>> Error: Could not get directory<preprocesset>.
>> Error: Could not get tree<exportset>.
>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", :
>> error in function ?ExportData?
>>
>>
>> This file "exon_quantiles.root" definitely exists in the current
>> working directory though... Thanks again for your help!
>>
>> Paul.
>>
>>
>>
>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at=""> wrote:
>>> Dear Paul,
>>>
>>> Please have a look at the help ?root.expr.
>>>
>>> If I understand you correctly, you did only do quantile
normalization?
>>>
>>> To see the tree names in your file you should do:
>>>> getTreeNames("exon_quantiles.root")
>>>
>>> You will probably see trees with extension "int", see help
>>> ?validTreetype.
>>>
>>> To load these trees you need to do:
>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"int")
>>>
>>> Please let me know if this did solve your problem.
>>>
>>> Best regards
>>> Christian
>>> _._._._._._._._._._._._._._._._._._
>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
>>> V.i.e.n.n.a A.u.s.t.r.i.a
>>> e.m.a.i.l: cstrato at aon.at
>>> _._._._._._._._._._._._._._._._._._
>>>
>>>
>>>
>>>
>>> On 2/1/12 7:07 PM, Paul Geeleher wrote:
>>>>
>>>> Hi,
>>>>
>>>> I've used xps to quantiles normalize (at probe level) some Affy
Exon
>>>> Array data. I now have a "root" file called
"exon_quantiles.root", but
>>>> if I try to load it the same was I'd load my raw data (using the
>>>> scheme file I created for Affy exon arrays) I get the error
below? I
>>>> can load my raw data just fine though. Any ideas? Do I perhaps
need a
>>>> different "root scheme" file for this normalized data?
Unfortunately,
>>>> I haven't been able to find an answer.
>>>>
>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root")
>>>>
>>>> Error in if (chipname != treetitle) { : argument is of length
zero
>>>>
>>>> Hope someone can help,
>>>>
>>>> Paul.
>>>>
>>>>
>>>>
>>>>> sessionInfo()
>>>>
>>>> R version 2.11.0 (2010-04-22)
>>>> x86_64-redhat-linux-gnu
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods
base
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
--
Paul Geeleher (PhD Student)
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland
--
www.bioinformaticstutorials.com
Dear Paul,
I am afraid that it is not quite clear to me what you want to do.
What do you mean with "obtain the expression levels of the probes"? Do
you mean "probes" (i.e. oligos) or do you mean "probesets" as in your
question about DABG?
How did you create "data_hapMap"? Maybe you could send me your
complete
code otherwise I am only able to guess.
I guess that "data_hapMap" contains the raw data. For these the slot
"data" is empty to save memory. So you need to use either attachData()
or attachInten(). However since you are using exon arrays you may not
have enough RAM, so it would be better to use function export() or
export.data(), or attach only a subset, see help ?attachData. See also
vignette xps.pdf (chapter 2.3).
When you talk about "expression matrix", how did you create it? Maybe
you could use function validExpr(), but w/o seeing your code it is
hard
to tell.
For DABG there are functions pvalData() and presCall(), see the
examples
in help ?dabg.call.
Best regards
Christian
On 2/6/12 4:33 PM, Paul Geeleher wrote:
> Hi Christian,
>
> Thanks for your quick and informative reply.
>
> I have re-run the analysis and saved the R objects as you suggest.
The
> next thing I'm trying to do is to obtain the expression levels of
the
> probes, but this doesn't seem to be working for me:
>
>> a<- validData(data_hapMap)
> Error in .local(object, ...) : slot "data" has no data
>
> Based on the documentation I think validData() is the correct
function.
>
> I've also performed probeset level DABG and I'm trying to set
> individual probes which belong to probesets with DABG< .05 to 0 in
> the expression matrix.
>
> But it seems I can't see the expression matrix using validData().
> Perhaps there is another function. Any ideas?
>
> Thank you again for your help with this, I'm very grateful!
>
> Paul.
>
> On 2/2/12, cstrato<cstrato at="" aon.at=""> wrote:
>> Dear Paul,
>>
>> The functions root.data(), root.call() and root.expr() were created
to
>> allow you access to the corresponding root files just in case that
you
>> did not save your R session.
>>
>> In the cases where you compute expression levels stepwise, or only
part
>> of them such as normalize.quantiles(), as seems to be the matter in
your
>> case, there is no corresponding root.xxx() function to access the
root
>> file directly. In these cases you need to save your R session to
have
>> continued access to the resulting root file.
>>
>> Please note that saving the R session is the usual case to have
access
>> to the root files.
>>
>> Best regards
>> Christian
>>
>>
>> On 2/2/12 1:12 PM, Paul Geeleher wrote:
>>> Hi Christian,
>>>
>>> Thanks for your quick reply. I check what kind of trees I have
using
>>> "getTreeNames()" as you'd suggested, it seems they are of type
"cqu"
>>> rather than "int", this is presumably because my analysis required
no
>>> background correction step?
>>>
>>> So I then tried:
>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"cqu")
>>>
>>> but that gives me a huge number of errors that look like this:
>>>
>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>> Error: Could not get directory<preprocesset>.
>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>> Error: Could not get directory<preprocesset>.
>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>> Error: Could not get directory<preprocesset>.
>>> Error: Could not get tree<exportset>.
>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", :
>>> error in function ?ExportData?
>>>
>>>
>>> This file "exon_quantiles.root" definitely exists in the current
>>> working directory though... Thanks again for your help!
>>>
>>> Paul.
>>>
>>>
>>>
>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at="">
wrote:
>>>> Dear Paul,
>>>>
>>>> Please have a look at the help ?root.expr.
>>>>
>>>> If I understand you correctly, you did only do quantile
normalization?
>>>>
>>>> To see the tree names in your file you should do:
>>>>> getTreeNames("exon_quantiles.root")
>>>>
>>>> You will probably see trees with extension "int", see help
>>>> ?validTreetype.
>>>>
>>>> To load these trees you need to do:
>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"int")
>>>>
>>>> Please let me know if this did solve your problem.
>>>>
>>>> Best regards
>>>> Christian
>>>> _._._._._._._._._._._._._._._._._._
>>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
>>>> V.i.e.n.n.a A.u.s.t.r.i.a
>>>> e.m.a.i.l: cstrato at aon.at
>>>> _._._._._._._._._._._._._._._._._._
>>>>
>>>>
>>>>
>>>>
>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I've used xps to quantiles normalize (at probe level) some Affy
Exon
>>>>> Array data. I now have a "root" file called
"exon_quantiles.root", but
>>>>> if I try to load it the same was I'd load my raw data (using the
>>>>> scheme file I created for Affy exon arrays) I get the error
below? I
>>>>> can load my raw data just fine though. Any ideas? Do I perhaps
need a
>>>>> different "root scheme" file for this normalized data?
Unfortunately,
>>>>> I haven't been able to find an answer.
>>>>>
>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root")
>>>>>
>>>>> Error in if (chipname != treetitle) { : argument is of length
zero
>>>>>
>>>>> Hope someone can help,
>>>>>
>>>>> Paul.
>>>>>
>>>>>
>>>>>
>>>>>> sessionInfo()
>>>>>
>>>>> R version 2.11.0 (2010-04-22)
>>>>> x86_64-redhat-linux-gnu
>>>>>
>>>>> locale:
>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>
>>>>> attached base packages:
>>>>> [1] stats graphics grDevices utils datasets methods
base
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
Hi Christian, apologies for the lack of clarity in my previous email,
I'll try to clear this up, see replies below:
On Mon, Feb 6, 2012 at 7:51 PM, cstrato <cstrato at="" aon.at=""> wrote:
> Dear Paul,
>
> I am afraid that it is not quite clear to me what you want to do.
>
> What do you mean with "obtain the expression levels of the probes"?
Do you
> mean "probes" (i.e. oligos) or do you mean "probesets" as in your
question
> about DABG?
I've read in and quantiles normalized probe level data for 176 arrays
and what I'm trying to do now is set every probe (individual probe,
not probeset) that is not detected above background to zero. But from
what I've read dabg.call() can only create p-values for probeset
level? So as compromise I'm now trying to set the expression of all
*individual probes* which belong to a *probeset* not detected above
background to zero. I hope that makes sense.
>
> How did you create "data_hapMap"? Maybe you could send me your
complete code
> otherwise I am only able to guess.
"data_hapMap" was a mistake, sorry I copied the wrong object name it
should have been "data_qn" which is quantiles normalized probe level
data that I created like this:
data_hapMap <- root.data(scheme.huex10stv2,
"/data2/paul/normalization_project/root_data/hapMap_root_data_cel.root
")
#read raw data
data_qn <- normalize.quantiles(data_hapMap, "exon_quantiles",
filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at *probe*
level
So "data_qn" is the object I want to work with, based on your advice
I've figured out that to access the expression levels in "data_qn" I
need to use "attachInten()", then I can use "intensities()" to access
the expression levels:
treenames <- unlist(treeNames(data_qn))
data_qn <- attachInten(data_qn, treenames=treenames[1:2]) #attach the
first two samples
> head(intensity(data_qn))
X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
1 0 0 0.0000 0.000
2 1 0 0.0000 0.000
3 2 0 0.0000 0.000
4 3 0 0.0000 0.000
5 4 0 0.0000 0.000
6 5 0 85.3523 121.733
Does the X column in the output above represent the probe ID? If so
and I have a mapping of probeset IDs to corresponding probe IDs, it
should be fairly straightforward to set probes that are not detected
above background to zero? Perhaps there is a more straightforward way
of doing this though?
>
> I guess that "data_hapMap" contains the raw data. For these the slot
"data"
> is empty to save memory. So you need to use either attachData() or
> attachInten(). However since you are using exon arrays you may not
have
> enough RAM, so it would be better to use function export() or
export.data(),
> or attach only a subset, see help ?attachData. See also vignette
xps.pdf
> (chapter 2.3).
I think RAM shouldn't be an issue if I attach the samples one at a
time? (I actually have access to a machine with 32gigs RAM but ideally
would like to get what I'm doing to run on a standard desktop, which
is actually why I'm using XPS!).
>
> When you talk about "expression matrix", how did you create it?
Maybe you
> could use function validExpr(), but w/o seeing your code it is hard
to tell.
> For DABG there are functions pvalData() and presCall(), see the
examples in
> help ?dabg.call.
Yes I've managed to use dabg.call() at probeset level and access the
p-values using pvalData() alright.
Thanks again for all of your help and patience!
Kind Regards,
Paul.
>
> Best regards
> Christian
>
>
>
> On 2/6/12 4:33 PM, Paul Geeleher wrote:
>>
>> Hi Christian,
>>
>> Thanks for your quick and informative reply.
>>
>> I have re-run the analysis and saved the R objects as you suggest.
The
>> next thing I'm trying to do is to obtain the expression levels of
the
>> probes, but this doesn't seem to be working for me:
>>
>>> a<- validData(data_hapMap)
>>
>> Error in .local(object, ...) : slot "data" has no data
>>
>> Based on the documentation I think validData() is the correct
function.
>>
>> I've also performed probeset level DABG and I'm trying to set
>> individual probes which belong to probesets with DABG< ?.05 to 0 in
>> the expression matrix.
>>
>> But it seems I can't see the expression matrix using validData().
>> Perhaps there is another function. Any ideas?
>>
>> Thank you again for your help with this, I'm very grateful!
>>
>> Paul.
>>
>> On 2/2/12, cstrato<cstrato at="" aon.at=""> ?wrote:
>>>
>>> Dear Paul,
>>>
>>> The functions root.data(), root.call() and root.expr() were
created to
>>> allow you access to the corresponding root files just in case that
you
>>> did not save your R session.
>>>
>>> In the cases where you compute expression levels stepwise, or only
part
>>> of them such as normalize.quantiles(), as seems to be the matter
in your
>>> case, there is no corresponding root.xxx() function to access the
root
>>> file directly. In these cases you need to save your R session to
have
>>> continued access to the resulting root file.
>>>
>>> Please note that saving the R session is the usual case to have
access
>>> to the root files.
>>>
>>> Best regards
>>> Christian
>>>
>>>
>>> On 2/2/12 1:12 PM, Paul Geeleher wrote:
>>>>
>>>> Hi Christian,
>>>>
>>>> Thanks for your quick reply. I check what kind of trees I have
using
>>>> "getTreeNames()" as you'd suggested, it seems they are of type
"cqu"
>>>> rather than "int", this is presumably because my analysis
required no
>>>> background correction step?
>>>>
>>>> So I then tried:
>>>>>
>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"cqu")
>>>>
>>>>
>>>> but that gives me a huge number of errors that look like this:
>>>>
>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>> Error: Could not get directory<preprocesset>.
>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>> Error: Could not get directory<preprocesset>.
>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>> Error: Could not get directory<preprocesset>.
>>>> Error: Could not get tree<exportset>.
>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", ?:
>>>> ? ?error in function ?ExportData?
>>>>
>>>>
>>>> This file "exon_quantiles.root" definitely exists in the current
>>>> working directory though... Thanks again for your help!
>>>>
>>>> Paul.
>>>>
>>>>
>>>>
>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at=""> ?
wrote:
>>>>>
>>>>> Dear Paul,
>>>>>
>>>>> Please have a look at the help ?root.expr.
>>>>>
>>>>> If I understand you correctly, you did only do quantile
normalization?
>>>>>
>>>>> To see the tree names in your file you should do:
>>>>>>
>>>>>> getTreeNames("exon_quantiles.root")
>>>>>
>>>>>
>>>>> You will probably see trees with extension "int", see help
>>>>> ?validTreetype.
>>>>>
>>>>> To load these trees you need to do:
>>>>>>
>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"int")
>>>>>
>>>>>
>>>>> Please let me know if this did solve your problem.
>>>>>
>>>>> Best regards
>>>>> Christian
>>>>> _._._._._._._._._._._._._._._._._._
>>>>> C.h.r.i.s.t.i.a.n ? S.t.r.a.t.o.w.a
>>>>> V.i.e.n.n.a ? ? ? ? ? A.u.s.t.r.i.a
>>>>> e.m.a.i.l: ? ? ? ?cstrato at aon.at
>>>>> _._._._._._._._._._._._._._._._._._
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've used xps to quantiles normalize (at probe level) some Affy
Exon
>>>>>> Array data. I now have a "root" file called
"exon_quantiles.root", but
>>>>>> if I try to load it the same was I'd load my raw data (using
the
>>>>>> scheme file I created for Affy exon arrays) I get the error
below? I
>>>>>> can load my raw data just fine though. Any ideas? Do I perhaps
need a
>>>>>> different "root scheme" file for this normalized data?
Unfortunately,
>>>>>> I haven't been able to find an answer.
>>>>>>
>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>>>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root")
>>>>>>
>>>>>>
>>>>>> Error in if (chipname != treetitle) { : argument is of length
zero
>>>>>>
>>>>>> Hope someone can help,
>>>>>>
>>>>>> Paul.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> sessionInfo()
>>>>>>
>>>>>>
>>>>>> R version 2.11.0 (2010-04-22)
>>>>>> x86_64-redhat-linux-gnu
>>>>>>
>>>>>> locale:
>>>>>> ? [1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C
>>>>>> ? [3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8
>>>>>> ? [5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8
>>>>>> ? [7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C
>>>>>> ? [9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C
>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ?
base
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>
--
Paul Geeleher (PhD Student)
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland
--
www.bioinformaticstutorials.com
Actually doing what I'm talking about relies on being able to modify
values in "intensity(data_qn)", which I was assuming one could do but
testing it I don't think I can!?
> head(intensity(data_qn))
X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
1 0 0 0.0000 0.000
2 1 0 0.0000 0.000
3 2 0 0.0000 0.000
4 3 0 0.0000 0.000
5 4 0 0.0000 0.000
6 5 0 85.3523 121.733
> intensity(data_qn)[6,3]
[1] 85.3523
> intensity(data_qn)[6,3] <- 0
Error in dimnames(x) : 'x' is missing
> head(intensity(data_qn))
X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
1 0 0 0.0000 0.000
2 1 0 0.0000 0.000
3 2 0 0.0000 0.000
4 3 0 0.0000 0.000
5 4 0 0.0000 0.000
6 5 0 85.3523 121.733
This means a different approach may be necessary.
What I can do is extract the probe level estimates for each gene,
which I think you outlined in a previous email to the list
(https://mailman.stat.ethz.ch/pipermail/bioconductor/2011-September/04
0963.html)
and from there set those probes which aren't detected above background
to zero (once I've figured out how to map probesets to probes). I can
the do what I need to do with this data (which is basically to fit a
model based on individual probe intensities for each gene) and do this
for every gene, which won't involve modifying the intensities in
data_qn and as I'm only "attaching" probe level data for one gene at a
time, shouldn't use that much memory.
Paul
On Tue, Feb 7, 2012 at 12:52 PM, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote:
> Hi Christian, apologies for the lack of clarity in my previous
email,
> I'll try to clear this up, see replies below:
>
> On Mon, Feb 6, 2012 at 7:51 PM, cstrato <cstrato at="" aon.at=""> wrote:
>> Dear Paul,
>>
>> I am afraid that it is not quite clear to me what you want to do.
>>
>> What do you mean with "obtain the expression levels of the probes"?
Do you
>> mean "probes" (i.e. oligos) or do you mean "probesets" as in your
question
>> about DABG?
>
> I've read in and quantiles normalized probe level data for 176
arrays
> and what I'm trying to do now is set every probe (individual probe,
> not probeset) that is not detected above background to zero. But
from
> what I've read dabg.call() can only create p-values for probeset
> level? So as compromise I'm now trying to set the expression of all
> *individual probes* which belong to a *probeset* not detected above
> background to zero. I hope that makes sense.
>
>>
>> How did you create "data_hapMap"? Maybe you could send me your
complete code
>> otherwise I am only able to guess.
>
> "data_hapMap" was a mistake, sorry I copied the wrong object name it
> should have been "data_qn" which is quantiles normalized probe level
> data that I created like this:
>
> data_hapMap <- root.data(scheme.huex10stv2,
> "/data2/paul/normalization_project/root_data/hapMap_root_data_cel.ro
ot")
> #read raw data
> data_qn <- normalize.quantiles(data_hapMap, "exon_quantiles",
> filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at
*probe*
> level
>
> So "data_qn" is the object I want to work with, based on your advice
> I've figured out that to access the expression levels in "data_qn" I
> need to use "attachInten()", then I can use "intensities()" to
access
> the expression levels:
>
> treenames <- unlist(treeNames(data_qn))
> data_qn <- attachInten(data_qn, treenames=treenames[1:2]) #attach
the
> first two samples
>
>> head(intensity(data_qn))
> ?X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
> 1 0 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000
> 2 1 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000
> 3 2 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000
> 4 3 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000
> 5 4 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000
> 6 5 0 ? ? ? ? ? ?85.3523 ? ? ? ? ? ?121.733
>
> Does the X column in the output above represent the probe ID? If so
> and I have a mapping of probeset IDs to corresponding probe IDs, it
> should be fairly straightforward to set probes that are not detected
> above background to zero? Perhaps there is a more straightforward
way
> of doing this though?
>
>
>>
>> I guess that "data_hapMap" contains the raw data. For these the
slot "data"
>> is empty to save memory. So you need to use either attachData() or
>> attachInten(). However since you are using exon arrays you may not
have
>> enough RAM, so it would be better to use function export() or
export.data(),
>> or attach only a subset, see help ?attachData. See also vignette
xps.pdf
>> (chapter 2.3).
>
> I think RAM shouldn't be an issue if I attach the samples one at a
> time? (I actually have access to a machine with 32gigs RAM but
ideally
> would like to get what I'm doing to run on a standard desktop, which
> is actually why I'm using XPS!).
>
>>
>> When you talk about "expression matrix", how did you create it?
Maybe you
>> could use function validExpr(), but w/o seeing your code it is hard
to tell.
>> For DABG there are functions pvalData() and presCall(), see the
examples in
>> help ?dabg.call.
>
> Yes I've managed to use dabg.call() at probeset level and access the
> p-values using pvalData() alright.
>
> Thanks again for all of your help and patience!
>
> Kind Regards,
>
> Paul.
>
>
>>
>> Best regards
>> Christian
>>
>>
>>
>> On 2/6/12 4:33 PM, Paul Geeleher wrote:
>>>
>>> Hi Christian,
>>>
>>> Thanks for your quick and informative reply.
>>>
>>> I have re-run the analysis and saved the R objects as you suggest.
The
>>> next thing I'm trying to do is to obtain the expression levels of
the
>>> probes, but this doesn't seem to be working for me:
>>>
>>>> a<- validData(data_hapMap)
>>>
>>> Error in .local(object, ...) : slot "data" has no data
>>>
>>> Based on the documentation I think validData() is the correct
function.
>>>
>>> I've also performed probeset level DABG and I'm trying to set
>>> individual probes which belong to probesets with DABG< ?.05 to 0
in
>>> the expression matrix.
>>>
>>> But it seems I can't see the expression matrix using validData().
>>> Perhaps there is another function. Any ideas?
>>>
>>> Thank you again for your help with this, I'm very grateful!
>>>
>>> Paul.
>>>
>>> On 2/2/12, cstrato<cstrato at="" aon.at=""> ?wrote:
>>>>
>>>> Dear Paul,
>>>>
>>>> The functions root.data(), root.call() and root.expr() were
created to
>>>> allow you access to the corresponding root files just in case
that you
>>>> did not save your R session.
>>>>
>>>> In the cases where you compute expression levels stepwise, or
only part
>>>> of them such as normalize.quantiles(), as seems to be the matter
in your
>>>> case, there is no corresponding root.xxx() function to access the
root
>>>> file directly. In these cases you need to save your R session to
have
>>>> continued access to the resulting root file.
>>>>
>>>> Please note that saving the R session is the usual case to have
access
>>>> to the root files.
>>>>
>>>> Best regards
>>>> Christian
>>>>
>>>>
>>>> On 2/2/12 1:12 PM, Paul Geeleher wrote:
>>>>>
>>>>> Hi Christian,
>>>>>
>>>>> Thanks for your quick reply. I check what kind of trees I have
using
>>>>> "getTreeNames()" as you'd suggested, it seems they are of type
"cqu"
>>>>> rather than "int", this is presumably because my analysis
required no
>>>>> background correction step?
>>>>>
>>>>> So I then tried:
>>>>>>
>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"cqu")
>>>>>
>>>>>
>>>>> but that gives me a huge number of errors that look like this:
>>>>>
>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>> Error: Could not get directory<preprocesset>.
>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>> Error: Could not get directory<preprocesset>.
>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>> Error: Could not get directory<preprocesset>.
>>>>> Error: Could not get tree<exportset>.
>>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", ?:
>>>>> ? ?error in function ?ExportData?
>>>>>
>>>>>
>>>>> This file "exon_quantiles.root" definitely exists in the current
>>>>> working directory though... Thanks again for your help!
>>>>>
>>>>> Paul.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at=""> ?
wrote:
>>>>>>
>>>>>> Dear Paul,
>>>>>>
>>>>>> Please have a look at the help ?root.expr.
>>>>>>
>>>>>> If I understand you correctly, you did only do quantile
normalization?
>>>>>>
>>>>>> To see the tree names in your file you should do:
>>>>>>>
>>>>>>> getTreeNames("exon_quantiles.root")
>>>>>>
>>>>>>
>>>>>> You will probably see trees with extension "int", see help
>>>>>> ?validTreetype.
>>>>>>
>>>>>> To load these trees you need to do:
>>>>>>>
>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"int")
>>>>>>
>>>>>>
>>>>>> Please let me know if this did solve your problem.
>>>>>>
>>>>>> Best regards
>>>>>> Christian
>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>> C.h.r.i.s.t.i.a.n ? S.t.r.a.t.o.w.a
>>>>>> V.i.e.n.n.a ? ? ? ? ? A.u.s.t.r.i.a
>>>>>> e.m.a.i.l: ? ? ? ?cstrato at aon.at
>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've used xps to quantiles normalize (at probe level) some
Affy Exon
>>>>>>> Array data. I now have a "root" file called
"exon_quantiles.root", but
>>>>>>> if I try to load it the same was I'd load my raw data (using
the
>>>>>>> scheme file I created for Affy exon arrays) I get the error
below? I
>>>>>>> can load my raw data just fine though. Any ideas? Do I perhaps
need a
>>>>>>> different "root scheme" file for this normalized data?
Unfortunately,
>>>>>>> I haven't been able to find an answer.
>>>>>>>
>>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>>>>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root")
>>>>>>>
>>>>>>>
>>>>>>> Error in if (chipname != treetitle) { : argument is of length
zero
>>>>>>>
>>>>>>> Hope someone can help,
>>>>>>>
>>>>>>> Paul.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> sessionInfo()
>>>>>>>
>>>>>>>
>>>>>>> R version 2.11.0 (2010-04-22)
>>>>>>> x86_64-redhat-linux-gnu
>>>>>>>
>>>>>>> locale:
>>>>>>> ? [1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C
>>>>>>> ? [3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8
>>>>>>> ? [5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8
>>>>>>> ? [7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C
>>>>>>> ? [9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C
>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>
>>>>>>> attached base packages:
>>>>>>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods
? base
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
>
> --
> Paul Geeleher (PhD Student)
> School of Mathematics, Statistics and Applied Mathematics
> National University of Ireland
> Galway
> Ireland
> --
> www.bioinformaticstutorials.com
--
Paul Geeleher (PhD Student)
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland
--
www.bioinformaticstutorials.com
Since I did not implement operators [] for function intensity() you
need
to use a dataframe to set values to zero and then use
intensity()<-value
to replace the object. However, please read the help file for
intensity<- carfully, since this operation is dangerous (and I have
not
tested it for exon arrays).
I think it is best to work with the dataframe itself, since you want
to
fit your own model, anyhow.
Christian
On 2/7/12 2:17 PM, Paul Geeleher wrote:
> Actually doing what I'm talking about relies on being able to modify
> values in "intensity(data_qn)", which I was assuming one could do
but
> testing it I don't think I can!?
>
>> head(intensity(data_qn))
> X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
> 1 0 0 0.0000 0.000
> 2 1 0 0.0000 0.000
> 3 2 0 0.0000 0.000
> 4 3 0 0.0000 0.000
> 5 4 0 0.0000 0.000
> 6 5 0 85.3523 121.733
>
>> intensity(data_qn)[6,3]
> [1] 85.3523
>> intensity(data_qn)[6,3]<- 0
> Error in dimnames(x) : 'x' is missing
>
>> head(intensity(data_qn))
> X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
> 1 0 0 0.0000 0.000
> 2 1 0 0.0000 0.000
> 3 2 0 0.0000 0.000
> 4 3 0 0.0000 0.000
> 5 4 0 0.0000 0.000
> 6 5 0 85.3523 121.733
>
> This means a different approach may be necessary.
>
> What I can do is extract the probe level estimates for each gene,
> which I think you outlined in a previous email to the list
> (https://mailman.stat.ethz.ch/pipermail/bioconductor/2011-September/
040963.html)
> and from there set those probes which aren't detected above
background
> to zero (once I've figured out how to map probesets to probes). I
can
> the do what I need to do with this data (which is basically to fit a
> model based on individual probe intensities for each gene) and do
this
> for every gene, which won't involve modifying the intensities in
> data_qn and as I'm only "attaching" probe level data for one gene at
a
> time, shouldn't use that much memory.
>
> Paul
>
> On Tue, Feb 7, 2012 at 12:52 PM, Paul Geeleher<paulgeeleher at="" gmail.com=""> wrote:
>> Hi Christian, apologies for the lack of clarity in my previous
email,
>> I'll try to clear this up, see replies below:
>>
>> On Mon, Feb 6, 2012 at 7:51 PM, cstrato<cstrato at="" aon.at=""> wrote:
>>> Dear Paul,
>>>
>>> I am afraid that it is not quite clear to me what you want to do.
>>>
>>> What do you mean with "obtain the expression levels of the
probes"? Do you
>>> mean "probes" (i.e. oligos) or do you mean "probesets" as in your
question
>>> about DABG?
>>
>> I've read in and quantiles normalized probe level data for 176
arrays
>> and what I'm trying to do now is set every probe (individual probe,
>> not probeset) that is not detected above background to zero. But
from
>> what I've read dabg.call() can only create p-values for probeset
>> level? So as compromise I'm now trying to set the expression of all
>> *individual probes* which belong to a *probeset* not detected above
>> background to zero. I hope that makes sense.
>>
>>>
>>> How did you create "data_hapMap"? Maybe you could send me your
complete code
>>> otherwise I am only able to guess.
>>
>> "data_hapMap" was a mistake, sorry I copied the wrong object name
it
>> should have been "data_qn" which is quantiles normalized probe
level
>> data that I created like this:
>>
>> data_hapMap<- root.data(scheme.huex10stv2,
>> "/data2/paul/normalization_project/root_data/hapMap_root_data_cel.r
oot")
>> #read raw data
>> data_qn<- normalize.quantiles(data_hapMap, "exon_quantiles",
>> filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at
*probe*
>> level
>>
>> So "data_qn" is the object I want to work with, based on your
advice
>> I've figured out that to access the expression levels in "data_qn"
I
>> need to use "attachInten()", then I can use "intensities()" to
access
>> the expression levels:
>>
>> treenames<- unlist(treeNames(data_qn))
>> data_qn<- attachInten(data_qn, treenames=treenames[1:2]) #attach
the
>> first two samples
>>
>>> head(intensity(data_qn))
>> X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
>> 1 0 0 0.0000 0.000
>> 2 1 0 0.0000 0.000
>> 3 2 0 0.0000 0.000
>> 4 3 0 0.0000 0.000
>> 5 4 0 0.0000 0.000
>> 6 5 0 85.3523 121.733
>>
>> Does the X column in the output above represent the probe ID? If so
>> and I have a mapping of probeset IDs to corresponding probe IDs, it
>> should be fairly straightforward to set probes that are not
detected
>> above background to zero? Perhaps there is a more straightforward
way
>> of doing this though?
>>
>>
>>>
>>> I guess that "data_hapMap" contains the raw data. For these the
slot "data"
>>> is empty to save memory. So you need to use either attachData() or
>>> attachInten(). However since you are using exon arrays you may not
have
>>> enough RAM, so it would be better to use function export() or
export.data(),
>>> or attach only a subset, see help ?attachData. See also vignette
xps.pdf
>>> (chapter 2.3).
>>
>> I think RAM shouldn't be an issue if I attach the samples one at a
>> time? (I actually have access to a machine with 32gigs RAM but
ideally
>> would like to get what I'm doing to run on a standard desktop,
which
>> is actually why I'm using XPS!).
>>
>>>
>>> When you talk about "expression matrix", how did you create it?
Maybe you
>>> could use function validExpr(), but w/o seeing your code it is
hard to tell.
>>> For DABG there are functions pvalData() and presCall(), see the
examples in
>>> help ?dabg.call.
>>
>> Yes I've managed to use dabg.call() at probeset level and access
the
>> p-values using pvalData() alright.
>>
>> Thanks again for all of your help and patience!
>>
>> Kind Regards,
>>
>> Paul.
>>
>>
>>>
>>> Best regards
>>> Christian
>>>
>>>
>>>
>>> On 2/6/12 4:33 PM, Paul Geeleher wrote:
>>>>
>>>> Hi Christian,
>>>>
>>>> Thanks for your quick and informative reply.
>>>>
>>>> I have re-run the analysis and saved the R objects as you
suggest. The
>>>> next thing I'm trying to do is to obtain the expression levels of
the
>>>> probes, but this doesn't seem to be working for me:
>>>>
>>>>> a<- validData(data_hapMap)
>>>>
>>>> Error in .local(object, ...) : slot "data" has no data
>>>>
>>>> Based on the documentation I think validData() is the correct
function.
>>>>
>>>> I've also performed probeset level DABG and I'm trying to set
>>>> individual probes which belong to probesets with DABG< .05 to
0 in
>>>> the expression matrix.
>>>>
>>>> But it seems I can't see the expression matrix using validData().
>>>> Perhaps there is another function. Any ideas?
>>>>
>>>> Thank you again for your help with this, I'm very grateful!
>>>>
>>>> Paul.
>>>>
>>>> On 2/2/12, cstrato<cstrato at="" aon.at=""> wrote:
>>>>>
>>>>> Dear Paul,
>>>>>
>>>>> The functions root.data(), root.call() and root.expr() were
created to
>>>>> allow you access to the corresponding root files just in case
that you
>>>>> did not save your R session.
>>>>>
>>>>> In the cases where you compute expression levels stepwise, or
only part
>>>>> of them such as normalize.quantiles(), as seems to be the matter
in your
>>>>> case, there is no corresponding root.xxx() function to access
the root
>>>>> file directly. In these cases you need to save your R session to
have
>>>>> continued access to the resulting root file.
>>>>>
>>>>> Please note that saving the R session is the usual case to have
access
>>>>> to the root files.
>>>>>
>>>>> Best regards
>>>>> Christian
>>>>>
>>>>>
>>>>> On 2/2/12 1:12 PM, Paul Geeleher wrote:
>>>>>>
>>>>>> Hi Christian,
>>>>>>
>>>>>> Thanks for your quick reply. I check what kind of trees I have
using
>>>>>> "getTreeNames()" as you'd suggested, it seems they are of type
"cqu"
>>>>>> rather than "int", this is presumably because my analysis
required no
>>>>>> background correction step?
>>>>>>
>>>>>> So I then tried:
>>>>>>>
>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"cqu")
>>>>>>
>>>>>>
>>>>>> but that gives me a huge number of errors that look like this:
>>>>>>
>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>>> Error: Could not get directory<preprocesset>.
>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>>> Error: Could not get directory<preprocesset>.
>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>>> Error: Could not get directory<preprocesset>.
>>>>>> Error: Could not get tree<exportset>.
>>>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", :
>>>>>> error in function ?ExportData?
>>>>>>
>>>>>>
>>>>>> This file "exon_quantiles.root" definitely exists in the
current
>>>>>> working directory though... Thanks again for your help!
>>>>>>
>>>>>> Paul.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at="">
wrote:
>>>>>>>
>>>>>>> Dear Paul,
>>>>>>>
>>>>>>> Please have a look at the help ?root.expr.
>>>>>>>
>>>>>>> If I understand you correctly, you did only do quantile
normalization?
>>>>>>>
>>>>>>> To see the tree names in your file you should do:
>>>>>>>>
>>>>>>>> getTreeNames("exon_quantiles.root")
>>>>>>>
>>>>>>>
>>>>>>> You will probably see trees with extension "int", see help
>>>>>>> ?validTreetype.
>>>>>>>
>>>>>>> To load these trees you need to do:
>>>>>>>>
>>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"int")
>>>>>>>
>>>>>>>
>>>>>>> Please let me know if this did solve your problem.
>>>>>>>
>>>>>>> Best regards
>>>>>>> Christian
>>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
>>>>>>> V.i.e.n.n.a A.u.s.t.r.i.a
>>>>>>> e.m.a.i.l: cstrato at aon.at
>>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've used xps to quantiles normalize (at probe level) some
Affy Exon
>>>>>>>> Array data. I now have a "root" file called
"exon_quantiles.root", but
>>>>>>>> if I try to load it the same was I'd load my raw data (using
the
>>>>>>>> scheme file I created for Affy exon arrays) I get the error
below? I
>>>>>>>> can load my raw data just fine though. Any ideas? Do I
perhaps need a
>>>>>>>> different "root scheme" file for this normalized data?
Unfortunately,
>>>>>>>> I haven't been able to find an answer.
>>>>>>>>
>>>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>>>>>>>>> data_qn<- root.data(scheme.huex10stv2,
"exon_quantiles.root")
>>>>>>>>
>>>>>>>>
>>>>>>>> Error in if (chipname != treetitle) { : argument is of length
zero
>>>>>>>>
>>>>>>>> Hope someone can help,
>>>>>>>>
>>>>>>>> Paul.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> sessionInfo()
>>>>>>>>
>>>>>>>>
>>>>>>>> R version 2.11.0 (2010-04-22)
>>>>>>>> x86_64-redhat-linux-gnu
>>>>>>>>
>>>>>>>> locale:
>>>>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>>
>>>>>>>> attached base packages:
>>>>>>>> [1] stats graphics grDevices utils datasets methods
base
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Paul Geeleher (PhD Student)
>> School of Mathematics, Statistics and Applied Mathematics
>> National University of Ireland
>> Galway
>> Ireland
>> --
>> www.bioinformaticstutorials.com
>
>
>
Dear Paul,
see replies below:
On 2/7/12 1:52 PM, Paul Geeleher wrote:
> Hi Christian, apologies for the lack of clarity in my previous
email,
> I'll try to clear this up, see replies below:
>
> On Mon, Feb 6, 2012 at 7:51 PM, cstrato<cstrato at="" aon.at=""> wrote:
>> Dear Paul,
>>
>> I am afraid that it is not quite clear to me what you want to do.
>>
>> What do you mean with "obtain the expression levels of the probes"?
Do you
>> mean "probes" (i.e. oligos) or do you mean "probesets" as in your
question
>> about DABG?
>
> I've read in and quantiles normalized probe level data for 176
arrays
> and what I'm trying to do now is set every probe (individual probe,
> not probeset) that is not detected above background to zero. But
from
> what I've read dabg.call() can only create p-values for probeset
> level? So as compromise I'm now trying to set the expression of all
> *individual probes* which belong to a *probeset* not detected above
> background to zero. I hope that makes sense.
>
Did you read the exon array whitepapers which you can download from
Affymetrix?
Every probeset consists of 1-4 probes only, and every exon consists
usually of 1-2 probesets. Each gene has a transcript_cluster_id, which
consists of one or more probeset_ids. (You can see the mapping between
ids using function export.scheme(..,treetype="anp",..)
Since the smallest unit is the probeset, function dabg.call() will
only
work at the probeset (and transcript) level. If you set all probes of
a
probeset to zero you may loose an entire exon.
>>
>> How did you create "data_hapMap"? Maybe you could send me your
complete code
>> otherwise I am only able to guess.
>
> "data_hapMap" was a mistake, sorry I copied the wrong object name it
> should have been "data_qn" which is quantiles normalized probe level
> data that I created like this:
>
> data_hapMap<- root.data(scheme.huex10stv2,
> "/data2/paul/normalization_project/root_data/hapMap_root_data_cel.ro
ot")
> #read raw data
> data_qn<- normalize.quantiles(data_hapMap, "exon_quantiles",
> filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at
*probe*
> level
>
> So "data_qn" is the object I want to work with, based on your advice
> I've figured out that to access the expression levels in "data_qn" I
> need to use "attachInten()", then I can use "intensities()" to
access
> the expression levels:
>
> treenames<- unlist(treeNames(data_qn))
> data_qn<- attachInten(data_qn, treenames=treenames[1:2]) #attach the
> first two samples
>
>> head(intensity(data_qn))
> X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
> 1 0 0 0.0000 0.000
> 2 1 0 0.0000 0.000
> 3 2 0 0.0000 0.000
> 4 3 0 0.0000 0.000
> 5 4 0 0.0000 0.000
> 6 5 0 85.3523 121.733
>
> Does the X column in the output above represent the probe ID? If so
> and I have a mapping of probeset IDs to corresponding probe IDs, it
> should be fairly straightforward to set probes that are not detected
> above background to zero? Perhaps there is a more straightforward
way
> of doing this though?
>
No, (X,Y) are the coordinates of the single probes on the exon array.
You can use function export.scheme(..,treetype="scm",..) to get the
mapping between (X,Y) and an internal PROBESET_ID, e.g.:
UNIT_ID X Y ProbeLength Mask EXON_ID PROBESET_ID
31 986 1674 25 512 31 31
31 1092 677 25 512 31 31
31 796 1862 25 512 31 31
31 917 193 25 512 31 31
31 341 1677 25 512 32 32
31 144 2250 25 512 32 32
31 689 262 25 512 32 32
31 579 1670 25 512 32 32
Then you can use export.scheme(..,treetype="pbs",..) to map
PROBESET_ID
(=UNIT_ID) to the ProbesetID, e.g.:
UNIT_ID ProbesetID NumCells NumAtoms NumSubunits
UnitType
31 2315101 4 4 1 512
32 2315102 4 4 1 512
As you see PROBESET_IDs 31 and 32 have each 4 probes and belong to the
Affymetrix ProbesetIDs 2315101 and 2315102, respectively.
You could also use functions indexUnits(), and unitID2probesetID() or
unitID2transcriptID(), respectively.
Best regards
Christian
>
>>
>> I guess that "data_hapMap" contains the raw data. For these the
slot "data"
>> is empty to save memory. So you need to use either attachData() or
>> attachInten(). However since you are using exon arrays you may not
have
>> enough RAM, so it would be better to use function export() or
export.data(),
>> or attach only a subset, see help ?attachData. See also vignette
xps.pdf
>> (chapter 2.3).
>
> I think RAM shouldn't be an issue if I attach the samples one at a
> time? (I actually have access to a machine with 32gigs RAM but
ideally
> would like to get what I'm doing to run on a standard desktop, which
> is actually why I'm using XPS!).
>
>>
>> When you talk about "expression matrix", how did you create it?
Maybe you
>> could use function validExpr(), but w/o seeing your code it is hard
to tell.
>> For DABG there are functions pvalData() and presCall(), see the
examples in
>> help ?dabg.call.
>
> Yes I've managed to use dabg.call() at probeset level and access the
> p-values using pvalData() alright.
>
> Thanks again for all of your help and patience!
>
> Kind Regards,
>
> Paul.
>
>
>>
>> Best regards
>> Christian
>>
>>
>>
>> On 2/6/12 4:33 PM, Paul Geeleher wrote:
>>>
>>> Hi Christian,
>>>
>>> Thanks for your quick and informative reply.
>>>
>>> I have re-run the analysis and saved the R objects as you suggest.
The
>>> next thing I'm trying to do is to obtain the expression levels of
the
>>> probes, but this doesn't seem to be working for me:
>>>
>>>> a<- validData(data_hapMap)
>>>
>>> Error in .local(object, ...) : slot "data" has no data
>>>
>>> Based on the documentation I think validData() is the correct
function.
>>>
>>> I've also performed probeset level DABG and I'm trying to set
>>> individual probes which belong to probesets with DABG< .05 to 0
in
>>> the expression matrix.
>>>
>>> But it seems I can't see the expression matrix using validData().
>>> Perhaps there is another function. Any ideas?
>>>
>>> Thank you again for your help with this, I'm very grateful!
>>>
>>> Paul.
>>>
>>> On 2/2/12, cstrato<cstrato at="" aon.at=""> wrote:
>>>>
>>>> Dear Paul,
>>>>
>>>> The functions root.data(), root.call() and root.expr() were
created to
>>>> allow you access to the corresponding root files just in case
that you
>>>> did not save your R session.
>>>>
>>>> In the cases where you compute expression levels stepwise, or
only part
>>>> of them such as normalize.quantiles(), as seems to be the matter
in your
>>>> case, there is no corresponding root.xxx() function to access the
root
>>>> file directly. In these cases you need to save your R session to
have
>>>> continued access to the resulting root file.
>>>>
>>>> Please note that saving the R session is the usual case to have
access
>>>> to the root files.
>>>>
>>>> Best regards
>>>> Christian
>>>>
>>>>
>>>> On 2/2/12 1:12 PM, Paul Geeleher wrote:
>>>>>
>>>>> Hi Christian,
>>>>>
>>>>> Thanks for your quick reply. I check what kind of trees I have
using
>>>>> "getTreeNames()" as you'd suggested, it seems they are of type
"cqu"
>>>>> rather than "int", this is presumably because my analysis
required no
>>>>> background correction step?
>>>>>
>>>>> So I then tried:
>>>>>>
>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"cqu")
>>>>>
>>>>>
>>>>> but that gives me a huge number of errors that look like this:
>>>>>
>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>> Error: Could not get directory<preprocesset>.
>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>> Error: Could not get directory<preprocesset>.
>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>> Error: Could not get directory<preprocesset>.
>>>>> Error: Could not get tree<exportset>.
>>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", :
>>>>> error in function ?ExportData?
>>>>>
>>>>>
>>>>> This file "exon_quantiles.root" definitely exists in the current
>>>>> working directory though... Thanks again for your help!
>>>>>
>>>>> Paul.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at="">
wrote:
>>>>>>
>>>>>> Dear Paul,
>>>>>>
>>>>>> Please have a look at the help ?root.expr.
>>>>>>
>>>>>> If I understand you correctly, you did only do quantile
normalization?
>>>>>>
>>>>>> To see the tree names in your file you should do:
>>>>>>>
>>>>>>> getTreeNames("exon_quantiles.root")
>>>>>>
>>>>>>
>>>>>> You will probably see trees with extension "int", see help
>>>>>> ?validTreetype.
>>>>>>
>>>>>> To load these trees you need to do:
>>>>>>>
>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"int")
>>>>>>
>>>>>>
>>>>>> Please let me know if this did solve your problem.
>>>>>>
>>>>>> Best regards
>>>>>> Christian
>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
>>>>>> V.i.e.n.n.a A.u.s.t.r.i.a
>>>>>> e.m.a.i.l: cstrato at aon.at
>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've used xps to quantiles normalize (at probe level) some
Affy Exon
>>>>>>> Array data. I now have a "root" file called
"exon_quantiles.root", but
>>>>>>> if I try to load it the same was I'd load my raw data (using
the
>>>>>>> scheme file I created for Affy exon arrays) I get the error
below? I
>>>>>>> can load my raw data just fine though. Any ideas? Do I perhaps
need a
>>>>>>> different "root scheme" file for this normalized data?
Unfortunately,
>>>>>>> I haven't been able to find an answer.
>>>>>>>
>>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>>>>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root")
>>>>>>>
>>>>>>>
>>>>>>> Error in if (chipname != treetitle) { : argument is of length
zero
>>>>>>>
>>>>>>> Hope someone can help,
>>>>>>>
>>>>>>> Paul.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> sessionInfo()
>>>>>>>
>>>>>>>
>>>>>>> R version 2.11.0 (2010-04-22)
>>>>>>> x86_64-redhat-linux-gnu
>>>>>>>
>>>>>>> locale:
>>>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>
>>>>>>> attached base packages:
>>>>>>> [1] stats graphics grDevices utils datasets methods
base
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
>
Hi Christian,
Thanks for your replies.
As per your suggestions I've been been able to map between xy
co-ordinates, probe IDs and PROBESET_IDS.
I've been able to extract the expression level for a given gene (for a
subset of samples) using this:
id <- symbol2unitID(scheme.huex10stv2 , symbol="CDH11",
unittype="transcript", as.list =TRUE)
treenames <- unlist(treeNames(data_qn))
data_qn <- attachInten(data_qn, treenames=treenames[1:3])
xyIntens <- validData(data_qn, unitID=unlist(id))
With "validData()" is that it seems I can only get the expression
levels for samples which are "attached". This means I'd probably have
to attach each sample separately (which will take a long time) as I
don't have enough memory to attach all samples.
I'm wondering if there is a way to extract this data without having to
attach the data? If something similar is possible for the dabg data
(at probeset level) that would also be a big help in speeding up my
pipeline (which will be complete if I can implement these last two
steps)!
Thanks again for all your help its much appreciated,
Paul.
On Tue, Feb 7, 2012 at 8:48 PM, cstrato <cstrato at="" aon.at=""> wrote:
> Dear Paul,
>
> see replies below:
>
>
> On 2/7/12 1:52 PM, Paul Geeleher wrote:
>>
>> Hi Christian, apologies for the lack of clarity in my previous
email,
>> I'll try to clear this up, see replies below:
>>
>> On Mon, Feb 6, 2012 at 7:51 PM, cstrato<cstrato at="" aon.at=""> ?wrote:
>>>
>>> Dear Paul,
>>>
>>> I am afraid that it is not quite clear to me what you want to do.
>>>
>>> What do you mean with "obtain the expression levels of the
probes"? Do
>>> you
>>> mean "probes" (i.e. oligos) or do you mean "probesets" as in your
>>> question
>>> about DABG?
>>
>>
>> I've read in and quantiles normalized probe level data for 176
arrays
>> and what I'm trying to do now is set every probe (individual probe,
>> not probeset) that is not detected above background to zero. But
from
>> what I've read dabg.call() can only create p-values for probeset
>> level? So as compromise I'm now trying to set the expression of all
>> *individual probes* which belong to a *probeset* not detected above
>> background to zero. I hope that makes sense.
>>
>
>
> Did you read the exon array whitepapers which you can download from
> Affymetrix?
>
> Every probeset consists of 1-4 probes only, and every exon consists
usually
> of 1-2 probesets. Each gene has a transcript_cluster_id, which
consists of
> one or more probeset_ids. (You can see the mapping between ids using
> function export.scheme(..,treetype="anp",..)
>
> Since the smallest unit is the probeset, function dabg.call() will
only work
> at the probeset (and transcript) level. If you set all probes of a
probeset
> to zero you may loose an entire exon.
>
>
>
>>>
>>> How did you create "data_hapMap"? Maybe you could send me your
complete
>>> code
>>> otherwise I am only able to guess.
>>
>>
>> "data_hapMap" was a mistake, sorry I copied the wrong object name
it
>> should have been "data_qn" which is quantiles normalized probe
level
>> data that I created like this:
>>
>> data_hapMap<- root.data(scheme.huex10stv2,
>> "/data2/paul/normalization_project/root_data/hapMap_root_data_cel.r
oot")
>> #read raw data
>> data_qn<- normalize.quantiles(data_hapMap, "exon_quantiles",
>> filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at
*probe*
>> level
>>
>> So "data_qn" is the object I want to work with, based on your
advice
>> I've figured out that to access the expression levels in "data_qn"
I
>> need to use "attachInten()", then I can use "intensities()" to
access
>> the expression levels:
>>
>> treenames<- unlist(treeNames(data_qn))
>> data_qn<- attachInten(data_qn, treenames=treenames[1:2]) #attach
the
>> first two samples
>>
>>> head(intensity(data_qn))
>>
>> ? X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
>> 1 0 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000
>> 2 1 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000
>> 3 2 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000
>> 4 3 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000
>> 5 4 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000
>> 6 5 0 ? ? ? ? ? ?85.3523 ? ? ? ? ? ?121.733
>>
>> Does the X column in the output above represent the probe ID? If so
>> and I have a mapping of probeset IDs to corresponding probe IDs, it
>> should be fairly straightforward to set probes that are not
detected
>> above background to zero? Perhaps there is a more straightforward
way
>> of doing this though?
>>
>
> No, (X,Y) are the coordinates of the single probes on the exon
array. You
> can use function export.scheme(..,treetype="scm",..) to get the
mapping
> between (X,Y) and an internal PROBESET_ID, e.g.:
>
> UNIT_ID X ? ? ? Y ? ? ? ProbeLength ? ? Mask ? ?EXON_ID PROBESET_ID
> 31 ? ? ?986 ? ? 1674 ? ?25 ? ? ?512 ? ? 31 ? ? ?31
> 31 ? ? ?1092 ? ?677 ? ? 25 ? ? ?512 ? ? 31 ? ? ?31
> 31 ? ? ?796 ? ? 1862 ? ?25 ? ? ?512 ? ? 31 ? ? ?31
> 31 ? ? ?917 ? ? 193 ? ? 25 ? ? ?512 ? ? 31 ? ? ?31
> 31 ? ? ?341 ? ? 1677 ? ?25 ? ? ?512 ? ? 32 ? ? ?32
> 31 ? ? ?144 ? ? 2250 ? ?25 ? ? ?512 ? ? 32 ? ? ?32
> 31 ? ? ?689 ? ? 262 ? ? 25 ? ? ?512 ? ? 32 ? ? ?32
> 31 ? ? ?579 ? ? 1670 ? ?25 ? ? ?512 ? ? 32 ? ? ?32
>
> Then you can use export.scheme(..,treetype="pbs",..) to map
PROBESET_ID
> (=UNIT_ID) to the ProbesetID, e.g.:
>
> UNIT_ID ProbesetID ? ? ?NumCells ? ? ? ?NumAtoms ? ? ? ?NumSubunits
> UnitType
> 31 ? ? ?2315101 4 ? ? ? 4 ? ? ? 1 ? ? ? 512
> 32 ? ? ?2315102 4 ? ? ? 4 ? ? ? 1 ? ? ? 512
>
> As you see PROBESET_IDs 31 and 32 have each 4 probes and belong to
the
> Affymetrix ProbesetIDs 2315101 and 2315102, respectively.
>
> You could also use functions indexUnits(), and unitID2probesetID()
or
> unitID2transcriptID(), respectively.
>
> Best regards
> Christian
>
>
>
>>
>>>
>>> I guess that "data_hapMap" contains the raw data. For these the
slot
>>> "data"
>>> is empty to save memory. So you need to use either attachData() or
>>> attachInten(). However since you are using exon arrays you may not
have
>>> enough RAM, so it would be better to use function export() or
>>> export.data(),
>>> or attach only a subset, see help ?attachData. See also vignette
xps.pdf
>>> (chapter 2.3).
>>
>>
>> I think RAM shouldn't be an issue if I attach the samples one at a
>> time? (I actually have access to a machine with 32gigs RAM but
ideally
>> would like to get what I'm doing to run on a standard desktop,
which
>> is actually why I'm using XPS!).
>>
>>>
>>> When you talk about "expression matrix", how did you create it?
Maybe you
>>> could use function validExpr(), but w/o seeing your code it is
hard to
>>> tell.
>>> For DABG there are functions pvalData() and presCall(), see the
examples
>>> in
>>> help ?dabg.call.
>>
>>
>> Yes I've managed to use dabg.call() at probeset level and access
the
>> p-values using pvalData() alright.
>>
>> Thanks again for all of your help and patience!
>>
>> Kind Regards,
>>
>> Paul.
>>
>>
>>>
>>> Best regards
>>> Christian
>>>
>>>
>>>
>>> On 2/6/12 4:33 PM, Paul Geeleher wrote:
>>>>
>>>>
>>>> Hi Christian,
>>>>
>>>> Thanks for your quick and informative reply.
>>>>
>>>> I have re-run the analysis and saved the R objects as you
suggest. The
>>>> next thing I'm trying to do is to obtain the expression levels of
the
>>>> probes, but this doesn't seem to be working for me:
>>>>
>>>>> a<- validData(data_hapMap)
>>>>
>>>>
>>>> Error in .local(object, ...) : slot "data" has no data
>>>>
>>>> Based on the documentation I think validData() is the correct
function.
>>>>
>>>> I've also performed probeset level DABG and I'm trying to set
>>>> individual probes which belong to probesets with DABG< ? ?.05 to
0 in
>>>> the expression matrix.
>>>>
>>>> But it seems I can't see the expression matrix using validData().
>>>> Perhaps there is another function. Any ideas?
>>>>
>>>> Thank you again for your help with this, I'm very grateful!
>>>>
>>>> Paul.
>>>>
>>>> On 2/2/12, cstrato<cstrato at="" aon.at=""> ? ?wrote:
>>>>>
>>>>>
>>>>> Dear Paul,
>>>>>
>>>>> The functions root.data(), root.call() and root.expr() were
created to
>>>>> allow you access to the corresponding root files just in case
that you
>>>>> did not save your R session.
>>>>>
>>>>> In the cases where you compute expression levels stepwise, or
only part
>>>>> of them such as normalize.quantiles(), as seems to be the matter
in
>>>>> your
>>>>> case, there is no corresponding root.xxx() function to access
the root
>>>>> file directly. In these cases you need to save your R session to
have
>>>>> continued access to the resulting root file.
>>>>>
>>>>> Please note that saving the R session is the usual case to have
access
>>>>> to the root files.
>>>>>
>>>>> Best regards
>>>>> Christian
>>>>>
>>>>>
>>>>> On 2/2/12 1:12 PM, Paul Geeleher wrote:
>>>>>>
>>>>>>
>>>>>> Hi Christian,
>>>>>>
>>>>>> Thanks for your quick reply. I check what kind of trees I have
using
>>>>>> "getTreeNames()" as you'd suggested, it seems they are of type
"cqu"
>>>>>> rather than "int", this is presumably because my analysis
required no
>>>>>> background correction step?
>>>>>>
>>>>>> So I then tried:
>>>>>>>
>>>>>>>
>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"cqu")
>>>>>>
>>>>>>
>>>>>>
>>>>>> but that gives me a huge number of errors that look like this:
>>>>>>
>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>>> Error: Could not get directory<preprocesset>.
>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>>> Error: Could not get directory<preprocesset>.
>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>>> Error: Could not get directory<preprocesset>.
>>>>>> Error: Could not get tree<exportset>.
>>>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", ?:
>>>>>> ? ?error in function ?ExportData?
>>>>>>
>>>>>>
>>>>>> This file "exon_quantiles.root" definitely exists in the
current
>>>>>> working directory though... Thanks again for your help!
>>>>>>
>>>>>> Paul.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at=""> ? ?
wrote:
>>>>>>>
>>>>>>>
>>>>>>> Dear Paul,
>>>>>>>
>>>>>>> Please have a look at the help ?root.expr.
>>>>>>>
>>>>>>> If I understand you correctly, you did only do quantile
>>>>>>> normalization?
>>>>>>>
>>>>>>> To see the tree names in your file you should do:
>>>>>>>>
>>>>>>>>
>>>>>>>> getTreeNames("exon_quantiles.root")
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> You will probably see trees with extension "int", see help
>>>>>>> ?validTreetype.
>>>>>>>
>>>>>>> To load these trees you need to do:
>>>>>>>>
>>>>>>>>
>>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"int")
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Please let me know if this did solve your problem.
>>>>>>>
>>>>>>> Best regards
>>>>>>> Christian
>>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>> C.h.r.i.s.t.i.a.n ? S.t.r.a.t.o.w.a
>>>>>>> V.i.e.n.n.a ? ? ? ? ? A.u.s.t.r.i.a
>>>>>>> e.m.a.i.l: ? ? ? ?cstrato at aon.at
>>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've used xps to quantiles normalize (at probe level) some
Affy Exon
>>>>>>>> Array data. I now have a "root" file called
"exon_quantiles.root",
>>>>>>>> but
>>>>>>>> if I try to load it the same was I'd load my raw data (using
the
>>>>>>>> scheme file I created for Affy exon arrays) I get the error
below? I
>>>>>>>> can load my raw data just fine though. Any ideas? Do I
perhaps need
>>>>>>>> a
>>>>>>>> different "root scheme" file for this normalized data?
>>>>>>>> Unfortunately,
>>>>>>>> I haven't been able to find an answer.
>>>>>>>>
>>>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>>>>>>>>> data_qn<- root.data(scheme.huex10stv2,
"exon_quantiles.root")
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Error in if (chipname != treetitle) { : argument is of length
zero
>>>>>>>>
>>>>>>>> Hope someone can help,
>>>>>>>>
>>>>>>>> Paul.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> sessionInfo()
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> R version 2.11.0 (2010-04-22)
>>>>>>>> x86_64-redhat-linux-gnu
>>>>>>>>
>>>>>>>> locale:
>>>>>>>> ? [1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C
>>>>>>>> ? [3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8
>>>>>>>> ? [5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8
>>>>>>>> ? [7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C
>>>>>>>> ? [9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C
>>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>>
>>>>>>>> attached base packages:
>>>>>>>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods
? base
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
--
Paul Geeleher (PhD Student)
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland
--
www.bioinformaticstutorials.com
Dear Paul,
You could export the whole data as table using:
export(data_qn, treename="*", treetype="cqu", varlist="fInten",
outfile="data_cqu.txt",as.dataframe=FALSE)
Then you have a huge text-file, which you can then use for your
purposes. However, since you do not have enough memory to work with R
you would need to either import only parts of the table into R, or use
other languages such as perl for further analysis.
For the DABG data you could use function export.call().
Alternatively, you could also work completely from within ROOT, using
C++ macros. Examples how to use xps from within ROOT are shown in
directory xps/examples/macro4XPS.C
Best regards
Christian
On 2/14/12 1:54 PM, Paul Geeleher wrote:
> Hi Christian,
>
> Thanks for your replies.
>
> As per your suggestions I've been been able to map between xy
> co-ordinates, probe IDs and PROBESET_IDS.
>
> I've been able to extract the expression level for a given gene (for
a
> subset of samples) using this:
>
> id<- symbol2unitID(scheme.huex10stv2 , symbol="CDH11",
> unittype="transcript", as.list =TRUE)
> treenames<- unlist(treeNames(data_qn))
> data_qn<- attachInten(data_qn, treenames=treenames[1:3])
> xyIntens<- validData(data_qn, unitID=unlist(id))
>
> With "validData()" is that it seems I can only get the expression
> levels for samples which are "attached". This means I'd probably
have
> to attach each sample separately (which will take a long time) as I
> don't have enough memory to attach all samples.
>
> I'm wondering if there is a way to extract this data without having
to
> attach the data? If something similar is possible for the dabg data
> (at probeset level) that would also be a big help in speeding up my
> pipeline (which will be complete if I can implement these last two
> steps)!
>
> Thanks again for all your help its much appreciated,
>
> Paul.
>
>
> On Tue, Feb 7, 2012 at 8:48 PM, cstrato<cstrato at="" aon.at=""> wrote:
>> Dear Paul,
>>
>> see replies below:
>>
>>
>> On 2/7/12 1:52 PM, Paul Geeleher wrote:
>>>
>>> Hi Christian, apologies for the lack of clarity in my previous
email,
>>> I'll try to clear this up, see replies below:
>>>
>>> On Mon, Feb 6, 2012 at 7:51 PM, cstrato<cstrato at="" aon.at="">
wrote:
>>>>
>>>> Dear Paul,
>>>>
>>>> I am afraid that it is not quite clear to me what you want to do.
>>>>
>>>> What do you mean with "obtain the expression levels of the
probes"? Do
>>>> you
>>>> mean "probes" (i.e. oligos) or do you mean "probesets" as in your
>>>> question
>>>> about DABG?
>>>
>>>
>>> I've read in and quantiles normalized probe level data for 176
arrays
>>> and what I'm trying to do now is set every probe (individual
probe,
>>> not probeset) that is not detected above background to zero. But
from
>>> what I've read dabg.call() can only create p-values for probeset
>>> level? So as compromise I'm now trying to set the expression of
all
>>> *individual probes* which belong to a *probeset* not detected
above
>>> background to zero. I hope that makes sense.
>>>
>>
>>
>> Did you read the exon array whitepapers which you can download from
>> Affymetrix?
>>
>> Every probeset consists of 1-4 probes only, and every exon consists
usually
>> of 1-2 probesets. Each gene has a transcript_cluster_id, which
consists of
>> one or more probeset_ids. (You can see the mapping between ids
using
>> function export.scheme(..,treetype="anp",..)
>>
>> Since the smallest unit is the probeset, function dabg.call() will
only work
>> at the probeset (and transcript) level. If you set all probes of a
probeset
>> to zero you may loose an entire exon.
>>
>>
>>
>>>>
>>>> How did you create "data_hapMap"? Maybe you could send me your
complete
>>>> code
>>>> otherwise I am only able to guess.
>>>
>>>
>>> "data_hapMap" was a mistake, sorry I copied the wrong object name
it
>>> should have been "data_qn" which is quantiles normalized probe
level
>>> data that I created like this:
>>>
>>> data_hapMap<- root.data(scheme.huex10stv2,
>>> "/data2/paul/normalization_project/root_data/hapMap_root_data_cel.
root")
>>> #read raw data
>>> data_qn<- normalize.quantiles(data_hapMap, "exon_quantiles",
>>> filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at
*probe*
>>> level
>>>
>>> So "data_qn" is the object I want to work with, based on your
advice
>>> I've figured out that to access the expression levels in "data_qn"
I
>>> need to use "attachInten()", then I can use "intensities()" to
access
>>> the expression levels:
>>>
>>> treenames<- unlist(treeNames(data_qn))
>>> data_qn<- attachInten(data_qn, treenames=treenames[1:2]) #attach
the
>>> first two samples
>>>
>>>> head(intensity(data_qn))
>>>
>>> X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
>>> 1 0 0 0.0000 0.000
>>> 2 1 0 0.0000 0.000
>>> 3 2 0 0.0000 0.000
>>> 4 3 0 0.0000 0.000
>>> 5 4 0 0.0000 0.000
>>> 6 5 0 85.3523 121.733
>>>
>>> Does the X column in the output above represent the probe ID? If
so
>>> and I have a mapping of probeset IDs to corresponding probe IDs,
it
>>> should be fairly straightforward to set probes that are not
detected
>>> above background to zero? Perhaps there is a more straightforward
way
>>> of doing this though?
>>>
>>
>> No, (X,Y) are the coordinates of the single probes on the exon
array. You
>> can use function export.scheme(..,treetype="scm",..) to get the
mapping
>> between (X,Y) and an internal PROBESET_ID, e.g.:
>>
>> UNIT_ID X Y ProbeLength Mask EXON_ID PROBESET_ID
>> 31 986 1674 25 512 31 31
>> 31 1092 677 25 512 31 31
>> 31 796 1862 25 512 31 31
>> 31 917 193 25 512 31 31
>> 31 341 1677 25 512 32 32
>> 31 144 2250 25 512 32 32
>> 31 689 262 25 512 32 32
>> 31 579 1670 25 512 32 32
>>
>> Then you can use export.scheme(..,treetype="pbs",..) to map
PROBESET_ID
>> (=UNIT_ID) to the ProbesetID, e.g.:
>>
>> UNIT_ID ProbesetID NumCells NumAtoms NumSubunits
>> UnitType
>> 31 2315101 4 4 1 512
>> 32 2315102 4 4 1 512
>>
>> As you see PROBESET_IDs 31 and 32 have each 4 probes and belong to
the
>> Affymetrix ProbesetIDs 2315101 and 2315102, respectively.
>>
>> You could also use functions indexUnits(), and unitID2probesetID()
or
>> unitID2transcriptID(), respectively.
>>
>> Best regards
>> Christian
>>
>>
>>
>>>
>>>>
>>>> I guess that "data_hapMap" contains the raw data. For these the
slot
>>>> "data"
>>>> is empty to save memory. So you need to use either attachData()
or
>>>> attachInten(). However since you are using exon arrays you may
not have
>>>> enough RAM, so it would be better to use function export() or
>>>> export.data(),
>>>> or attach only a subset, see help ?attachData. See also vignette
xps.pdf
>>>> (chapter 2.3).
>>>
>>>
>>> I think RAM shouldn't be an issue if I attach the samples one at a
>>> time? (I actually have access to a machine with 32gigs RAM but
ideally
>>> would like to get what I'm doing to run on a standard desktop,
which
>>> is actually why I'm using XPS!).
>>>
>>>>
>>>> When you talk about "expression matrix", how did you create it?
Maybe you
>>>> could use function validExpr(), but w/o seeing your code it is
hard to
>>>> tell.
>>>> For DABG there are functions pvalData() and presCall(), see the
examples
>>>> in
>>>> help ?dabg.call.
>>>
>>>
>>> Yes I've managed to use dabg.call() at probeset level and access
the
>>> p-values using pvalData() alright.
>>>
>>> Thanks again for all of your help and patience!
>>>
>>> Kind Regards,
>>>
>>> Paul.
>>>
>>>
>>>>
>>>> Best regards
>>>> Christian
>>>>
>>>>
>>>>
>>>> On 2/6/12 4:33 PM, Paul Geeleher wrote:
>>>>>
>>>>>
>>>>> Hi Christian,
>>>>>
>>>>> Thanks for your quick and informative reply.
>>>>>
>>>>> I have re-run the analysis and saved the R objects as you
suggest. The
>>>>> next thing I'm trying to do is to obtain the expression levels
of the
>>>>> probes, but this doesn't seem to be working for me:
>>>>>
>>>>>> a<- validData(data_hapMap)
>>>>>
>>>>>
>>>>> Error in .local(object, ...) : slot "data" has no data
>>>>>
>>>>> Based on the documentation I think validData() is the correct
function.
>>>>>
>>>>> I've also performed probeset level DABG and I'm trying to set
>>>>> individual probes which belong to probesets with DABG< .05
to 0 in
>>>>> the expression matrix.
>>>>>
>>>>> But it seems I can't see the expression matrix using
validData().
>>>>> Perhaps there is another function. Any ideas?
>>>>>
>>>>> Thank you again for your help with this, I'm very grateful!
>>>>>
>>>>> Paul.
>>>>>
>>>>> On 2/2/12, cstrato<cstrato at="" aon.at=""> wrote:
>>>>>>
>>>>>>
>>>>>> Dear Paul,
>>>>>>
>>>>>> The functions root.data(), root.call() and root.expr() were
created to
>>>>>> allow you access to the corresponding root files just in case
that you
>>>>>> did not save your R session.
>>>>>>
>>>>>> In the cases where you compute expression levels stepwise, or
only part
>>>>>> of them such as normalize.quantiles(), as seems to be the
matter in
>>>>>> your
>>>>>> case, there is no corresponding root.xxx() function to access
the root
>>>>>> file directly. In these cases you need to save your R session
to have
>>>>>> continued access to the resulting root file.
>>>>>>
>>>>>> Please note that saving the R session is the usual case to have
access
>>>>>> to the root files.
>>>>>>
>>>>>> Best regards
>>>>>> Christian
>>>>>>
>>>>>>
>>>>>> On 2/2/12 1:12 PM, Paul Geeleher wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Christian,
>>>>>>>
>>>>>>> Thanks for your quick reply. I check what kind of trees I have
using
>>>>>>> "getTreeNames()" as you'd suggested, it seems they are of type
"cqu"
>>>>>>> rather than "int", this is presumably because my analysis
required no
>>>>>>> background correction step?
>>>>>>>
>>>>>>> So I then tried:
>>>>>>>>
>>>>>>>>
>>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root",
"cqu")
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> but that gives me a huge number of errors that look like this:
>>>>>>>
>>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>>>> Error: Could not get directory<preprocesset>.
>>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>>>> Error: Could not get directory<preprocesset>.
>>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>>>> Error: Could not get directory<preprocesset>.
>>>>>>> Error: Could not get tree<exportset>.
>>>>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root",
:
>>>>>>> error in function ?ExportData?
>>>>>>>
>>>>>>>
>>>>>>> This file "exon_quantiles.root" definitely exists in the
current
>>>>>>> working directory though... Thanks again for your help!
>>>>>>>
>>>>>>> Paul.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at="">
wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Dear Paul,
>>>>>>>>
>>>>>>>> Please have a look at the help ?root.expr.
>>>>>>>>
>>>>>>>> If I understand you correctly, you did only do quantile
>>>>>>>> normalization?
>>>>>>>>
>>>>>>>> To see the tree names in your file you should do:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> getTreeNames("exon_quantiles.root")
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> You will probably see trees with extension "int", see help
>>>>>>>> ?validTreetype.
>>>>>>>>
>>>>>>>> To load these trees you need to do:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> data_qn<- root.expr(scheme.huex10stv2,
"exon_quantiles.root", "int")
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Please let me know if this did solve your problem.
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>> Christian
>>>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
>>>>>>>> V.i.e.n.n.a A.u.s.t.r.i.a
>>>>>>>> e.m.a.i.l: cstrato at aon.at
>>>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I've used xps to quantiles normalize (at probe level) some
Affy Exon
>>>>>>>>> Array data. I now have a "root" file called
"exon_quantiles.root",
>>>>>>>>> but
>>>>>>>>> if I try to load it the same was I'd load my raw data (using
the
>>>>>>>>> scheme file I created for Affy exon arrays) I get the error
below? I
>>>>>>>>> can load my raw data just fine though. Any ideas? Do I
perhaps need
>>>>>>>>> a
>>>>>>>>> different "root scheme" file for this normalized data?
>>>>>>>>> Unfortunately,
>>>>>>>>> I haven't been able to find an answer.
>>>>>>>>>
>>>>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>>>>>>>>>> data_qn<- root.data(scheme.huex10stv2,
"exon_quantiles.root")
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Error in if (chipname != treetitle) { : argument is of
length zero
>>>>>>>>>
>>>>>>>>> Hope someone can help,
>>>>>>>>>
>>>>>>>>> Paul.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> sessionInfo()
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> R version 2.11.0 (2010-04-22)
>>>>>>>>> x86_64-redhat-linux-gnu
>>>>>>>>>
>>>>>>>>> locale:
>>>>>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>>>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>>>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>>>
>>>>>>>>> attached base packages:
>>>>>>>>> [1] stats graphics grDevices utils datasets
methods base
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>
Hi Christian,
Thanks again for your very helpful replies. I think using the C++
macros might be the way I go once I get everything ironed out.
I have what I think is one last question. When try to convert gene
symbol to internal ID:
allInternalGeneIds[[i]] <- symbol2unitID(scheme.huex10stv2 ,
symbol=geneSym, unittype="transcript", as.list =TRUE)
I get the following message:
slot ?unitname? is empty, importing data from ?HuEx-1_0-st-v2.idx? ...
Which leads me to believe there is a way to attach "unitname"? I've
been unable to find how to to this though? I think if you know if this
is possible it could speed up my pipeline quite a bit, as I have to
repeat this step for every gene!
Thanks again,
Paul.
On 2/14/12, cstrato <cstrato at="" aon.at=""> wrote:
> Dear Paul,
>
> You could export the whole data as table using:
>
> export(data_qn, treename="*", treetype="cqu", varlist="fInten",
> outfile="data_cqu.txt",as.dataframe=FALSE)
>
> Then you have a huge text-file, which you can then use for your
> purposes. However, since you do not have enough memory to work with
R
> you would need to either import only parts of the table into R, or
use
> other languages such as perl for further analysis.
>
> For the DABG data you could use function export.call().
>
> Alternatively, you could also work completely from within ROOT,
using
> C++ macros. Examples how to use xps from within ROOT are shown in
> directory xps/examples/macro4XPS.C
>
> Best regards
> Christian
>
>
> On 2/14/12 1:54 PM, Paul Geeleher wrote:
>> Hi Christian,
>>
>> Thanks for your replies.
>>
>> As per your suggestions I've been been able to map between xy
>> co-ordinates, probe IDs and PROBESET_IDS.
>>
>> I've been able to extract the expression level for a given gene
(for a
>> subset of samples) using this:
>>
>> id<- symbol2unitID(scheme.huex10stv2 , symbol="CDH11",
>> unittype="transcript", as.list =TRUE)
>> treenames<- unlist(treeNames(data_qn))
>> data_qn<- attachInten(data_qn, treenames=treenames[1:3])
>> xyIntens<- validData(data_qn, unitID=unlist(id))
>>
>> With "validData()" is that it seems I can only get the expression
>> levels for samples which are "attached". This means I'd probably
have
>> to attach each sample separately (which will take a long time) as I
>> don't have enough memory to attach all samples.
>>
>> I'm wondering if there is a way to extract this data without having
to
>> attach the data? If something similar is possible for the dabg data
>> (at probeset level) that would also be a big help in speeding up my
>> pipeline (which will be complete if I can implement these last two
>> steps)!
>>
>> Thanks again for all your help its much appreciated,
>>
>> Paul.
>>
>>
>> On Tue, Feb 7, 2012 at 8:48 PM, cstrato<cstrato at="" aon.at=""> wrote:
>>> Dear Paul,
>>>
>>> see replies below:
>>>
>>>
>>> On 2/7/12 1:52 PM, Paul Geeleher wrote:
>>>>
>>>> Hi Christian, apologies for the lack of clarity in my previous
email,
>>>> I'll try to clear this up, see replies below:
>>>>
>>>> On Mon, Feb 6, 2012 at 7:51 PM, cstrato<cstrato at="" aon.at="">
wrote:
>>>>>
>>>>> Dear Paul,
>>>>>
>>>>> I am afraid that it is not quite clear to me what you want to
do.
>>>>>
>>>>> What do you mean with "obtain the expression levels of the
probes"? Do
>>>>> you
>>>>> mean "probes" (i.e. oligos) or do you mean "probesets" as in
your
>>>>> question
>>>>> about DABG?
>>>>
>>>>
>>>> I've read in and quantiles normalized probe level data for 176
arrays
>>>> and what I'm trying to do now is set every probe (individual
probe,
>>>> not probeset) that is not detected above background to zero. But
from
>>>> what I've read dabg.call() can only create p-values for probeset
>>>> level? So as compromise I'm now trying to set the expression of
all
>>>> *individual probes* which belong to a *probeset* not detected
above
>>>> background to zero. I hope that makes sense.
>>>>
>>>
>>>
>>> Did you read the exon array whitepapers which you can download
from
>>> Affymetrix?
>>>
>>> Every probeset consists of 1-4 probes only, and every exon
consists
>>> usually
>>> of 1-2 probesets. Each gene has a transcript_cluster_id, which
consists
>>> of
>>> one or more probeset_ids. (You can see the mapping between ids
using
>>> function export.scheme(..,treetype="anp",..)
>>>
>>> Since the smallest unit is the probeset, function dabg.call() will
only
>>> work
>>> at the probeset (and transcript) level. If you set all probes of a
>>> probeset
>>> to zero you may loose an entire exon.
>>>
>>>
>>>
>>>>>
>>>>> How did you create "data_hapMap"? Maybe you could send me your
complete
>>>>> code
>>>>> otherwise I am only able to guess.
>>>>
>>>>
>>>> "data_hapMap" was a mistake, sorry I copied the wrong object name
it
>>>> should have been "data_qn" which is quantiles normalized probe
level
>>>> data that I created like this:
>>>>
>>>> data_hapMap<- root.data(scheme.huex10stv2,
>>>> "/data2/paul/normalization_project/root_data/hapMap_root_data_cel
.root")
>>>> #read raw data
>>>> data_qn<- normalize.quantiles(data_hapMap, "exon_quantiles",
>>>> filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at
*probe*
>>>> level
>>>>
>>>> So "data_qn" is the object I want to work with, based on your
advice
>>>> I've figured out that to access the expression levels in
"data_qn" I
>>>> need to use "attachInten()", then I can use "intensities()" to
access
>>>> the expression levels:
>>>>
>>>> treenames<- unlist(treeNames(data_qn))
>>>> data_qn<- attachInten(data_qn, treenames=treenames[1:2]) #attach
the
>>>> first two samples
>>>>
>>>>> head(intensity(data_qn))
>>>>
>>>> X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
>>>> 1 0 0 0.0000 0.000
>>>> 2 1 0 0.0000 0.000
>>>> 3 2 0 0.0000 0.000
>>>> 4 3 0 0.0000 0.000
>>>> 5 4 0 0.0000 0.000
>>>> 6 5 0 85.3523 121.733
>>>>
>>>> Does the X column in the output above represent the probe ID? If
so
>>>> and I have a mapping of probeset IDs to corresponding probe IDs,
it
>>>> should be fairly straightforward to set probes that are not
detected
>>>> above background to zero? Perhaps there is a more straightforward
way
>>>> of doing this though?
>>>>
>>>
>>> No, (X,Y) are the coordinates of the single probes on the exon
array. You
>>> can use function export.scheme(..,treetype="scm",..) to get the
mapping
>>> between (X,Y) and an internal PROBESET_ID, e.g.:
>>>
>>> UNIT_ID X Y ProbeLength Mask EXON_ID
PROBESET_ID
>>> 31 986 1674 25 512 31 31
>>> 31 1092 677 25 512 31 31
>>> 31 796 1862 25 512 31 31
>>> 31 917 193 25 512 31 31
>>> 31 341 1677 25 512 32 32
>>> 31 144 2250 25 512 32 32
>>> 31 689 262 25 512 32 32
>>> 31 579 1670 25 512 32 32
>>>
>>> Then you can use export.scheme(..,treetype="pbs",..) to map
PROBESET_ID
>>> (=UNIT_ID) to the ProbesetID, e.g.:
>>>
>>> UNIT_ID ProbesetID NumCells NumAtoms
NumSubunits
>>> UnitType
>>> 31 2315101 4 4 1 512
>>> 32 2315102 4 4 1 512
>>>
>>> As you see PROBESET_IDs 31 and 32 have each 4 probes and belong to
the
>>> Affymetrix ProbesetIDs 2315101 and 2315102, respectively.
>>>
>>> You could also use functions indexUnits(), and unitID2probesetID()
or
>>> unitID2transcriptID(), respectively.
>>>
>>> Best regards
>>> Christian
>>>
>>>
>>>
>>>>
>>>>>
>>>>> I guess that "data_hapMap" contains the raw data. For these the
slot
>>>>> "data"
>>>>> is empty to save memory. So you need to use either attachData()
or
>>>>> attachInten(). However since you are using exon arrays you may
not have
>>>>> enough RAM, so it would be better to use function export() or
>>>>> export.data(),
>>>>> or attach only a subset, see help ?attachData. See also vignette
>>>>> xps.pdf
>>>>> (chapter 2.3).
>>>>
>>>>
>>>> I think RAM shouldn't be an issue if I attach the samples one at
a
>>>> time? (I actually have access to a machine with 32gigs RAM but
ideally
>>>> would like to get what I'm doing to run on a standard desktop,
which
>>>> is actually why I'm using XPS!).
>>>>
>>>>>
>>>>> When you talk about "expression matrix", how did you create it?
Maybe
>>>>> you
>>>>> could use function validExpr(), but w/o seeing your code it is
hard to
>>>>> tell.
>>>>> For DABG there are functions pvalData() and presCall(), see the
>>>>> examples
>>>>> in
>>>>> help ?dabg.call.
>>>>
>>>>
>>>> Yes I've managed to use dabg.call() at probeset level and access
the
>>>> p-values using pvalData() alright.
>>>>
>>>> Thanks again for all of your help and patience!
>>>>
>>>> Kind Regards,
>>>>
>>>> Paul.
>>>>
>>>>
>>>>>
>>>>> Best regards
>>>>> Christian
>>>>>
>>>>>
>>>>>
>>>>> On 2/6/12 4:33 PM, Paul Geeleher wrote:
>>>>>>
>>>>>>
>>>>>> Hi Christian,
>>>>>>
>>>>>> Thanks for your quick and informative reply.
>>>>>>
>>>>>> I have re-run the analysis and saved the R objects as you
suggest. The
>>>>>> next thing I'm trying to do is to obtain the expression levels
of the
>>>>>> probes, but this doesn't seem to be working for me:
>>>>>>
>>>>>>> a<- validData(data_hapMap)
>>>>>>
>>>>>>
>>>>>> Error in .local(object, ...) : slot "data" has no data
>>>>>>
>>>>>> Based on the documentation I think validData() is the correct
>>>>>> function.
>>>>>>
>>>>>> I've also performed probeset level DABG and I'm trying to set
>>>>>> individual probes which belong to probesets with DABG< .05
to 0
>>>>>> in
>>>>>> the expression matrix.
>>>>>>
>>>>>> But it seems I can't see the expression matrix using
validData().
>>>>>> Perhaps there is another function. Any ideas?
>>>>>>
>>>>>> Thank you again for your help with this, I'm very grateful!
>>>>>>
>>>>>> Paul.
>>>>>>
>>>>>> On 2/2/12, cstrato<cstrato at="" aon.at=""> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Dear Paul,
>>>>>>>
>>>>>>> The functions root.data(), root.call() and root.expr() were
created
>>>>>>> to
>>>>>>> allow you access to the corresponding root files just in case
that
>>>>>>> you
>>>>>>> did not save your R session.
>>>>>>>
>>>>>>> In the cases where you compute expression levels stepwise, or
only
>>>>>>> part
>>>>>>> of them such as normalize.quantiles(), as seems to be the
matter in
>>>>>>> your
>>>>>>> case, there is no corresponding root.xxx() function to access
the
>>>>>>> root
>>>>>>> file directly. In these cases you need to save your R session
to have
>>>>>>> continued access to the resulting root file.
>>>>>>>
>>>>>>> Please note that saving the R session is the usual case to
have
>>>>>>> access
>>>>>>> to the root files.
>>>>>>>
>>>>>>> Best regards
>>>>>>> Christian
>>>>>>>
>>>>>>>
>>>>>>> On 2/2/12 1:12 PM, Paul Geeleher wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Christian,
>>>>>>>>
>>>>>>>> Thanks for your quick reply. I check what kind of trees I
have using
>>>>>>>> "getTreeNames()" as you'd suggested, it seems they are of
type "cqu"
>>>>>>>> rather than "int", this is presumably because my analysis
required
>>>>>>>> no
>>>>>>>> background correction step?
>>>>>>>>
>>>>>>>> So I then tried:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> data_qn<- root.expr(scheme.huex10stv2,
"exon_quantiles.root",
>>>>>>>>> "cqu")
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> but that gives me a huge number of errors that look like
this:
>>>>>>>>
>>>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>>>>> Error: Could not get directory<preprocesset>.
>>>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>>>>> Error: Could not get directory<preprocesset>.
>>>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>>>>> Error: Could not get directory<preprocesset>.
>>>>>>>> Error: Could not get tree<exportset>.
>>>>>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root",
:
>>>>>>>> error in function ?ExportData?
>>>>>>>>
>>>>>>>>
>>>>>>>> This file "exon_quantiles.root" definitely exists in the
current
>>>>>>>> working directory though... Thanks again for your help!
>>>>>>>>
>>>>>>>> Paul.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at="">
wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dear Paul,
>>>>>>>>>
>>>>>>>>> Please have a look at the help ?root.expr.
>>>>>>>>>
>>>>>>>>> If I understand you correctly, you did only do quantile
>>>>>>>>> normalization?
>>>>>>>>>
>>>>>>>>> To see the tree names in your file you should do:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> getTreeNames("exon_quantiles.root")
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You will probably see trees with extension "int", see help
>>>>>>>>> ?validTreetype.
>>>>>>>>>
>>>>>>>>> To load these trees you need to do:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> data_qn<- root.expr(scheme.huex10stv2,
"exon_quantiles.root",
>>>>>>>>>> "int")
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Please let me know if this did solve your problem.
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>> Christian
>>>>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
>>>>>>>>> V.i.e.n.n.a A.u.s.t.r.i.a
>>>>>>>>> e.m.a.i.l: cstrato at aon.at
>>>>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I've used xps to quantiles normalize (at probe level) some
Affy
>>>>>>>>>> Exon
>>>>>>>>>> Array data. I now have a "root" file called
"exon_quantiles.root",
>>>>>>>>>> but
>>>>>>>>>> if I try to load it the same was I'd load my raw data
(using the
>>>>>>>>>> scheme file I created for Affy exon arrays) I get the error
below?
>>>>>>>>>> I
>>>>>>>>>> can load my raw data just fine though. Any ideas? Do I
perhaps
>>>>>>>>>> need
>>>>>>>>>> a
>>>>>>>>>> different "root scheme" file for this normalized data?
>>>>>>>>>> Unfortunately,
>>>>>>>>>> I haven't been able to find an answer.
>>>>>>>>>>
>>>>>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>>>>>>>>>>> data_qn<- root.data(scheme.huex10stv2,
"exon_quantiles.root")
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Error in if (chipname != treetitle) { : argument is of
length zero
>>>>>>>>>>
>>>>>>>>>> Hope someone can help,
>>>>>>>>>>
>>>>>>>>>> Paul.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> sessionInfo()
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> R version 2.11.0 (2010-04-22)
>>>>>>>>>> x86_64-redhat-linux-gnu
>>>>>>>>>>
>>>>>>>>>> locale:
>>>>>>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>>>>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>>>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>>>>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>>>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>>>>
>>>>>>>>>> attached base packages:
>>>>>>>>>> [1] stats graphics grDevices utils datasets
methods
>>>>>>>>>> base
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
--
Paul Geeleher (PhD Student)
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland
--
www.bioinformaticstutorials.com
Dear Paul,
From your mail it is not quite clear to me what you mean: where do
you
want to attach which unitname?
Please try the following:
scheme.test3 <-
root.scheme(paste(.path.package("xps"),"schemes/SchemeTest3.root",sep=
"/"))
data.test3 <- root.data(scheme.test3,
paste(.path.package("xps"),"rootdata/DataTest3_cel.root",sep="/"))
id <- indexUnits(data.test3, which="", unitID="*")
id[id[,1]==34,]
Do you mean this?
Best regards
Christian
On 2/21/12 1:23 PM, Paul Geeleher wrote:
> Hi Christian,
>
> Thanks again for your very helpful replies. I think using the C++
> macros might be the way I go once I get everything ironed out.
>
> I have what I think is one last question. When try to convert gene
> symbol to internal ID:
>
> allInternalGeneIds[[i]]<- symbol2unitID(scheme.huex10stv2 ,
> symbol=geneSym, unittype="transcript", as.list =TRUE)
>
> I get the following message:
> slot ?unitname? is empty, importing data from ?HuEx-1_0-st-v2.idx?
...
>
> Which leads me to believe there is a way to attach "unitname"? I've
> been unable to find how to to this though? I think if you know if
this
> is possible it could speed up my pipeline quite a bit, as I have to
> repeat this step for every gene!
>
> Thanks again,
>
> Paul.
>
> On 2/14/12, cstrato<cstrato at="" aon.at=""> wrote:
>> Dear Paul,
>>
>> You could export the whole data as table using:
>>
>> export(data_qn, treename="*", treetype="cqu", varlist="fInten",
>> outfile="data_cqu.txt",as.dataframe=FALSE)
>>
>> Then you have a huge text-file, which you can then use for your
>> purposes. However, since you do not have enough memory to work with
R
>> you would need to either import only parts of the table into R, or
use
>> other languages such as perl for further analysis.
>>
>> For the DABG data you could use function export.call().
>>
>> Alternatively, you could also work completely from within ROOT,
using
>> C++ macros. Examples how to use xps from within ROOT are shown in
>> directory xps/examples/macro4XPS.C
>>
>> Best regards
>> Christian
>>
>>
>> On 2/14/12 1:54 PM, Paul Geeleher wrote:
>>> Hi Christian,
>>>
>>> Thanks for your replies.
>>>
>>> As per your suggestions I've been been able to map between xy
>>> co-ordinates, probe IDs and PROBESET_IDS.
>>>
>>> I've been able to extract the expression level for a given gene
(for a
>>> subset of samples) using this:
>>>
>>> id<- symbol2unitID(scheme.huex10stv2 , symbol="CDH11",
>>> unittype="transcript", as.list =TRUE)
>>> treenames<- unlist(treeNames(data_qn))
>>> data_qn<- attachInten(data_qn, treenames=treenames[1:3])
>>> xyIntens<- validData(data_qn, unitID=unlist(id))
>>>
>>> With "validData()" is that it seems I can only get the expression
>>> levels for samples which are "attached". This means I'd probably
have
>>> to attach each sample separately (which will take a long time) as
I
>>> don't have enough memory to attach all samples.
>>>
>>> I'm wondering if there is a way to extract this data without
having to
>>> attach the data? If something similar is possible for the dabg
data
>>> (at probeset level) that would also be a big help in speeding up
my
>>> pipeline (which will be complete if I can implement these last two
>>> steps)!
>>>
>>> Thanks again for all your help its much appreciated,
>>>
>>> Paul.
>>>
>>>
>>> On Tue, Feb 7, 2012 at 8:48 PM, cstrato<cstrato at="" aon.at="">
wrote:
>>>> Dear Paul,
>>>>
>>>> see replies below:
>>>>
>>>>
>>>> On 2/7/12 1:52 PM, Paul Geeleher wrote:
>>>>>
>>>>> Hi Christian, apologies for the lack of clarity in my previous
email,
>>>>> I'll try to clear this up, see replies below:
>>>>>
>>>>> On Mon, Feb 6, 2012 at 7:51 PM, cstrato<cstrato at="" aon.at="">
wrote:
>>>>>>
>>>>>> Dear Paul,
>>>>>>
>>>>>> I am afraid that it is not quite clear to me what you want to
do.
>>>>>>
>>>>>> What do you mean with "obtain the expression levels of the
probes"? Do
>>>>>> you
>>>>>> mean "probes" (i.e. oligos) or do you mean "probesets" as in
your
>>>>>> question
>>>>>> about DABG?
>>>>>
>>>>>
>>>>> I've read in and quantiles normalized probe level data for 176
arrays
>>>>> and what I'm trying to do now is set every probe (individual
probe,
>>>>> not probeset) that is not detected above background to zero. But
from
>>>>> what I've read dabg.call() can only create p-values for probeset
>>>>> level? So as compromise I'm now trying to set the expression of
all
>>>>> *individual probes* which belong to a *probeset* not detected
above
>>>>> background to zero. I hope that makes sense.
>>>>>
>>>>
>>>>
>>>> Did you read the exon array whitepapers which you can download
from
>>>> Affymetrix?
>>>>
>>>> Every probeset consists of 1-4 probes only, and every exon
consists
>>>> usually
>>>> of 1-2 probesets. Each gene has a transcript_cluster_id, which
consists
>>>> of
>>>> one or more probeset_ids. (You can see the mapping between ids
using
>>>> function export.scheme(..,treetype="anp",..)
>>>>
>>>> Since the smallest unit is the probeset, function dabg.call()
will only
>>>> work
>>>> at the probeset (and transcript) level. If you set all probes of
a
>>>> probeset
>>>> to zero you may loose an entire exon.
>>>>
>>>>
>>>>
>>>>>>
>>>>>> How did you create "data_hapMap"? Maybe you could send me your
complete
>>>>>> code
>>>>>> otherwise I am only able to guess.
>>>>>
>>>>>
>>>>> "data_hapMap" was a mistake, sorry I copied the wrong object
name it
>>>>> should have been "data_qn" which is quantiles normalized probe
level
>>>>> data that I created like this:
>>>>>
>>>>> data_hapMap<- root.data(scheme.huex10stv2,
>>>>> "/data2/paul/normalization_project/root_data/hapMap_root_data_ce
l.root")
>>>>> #read raw data
>>>>> data_qn<- normalize.quantiles(data_hapMap, "exon_quantiles",
>>>>> filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at
*probe*
>>>>> level
>>>>>
>>>>> So "data_qn" is the object I want to work with, based on your
advice
>>>>> I've figured out that to access the expression levels in
"data_qn" I
>>>>> need to use "attachInten()", then I can use "intensities()" to
access
>>>>> the expression levels:
>>>>>
>>>>> treenames<- unlist(treeNames(data_qn))
>>>>> data_qn<- attachInten(data_qn, treenames=treenames[1:2]) #attach
the
>>>>> first two samples
>>>>>
>>>>>> head(intensity(data_qn))
>>>>>
>>>>> X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
>>>>> 1 0 0 0.0000 0.000
>>>>> 2 1 0 0.0000 0.000
>>>>> 3 2 0 0.0000 0.000
>>>>> 4 3 0 0.0000 0.000
>>>>> 5 4 0 0.0000 0.000
>>>>> 6 5 0 85.3523 121.733
>>>>>
>>>>> Does the X column in the output above represent the probe ID? If
so
>>>>> and I have a mapping of probeset IDs to corresponding probe IDs,
it
>>>>> should be fairly straightforward to set probes that are not
detected
>>>>> above background to zero? Perhaps there is a more
straightforward way
>>>>> of doing this though?
>>>>>
>>>>
>>>> No, (X,Y) are the coordinates of the single probes on the exon
array. You
>>>> can use function export.scheme(..,treetype="scm",..) to get the
mapping
>>>> between (X,Y) and an internal PROBESET_ID, e.g.:
>>>>
>>>> UNIT_ID X Y ProbeLength Mask EXON_ID
PROBESET_ID
>>>> 31 986 1674 25 512 31 31
>>>> 31 1092 677 25 512 31 31
>>>> 31 796 1862 25 512 31 31
>>>> 31 917 193 25 512 31 31
>>>> 31 341 1677 25 512 32 32
>>>> 31 144 2250 25 512 32 32
>>>> 31 689 262 25 512 32 32
>>>> 31 579 1670 25 512 32 32
>>>>
>>>> Then you can use export.scheme(..,treetype="pbs",..) to map
PROBESET_ID
>>>> (=UNIT_ID) to the ProbesetID, e.g.:
>>>>
>>>> UNIT_ID ProbesetID NumCells NumAtoms
NumSubunits
>>>> UnitType
>>>> 31 2315101 4 4 1 512
>>>> 32 2315102 4 4 1 512
>>>>
>>>> As you see PROBESET_IDs 31 and 32 have each 4 probes and belong
to the
>>>> Affymetrix ProbesetIDs 2315101 and 2315102, respectively.
>>>>
>>>> You could also use functions indexUnits(), and
unitID2probesetID() or
>>>> unitID2transcriptID(), respectively.
>>>>
>>>> Best regards
>>>> Christian
>>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> I guess that "data_hapMap" contains the raw data. For these the
slot
>>>>>> "data"
>>>>>> is empty to save memory. So you need to use either attachData()
or
>>>>>> attachInten(). However since you are using exon arrays you may
not have
>>>>>> enough RAM, so it would be better to use function export() or
>>>>>> export.data(),
>>>>>> or attach only a subset, see help ?attachData. See also
vignette
>>>>>> xps.pdf
>>>>>> (chapter 2.3).
>>>>>
>>>>>
>>>>> I think RAM shouldn't be an issue if I attach the samples one at
a
>>>>> time? (I actually have access to a machine with 32gigs RAM but
ideally
>>>>> would like to get what I'm doing to run on a standard desktop,
which
>>>>> is actually why I'm using XPS!).
>>>>>
>>>>>>
>>>>>> When you talk about "expression matrix", how did you create it?
Maybe
>>>>>> you
>>>>>> could use function validExpr(), but w/o seeing your code it is
hard to
>>>>>> tell.
>>>>>> For DABG there are functions pvalData() and presCall(), see the
>>>>>> examples
>>>>>> in
>>>>>> help ?dabg.call.
>>>>>
>>>>>
>>>>> Yes I've managed to use dabg.call() at probeset level and access
the
>>>>> p-values using pvalData() alright.
>>>>>
>>>>> Thanks again for all of your help and patience!
>>>>>
>>>>> Kind Regards,
>>>>>
>>>>> Paul.
>>>>>
>>>>>
>>>>>>
>>>>>> Best regards
>>>>>> Christian
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/6/12 4:33 PM, Paul Geeleher wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Christian,
>>>>>>>
>>>>>>> Thanks for your quick and informative reply.
>>>>>>>
>>>>>>> I have re-run the analysis and saved the R objects as you
suggest. The
>>>>>>> next thing I'm trying to do is to obtain the expression levels
of the
>>>>>>> probes, but this doesn't seem to be working for me:
>>>>>>>
>>>>>>>> a<- validData(data_hapMap)
>>>>>>>
>>>>>>>
>>>>>>> Error in .local(object, ...) : slot "data" has no data
>>>>>>>
>>>>>>> Based on the documentation I think validData() is the correct
>>>>>>> function.
>>>>>>>
>>>>>>> I've also performed probeset level DABG and I'm trying to set
>>>>>>> individual probes which belong to probesets with DABG<
.05 to 0
>>>>>>> in
>>>>>>> the expression matrix.
>>>>>>>
>>>>>>> But it seems I can't see the expression matrix using
validData().
>>>>>>> Perhaps there is another function. Any ideas?
>>>>>>>
>>>>>>> Thank you again for your help with this, I'm very grateful!
>>>>>>>
>>>>>>> Paul.
>>>>>>>
>>>>>>> On 2/2/12, cstrato<cstrato at="" aon.at=""> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Dear Paul,
>>>>>>>>
>>>>>>>> The functions root.data(), root.call() and root.expr() were
created
>>>>>>>> to
>>>>>>>> allow you access to the corresponding root files just in case
that
>>>>>>>> you
>>>>>>>> did not save your R session.
>>>>>>>>
>>>>>>>> In the cases where you compute expression levels stepwise, or
only
>>>>>>>> part
>>>>>>>> of them such as normalize.quantiles(), as seems to be the
matter in
>>>>>>>> your
>>>>>>>> case, there is no corresponding root.xxx() function to access
the
>>>>>>>> root
>>>>>>>> file directly. In these cases you need to save your R session
to have
>>>>>>>> continued access to the resulting root file.
>>>>>>>>
>>>>>>>> Please note that saving the R session is the usual case to
have
>>>>>>>> access
>>>>>>>> to the root files.
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>> Christian
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2/2/12 1:12 PM, Paul Geeleher wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Christian,
>>>>>>>>>
>>>>>>>>> Thanks for your quick reply. I check what kind of trees I
have using
>>>>>>>>> "getTreeNames()" as you'd suggested, it seems they are of
type "cqu"
>>>>>>>>> rather than "int", this is presumably because my analysis
required
>>>>>>>>> no
>>>>>>>>> background correction step?
>>>>>>>>>
>>>>>>>>> So I then tried:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> data_qn<- root.expr(scheme.huex10stv2,
"exon_quantiles.root",
>>>>>>>>>> "cqu")
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> but that gives me a huge number of errors that look like
this:
>>>>>>>>>
>>>>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>>>>>> Error: Could not get directory<preprocesset>.
>>>>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>>>>>> Error: Could not get directory<preprocesset>.
>>>>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet
>>>>>>>>> Error: Could not get directory<preprocesset>.
>>>>>>>>> Error: Could not get tree<exportset>.
>>>>>>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root",
:
>>>>>>>>> error in function ?ExportData?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This file "exon_quantiles.root" definitely exists in the
current
>>>>>>>>> working directory though... Thanks again for your help!
>>>>>>>>>
>>>>>>>>> Paul.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at="">
wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Dear Paul,
>>>>>>>>>>
>>>>>>>>>> Please have a look at the help ?root.expr.
>>>>>>>>>>
>>>>>>>>>> If I understand you correctly, you did only do quantile
>>>>>>>>>> normalization?
>>>>>>>>>>
>>>>>>>>>> To see the tree names in your file you should do:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> getTreeNames("exon_quantiles.root")
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> You will probably see trees with extension "int", see help
>>>>>>>>>> ?validTreetype.
>>>>>>>>>>
>>>>>>>>>> To load these trees you need to do:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> data_qn<- root.expr(scheme.huex10stv2,
"exon_quantiles.root",
>>>>>>>>>>> "int")
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please let me know if this did solve your problem.
>>>>>>>>>>
>>>>>>>>>> Best regards
>>>>>>>>>> Christian
>>>>>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>>>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
>>>>>>>>>> V.i.e.n.n.a A.u.s.t.r.i.a
>>>>>>>>>> e.m.a.i.l: cstrato at aon.at
>>>>>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I've used xps to quantiles normalize (at probe level) some
Affy
>>>>>>>>>>> Exon
>>>>>>>>>>> Array data. I now have a "root" file called
"exon_quantiles.root",
>>>>>>>>>>> but
>>>>>>>>>>> if I try to load it the same was I'd load my raw data
(using the
>>>>>>>>>>> scheme file I created for Affy exon arrays) I get the
error below?
>>>>>>>>>>> I
>>>>>>>>>>> can load my raw data just fine though. Any ideas? Do I
perhaps
>>>>>>>>>>> need
>>>>>>>>>>> a
>>>>>>>>>>> different "root scheme" file for this normalized data?
>>>>>>>>>>> Unfortunately,
>>>>>>>>>>> I haven't been able to find an answer.
>>>>>>>>>>>
>>>>>>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>>>>>>>>>>>> data_qn<- root.data(scheme.huex10stv2,
"exon_quantiles.root")
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Error in if (chipname != treetitle) { : argument is of
length zero
>>>>>>>>>>>
>>>>>>>>>>> Hope someone can help,
>>>>>>>>>>>
>>>>>>>>>>> Paul.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> sessionInfo()
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> R version 2.11.0 (2010-04-22)
>>>>>>>>>>> x86_64-redhat-linux-gnu
>>>>>>>>>>>
>>>>>>>>>>> locale:
>>>>>>>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>>>>>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>>>>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>>>>>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>>>>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>>>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>>>>>
>>>>>>>>>>> attached base packages:
>>>>>>>>>>> [1] stats graphics grDevices utils datasets
methods
>>>>>>>>>>> base
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>