Dear all,
I am currently working on a project where I need to get the exact IDs
of
probes of a custom Affymetrix Chip in order to merge it with another
list containing the sequence.
I am using this small R script for creating the list:
mitdata <- ReadAffy();
stddata <- apply(pm(mitdata), 2, bg.adjust);
nrmdata <- normalize.quantiles(stddata);
namedata <- probeNames(mitdata);
enddata <- cbind(namedata, nrmdata);
write.table(enddata, file="probesdata.txt",sep="\t");
This is an output example
...
145 TZG_ARR_0001_x_at 135.115780787133 ...
146 TZG_ARR_0001_x_at 147.346049115501 ...
147 TZG_ARR_0001_x_at 203.840215898533 ...
148 TZG_ARR_0003_x_at 48.7635207480323 ...
...
As you can see, a number of probes have the same name but refer to
different oligos. The number in front of the row is just added by me,
therefore you can ignore it.
I received a list containing the probe name, a couple of other
information AND the sequence.
This is a part of it:
15 ggagattgtttgtaatcaaaatgaa TGZ_ARR_0001_x ! 2398 0
176 200 + 1
103 gcaaatttacttctaacagctgatc TGZ_ARR_0001_x ! 2398 1
264 288 + 1
188 ttgatgcaactgtaaacaaaagtgg TGZ_ARR_0001_x ! 2398 2
349 373 + 1
15 gatagattcttcaagtaacaatact TGZ_ARR_0003_x ! 2400 0
2046 2070 + 1
This should be the same area.
In this received list, I can identify the unique probes using the 2
numbers right after the exclamation mark, which are referring to the
position on the chip, I guess. How can I extract those coordinates for
my own list? I tried it with indices2xy, however I failed to get it
running since I don't understand how to use this function correctly.
Thanks in advance for all answers,
Karsten Voigt
--
_________________________________________________
Karsten Voigt, Msc.
Experimentelle Bioinformatik, Hess Group
University of Freiburg, BIO III
t: 0761-2032708
m: 0176-61110420
e: karsten.voigt at biologie.uni-freiburg.de
Hi Karsten,
On 1/11/2011 12:56 PM, Karsten Voigt wrote:
> Dear all,
>
> I am currently working on a project where I need to get the exact
IDs of
> probes of a custom Affymetrix Chip in order to merge it with another
> list containing the sequence.
>
> I am using this small R script for creating the list:
>
> mitdata <- ReadAffy();
> stddata <- apply(pm(mitdata), 2, bg.adjust);
> nrmdata <- normalize.quantiles(stddata);
> namedata <- probeNames(mitdata);
> enddata <- cbind(namedata, nrmdata);
> write.table(enddata, file="probesdata.txt",sep="\t");
>
> This is an output example
>
> ...
> 145 TZG_ARR_0001_x_at 135.115780787133 ...
> 146 TZG_ARR_0001_x_at 147.346049115501 ...
> 147 TZG_ARR_0001_x_at 203.840215898533 ...
> 148 TZG_ARR_0003_x_at 48.7635207480323 ...
> ...
>
> As you can see, a number of probes have the same name but refer to
> different oligos. The number in front of the row is just added by
me,
> therefore you can ignore it.
>
> I received a list containing the probe name, a couple of other
> information AND the sequence.
>
> This is a part of it:
>
> 15 ggagattgtttgtaatcaaaatgaa TGZ_ARR_0001_x ! 2398 0 176 200 + 1
> 103 gcaaatttacttctaacagctgatc TGZ_ARR_0001_x ! 2398 1 264 288 + 1
> 188 ttgatgcaactgtaaacaaaagtgg TGZ_ARR_0001_x ! 2398 2 349 373 + 1
> 15 gatagattcttcaagtaacaatact TGZ_ARR_0003_x ! 2400 0 2046 2070 + 1
>
> This should be the same area.
>
> In this received list, I can identify the unique probes using the 2
> numbers right after the exclamation mark, which are referring to the
> position on the chip, I guess. How can I extract those coordinates
for
> my own list? I tried it with indices2xy, however I failed to get it
> running since I don't understand how to use this function correctly.
Using the hgu95av2cdf as an example:
> library(hgu95av2cdf)
> x <- as.list(hgu95av2cdf)
> x <- x[order(names(x))]
> x <- unlist(sapply(x, function(x) x[,1]))
> xys <- indices2xy(x, cdf="hgu95av2cdf")
> head(xys)
x y
1000_at1 399 559
1000_at2 544 185
1000_at3 530 505
1000_at4 617 349
1000_at5 459 489
1000_at6 408 545
Best,
Jim
>
> Thanks in advance for all answers,
>
> Karsten Voigt
>
>
>
>
>
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues
Hi all,
On 01/11/2011 07:36 PM, James W. MacDonald wrote:
> Hi Karsten,
>
> On 1/11/2011 12:56 PM, Karsten Voigt wrote:
>> Dear all,
>>
>> I am currently working on a project where I need to get the exact
IDs of
>> probes of a custom Affymetrix Chip in order to merge it with
another
>> list containing the sequence.
>>
>> I am using this small R script for creating the list:
>>
>> mitdata <- ReadAffy();
>> stddata <- apply(pm(mitdata), 2, bg.adjust);
>> nrmdata <- normalize.quantiles(stddata);
>> namedata <- probeNames(mitdata);
>> enddata <- cbind(namedata, nrmdata);
>> write.table(enddata, file="probesdata.txt",sep="\t");
>>
>> This is an output example
>>
>> ...
>> 145 TZG_ARR_0001_x_at 135.115780787133 ...
>> 146 TZG_ARR_0001_x_at 147.346049115501 ...
>> 147 TZG_ARR_0001_x_at 203.840215898533 ...
>> 148 TZG_ARR_0003_x_at 48.7635207480323 ...
>> ...
>>
>> As you can see, a number of probes have the same name but refer to
>> different oligos. The number in front of the row is just added by
me,
>> therefore you can ignore it.
>>
>> I received a list containing the probe name, a couple of other
>> information AND the sequence.
>>
>> This is a part of it:
>>
>> 15 ggagattgtttgtaatcaaaatgaa TGZ_ARR_0001_x ! 2398 0 176 200 + 1
>> 103 gcaaatttacttctaacagctgatc TGZ_ARR_0001_x ! 2398 1 264 288 + 1
>> 188 ttgatgcaactgtaaacaaaagtgg TGZ_ARR_0001_x ! 2398 2 349 373 + 1
>> 15 gatagattcttcaagtaacaatact TGZ_ARR_0003_x ! 2400 0 2046 2070 + 1
>>
>> This should be the same area.
>>
>> In this received list, I can identify the unique probes using the 2
>> numbers right after the exclamation mark, which are referring to
the
>> position on the chip, I guess. How can I extract those coordinates
for
>> my own list? I tried it with indices2xy, however I failed to get it
>> running since I don't understand how to use this function
correctly.
>
> Using the hgu95av2cdf as an example:
>
> > library(hgu95av2cdf)
> > x <- as.list(hgu95av2cdf)
> > x <- x[order(names(x))]
> > x <- unlist(sapply(x, function(x) x[,1]))
> > xys <- indices2xy(x, cdf="hgu95av2cdf")
> > head(xys)
> x y
> 1000_at1 399 559
> 1000_at2 544 185
> 1000_at3 530 505
> 1000_at4 617 349
> 1000_at5 459 489
> 1000_at6 408 545
>
> Best,
>
> Jim
>
first of all, many thanks to Jim for the quick and good answer. I
runned
your script on my own cdf and it is exactly extracting what I am
looking
for.
However I still cannot identify the probes in my CEL-files loaded by
the
ReadAffy() function. If I run probeNames on it, the probes will be
exported alphabetically. I cannot imagine that the CEL file probe
values
are also sorted alphabetically in the way I gained it.
I think my way of creating this list is wrong since it is highly
unlikely and impossible to prove that the probe names and the
normalized
data are listed in the same order:
How can I prove that the probeNames are fitting to the probe values?
Is
it also possible to extract the x y values out of the cdf file?
One other question: Is there any possibility to extract the sequence
out
of the cdf file?
Many thanks in advance again,
Karsten
--
_________________________________________________
Karsten Voigt, Msc.
Experimentelle Bioinformatik, Hess Group
University of Freiburg, BIO III
t: 0761-2032708
m: 0176-61110420
e: karsten.voigt at biologie.uni-freiburg.de
Hi Karsten,
if you created an AffyBatch x with ReadAffy, then exprs(x) is a matrix
whose rows correspond to the probes on the array, one after the other
as
they physically on the chip. The mapping between row-index in the
AffyBatch and (x,y)-coordinates is provided by the functions
indices2xy
and xy2indices in the 'affy' package (whose code you can see by typing
their name). Essentially, it is very simple:
x = (i - 1) %% nr
y = (i - 1) %/% nr
and in reverse:
i = x + 1 + nr * y
where nr is the width of the chip. So one strategy is to compute the
(x,y) index of each probe on your array by
indices2xy(seq_len(nrow(mitdata)), abatch=mitdata)
and use this to merge with your probe-sequence table. This might be
easier and more transparent than going through probeNames.
Probe sequences for many Affymetrix chips are obtained through the
'probe' packages (whose content is complementary to the smaller 'cdf'
packages):
library(hgu95av2probe)
head(as.data.frame(hgu95av2probe))
Best wishes
Wolfgang
Karsten Voigt scripsit 12/01/11 15:28:
> Hi all,
>
> On 01/11/2011 07:36 PM, James W. MacDonald wrote:
>> Hi Karsten,
>>
>> On 1/11/2011 12:56 PM, Karsten Voigt wrote:
>>> Dear all,
>>>
>>> I am currently working on a project where I need to get the exact
IDs of
>>> probes of a custom Affymetrix Chip in order to merge it with
another
>>> list containing the sequence.
>>>
>>> I am using this small R script for creating the list:
>>>
>>> mitdata <- ReadAffy();
>>> stddata <- apply(pm(mitdata), 2, bg.adjust);
>>> nrmdata <- normalize.quantiles(stddata);
>>> namedata <- probeNames(mitdata);
>>> enddata <- cbind(namedata, nrmdata);
>>> write.table(enddata, file="probesdata.txt",sep="\t");
>>>
>>> This is an output example
>>>
>>> ...
>>> 145 TZG_ARR_0001_x_at 135.115780787133 ...
>>> 146 TZG_ARR_0001_x_at 147.346049115501 ...
>>> 147 TZG_ARR_0001_x_at 203.840215898533 ...
>>> 148 TZG_ARR_0003_x_at 48.7635207480323 ...
>>> ...
>>>
>>> As you can see, a number of probes have the same name but refer to
>>> different oligos. The number in front of the row is just added by
me,
>>> therefore you can ignore it.
>>>
>>> I received a list containing the probe name, a couple of other
>>> information AND the sequence.
>>>
>>> This is a part of it:
>>>
>>> 15 ggagattgtttgtaatcaaaatgaa TGZ_ARR_0001_x ! 2398 0 176 200 + 1
>>> 103 gcaaatttacttctaacagctgatc TGZ_ARR_0001_x ! 2398 1 264 288 + 1
>>> 188 ttgatgcaactgtaaacaaaagtgg TGZ_ARR_0001_x ! 2398 2 349 373 + 1
>>> 15 gatagattcttcaagtaacaatact TGZ_ARR_0003_x ! 2400 0 2046 2070 + 1
>>>
>>> This should be the same area.
>>>
>>> In this received list, I can identify the unique probes using the
2
>>> numbers right after the exclamation mark, which are referring to
the
>>> position on the chip, I guess. How can I extract those coordinates
for
>>> my own list? I tried it with indices2xy, however I failed to get
it
>>> running since I don't understand how to use this function
correctly.
>>
>> Using the hgu95av2cdf as an example:
>>
>> > library(hgu95av2cdf)
>> > x <- as.list(hgu95av2cdf)
>> > x <- x[order(names(x))]
>> > x <- unlist(sapply(x, function(x) x[,1]))
>> > xys <- indices2xy(x, cdf="hgu95av2cdf")
>> > head(xys)
>> x y
>> 1000_at1 399 559
>> 1000_at2 544 185
>> 1000_at3 530 505
>> 1000_at4 617 349
>> 1000_at5 459 489
>> 1000_at6 408 545
>>
>> Best,
>>
>> Jim
>>
>
> first of all, many thanks to Jim for the quick and good answer. I
runned
> your script on my own cdf and it is exactly extracting what I am
looking
> for.
>
> However I still cannot identify the probes in my CEL-files loaded by
the
> ReadAffy() function. If I run probeNames on it, the probes will be
> exported alphabetically. I cannot imagine that the CEL file probe
values
> are also sorted alphabetically in the way I gained it.
>
> I think my way of creating this list is wrong since it is highly
> unlikely and impossible to prove that the probe names and the
normalized
> data are listed in the same order:
>
> How can I prove that the probeNames are fitting to the probe values?
Is
> it also possible to extract the x y values out of the cdf file?
>
> One other question: Is there any possibility to extract the sequence
out
> of the cdf file?
>
> Many thanks in advance again,
>
> Karsten
>
>
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber
Dear all,
thanks for the great input so far. I now have to test it and
understand
it. If there are any problems remaining, I will let you know ;-)
Thanks and best whishes,
Karsten
>
> Hi Karsten,
>
> if you created an AffyBatch x with ReadAffy, then exprs(x) is a
matrix
> whose rows correspond to the probes on the array, one after the
other
> as they physically on the chip. The mapping between row-index in the
> AffyBatch and (x,y)-coordinates is provided by the functions
> indices2xy and xy2indices in the 'affy' package (whose code you can
> see by typing their name). Essentially, it is very simple:
>
> x = (i - 1) %% nr
> y = (i - 1) %/% nr
> and in reverse:
> i = x + 1 + nr * y
>
> where nr is the width of the chip. So one strategy is to compute the
> (x,y) index of each probe on your array by
>
> indices2xy(seq_len(nrow(mitdata)), abatch=mitdata)
>
> and use this to merge with your probe-sequence table. This might be
> easier and more transparent than going through probeNames.
>
> Probe sequences for many Affymetrix chips are obtained through the
> 'probe' packages (whose content is complementary to the smaller
'cdf'
> packages):
>
> library(hgu95av2probe)
> head(as.data.frame(hgu95av2probe))
>
>
> Best wishes
> Wolfgang
>
>
> Karsten Voigt scripsit 12/01/11 15:28:
>> Hi all,
>>
>> On 01/11/2011 07:36 PM, James W. MacDonald wrote:
>>> Hi Karsten,
>>>
>>> On 1/11/2011 12:56 PM, Karsten Voigt wrote:
>>>> Dear all,
>>>>
>>>> I am currently working on a project where I need to get the exact
>>>> IDs of
>>>> probes of a custom Affymetrix Chip in order to merge it with
another
>>>> list containing the sequence.
>>>>
>>>> I am using this small R script for creating the list:
>>>>
>>>> mitdata <- ReadAffy();
>>>> stddata <- apply(pm(mitdata), 2, bg.adjust);
>>>> nrmdata <- normalize.quantiles(stddata);
>>>> namedata <- probeNames(mitdata);
>>>> enddata <- cbind(namedata, nrmdata);
>>>> write.table(enddata, file="probesdata.txt",sep="\t");
>>>>
>>>> This is an output example
>>>>
>>>> ...
>>>> 145 TZG_ARR_0001_x_at 135.115780787133 ...
>>>> 146 TZG_ARR_0001_x_at 147.346049115501 ...
>>>> 147 TZG_ARR_0001_x_at 203.840215898533 ...
>>>> 148 TZG_ARR_0003_x_at 48.7635207480323 ...
>>>> ...
>>>>
>>>> As you can see, a number of probes have the same name but refer
to
>>>> different oligos. The number in front of the row is just added by
me,
>>>> therefore you can ignore it.
>>>>
>>>> I received a list containing the probe name, a couple of other
>>>> information AND the sequence.
>>>>
>>>> This is a part of it:
>>>>
>>>> 15 ggagattgtttgtaatcaaaatgaa TGZ_ARR_0001_x ! 2398 0 176 200 + 1
>>>> 103 gcaaatttacttctaacagctgatc TGZ_ARR_0001_x ! 2398 1 264 288 + 1
>>>> 188 ttgatgcaactgtaaacaaaagtgg TGZ_ARR_0001_x ! 2398 2 349 373 + 1
>>>> 15 gatagattcttcaagtaacaatact TGZ_ARR_0003_x ! 2400 0 2046 2070 +
1
>>>>
>>>> This should be the same area.
>>>>
>>>> In this received list, I can identify the unique probes using the
2
>>>> numbers right after the exclamation mark, which are referring to
the
>>>> position on the chip, I guess. How can I extract those
coordinates for
>>>> my own list? I tried it with indices2xy, however I failed to get
it
>>>> running since I don't understand how to use this function
correctly.
>>>
>>> Using the hgu95av2cdf as an example:
>>>
>>> > library(hgu95av2cdf)
>>> > x <- as.list(hgu95av2cdf)
>>> > x <- x[order(names(x))]
>>> > x <- unlist(sapply(x, function(x) x[,1]))
>>> > xys <- indices2xy(x, cdf="hgu95av2cdf")
>>> > head(xys)
>>> x y
>>> 1000_at1 399 559
>>> 1000_at2 544 185
>>> 1000_at3 530 505
>>> 1000_at4 617 349
>>> 1000_at5 459 489
>>> 1000_at6 408 545
>>>
>>> Best,
>>>
>>> Jim
>>>
>>
>> first of all, many thanks to Jim for the quick and good answer. I
runned
>> your script on my own cdf and it is exactly extracting what I am
looking
>> for.
>>
>> However I still cannot identify the probes in my CEL-files loaded
by the
>> ReadAffy() function. If I run probeNames on it, the probes will be
>> exported alphabetically. I cannot imagine that the CEL file probe
values
>> are also sorted alphabetically in the way I gained it.
>>
>> I think my way of creating this list is wrong since it is highly
>> unlikely and impossible to prove that the probe names and the
normalized
>> data are listed in the same order:
>>
>> How can I prove that the probeNames are fitting to the probe
values? Is
>> it also possible to extract the x y values out of the cdf file?
>>
>> One other question: Is there any possibility to extract the
sequence out
>> of the cdf file?
>>
>> Many thanks in advance again,
>>
>> Karsten
>>
>>
>
>
--
_________________________________________________
Karsten Voigt, Msc.
Experimentelle Bioinformatik, Hess Group
University of Freiburg, BIO III
t: 0761-2032708
m: 0176-61110420
e: karsten.voigt at biologie.uni-freiburg.de