Entering edit mode
Gavin Koh
▴
220
@gavin-koh-4582
Last seen 10.2 years ago
Dear Wei Shi
I am afraid I am stuck at the normalization step, as you predicted.
I do not understand your instruction to provide a Detection value
matrix to detection.p, because the neqc function does not appear to
have a parameter called detection.p or detection.p.val. As you
predicted, neqc() exits saying "Probe status can not be found!"
Thank you in advance for your help.
Gavin Koh
On 17 April 2011 23:44, Wei Shi <shi at="" wehi.edu.au=""> wrote:
> Thanks for the summarization, Gavin. It is good to see things
finally worked.
>
> Just a comment on the data normalization if you are going to do it
next. Although control probes are not available in your ArrayExpress
download, you can still perform neqc normalization using the derived
negative controls, which are inferred from the gene probe intensities
and their detection p values. To do this, you just need to provide the
Detection value matrix of your TB object to ?detection.p parameter of
neqc function.
>
> Cheers,
> Wei
>
> On Apr 17, 2011, at 10:39 PM, gavin.koh at gmail.com wrote:
>
>> I am summarising everything just so it is archived on the news
group. This is the code I finally used:
>>
>> The summarised data is from ArrayExpress (accession number
E-GEOD-22098).
>> There is no bead-level data available.
>> Each array is in a separate file, and the first 5 lines of the
first file looks like this:
>> Probe_ID ? ? ?Signal ?Detection
>> ILMN_1809034 ?58.80201 ? ? ? ?0.003952569
>> ILMN_1660305 ?236.4589 ? ? ? ?0
>> ILMN_1792173 ?202.6858 ? ? ? ?0
>> ILMN_1762337 ?-4.230737 ? ? ? 0.7285903
>> ILMN_2055271 ?7.409712 ? ? ? ?0.07641634
>> ...
>>
>> targets.txt looks like this:
>> name
>> GSM549324_4325540010_E_Raw.txt
>> GSM549325_4325540026_A_Raw.txt
>> GSM549326_4325540026_B_Raw.txt
>> GSM549327_4335991057_D_Raw.txt
>> GSM549328_4335991058_A_Raw.txt
>> ...
>>
>> The code I used was:
>>
>> TB1 <- read.ilmn(
>> files=as.character(targets$name)[1:5],
>> probeid="Probe_ID",
>> expr="Signal", sep="\t",
>> other.columns="Detection"
>> )
>> colnames(TB1$E) <- substr(targets$name[1:5],1,9)
>> colnames(TB1$other$Detection) <- substr(targets$name[1:5],1,9)
>> TB1$genes <- as.data.frame(TB1$genes) #read.ilmn reads in as
vector.
>> TB2 <- read.ilmn(
>> files=as.character(targets$name)[6:21],
>> probeid="Probe_ID",
>> expr="Signal", sep="\t",
>> other.columns="Detection"
>> )
>> colnames(TB2$E) <- substr(targets$name[6:21],1,9)
>> colnames(TB2$other$Detection) <- substr(targets$name[6:21],1,9)
>> TB2$genes <- as.data.frame(TB2$genes)
>> TB1.TB2 <- match(TB1$genes[[1]], TB2$genes[[1]])
>> TB <- cbind(TB1, TB2[TB1.TB2,])
>>
>>
>> On , Gavin Koh <gavin.koh at="" gmail.com=""> wrote:
>> > Dear Wei,
>> >
>> > I think that's worked!
>> >
>> > Thank you! Gavin.
>> >
>> >
>> >
>> > On 16 April 2011 13:25, Wei Shi shi at wehi.edu.au> wrote:
>> >
>> > > Hi Gavin:
>> >
>> > >
>> >
>> > > ? ? ? ?I think the problem is that your TB1$genes (and
TB2$genes) is a vector rather than a data frame. This made cbind fail
to combine them. I guess the data you downloaded from the public
repository is not the original GenomeStudio/BeadStudio output. But you
can fix this using the following code:
>> >
>> > >
>> >
>> > > m
>> > > TB1$genes
>> > > TB2$genes
>> > > TB
>> > >
>> >
>> > > ? ? ? ?I tried this code on my computer and it worked. Hope
that will work for you.
>> >
>> > >
>> >
>> > > Cheers,
>> >
>> > > Wei
>> >
>> > >
>> >
>> > > On Apr 16, 2011, at 7:34 PM, Gavin Koh wrote:
>> >
>> > >
>> >
>> > >> Dear Wei,
>> >
>> > >>
>> >
>> > >> I am afraid it still doesn't work. I this is because TB1 is a
list and
>> >
>> > >> not a data frame and I cannot coerce it to become a dataframe.
>> >
>> > >>> TB
>> > >> Error in object$genes[i, , drop = FALSE] : incorrect number of
dimensions
>> >
>> > >>> names(TB1)
>> >
>> > >> [1] "source" ?"E" ? ? ? "genes" ? "targets" "other"
>> >
>> > >>> class(TB1)
>> >
>> > >> [1] "EListRaw"
>> >
>> > >> attr(,"package")
>> >
>> > >> [1] "limma"
>> >
>> > >>
>> >
>> > >> I checked EListRaw and it inherits directly from list and not
from data frame.
>> >
>> > >> So sorry,
>> >
>> > >>
>> >
>> > >> Gavin.
>> >
>> > >>
>> >
>> > >> On 16 April 2011 08:38, Wei Shi shi at wehi.edu.au> wrote:
>> >
>> > >>> Hi Gavin:
>> >
>> > >>>
>> >
>> > >>> ? ? ? ?Sorry, TB1[common.probes] should be changed to
TB1[common.probes, ].
>> >
>> > >>>
>> >
>> > >>> ? ? ? ?Hope it works now.
>> >
>> > >>>
>> >
>> > >>> Cheers,
>> >
>> > >>> Wei
>> >
>> > >>>
>> >
>> > >>>
>> >
>> > >>> On Apr 16, 2011, at 4:32 PM, Gavin Koh wrote:
>> >
>> > >>>
>> >
>> > >>>> Dear Wei,
>> >
>> > >>>>
>> >
>> > >>>> I am afraid this data is from a public repository, so I have
no
>> >
>> > >>>> control over what data is published or the format :-(
>> >
>> > >>>> I am afraid cbind still does not appear to work with this
subscripting.
>> >
>> > >>>>
>> >
>> > >>>>> common.probes
>> > >>>>> TB
>> > >>>> Error: Two subscripts required
>> >
>> > >>>>
>> >
>> > >>>> Please help?
>> >
>> > >>>>
>> >
>> > >>>> Gavin ?? ??
>> >
>> > >>>>
>> >
>> > >>>> On 16 April 2011 00:33, Wei Shi shi at wehi.edu.au> wrote:
>> >
>> > >>>>> Dear Gavin:
>> >
>> > >>>>>
>> >
>> > >>>>> ? ? ? ?OK, so you did not input the control data. That is
the reason why my code did not work. You should really include the
control data in your analysis because they are very useful for the
normalization. But you can use the following code to merge the data
you are having now:
>> >
>> > >>>>>
>> >
>> > >>>>> m
>> > >>>>> merged
>> > >>>>>
>> >
>> > >>>>> This will remove the second ILMN_2038777 probe from TB1 and
combine probes from TB1 and TB2 in the right order.
>> >
>> > >>>>>
>> >
>> > >>>>> Cheers,
>> >
>> > >>>>> Wei
>> >
>> > >>>>>
>> >
>> > >>>>> On Apr 16, 2011, at 1:58 AM, Gavin Koh wrote:
>> >
>> > >>>>>
>> >
>> > >>>>>> Dear Wei
>> >
>> > >>>>>>
>> >
>> > >>>>>> I am very sorry, but this still does not work.
>> >
>> > >>>>>>
>> >
>> > >>>>>> ILMN_2038777 is not missing in TB1, but duplicated. The
batches with
>> >
>> > >>>>>> 48804 probes contain two copies of ILMN_2038777. The
batches with
>> >
>> > >>>>>> 48803 probes contain only one copy of ILMN_2038777. The
order of
>> >
>> > >>>>>> probes also seems to be different from batch to batch.
>> >
>> > >>>>>>
>> >
>> > >>>>>> TB1 was generated using:
>> >
>> > >>>>>>
>> >
>> > >>>>>> TB1
>> > >>>>>> ?files=as.character(targets$name)[1:5],
>> >
>> > >>>>>> ?probeid="Probe_ID",
>> >
>> > >>>>>> ?expr="Signal", sep="\t",
>> >
>> > >>>>>> ?other.columns="Detection"
>> >
>> > >>>>>> )
>> >
>> > >>>>>>
>> >
>> > >>>>>> The reason for this being that the summarized data for
each array is
>> >
>> > >>>>>> in a separate file. There is no bead level data available.
There is no
>> >
>> > >>>>>> xxx_profile.txt file.
>> >
>> > >>>>>>
>> >
>> > >>>>>> I tried removing ILMN_2038777, but I cannot. Am I right in
saying that
>> >
>> > >>>>>> this method of subsetting is only applicable to data
frames?
>> >
>> > >>>>>>> TB1
>> > >>>>>> Error in object$genes[i, , drop = FALSE] : incorrect
number of dimensions
>> >
>> > >>>>>>> TB1
>> > >>>>>> Error in object$genes[i, , drop = FALSE] : incorrect
number of dimensions
>> >
>> > >>>>>>
>> >
>> > >>>>>> Just so you can see the structure of the file that
read.ilmn() has produced:
>> >
>> > >>>>>>
>> >
>> > >>>>>> --begin screen dump--
>> >
>> > >>>>>>
>> >
>> > >>>>>>> TB1
>> >
>> > >>>>>> An object of class "EListRaw"
>> >
>> > >>>>>> $source
>> >
>> > >>>>>> [1] "illumina"
>> >
>> > >>>>>>
>> >
>> > >>>>>> $E
>> >
>> > >>>>>> ? ? ? ? ? ? ? ? ? [,1] ? ? ? [,2] ? ? ? [,3] ? ? ?[,4] ? ?
? [,5]
>> >
>> > >>>>>> ILMN_1809034 ?58.802010 ?24.907950 ?13.905010 ?10.07729 ?
7.044668
>> >
>> > >>>>>> ILMN_1660305 236.458900 113.218000 193.581800 282.36350
127.023400
>> >
>> > >>>>>> ILMN_1792173 202.685800 120.449500 208.370600 242.63090
130.447200
>> >
>> > >>>>>> ILMN_1762337 ?-4.230737 ?-3.899888 ?-3.654122 ?-3.30873
?-5.115820
>> >
>> > >>>>>> ILMN_2055271 ? 7.409712 ? 8.776000 ? 9.394149 ?12.66054 ?
1.250353
>> >
>> > >>>>>> 48799 more rows ...
>> >
>> > >>>>>>
>> >
>> > >>>>>> $genes
>> >
>> > >>>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173"
"ILMN_1762337" "ILMN_2055271"
>> >
>> > >>>>>> 48799 more elements ...
>> >
>> > >>>>>>
>> >
>> > >>>>>> $targets
>> >
>> > >>>>>> [1] SampleNames
>> >
>> > >>>>>> (or 0-length row.names)
>> >
>> > >>>>>>
>> >
>> > >>>>>> $other
>> >
>> > >>>>>> $Detection
>> >
>> > >>>>>> ? ? ? ? ? ? ? ? ? ?[,1] ? ? ? [,2] ? ? ? [,3] ? ? ? [,4] ?
? ? ?[,5]
>> >
>> > >>>>>> ILMN_1809034 0.003952569 0.01844532 0.03952569 0.08432148
0.111989500
>> >
>> > >>>>>> ILMN_1660305 0.000000000 0.00000000 0.00000000 0.00000000
0.001317523
>> >
>> > >>>>>> ILMN_1792173 0.000000000 0.00000000 0.00000000 0.00000000
0.001317523
>> >
>> > >>>>>> ILMN_1762337 0.728590300 0.75230570 0.68247690 0.57444010
0.708827400
>> >
>> > >>>>>> ILMN_2055271 0.076416340 0.05138340 0.05665349 0.06719368
0.283267500
>> >
>> > >>>>>> 48799 more rows ...
>> >
>> > >>>>>>
>> >
>> > >>>>>> --end screen dump--
>> >
>> > >>>>>>
>> >
>> > >>>>>> Gavin
>> >
>> > >>>>>>
>> >
>> > >>>>>> On 15 April 2011 12:24, Wei Shi shi at wehi.edu.au> wrote:
>> >
>> > >>>>>>> Dear Gavin:
>> >
>> > >>>>>>>
>> >
>> > >>>>>>> ? ? ? ?Thanks for the further information. The probe
"ILMN_2038777" is not only a gene probe but also a positive control
probe (control type: housekeeping). You can find more information
about this probe in the HT12 manifest file. But I do not know why it
was absent in your TB2 dataset. Anyway, it will be quite safe to
remove the housekeeping "ILMN_2038777" from your TB1 dataset. Then you
can combine these two datasets together. Below is the code to do this:
>> >
>> > >>>>>>>
>> >
>> > >>>>>>> x1
>> > >>>>>>> x2
>> > >>>>>>> x1
>> > >>>>>>> m
>> > >>>>>>> x.merged
>> > >>>>>>>
>> >
>> > >>>>>>> This will combine TB1 with TB2. For the other four
datasets, you can merge them to x.merged using the same procedure
(removing housekeeping "ILMN_2038777" from the dataset first if it
has, then using match and cbind commands to merge them).
>> >
>> > >>>>>>>
>> >
>> > >>>>>>> Hope this will work for you. But let you know it doesn't.
>> >
>> > >>>>>>>
>> >
>> > >>>>>>> Cheers,
>> >
>> > >>>>>>> Wei
>> >
>> > >>>>>>>
>> >
>> > >>>>>>>
>> >
>> > >>>>>>> On Apr 15, 2011, at 9:16 PM, Gavin Koh wrote:
>> >
>> > >>>>>>>
>> >
>> > >>>>>>>> Dear Wei,
>> >
>> > >>>>>>>>
>> >
>> > >>>>>>>> Thank you for replying so quickly. There appear to be 6
batches in
>> >
>> > >>>>>>>> this dataset (TB1 to 6)
>> >
>> > >>>>>>>>
>> >
>> > >>>>>>>>> TB1$genes[1:10]
>> >
>> > >>>>>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173"
"ILMN_1762337"
>> >
>> > >>>>>>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316"
>> >
>> > >>>>>>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689"
>> >
>> > >>>>>>>>> TB2$genes[1:10]
>> >
>> > >>>>>>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007"
"ILMN_2383229"
>> >
>> > >>>>>>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282"
>> >
>> > >>>>>>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698"
>> >
>> > >>>>>>>>> TB3$genes[1:10]
>> >
>> > >>>>>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173"
"ILMN_1762337"
>> >
>> > >>>>>>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316"
>> >
>> > >>>>>>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689"
>> >
>> > >>>>>>>>> TB4$genes[1:10]
>> >
>> > >>>>>>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007"
"ILMN_2383229"
>> >
>> > >>>>>>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282"
>> >
>> > >>>>>>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698"
>> >
>> > >>>>>>>>> TB5$genes[1:10]
>> >
>> > >>>>>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173"
"ILMN_1762337"
>> >
>> > >>>>>>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316"
>> >
>> > >>>>>>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689"
>> >
>> > >>>>>>>>> TB6$genes[1:10]
>> >
>> > >>>>>>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007"
"ILMN_2383229"
>> >
>> > >>>>>>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282"
>> >
>> > >>>>>>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698"
>> >
>> > >>>>>>>>
>> >
>> > >>>>>>>> ????????
>> >
>> > >>>>>>>>
>> >
>> > >>>>>>>> Gavin
>> >
>> > >>>>>>>>
>> >
>> > >>>>>>>> On 15 April 2011 11:45, Wei Shi shi at wehi.edu.au>
wrote:
>> >
>> > >>>>>>>>> Hi Gavin:
>> >
>> > >>>>>>>>>
>> >
>> > >>>>>>>>> ? ? ? ?It would be best if you can match the two
batches using the probe identifiers because they are much less likely
to have duplicates. Would it possible to show the first several probes
in each dataset so that I can write some code to help you do this?
>> >
>> > >>>>>>>>>
>> >
>> > >>>>>>>>> Cheers,
>> >
>> > >>>>>>>>> Wei
>> >
>> > >>>>>>>>>
>> >
>> > >>>>>>>>>
>> >
>> > >>>>>>>>> On Apr 15, 2011, at 7:54 PM, Gavin Koh wrote:
>> >
>> > >>>>>>>>>
>> >
>> > >>>>>>>>>> Dear Wei,
>> >
>> > >>>>>>>>>>
>> >
>> > >>>>>>>>>> A little more information: the difference seems to be
a single duplicated probe.
>> >
>> > >>>>>>>>>> Just comparing two batches (TB1 and TB2) with
different probe numbers:
>> >
>> > >>>>>>>>>>> length(TB1$genes)
>> >
>> > >>>>>>>>>> [1] 48804
>> >
>> > >>>>>>>>>>> length(TB2$genes)
>> >
>> > >>>>>>>>>> [1] 48803
>> >
>> > >>>>>>>>>>> length(unique(TB2$genes))
>> >
>> > >>>>>>>>>> [1] 48803
>> >
>> > >>>>>>>>>>> length(unique(TB1$genes))
>> >
>> > >>>>>>>>>> [1] 48803
>> >
>> > >>>>>>>>>>> setdiff(TB1$genes,TB2$genes)
>> >
>> > >>>>>>>>>> character(0)
>> >
>> > >>>>>>>>>>> setequal(TB1$genes,TB2$genes)
>> >
>> > >>>>>>>>>> [1] TRUE
>> >
>> > >>>>>>>>>>
>> >
>> > >>>>>>>>>> That still leaves me the problem that I don't know how
to identify the
>> >
>> > >>>>>>>>>> repeated probe or how to cbind TB1 and TB2... :-(
>> >
>> > >>>>>>>>>>
>> >
>> > >>>>>>>>>> Gavin
>> >
>> > >>>>>>>>>>
>> >
>> > >>>>>>>>>> On 15 April 2011 02:38, Wei Shi shi at wehi.edu.au>
wrote:
>> >
>> > >>>>>>>>>>> Hi Gavin:
>> >
>> > >>>>>>>>>>>
>> >
>> > >>>>>>>>>>> ? ? ? ?The number of probes which were present in one
batch but not in others should be very small. So you can use the
probes which are common in all batches for your analysis.
>> >
>> > >>>>>>>>>>>
>> >
>> > >>>>>>>>>>> ? ? ? ?Hope this helps.
>> >
>> > >>>>>>>>>>>
>> >
>> > >>>>>>>>>>> Cheers,
>> >
>> > >>>>>>>>>>> Wei
>> >
>> > >>>>>>>>>>>
>> >
>> > >>>>>>>>>>> On Apr 15, 2011, at 1:20 AM, Gavin Koh wrote:
>> >
>> > >>>>>>>>>>>
>> >
>> > >>>>>>>>>>>> I am trying to analyse data from ArrayExpress
E-GEOD-22098 (published
>> >
>> > >>>>>>>>>>>> Dec last year).
>> >
>> > >>>>>>>>>>>> According to the study methods, the data are
Illumina HumanHT-12 v3
>> >
>> > >>>>>>>>>>>> Expression BeadChips, but the hybridisation seems to
have been done in
>> >
>> > >>>>>>>>>>>> several batches, with different numbers of probes in
each batch,
>> >
>> > >>>>>>>>>>>> alternating between 48803 and 48804. Can anyone tell
me how to combine
>> >
>> > >>>>>>>>>>>> these different batches into the same file, please?
I am trying to
>> >
>> > >>>>>>>>>>>> read the probe data using the read.ilmn() function
in limma, but
>> >
>> > >>>>>>>>>>>> failing, because cbind complains the matrices are
not the same length
>> >
>> > >>>>>>>>>>>> (precise error is "Error in cbind(out$E,
objects[[i]]$E) : number of
>> >
>> > >>>>>>>>>>>> rows of matrices must match (see arg 2)").
>> >
>> > >>>>>>>>>>>>
>> >
>> > >>>>>>>>>>>> Thank you in advance,
>> >
>> > >>>>>>>>>>>>
>> >
>> > >>>>>>>>>>>> Gavin Koh
>> >
>> > >>>>>>>>>>>>
>> >
>> > >>>>>>>>>>>> _______________________________________________
>> >
>> > >>>>>>>>>>>> Bioconductor mailing list
>> >
>> > >>>>>>>>>>>> Bioconductor at r-project.org
>> >
>> > >>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >
>> > >>>>>>>>>>>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> > >>>>>>>>>>>
>> >
>> > >>>>>>>>>>>
>> >
>> > >>>>>>>>>>>
______________________________________________________________________
>> >
>> > >>>>>>>>>>> The information in this email is confidential and
intended solely for the addressee.
>> >
>> > >>>>>>>>>>> You must not disclose, forward, print or use it
without the permission of the sender.
>> >
>> > >>>>>>>>>>>
______________________________________________________________________
>> >
>> > >>>>>>>>>>>
>> >
>> > >>>>>>>>>>
>> >
>> > >>>>>>>>>>
>> >
>> > >>>>>>>>>>
>> >
>> > >>>>>>>>>> --
>> >
>> > >>>>>>>>>> Hofstadter's Law: It always takes longer than you
expect, even when
>> >
>> > >>>>>>>>>> you take into account Hofstadter's Law.
>> >
>> > >>>>>>>>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979)
>> >
>> > >>>>>>>>>
>> >
>> > >>>>>>>>>
>> >
>> > >>>>>>>>>
______________________________________________________________________
>> >
>> > >>>>>>>>> The information in this email is confidential and
intended solely for the addressee.
>> >
>> > >>>>>>>>> You must not disclose, forward, print or use it without
the permission of the sender.
>> >
>> > >>>>>>>>>
______________________________________________________________________
>> >
>> > >>>>>>>>>
>> >
>> > >>>>>>>>
>> >
>> > >>>>>>>>
>> >
>> > >>>>>>>>
>> >
>> > >>>>>>>> --
>> >
>> > >>>>>>>> Hofstadter's Law: It always takes longer than you
expect, even when
>> >
>> > >>>>>>>> you take into account Hofstadter's Law.
>> >
>> > >>>>>>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979)
>> >
>> > >>>>>>>
>> >
>> > >>>>>>>
>> >
>> > >>>>>>>
______________________________________________________________________
>> >
>> > >>>>>>> The information in this email is confidential and
intended solely for the addressee.
>> >
>> > >>>>>>> You must not disclose, forward, print or use it without
the permission of the sender.
>> >
>> > >>>>>>>
______________________________________________________________________
>> >
>> > >%
>
>
>
______________________________________________________________________
> The information in this email is confidential and
inte...{{dropped:14}}