Combining data from different versions of Illumina HumanHT-12 v3

0

Entering edit mode

Gavin Koh ▴ 220

@gavin-koh-4582

Last seen 9.6 years ago

I am trying to analyse data from ArrayExpress E-GEOD-22098 (published Dec last year). According to the study methods, the data are Illumina HumanHT-12 v3 Expression BeadChips, but the hybridisation seems to have been done in several batches, with different numbers of probes in each batch, alternating between 48803 and 48804. Can anyone tell me how to combine these different batches into the same file, please? I am trying to read the probe data using the read.ilmn() function in limma, but failing, because cbind complains the matrices are not the same length (precise error is "Error in cbind(out$E, objects[[i]]$E) : number of rows of matrices must match (see arg 2)"). Thank you in advance, Gavin Koh

probe limma ArrayExpress probe limma ArrayExpress • 2.2k views

ADD COMMENT • link updated 13.0 years ago by Wei Shi ★ 3.6k • written 13.0 years ago by Gavin Koh ▴ 220

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 16 days ago

Australia/Melbourne/Olivia Newton-John …

Hi Gavin: The number of probes which were present in one batch but not in others should be very small. So you can use the probes which are common in all batches for your analysis. Hope this helps. Cheers, Wei On Apr 15, 2011, at 1:20 AM, Gavin Koh wrote: > I am trying to analyse data from ArrayExpress E-GEOD-22098 (published > Dec last year). > According to the study methods, the data are Illumina HumanHT-12 v3 > Expression BeadChips, but the hybridisation seems to have been done in > several batches, with different numbers of probes in each batch, > alternating between 48803 and 48804. Can anyone tell me how to combine > these different batches into the same file, please? I am trying to > read the probe data using the read.ilmn() function in limma, but > failing, because cbind complains the matrices are not the same length > (precise error is "Error in cbind(out$E, objects[[i]]$E) : number of > rows of matrices must match (see arg 2)"). > > Thank you in advance, > > Gavin Koh > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD COMMENT • link 13.0 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Thank you Wei, I agree with your opinion, but can you help with the technical details of how to do this, please? Gavin. On 15 April 2011 02:38, Wei Shi <shi at="" wehi.edu.au=""> wrote: > Hi Gavin: > > ? ? ? ?The number of probes which were present in one batch but not in others should be very small. So you can use the probes which are common in all batches for your analysis. > > ? ? ? ?Hope this helps. > > Cheers, > Wei > > On Apr 15, 2011, at 1:20 AM, Gavin Koh wrote: > >> I am trying to analyse data from ArrayExpress E-GEOD-22098 (published >> Dec last year). >> According to the study methods, the data are Illumina HumanHT-12 v3 >> Expression BeadChips, but the hybridisation seems to have been done in >> several batches, with different numbers of probes in each batch, >> alternating between 48803 and 48804. Can anyone tell me how to combine >> these different batches into the same file, please? I am trying to >> read the probe data using the read.ilmn() function in limma, but >> failing, because cbind complains the matrices are not the same length >> (precise error is "Error in cbind(out$E, objects[[i]]$E) : number of >> rows of matrices must match (see arg 2)"). >> >> Thank you in advance, >> >> Gavin Koh >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:14}}

ADD REPLY • link 13.0 years ago Gavin Koh ▴ 220

0

Entering edit mode

Dear Wei, A little more information: the difference seems to be a single duplicated probe. Just comparing two batches (TB1 and TB2) with different probe numbers: > length(TB1$genes) [1] 48804 > length(TB2$genes) [1] 48803 > length(unique(TB2$genes)) [1] 48803 > length(unique(TB1$genes)) [1] 48803 > setdiff(TB1$genes,TB2$genes) character(0) > setequal(TB1$genes,TB2$genes) [1] TRUE That still leaves me the problem that I don't know how to identify the repeated probe or how to cbind TB1 and TB2... :-( Gavin On 15 April 2011 02:38, Wei Shi <shi at="" wehi.edu.au=""> wrote: > Hi Gavin: > > ? ? ? ?The number of probes which were present in one batch but not in others should be very small. So you can use the probes which are common in all batches for your analysis. > > ? ? ? ?Hope this helps. > > Cheers, > Wei > > On Apr 15, 2011, at 1:20 AM, Gavin Koh wrote: > >> I am trying to analyse data from ArrayExpress E-GEOD-22098 (published >> Dec last year). >> According to the study methods, the data are Illumina HumanHT-12 v3 >> Expression BeadChips, but the hybridisation seems to have been done in >> several batches, with different numbers of probes in each batch, >> alternating between 48803 and 48804. Can anyone tell me how to combine >> these different batches into the same file, please? I am trying to >> read the probe data using the read.ilmn() function in limma, but >> failing, because cbind complains the matrices are not the same length >> (precise error is "Error in cbind(out$E, objects[[i]]$E) : number of >> rows of matrices must match (see arg 2)"). >> >> Thank you in advance, >> >> Gavin Koh >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:14}}

ADD REPLY • link 13.0 years ago Gavin Koh ▴ 220

0

Entering edit mode

Dear Wei, I've identified the duplicate probe: common <- intersect(TB1$genes,TB2$genes) test <- common==TB1$genes Inspection of test shows that the repeated probe is "ILMN_2038777" at position 324 in the vector. Also discovered that the order of the probes is different between batches so just deleting the duplicated probe will not allow me to use cbind(). Just tried help file for cbind(), but that states only that the probe order must be the same between EListRaw objects. limma manual seems to suggest merge() command, but help(merge) says it only takes RGList or MAlist objects, but looking at code for read.ilmn(), read.ilmn() seems to use cbind to merge the input from different files anyway so in theory merge() should work also? Also tried using combine(), but combine() doesn't seem to be defined for EListRaw. Gavin. On 15 April 2011 10:54, Gavin Koh <gavin.koh at="" gmail.com=""> wrote: > Dear Wei, > > A little more information: the difference seems to be a single duplicated probe. > Just comparing two batches (TB1 and TB2) with different probe numbers: >> length(TB1$genes) > [1] 48804 >> length(TB2$genes) > [1] 48803 >> length(unique(TB2$genes)) > [1] 48803 >> length(unique(TB1$genes)) > [1] 48803 >> setdiff(TB1$genes,TB2$genes) > character(0) >> setequal(TB1$genes,TB2$genes) > [1] TRUE > > That still leaves me the problem that I don't know how to identify the > repeated probe or how to cbind TB1 and TB2... :-( > > Gavin > > On 15 April 2011 02:38, Wei Shi <shi at="" wehi.edu.au=""> wrote: >> Hi Gavin: >> >> ? ? ? ?The number of probes which were present in one batch but not in others should be very small. So you can use the probes which are common in all batches for your analysis. >> >> ? ? ? ?Hope this helps. >> >> Cheers, >> Wei >> >> On Apr 15, 2011, at 1:20 AM, Gavin Koh wrote: >> >>> I am trying to analyse data from ArrayExpress E-GEOD-22098 (published >>> Dec last year). >>> According to the study methods, the data are Illumina HumanHT-12 v3 >>> Expression BeadChips, but the hybridisation seems to have been done in >>> several batches, with different numbers of probes in each batch, >>> alternating between 48803 and 48804. Can anyone tell me how to combine >>> these different batches into the same file, please? I am trying to >>> read the probe data using the read.ilmn() function in limma, but >>> failing, because cbind complains the matrices are not the same length >>> (precise error is "Error in cbind(out$E, objects[[i]]$E) : number of >>> rows of matrices must match (see arg 2)"). >>> >>> Thank you in advance, >>> >>> Gavin Koh >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the addressee. >> You must not disclose, forward, print or use it without the permission of the sender. >> ______________________________________________________________________ >> > > > > -- > Hofstadter's Law: It always takes longer than you expect, even when > you take into account Hofstadter's Law. > ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) > -- Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law. ?Douglas Hofstadter (in G?del, Escher, Bach, 1979)

ADD REPLY • link 13.0 years ago Gavin Koh ▴ 220

0

Entering edit mode

Hi Gavin: It would be best if you can match the two batches using the probe identifiers because they are much less likely to have duplicates. Would it possible to show the first several probes in each dataset so that I can write some code to help you do this? Cheers, Wei On Apr 15, 2011, at 7:54 PM, Gavin Koh wrote: > Dear Wei, > > A little more information: the difference seems to be a single duplicated probe. > Just comparing two batches (TB1 and TB2) with different probe numbers: >> length(TB1$genes) > [1] 48804 >> length(TB2$genes) > [1] 48803 >> length(unique(TB2$genes)) > [1] 48803 >> length(unique(TB1$genes)) > [1] 48803 >> setdiff(TB1$genes,TB2$genes) > character(0) >> setequal(TB1$genes,TB2$genes) > [1] TRUE > > That still leaves me the problem that I don't know how to identify the > repeated probe or how to cbind TB1 and TB2... :-( > > Gavin > > On 15 April 2011 02:38, Wei Shi <shi at="" wehi.edu.au=""> wrote: >> Hi Gavin: >> >> The number of probes which were present in one batch but not in others should be very small. So you can use the probes which are common in all batches for your analysis. >> >> Hope this helps. >> >> Cheers, >> Wei >> >> On Apr 15, 2011, at 1:20 AM, Gavin Koh wrote: >> >>> I am trying to analyse data from ArrayExpress E-GEOD-22098 (published >>> Dec last year). >>> According to the study methods, the data are Illumina HumanHT-12 v3 >>> Expression BeadChips, but the hybridisation seems to have been done in >>> several batches, with different numbers of probes in each batch, >>> alternating between 48803 and 48804. Can anyone tell me how to combine >>> these different batches into the same file, please? I am trying to >>> read the probe data using the read.ilmn() function in limma, but >>> failing, because cbind complains the matrices are not the same length >>> (precise error is "Error in cbind(out$E, objects[[i]]$E) : number of >>> rows of matrices must match (see arg 2)"). >>> >>> Thank you in advance, >>> >>> Gavin Koh >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the addressee. >> You must not disclose, forward, print or use it without the permission of the sender. >> ______________________________________________________________________ >> > > > > -- > Hofstadter's Law: It always takes longer than you expect, even when > you take into account Hofstadter's Law. > ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 13.0 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Dear Wei, Thank you for replying so quickly. There appear to be 6 batches in this dataset (TB1 to 6) > TB1$genes[1:10] [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" > TB2$genes[1:10] [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" > TB3$genes[1:10] [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" > TB4$genes[1:10] [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" > TB5$genes[1:10] [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" > TB6$genes[1:10] [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" ???????? Gavin On 15 April 2011 11:45, Wei Shi <shi at="" wehi.edu.au=""> wrote: > Hi Gavin: > > ? ? ? ?It would be best if you can match the two batches using the probe identifiers because they are much less likely to have duplicates. Would it possible to show the first several probes in each dataset so that I can write some code to help you do this? > > Cheers, > Wei > > > On Apr 15, 2011, at 7:54 PM, Gavin Koh wrote: > >> Dear Wei, >> >> A little more information: the difference seems to be a single duplicated probe. >> Just comparing two batches (TB1 and TB2) with different probe numbers: >>> length(TB1$genes) >> [1] 48804 >>> length(TB2$genes) >> [1] 48803 >>> length(unique(TB2$genes)) >> [1] 48803 >>> length(unique(TB1$genes)) >> [1] 48803 >>> setdiff(TB1$genes,TB2$genes) >> character(0) >>> setequal(TB1$genes,TB2$genes) >> [1] TRUE >> >> That still leaves me the problem that I don't know how to identify the >> repeated probe or how to cbind TB1 and TB2... :-( >> >> Gavin >> >> On 15 April 2011 02:38, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>> Hi Gavin: >>> >>> ? ? ? ?The number of probes which were present in one batch but not in others should be very small. So you can use the probes which are common in all batches for your analysis. >>> >>> ? ? ? ?Hope this helps. >>> >>> Cheers, >>> Wei >>> >>> On Apr 15, 2011, at 1:20 AM, Gavin Koh wrote: >>> >>>> I am trying to analyse data from ArrayExpress E-GEOD-22098 (published >>>> Dec last year). >>>> According to the study methods, the data are Illumina HumanHT-12 v3 >>>> Expression BeadChips, but the hybridisation seems to have been done in >>>> several batches, with different numbers of probes in each batch, >>>> alternating between 48803 and 48804. Can anyone tell me how to combine >>>> these different batches into the same file, please? I am trying to >>>> read the probe data using the read.ilmn() function in limma, but >>>> failing, because cbind complains the matrices are not the same length >>>> (precise error is "Error in cbind(out$E, objects[[i]]$E) : number of >>>> rows of matrices must match (see arg 2)"). >>>> >>>> Thank you in advance, >>>> >>>> Gavin Koh >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> ______________________________________________________________________ >>> The information in this email is confidential and intended solely for the addressee. >>> You must not disclose, forward, print or use it without the permission of the sender. >>> ______________________________________________________________________ >>> >> >> >> >> -- >> Hofstadter's Law: It always takes longer than you expect, even when >> you take into account Hofstadter's Law. >> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:14}}

ADD REPLY • link 13.0 years ago Gavin Koh ▴ 220

0

Entering edit mode

Dear Gavin: Thanks for the further information. The probe "ILMN_2038777" is not only a gene probe but also a positive control probe (control type: housekeeping). You can find more information about this probe in the HT12 manifest file. But I do not know why it was absent in your TB2 dataset. Anyway, it will be quite safe to remove the housekeeping "ILMN_2038777" from your TB1 dataset. Then you can combine these two datasets together. Below is the code to do this: x1 <- read.ilmn("your_TB1_probe_profile","your_TB1_control_probe profile") x2 <- read.ilmn("your_TB2_probe_profile","your_TB2_control_probe profile") x1 <- x1[!(x1$genes$Probe_Id == "ILMN_2038777" & tolower(x1$genes$Status) == "housekeeping"),] m <- match(x1$genes$Probe_Id, x2$genes$Probe_Id) x.merged <- cbind(x1,x2[m,]) This will combine TB1 with TB2. For the other four datasets, you can merge them to x.merged using the same procedure (removing housekeeping "ILMN_2038777" from the dataset first if it has, then using match and cbind commands to merge them). Hope this will work for you. But let you know it doesn't. Cheers, Wei On Apr 15, 2011, at 9:16 PM, Gavin Koh wrote: > Dear Wei, > > Thank you for replying so quickly. There appear to be 6 batches in > this dataset (TB1 to 6) > >> TB1$genes[1:10] > [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" > "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" > [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >> TB2$genes[1:10] > [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" > "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" > [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >> TB3$genes[1:10] > [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" > "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" > [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >> TB4$genes[1:10] > [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" > "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" > [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >> TB5$genes[1:10] > [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" > "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" > [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >> TB6$genes[1:10] > [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" > "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" > [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" > > ???????? > > Gavin > > On 15 April 2011 11:45, Wei Shi <shi at="" wehi.edu.au=""> wrote: >> Hi Gavin: >> >> It would be best if you can match the two batches using the probe identifiers because they are much less likely to have duplicates. Would it possible to show the first several probes in each dataset so that I can write some code to help you do this? >> >> Cheers, >> Wei >> >> >> On Apr 15, 2011, at 7:54 PM, Gavin Koh wrote: >> >>> Dear Wei, >>> >>> A little more information: the difference seems to be a single duplicated probe. >>> Just comparing two batches (TB1 and TB2) with different probe numbers: >>>> length(TB1$genes) >>> [1] 48804 >>>> length(TB2$genes) >>> [1] 48803 >>>> length(unique(TB2$genes)) >>> [1] 48803 >>>> length(unique(TB1$genes)) >>> [1] 48803 >>>> setdiff(TB1$genes,TB2$genes) >>> character(0) >>>> setequal(TB1$genes,TB2$genes) >>> [1] TRUE >>> >>> That still leaves me the problem that I don't know how to identify the >>> repeated probe or how to cbind TB1 and TB2... :-( >>> >>> Gavin >>> >>> On 15 April 2011 02:38, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>> Hi Gavin: >>>> >>>> The number of probes which were present in one batch but not in others should be very small. So you can use the probes which are common in all batches for your analysis. >>>> >>>> Hope this helps. >>>> >>>> Cheers, >>>> Wei >>>> >>>> On Apr 15, 2011, at 1:20 AM, Gavin Koh wrote: >>>> >>>>> I am trying to analyse data from ArrayExpress E-GEOD-22098 (published >>>>> Dec last year). >>>>> According to the study methods, the data are Illumina HumanHT-12 v3 >>>>> Expression BeadChips, but the hybridisation seems to have been done in >>>>> several batches, with different numbers of probes in each batch, >>>>> alternating between 48803 and 48804. Can anyone tell me how to combine >>>>> these different batches into the same file, please? I am trying to >>>>> read the probe data using the read.ilmn() function in limma, but >>>>> failing, because cbind complains the matrices are not the same length >>>>> (precise error is "Error in cbind(out$E, objects[[i]]$E) : number of >>>>> rows of matrices must match (see arg 2)"). >>>>> >>>>> Thank you in advance, >>>>> >>>>> Gavin Koh >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> ______________________________________________________________________ >>>> The information in this email is confidential and intended solely for the addressee. >>>> You must not disclose, forward, print or use it without the permission of the sender. >>>> ______________________________________________________________________ >>>> >>> >>> >>> >>> -- >>> Hofstadter's Law: It always takes longer than you expect, even when >>> you take into account Hofstadter's Law. >>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the addressee. >> You must not disclose, forward, print or use it without the permission of the sender. >> ______________________________________________________________________ >> > > > > -- > Hofstadter's Law: It always takes longer than you expect, even when > you take into account Hofstadter's Law. > ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 13.0 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Dear Wei I am very sorry, but this still does not work. ILMN_2038777 is not missing in TB1, but duplicated. The batches with 48804 probes contain two copies of ILMN_2038777. The batches with 48803 probes contain only one copy of ILMN_2038777. The order of probes also seems to be different from batch to batch. TB1 was generated using: TB1 <- read.ilmn( files=as.character(targets$name)[1:5], probeid="Probe_ID", expr="Signal", sep="\t", other.columns="Detection" ) The reason for this being that the summarized data for each array is in a separate file. There is no bead level data available. There is no xxx_profile.txt file. I tried removing ILMN_2038777, but I cannot. Am I right in saying that this method of subsetting is only applicable to data frames? > TB1 <- TB1[TB1$genes != "ILMN_2038777", ] Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions > TB1 <- TB1[!(TB1$genes == "ILMN_2038777"), ] Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions Just so you can see the structure of the file that read.ilmn() has produced: --begin screen dump-- > TB1 An object of class "EListRaw" $source [1] "illumina" $E [,1] [,2] [,3] [,4] [,5] ILMN_1809034 58.802010 24.907950 13.905010 10.07729 7.044668 ILMN_1660305 236.458900 113.218000 193.581800 282.36350 127.023400 ILMN_1792173 202.685800 120.449500 208.370600 242.63090 130.447200 ILMN_1762337 -4.230737 -3.899888 -3.654122 -3.30873 -5.115820 ILMN_2055271 7.409712 8.776000 9.394149 12.66054 1.250353 48799 more rows ... $genes [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" "ILMN_2055271" 48799 more elements ... $targets [1] SampleNames <0 rows> (or 0-length row.names) $other $Detection [,1] [,2] [,3] [,4] [,5] ILMN_1809034 0.003952569 0.01844532 0.03952569 0.08432148 0.111989500 ILMN_1660305 0.000000000 0.00000000 0.00000000 0.00000000 0.001317523 ILMN_1792173 0.000000000 0.00000000 0.00000000 0.00000000 0.001317523 ILMN_1762337 0.728590300 0.75230570 0.68247690 0.57444010 0.708827400 ILMN_2055271 0.076416340 0.05138340 0.05665349 0.06719368 0.283267500 48799 more rows ... --end screen dump-- Gavin On 15 April 2011 12:24, Wei Shi <shi at="" wehi.edu.au=""> wrote: > Dear Gavin: > > ? ? ? ?Thanks for the further information. The probe "ILMN_2038777" is not only a gene probe but also a positive control probe (control type: housekeeping). You can find more information about this probe in the HT12 manifest file. But I do not know why it was absent in your TB2 dataset. Anyway, it will be quite safe to remove the housekeeping "ILMN_2038777" from your TB1 dataset. Then you can combine these two datasets together. Below is the code to do this: > > x1 <- read.ilmn("your_TB1_probe_profile","your_TB1_control_probe profile") > x2 <- read.ilmn("your_TB2_probe_profile","your_TB2_control_probe profile") > x1 <- x1[!(x1$genes$Probe_Id == "ILMN_2038777" & tolower(x1$genes$Status) == "housekeeping"),] > m <- match(x1$genes$Probe_Id, x2$genes$Probe_Id) > x.merged <- cbind(x1,x2[m,]) > > This will combine TB1 with TB2. For the other four datasets, you can merge them to x.merged using the same procedure (removing housekeeping "ILMN_2038777" from the dataset first if it has, then using match and cbind commands to merge them). > > Hope this will work for you. But let you know it doesn't. > > Cheers, > Wei > > > On Apr 15, 2011, at 9:16 PM, Gavin Koh wrote: > >> Dear Wei, >> >> Thank you for replying so quickly. There appear to be 6 batches in >> this dataset (TB1 to 6) >> >>> TB1$genes[1:10] >> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>> TB2$genes[1:10] >> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>> TB3$genes[1:10] >> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>> TB4$genes[1:10] >> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>> TB5$genes[1:10] >> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>> TB6$genes[1:10] >> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >> >> ???????? >> >> Gavin >> >> On 15 April 2011 11:45, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>> Hi Gavin: >>> >>> ? ? ? ?It would be best if you can match the two batches using the probe identifiers because they are much less likely to have duplicates. Would it possible to show the first several probes in each dataset so that I can write some code to help you do this? >>> >>> Cheers, >>> Wei >>> >>> >>> On Apr 15, 2011, at 7:54 PM, Gavin Koh wrote: >>> >>>> Dear Wei, >>>> >>>> A little more information: the difference seems to be a single duplicated probe. >>>> Just comparing two batches (TB1 and TB2) with different probe numbers: >>>>> length(TB1$genes) >>>> [1] 48804 >>>>> length(TB2$genes) >>>> [1] 48803 >>>>> length(unique(TB2$genes)) >>>> [1] 48803 >>>>> length(unique(TB1$genes)) >>>> [1] 48803 >>>>> setdiff(TB1$genes,TB2$genes) >>>> character(0) >>>>> setequal(TB1$genes,TB2$genes) >>>> [1] TRUE >>>> >>>> That still leaves me the problem that I don't know how to identify the >>>> repeated probe or how to cbind TB1 and TB2... :-( >>>> >>>> Gavin >>>> >>>> On 15 April 2011 02:38, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>>> Hi Gavin: >>>>> >>>>> ? ? ? ?The number of probes which were present in one batch but not in others should be very small. So you can use the probes which are common in all batches for your analysis. >>>>> >>>>> ? ? ? ?Hope this helps. >>>>> >>>>> Cheers, >>>>> Wei >>>>> >>>>> On Apr 15, 2011, at 1:20 AM, Gavin Koh wrote: >>>>> >>>>>> I am trying to analyse data from ArrayExpress E-GEOD-22098 (published >>>>>> Dec last year). >>>>>> According to the study methods, the data are Illumina HumanHT-12 v3 >>>>>> Expression BeadChips, but the hybridisation seems to have been done in >>>>>> several batches, with different numbers of probes in each batch, >>>>>> alternating between 48803 and 48804. Can anyone tell me how to combine >>>>>> these different batches into the same file, please? I am trying to >>>>>> read the probe data using the read.ilmn() function in limma, but >>>>>> failing, because cbind complains the matrices are not the same length >>>>>> (precise error is "Error in cbind(out$E, objects[[i]]$E) : number of >>>>>> rows of matrices must match (see arg 2)"). >>>>>> >>>>>> Thank you in advance, >>>>>> >>>>>> Gavin Koh >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> >>>>> ______________________________________________________________________ >>>>> The information in this email is confidential and intended solely for the addressee. >>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>> ______________________________________________________________________ >>>>> >>>> >>>> >>>> >>>> -- >>>> Hofstadter's Law: It always takes longer than you expect, even when >>>> you take into account Hofstadter's Law. >>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >>> >>> >>> ______________________________________________________________________ >>> The information in this email is confidential and intended solely for the addressee. >>> You must not disclose, forward, print or use it without the permission of the sender. >>> ______________________________________________________________________ >>> >> >> >> >> -- >> Hofstadter's Law: It always takes longer than you expect, even when >> you take into account Hofstadter's Law. >> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:14}}

ADD REPLY • link 13.0 years ago Gavin Koh ▴ 220

0

Entering edit mode

Dear Gavin: OK, so you did not input the control data. That is the reason why my code did not work. You should really include the control data in your analysis because they are very useful for the normalization. But you can use the following code to merge the data you are having now: m <- match(TB2$genes, TB1$genes) merged <- cbind(TB2,TB1[m]) This will remove the second ILMN_2038777 probe from TB1 and combine probes from TB1 and TB2 in the right order. Cheers, Wei On Apr 16, 2011, at 1:58 AM, Gavin Koh wrote: > Dear Wei > > I am very sorry, but this still does not work. > > ILMN_2038777 is not missing in TB1, but duplicated. The batches with > 48804 probes contain two copies of ILMN_2038777. The batches with > 48803 probes contain only one copy of ILMN_2038777. The order of > probes also seems to be different from batch to batch. > > TB1 was generated using: > > TB1 <- read.ilmn( > files=as.character(targets$name)[1:5], > probeid="Probe_ID", > expr="Signal", sep="\t", > other.columns="Detection" > ) > > The reason for this being that the summarized data for each array is > in a separate file. There is no bead level data available. There is no > xxx_profile.txt file. > > I tried removing ILMN_2038777, but I cannot. Am I right in saying that > this method of subsetting is only applicable to data frames? >> TB1 <- TB1[TB1$genes != "ILMN_2038777", ] > Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions >> TB1 <- TB1[!(TB1$genes == "ILMN_2038777"), ] > Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions > > Just so you can see the structure of the file that read.ilmn() has produced: > > --begin screen dump-- > >> TB1 > An object of class "EListRaw" > $source > [1] "illumina" > > $E > [,1] [,2] [,3] [,4] [,5] > ILMN_1809034 58.802010 24.907950 13.905010 10.07729 7.044668 > ILMN_1660305 236.458900 113.218000 193.581800 282.36350 127.023400 > ILMN_1792173 202.685800 120.449500 208.370600 242.63090 130.447200 > ILMN_1762337 -4.230737 -3.899888 -3.654122 -3.30873 -5.115820 > ILMN_2055271 7.409712 8.776000 9.394149 12.66054 1.250353 > 48799 more rows ... > > $genes > [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" "ILMN_2055271" > 48799 more elements ... > > $targets > [1] SampleNames > <0 rows> (or 0-length row.names) > > $other > $Detection > [,1] [,2] [,3] [,4] [,5] > ILMN_1809034 0.003952569 0.01844532 0.03952569 0.08432148 0.111989500 > ILMN_1660305 0.000000000 0.00000000 0.00000000 0.00000000 0.001317523 > ILMN_1792173 0.000000000 0.00000000 0.00000000 0.00000000 0.001317523 > ILMN_1762337 0.728590300 0.75230570 0.68247690 0.57444010 0.708827400 > ILMN_2055271 0.076416340 0.05138340 0.05665349 0.06719368 0.283267500 > 48799 more rows ... > > --end screen dump-- > > Gavin > > On 15 April 2011 12:24, Wei Shi <shi at="" wehi.edu.au=""> wrote: >> Dear Gavin: >> >> Thanks for the further information. The probe "ILMN_2038777" is not only a gene probe but also a positive control probe (control type: housekeeping). You can find more information about this probe in the HT12 manifest file. But I do not know why it was absent in your TB2 dataset. Anyway, it will be quite safe to remove the housekeeping "ILMN_2038777" from your TB1 dataset. Then you can combine these two datasets together. Below is the code to do this: >> >> x1 <- read.ilmn("your_TB1_probe_profile","your_TB1_control_probe profile") >> x2 <- read.ilmn("your_TB2_probe_profile","your_TB2_control_probe profile") >> x1 <- x1[!(x1$genes$Probe_Id == "ILMN_2038777" & tolower(x1$genes$Status) == "housekeeping"),] >> m <- match(x1$genes$Probe_Id, x2$genes$Probe_Id) >> x.merged <- cbind(x1,x2[m,]) >> >> This will combine TB1 with TB2. For the other four datasets, you can merge them to x.merged using the same procedure (removing housekeeping "ILMN_2038777" from the dataset first if it has, then using match and cbind commands to merge them). >> >> Hope this will work for you. But let you know it doesn't. >> >> Cheers, >> Wei >> >> >> On Apr 15, 2011, at 9:16 PM, Gavin Koh wrote: >> >>> Dear Wei, >>> >>> Thank you for replying so quickly. There appear to be 6 batches in >>> this dataset (TB1 to 6) >>> >>>> TB1$genes[1:10] >>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>> TB2$genes[1:10] >>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>> TB3$genes[1:10] >>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>> TB4$genes[1:10] >>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>> TB5$genes[1:10] >>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>> TB6$genes[1:10] >>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>> >>> ???????? >>> >>> Gavin >>> >>> On 15 April 2011 11:45, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>> Hi Gavin: >>>> >>>> It would be best if you can match the two batches using the probe identifiers because they are much less likely to have duplicates. Would it possible to show the first several probes in each dataset so that I can write some code to help you do this? >>>> >>>> Cheers, >>>> Wei >>>> >>>> >>>> On Apr 15, 2011, at 7:54 PM, Gavin Koh wrote: >>>> >>>>> Dear Wei, >>>>> >>>>> A little more information: the difference seems to be a single duplicated probe. >>>>> Just comparing two batches (TB1 and TB2) with different probe numbers: >>>>>> length(TB1$genes) >>>>> [1] 48804 >>>>>> length(TB2$genes) >>>>> [1] 48803 >>>>>> length(unique(TB2$genes)) >>>>> [1] 48803 >>>>>> length(unique(TB1$genes)) >>>>> [1] 48803 >>>>>> setdiff(TB1$genes,TB2$genes) >>>>> character(0) >>>>>> setequal(TB1$genes,TB2$genes) >>>>> [1] TRUE >>>>> >>>>> That still leaves me the problem that I don't know how to identify the >>>>> repeated probe or how to cbind TB1 and TB2... :-( >>>>> >>>>> Gavin >>>>> >>>>> On 15 April 2011 02:38, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>>>> Hi Gavin: >>>>>> >>>>>> The number of probes which were present in one batch but not in others should be very small. So you can use the probes which are common in all batches for your analysis. >>>>>> >>>>>> Hope this helps. >>>>>> >>>>>> Cheers, >>>>>> Wei >>>>>> >>>>>> On Apr 15, 2011, at 1:20 AM, Gavin Koh wrote: >>>>>> >>>>>>> I am trying to analyse data from ArrayExpress E-GEOD-22098 (published >>>>>>> Dec last year). >>>>>>> According to the study methods, the data are Illumina HumanHT-12 v3 >>>>>>> Expression BeadChips, but the hybridisation seems to have been done in >>>>>>> several batches, with different numbers of probes in each batch, >>>>>>> alternating between 48803 and 48804. Can anyone tell me how to combine >>>>>>> these different batches into the same file, please? I am trying to >>>>>>> read the probe data using the read.ilmn() function in limma, but >>>>>>> failing, because cbind complains the matrices are not the same length >>>>>>> (precise error is "Error in cbind(out$E, objects[[i]]$E) : number of >>>>>>> rows of matrices must match (see arg 2)"). >>>>>>> >>>>>>> Thank you in advance, >>>>>>> >>>>>>> Gavin Koh >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at r-project.org >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> >>>>>> ______________________________________________________________________ >>>>>> The information in this email is confidential and intended solely for the addressee. >>>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>>> ______________________________________________________________________ >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Hofstadter's Law: It always takes longer than you expect, even when >>>>> you take into account Hofstadter's Law. >>>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >>>> >>>> >>>> ______________________________________________________________________ >>>> The information in this email is confidential and intended solely for the addressee. >>>> You must not disclose, forward, print or use it without the permission of the sender. >>>> ______________________________________________________________________ >>>> >>> >>> >>> >>> -- >>> Hofstadter's Law: It always takes longer than you expect, even when >>> you take into account Hofstadter's Law. >>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the addressee. >> You must not disclose, forward, print or use it without the permission of the sender. >> ______________________________________________________________________ >> > > > > -- > Hofstadter's Law: It always takes longer than you expect, even when > you take into account Hofstadter's Law. > ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 13.0 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Dear Wei, I am afraid this data is from a public repository, so I have no control over what data is published or the format :-( I am afraid cbind still does not appear to work with this subscripting. > common.probes <- match(TB2$genes,TB1$genes) > TB <- cbind(TB2,TB1[common.probes]) Error: Two subscripts required Please help? Gavin ?? ?? On 16 April 2011 00:33, Wei Shi <shi at="" wehi.edu.au=""> wrote: > Dear Gavin: > > ? ? ? ?OK, so you did not input the control data. That is the reason why my code did not work. You should really include the control data in your analysis because they are very useful for the normalization. But you can use the following code to merge the data you are having now: > > m <- match(TB2$genes, TB1$genes) > merged <- cbind(TB2,TB1[m]) > > This will remove the second ILMN_2038777 probe from TB1 and combine probes from TB1 and TB2 in the right order. > > Cheers, > Wei > > On Apr 16, 2011, at 1:58 AM, Gavin Koh wrote: > >> Dear Wei >> >> I am very sorry, but this still does not work. >> >> ILMN_2038777 is not missing in TB1, but duplicated. The batches with >> 48804 probes contain two copies of ILMN_2038777. The batches with >> 48803 probes contain only one copy of ILMN_2038777. The order of >> probes also seems to be different from batch to batch. >> >> TB1 was generated using: >> >> TB1 <- read.ilmn( >> ?files=as.character(targets$name)[1:5], >> ?probeid="Probe_ID", >> ?expr="Signal", sep="\t", >> ?other.columns="Detection" >> ) >> >> The reason for this being that the summarized data for each array is >> in a separate file. There is no bead level data available. There is no >> xxx_profile.txt file. >> >> I tried removing ILMN_2038777, but I cannot. Am I right in saying that >> this method of subsetting is only applicable to data frames? >>> TB1 <- TB1[TB1$genes != "ILMN_2038777", ] >> Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions >>> TB1 <- TB1[!(TB1$genes == "ILMN_2038777"), ] >> Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions >> >> Just so you can see the structure of the file that read.ilmn() has produced: >> >> --begin screen dump-- >> >>> TB1 >> An object of class "EListRaw" >> $source >> [1] "illumina" >> >> $E >> ? ? ? ? ? ? ? ? ? [,1] ? ? ? [,2] ? ? ? [,3] ? ? ?[,4] ? ? ? [,5] >> ILMN_1809034 ?58.802010 ?24.907950 ?13.905010 ?10.07729 ? 7.044668 >> ILMN_1660305 236.458900 113.218000 193.581800 282.36350 127.023400 >> ILMN_1792173 202.685800 120.449500 208.370600 242.63090 130.447200 >> ILMN_1762337 ?-4.230737 ?-3.899888 ?-3.654122 ?-3.30873 ?-5.115820 >> ILMN_2055271 ? 7.409712 ? 8.776000 ? 9.394149 ?12.66054 ? 1.250353 >> 48799 more rows ... >> >> $genes >> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" "ILMN_2055271" >> 48799 more elements ... >> >> $targets >> [1] SampleNames >> <0 rows> (or 0-length row.names) >> >> $other >> $Detection >> ? ? ? ? ? ? ? ? ? ?[,1] ? ? ? [,2] ? ? ? [,3] ? ? ? [,4] ? ? ? ?[,5] >> ILMN_1809034 0.003952569 0.01844532 0.03952569 0.08432148 0.111989500 >> ILMN_1660305 0.000000000 0.00000000 0.00000000 0.00000000 0.001317523 >> ILMN_1792173 0.000000000 0.00000000 0.00000000 0.00000000 0.001317523 >> ILMN_1762337 0.728590300 0.75230570 0.68247690 0.57444010 0.708827400 >> ILMN_2055271 0.076416340 0.05138340 0.05665349 0.06719368 0.283267500 >> 48799 more rows ... >> >> --end screen dump-- >> >> Gavin >> >> On 15 April 2011 12:24, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>> Dear Gavin: >>> >>> ? ? ? ?Thanks for the further information. The probe "ILMN_2038777" is not only a gene probe but also a positive control probe (control type: housekeeping). You can find more information about this probe in the HT12 manifest file. But I do not know why it was absent in your TB2 dataset. Anyway, it will be quite safe to remove the housekeeping "ILMN_2038777" from your TB1 dataset. Then you can combine these two datasets together. Below is the code to do this: >>> >>> x1 <- read.ilmn("your_TB1_probe_profile","your_TB1_control_probe profile") >>> x2 <- read.ilmn("your_TB2_probe_profile","your_TB2_control_probe profile") >>> x1 <- x1[!(x1$genes$Probe_Id == "ILMN_2038777" & tolower(x1$genes$Status) == "housekeeping"),] >>> m <- match(x1$genes$Probe_Id, x2$genes$Probe_Id) >>> x.merged <- cbind(x1,x2[m,]) >>> >>> This will combine TB1 with TB2. For the other four datasets, you can merge them to x.merged using the same procedure (removing housekeeping "ILMN_2038777" from the dataset first if it has, then using match and cbind commands to merge them). >>> >>> Hope this will work for you. But let you know it doesn't. >>> >>> Cheers, >>> Wei >>> >>> >>> On Apr 15, 2011, at 9:16 PM, Gavin Koh wrote: >>> >>>> Dear Wei, >>>> >>>> Thank you for replying so quickly. There appear to be 6 batches in >>>> this dataset (TB1 to 6) >>>> >>>>> TB1$genes[1:10] >>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>>> TB2$genes[1:10] >>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>>> TB3$genes[1:10] >>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>>> TB4$genes[1:10] >>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>>> TB5$genes[1:10] >>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>>> TB6$genes[1:10] >>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>> >>>> ???????? >>>> >>>> Gavin >>>> >>>> On 15 April 2011 11:45, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>>> Hi Gavin: >>>>> >>>>> ? ? ? ?It would be best if you can match the two batches using the probe identifiers because they are much less likely to have duplicates. Would it possible to show the first several probes in each dataset so that I can write some code to help you do this? >>>>> >>>>> Cheers, >>>>> Wei >>>>> >>>>> >>>>> On Apr 15, 2011, at 7:54 PM, Gavin Koh wrote: >>>>> >>>>>> Dear Wei, >>>>>> >>>>>> A little more information: the difference seems to be a single duplicated probe. >>>>>> Just comparing two batches (TB1 and TB2) with different probe numbers: >>>>>>> length(TB1$genes) >>>>>> [1] 48804 >>>>>>> length(TB2$genes) >>>>>> [1] 48803 >>>>>>> length(unique(TB2$genes)) >>>>>> [1] 48803 >>>>>>> length(unique(TB1$genes)) >>>>>> [1] 48803 >>>>>>> setdiff(TB1$genes,TB2$genes) >>>>>> character(0) >>>>>>> setequal(TB1$genes,TB2$genes) >>>>>> [1] TRUE >>>>>> >>>>>> That still leaves me the problem that I don't know how to identify the >>>>>> repeated probe or how to cbind TB1 and TB2... :-( >>>>>> >>>>>> Gavin >>>>>> >>>>>> On 15 April 2011 02:38, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>>>>> Hi Gavin: >>>>>>> >>>>>>> ? ? ? ?The number of probes which were present in one batch but not in others should be very small. So you can use the probes which are common in all batches for your analysis. >>>>>>> >>>>>>> ? ? ? ?Hope this helps. >>>>>>> >>>>>>> Cheers, >>>>>>> Wei >>>>>>> >>>>>>> On Apr 15, 2011, at 1:20 AM, Gavin Koh wrote: >>>>>>> >>>>>>>> I am trying to analyse data from ArrayExpress E-GEOD-22098 (published >>>>>>>> Dec last year). >>>>>>>> According to the study methods, the data are Illumina HumanHT-12 v3 >>>>>>>> Expression BeadChips, but the hybridisation seems to have been done in >>>>>>>> several batches, with different numbers of probes in each batch, >>>>>>>> alternating between 48803 and 48804. Can anyone tell me how to combine >>>>>>>> these different batches into the same file, please? I am trying to >>>>>>>> read the probe data using the read.ilmn() function in limma, but >>>>>>>> failing, because cbind complains the matrices are not the same length >>>>>>>> (precise error is "Error in cbind(out$E, objects[[i]]$E) : number of >>>>>>>> rows of matrices must match (see arg 2)"). >>>>>>>> >>>>>>>> Thank you in advance, >>>>>>>> >>>>>>>> Gavin Koh >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioconductor mailing list >>>>>>>> Bioconductor at r-project.org >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>> >>>>>>> >>>>>>> ______________________________________________________________________ >>>>>>> The information in this email is confidential and intended solely for the addressee. >>>>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>>>> ______________________________________________________________________ >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Hofstadter's Law: It always takes longer than you expect, even when >>>>>> you take into account Hofstadter's Law. >>>>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >>>>> >>>>> >>>>> ______________________________________________________________________ >>>>> The information in this email is confidential and intended solely for the addressee. >>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>> ______________________________________________________________________ >>>>> >>>> >>>> >>>> >>>> -- >>>> Hofstadter's Law: It always takes longer than you expect, even when >>>> you take into account Hofstadter's Law. >>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >>> >>> >>> ______________________________________________________________________ >>> The information in this email is confidential and intended solely for the addressee. >>> You must not disclose, forward, print or use it without the permission of the sender. >>> ______________________________________________________________________ >>> >> >> >> >> -- >> Hofstadter's Law: It always takes longer than you expect, even when >> you take into account Hofstadter's Law. >> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:14}}

ADD REPLY • link 13.0 years ago Gavin Koh ▴ 220

0

Entering edit mode

Hi Gavin: Sorry, TB1[common.probes] should be changed to TB1[common.probes, ]. Hope it works now. Cheers, Wei On Apr 16, 2011, at 4:32 PM, Gavin Koh wrote: > Dear Wei, > > I am afraid this data is from a public repository, so I have no > control over what data is published or the format :-( > I am afraid cbind still does not appear to work with this subscripting. > >> common.probes <- match(TB2$genes,TB1$genes) >> TB <- cbind(TB2,TB1[common.probes]) > Error: Two subscripts required > > Please help? > > Gavin ?? ?? > > On 16 April 2011 00:33, Wei Shi <shi at="" wehi.edu.au=""> wrote: >> Dear Gavin: >> >> OK, so you did not input the control data. That is the reason why my code did not work. You should really include the control data in your analysis because they are very useful for the normalization. But you can use the following code to merge the data you are having now: >> >> m <- match(TB2$genes, TB1$genes) >> merged <- cbind(TB2,TB1[m]) >> >> This will remove the second ILMN_2038777 probe from TB1 and combine probes from TB1 and TB2 in the right order. >> >> Cheers, >> Wei >> >> On Apr 16, 2011, at 1:58 AM, Gavin Koh wrote: >> >>> Dear Wei >>> >>> I am very sorry, but this still does not work. >>> >>> ILMN_2038777 is not missing in TB1, but duplicated. The batches with >>> 48804 probes contain two copies of ILMN_2038777. The batches with >>> 48803 probes contain only one copy of ILMN_2038777. The order of >>> probes also seems to be different from batch to batch. >>> >>> TB1 was generated using: >>> >>> TB1 <- read.ilmn( >>> files=as.character(targets$name)[1:5], >>> probeid="Probe_ID", >>> expr="Signal", sep="\t", >>> other.columns="Detection" >>> ) >>> >>> The reason for this being that the summarized data for each array is >>> in a separate file. There is no bead level data available. There is no >>> xxx_profile.txt file. >>> >>> I tried removing ILMN_2038777, but I cannot. Am I right in saying that >>> this method of subsetting is only applicable to data frames? >>>> TB1 <- TB1[TB1$genes != "ILMN_2038777", ] >>> Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions >>>> TB1 <- TB1[!(TB1$genes == "ILMN_2038777"), ] >>> Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions >>> >>> Just so you can see the structure of the file that read.ilmn() has produced: >>> >>> --begin screen dump-- >>> >>>> TB1 >>> An object of class "EListRaw" >>> $source >>> [1] "illumina" >>> >>> $E >>> [,1] [,2] [,3] [,4] [,5] >>> ILMN_1809034 58.802010 24.907950 13.905010 10.07729 7.044668 >>> ILMN_1660305 236.458900 113.218000 193.581800 282.36350 127.023400 >>> ILMN_1792173 202.685800 120.449500 208.370600 242.63090 130.447200 >>> ILMN_1762337 -4.230737 -3.899888 -3.654122 -3.30873 -5.115820 >>> ILMN_2055271 7.409712 8.776000 9.394149 12.66054 1.250353 >>> 48799 more rows ... >>> >>> $genes >>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" "ILMN_2055271" >>> 48799 more elements ... >>> >>> $targets >>> [1] SampleNames >>> <0 rows> (or 0-length row.names) >>> >>> $other >>> $Detection >>> [,1] [,2] [,3] [,4] [,5] >>> ILMN_1809034 0.003952569 0.01844532 0.03952569 0.08432148 0.111989500 >>> ILMN_1660305 0.000000000 0.00000000 0.00000000 0.00000000 0.001317523 >>> ILMN_1792173 0.000000000 0.00000000 0.00000000 0.00000000 0.001317523 >>> ILMN_1762337 0.728590300 0.75230570 0.68247690 0.57444010 0.708827400 >>> ILMN_2055271 0.076416340 0.05138340 0.05665349 0.06719368 0.283267500 >>> 48799 more rows ... >>> >>> --end screen dump-- >>> >>> Gavin >>> >>> On 15 April 2011 12:24, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>> Dear Gavin: >>>> >>>> Thanks for the further information. The probe "ILMN_2038777" is not only a gene probe but also a positive control probe (control type: housekeeping). You can find more information about this probe in the HT12 manifest file. But I do not know why it was absent in your TB2 dataset. Anyway, it will be quite safe to remove the housekeeping "ILMN_2038777" from your TB1 dataset. Then you can combine these two datasets together. Below is the code to do this: >>>> >>>> x1 <- read.ilmn("your_TB1_probe_profile","your_TB1_control_probe profile") >>>> x2 <- read.ilmn("your_TB2_probe_profile","your_TB2_control_probe profile") >>>> x1 <- x1[!(x1$genes$Probe_Id == "ILMN_2038777" & tolower(x1$genes$Status) == "housekeeping"),] >>>> m <- match(x1$genes$Probe_Id, x2$genes$Probe_Id) >>>> x.merged <- cbind(x1,x2[m,]) >>>> >>>> This will combine TB1 with TB2. For the other four datasets, you can merge them to x.merged using the same procedure (removing housekeeping "ILMN_2038777" from the dataset first if it has, then using match and cbind commands to merge them). >>>> >>>> Hope this will work for you. But let you know it doesn't. >>>> >>>> Cheers, >>>> Wei >>>> >>>> >>>> On Apr 15, 2011, at 9:16 PM, Gavin Koh wrote: >>>> >>>>> Dear Wei, >>>>> >>>>> Thank you for replying so quickly. There appear to be 6 batches in >>>>> this dataset (TB1 to 6) >>>>> >>>>>> TB1$genes[1:10] >>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>>>> TB2$genes[1:10] >>>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>>>> TB3$genes[1:10] >>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>>>> TB4$genes[1:10] >>>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>>>> TB5$genes[1:10] >>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>>>> TB6$genes[1:10] >>>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>>> >>>>> ???????? >>>>> >>>>> Gavin >>>>> >>>>> On 15 April 2011 11:45, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>>>> Hi Gavin: >>>>>> >>>>>> It would be best if you can match the two batches using the probe identifiers because they are much less likely to have duplicates. Would it possible to show the first several probes in each dataset so that I can write some code to help you do this? >>>>>> >>>>>> Cheers, >>>>>> Wei >>>>>> >>>>>> >>>>>> On Apr 15, 2011, at 7:54 PM, Gavin Koh wrote: >>>>>> >>>>>>> Dear Wei, >>>>>>> >>>>>>> A little more information: the difference seems to be a single duplicated probe. >>>>>>> Just comparing two batches (TB1 and TB2) with different probe numbers: >>>>>>>> length(TB1$genes) >>>>>>> [1] 48804 >>>>>>>> length(TB2$genes) >>>>>>> [1] 48803 >>>>>>>> length(unique(TB2$genes)) >>>>>>> [1] 48803 >>>>>>>> length(unique(TB1$genes)) >>>>>>> [1] 48803 >>>>>>>> setdiff(TB1$genes,TB2$genes) >>>>>>> character(0) >>>>>>>> setequal(TB1$genes,TB2$genes) >>>>>>> [1] TRUE >>>>>>> >>>>>>> That still leaves me the problem that I don't know how to identify the >>>>>>> repeated probe or how to cbind TB1 and TB2... :-( >>>>>>> >>>>>>> Gavin >>>>>>> >>>>>>> On 15 April 2011 02:38, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>>>>>> Hi Gavin: >>>>>>>> >>>>>>>> The number of probes which were present in one batch but not in others should be very small. So you can use the probes which are common in all batches for your analysis. >>>>>>>> >>>>>>>> Hope this helps. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Wei >>>>>>>> >>>>>>>> On Apr 15, 2011, at 1:20 AM, Gavin Koh wrote: >>>>>>>> >>>>>>>>> I am trying to analyse data from ArrayExpress E-GEOD-22098 (published >>>>>>>>> Dec last year). >>>>>>>>> According to the study methods, the data are Illumina HumanHT-12 v3 >>>>>>>>> Expression BeadChips, but the hybridisation seems to have been done in >>>>>>>>> several batches, with different numbers of probes in each batch, >>>>>>>>> alternating between 48803 and 48804. Can anyone tell me how to combine >>>>>>>>> these different batches into the same file, please? I am trying to >>>>>>>>> read the probe data using the read.ilmn() function in limma, but >>>>>>>>> failing, because cbind complains the matrices are not the same length >>>>>>>>> (precise error is "Error in cbind(out$E, objects[[i]]$E) : number of >>>>>>>>> rows of matrices must match (see arg 2)"). >>>>>>>>> >>>>>>>>> Thank you in advance, >>>>>>>>> >>>>>>>>> Gavin Koh >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioconductor mailing list >>>>>>>>> Bioconductor at r-project.org >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>> >>>>>>>> >>>>>>>> ______________________________________________________________________ >>>>>>>> The information in this email is confidential and intended solely for the addressee. >>>>>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>>>>> ______________________________________________________________________ >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Hofstadter's Law: It always takes longer than you expect, even when >>>>>>> you take into account Hofstadter's Law. >>>>>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >>>>>> >>>>>> >>>>>> ______________________________________________________________________ >>>>>> The information in this email is confidential and intended solely for the addressee. >>>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>>> ______________________________________________________________________ >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Hofstadter's Law: It always takes longer than you expect, even when >>>>> you take into account Hofstadter's Law. >>>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >>>> >>>> >>>> ______________________________________________________________________ >>>> The information in this email is confidential and intended solely for the addressee. >>>> You must not disclose, forward, print or use it without the permission of the sender. >>>> ______________________________________________________________________ >>>> >>> >>> >>> >>> -- >>> Hofstadter's Law: It always takes longer than you expect, even when >>> you take into account Hofstadter's Law. >>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the addressee. >> You must not disclose, forward, print or use it without the permission of the sender. >> ______________________________________________________________________ >> > > > > -- > Hofstadter's Law: It always takes longer than you expect, even when > you take into account Hofstadter's Law. > ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 13.0 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Dear Wei, I am afraid it still doesn't work. I this is because TB1 is a list and not a data frame and I cannot coerce it to become a dataframe. > TB <- cbind(TB2,TB1[common.probes,]) Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions > names(TB1) [1] "source" "E" "genes" "targets" "other" > class(TB1) [1] "EListRaw" attr(,"package") [1] "limma" I checked EListRaw and it inherits directly from list and not from data frame. So sorry, Gavin. On 16 April 2011 08:38, Wei Shi <shi at="" wehi.edu.au=""> wrote: > Hi Gavin: > > ? ? ? ?Sorry, TB1[common.probes] should be changed to TB1[common.probes, ]. > > ? ? ? ?Hope it works now. > > Cheers, > Wei > > > On Apr 16, 2011, at 4:32 PM, Gavin Koh wrote: > >> Dear Wei, >> >> I am afraid this data is from a public repository, so I have no >> control over what data is published or the format :-( >> I am afraid cbind still does not appear to work with this subscripting. >> >>> common.probes <- match(TB2$genes,TB1$genes) >>> TB <- cbind(TB2,TB1[common.probes]) >> Error: Two subscripts required >> >> Please help? >> >> Gavin ?? ?? >> >> On 16 April 2011 00:33, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>> Dear Gavin: >>> >>> ? ? ? ?OK, so you did not input the control data. That is the reason why my code did not work. You should really include the control data in your analysis because they are very useful for the normalization. But you can use the following code to merge the data you are having now: >>> >>> m <- match(TB2$genes, TB1$genes) >>> merged <- cbind(TB2,TB1[m]) >>> >>> This will remove the second ILMN_2038777 probe from TB1 and combine probes from TB1 and TB2 in the right order. >>> >>> Cheers, >>> Wei >>> >>> On Apr 16, 2011, at 1:58 AM, Gavin Koh wrote: >>> >>>> Dear Wei >>>> >>>> I am very sorry, but this still does not work. >>>> >>>> ILMN_2038777 is not missing in TB1, but duplicated. The batches with >>>> 48804 probes contain two copies of ILMN_2038777. The batches with >>>> 48803 probes contain only one copy of ILMN_2038777. The order of >>>> probes also seems to be different from batch to batch. >>>> >>>> TB1 was generated using: >>>> >>>> TB1 <- read.ilmn( >>>> ?files=as.character(targets$name)[1:5], >>>> ?probeid="Probe_ID", >>>> ?expr="Signal", sep="\t", >>>> ?other.columns="Detection" >>>> ) >>>> >>>> The reason for this being that the summarized data for each array is >>>> in a separate file. There is no bead level data available. There is no >>>> xxx_profile.txt file. >>>> >>>> I tried removing ILMN_2038777, but I cannot. Am I right in saying that >>>> this method of subsetting is only applicable to data frames? >>>>> TB1 <- TB1[TB1$genes != "ILMN_2038777", ] >>>> Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions >>>>> TB1 <- TB1[!(TB1$genes == "ILMN_2038777"), ] >>>> Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions >>>> >>>> Just so you can see the structure of the file that read.ilmn() has produced: >>>> >>>> --begin screen dump-- >>>> >>>>> TB1 >>>> An object of class "EListRaw" >>>> $source >>>> [1] "illumina" >>>> >>>> $E >>>> ? ? ? ? ? ? ? ? ? [,1] ? ? ? [,2] ? ? ? [,3] ? ? ?[,4] ? ? ? [,5] >>>> ILMN_1809034 ?58.802010 ?24.907950 ?13.905010 ?10.07729 ? 7.044668 >>>> ILMN_1660305 236.458900 113.218000 193.581800 282.36350 127.023400 >>>> ILMN_1792173 202.685800 120.449500 208.370600 242.63090 130.447200 >>>> ILMN_1762337 ?-4.230737 ?-3.899888 ?-3.654122 ?-3.30873 ?-5.115820 >>>> ILMN_2055271 ? 7.409712 ? 8.776000 ? 9.394149 ?12.66054 ? 1.250353 >>>> 48799 more rows ... >>>> >>>> $genes >>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" "ILMN_2055271" >>>> 48799 more elements ... >>>> >>>> $targets >>>> [1] SampleNames >>>> <0 rows> (or 0-length row.names) >>>> >>>> $other >>>> $Detection >>>> ? ? ? ? ? ? ? ? ? ?[,1] ? ? ? [,2] ? ? ? [,3] ? ? ? [,4] ? ? ? ?[,5] >>>> ILMN_1809034 0.003952569 0.01844532 0.03952569 0.08432148 0.111989500 >>>> ILMN_1660305 0.000000000 0.00000000 0.00000000 0.00000000 0.001317523 >>>> ILMN_1792173 0.000000000 0.00000000 0.00000000 0.00000000 0.001317523 >>>> ILMN_1762337 0.728590300 0.75230570 0.68247690 0.57444010 0.708827400 >>>> ILMN_2055271 0.076416340 0.05138340 0.05665349 0.06719368 0.283267500 >>>> 48799 more rows ... >>>> >>>> --end screen dump-- >>>> >>>> Gavin >>>> >>>> On 15 April 2011 12:24, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>>> Dear Gavin: >>>>> >>>>> ? ? ? ?Thanks for the further information. The probe "ILMN_2038777" is not only a gene probe but also a positive control probe (control type: housekeeping). You can find more information about this probe in the HT12 manifest file. But I do not know why it was absent in your TB2 dataset. Anyway, it will be quite safe to remove the housekeeping "ILMN_2038777" from your TB1 dataset. Then you can combine these two datasets together. Below is the code to do this: >>>>> >>>>> x1 <- read.ilmn("your_TB1_probe_profile","your_TB1_control_probe profile") >>>>> x2 <- read.ilmn("your_TB2_probe_profile","your_TB2_control_probe profile") >>>>> x1 <- x1[!(x1$genes$Probe_Id == "ILMN_2038777" & tolower(x1$genes$Status) == "housekeeping"),] >>>>> m <- match(x1$genes$Probe_Id, x2$genes$Probe_Id) >>>>> x.merged <- cbind(x1,x2[m,]) >>>>> >>>>> This will combine TB1 with TB2. For the other four datasets, you can merge them to x.merged using the same procedure (removing housekeeping "ILMN_2038777" from the dataset first if it has, then using match and cbind commands to merge them). >>>>> >>>>> Hope this will work for you. But let you know it doesn't. >>>>> >>>>> Cheers, >>>>> Wei >>>>> >>>>> >>>>> On Apr 15, 2011, at 9:16 PM, Gavin Koh wrote: >>>>> >>>>>> Dear Wei, >>>>>> >>>>>> Thank you for replying so quickly. There appear to be 6 batches in >>>>>> this dataset (TB1 to 6) >>>>>> >>>>>>> TB1$genes[1:10] >>>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>>>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>>>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>>>>> TB2$genes[1:10] >>>>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>>>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>>>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>>>>> TB3$genes[1:10] >>>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>>>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>>>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>>>>> TB4$genes[1:10] >>>>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>>>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>>>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>>>>> TB5$genes[1:10] >>>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>>>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>>>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>>>>> TB6$genes[1:10] >>>>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>>>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>>>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>>>> >>>>>> ???????? >>>>>> >>>>>> Gavin >>>>>> >>>>>> On 15 April 2011 11:45, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>>>>> Hi Gavin: >>>>>>> >>>>>>> ? ? ? ?It would be best if you can match the two batches using the probe identifiers because they are much less likely to have duplicates. Would it possible to show the first several probes in each dataset so that I can write some code to help you do this? >>>>>>> >>>>>>> Cheers, >>>>>>> Wei >>>>>>> >>>>>>> >>>>>>> On Apr 15, 2011, at 7:54 PM, Gavin Koh wrote: >>>>>>> >>>>>>>> Dear Wei, >>>>>>>> >>>>>>>> A little more information: the difference seems to be a single duplicated probe. >>>>>>>> Just comparing two batches (TB1 and TB2) with different probe numbers: >>>>>>>>> length(TB1$genes) >>>>>>>> [1] 48804 >>>>>>>>> length(TB2$genes) >>>>>>>> [1] 48803 >>>>>>>>> length(unique(TB2$genes)) >>>>>>>> [1] 48803 >>>>>>>>> length(unique(TB1$genes)) >>>>>>>> [1] 48803 >>>>>>>>> setdiff(TB1$genes,TB2$genes) >>>>>>>> character(0) >>>>>>>>> setequal(TB1$genes,TB2$genes) >>>>>>>> [1] TRUE >>>>>>>> >>>>>>>> That still leaves me the problem that I don't know how to identify the >>>>>>>> repeated probe or how to cbind TB1 and TB2... :-( >>>>>>>> >>>>>>>> Gavin >>>>>>>> >>>>>>>> On 15 April 2011 02:38, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>>>>>>> Hi Gavin: >>>>>>>>> >>>>>>>>> ? ? ? ?The number of probes which were present in one batch but not in others should be very small. So you can use the probes which are common in all batches for your analysis. >>>>>>>>> >>>>>>>>> ? ? ? ?Hope this helps. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Wei >>>>>>>>> >>>>>>>>> On Apr 15, 2011, at 1:20 AM, Gavin Koh wrote: >>>>>>>>> >>>>>>>>>> I am trying to analyse data from ArrayExpress E-GEOD-22098 (published >>>>>>>>>> Dec last year). >>>>>>>>>> According to the study methods, the data are Illumina HumanHT-12 v3 >>>>>>>>>> Expression BeadChips, but the hybridisation seems to have been done in >>>>>>>>>> several batches, with different numbers of probes in each batch, >>>>>>>>>> alternating between 48803 and 48804. Can anyone tell me how to combine >>>>>>>>>> these different batches into the same file, please? I am trying to >>>>>>>>>> read the probe data using the read.ilmn() function in limma, but >>>>>>>>>> failing, because cbind complains the matrices are not the same length >>>>>>>>>> (precise error is "Error in cbind(out$E, objects[[i]]$E) : number of >>>>>>>>>> rows of matrices must match (see arg 2)"). >>>>>>>>>> >>>>>>>>>> Thank you in advance, >>>>>>>>>> >>>>>>>>>> Gavin Koh >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioconductor mailing list >>>>>>>>>> Bioconductor at r-project.org >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>> >>>>>>>>> >>>>>>>>> ______________________________________________________________________ >>>>>>>>> The information in this email is confidential and intended solely for the addressee. >>>>>>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>>>>>> ______________________________________________________________________ >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Hofstadter's Law: It always takes longer than you expect, even when >>>>>>>> you take into account Hofstadter's Law. >>>>>>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >>>>>>> >>>>>>> >>>>>>> ______________________________________________________________________ >>>>>>> The information in this email is confidential and intended solely for the addressee. >>>>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>>>> ______________________________________________________________________ >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Hofstadter's Law: It always takes longer than you expect, even when >>>>>> you take into account Hofstadter's Law. >>>>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >>>>> >>>>> >>>>> ______________________________________________________________________ >>>>> The information in this email is confidential and intended solely for the addressee. >>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>> ______________________________________________________________________ >>>>> >>>> >>>> >>>> >>>> -- >>>> Hofstadter's Law: It always takes longer than you expect, even when >>>> you take into account Hofstadter's Law. >>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >>> >>> >>> ______________________________________________________________________ >>> The information in this email is confidential and intended solely for the addressee. >>> You must not disclose, forward, print or use it without the permission of the sender. >>> ______________________________________________________________________ >>> >> >> >> >> -- >> Hofstadter's Law: It always takes longer than you expect, even when >> you take into account Hofstadter's Law. >> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:14}}

ADD REPLY • link 13.0 years ago Gavin Koh ▴ 220

0

Entering edit mode

Hi Gavin: I think the problem is that your TB1$genes (and TB2$genes) is a vector rather than a data frame. This made cbind fail to combine them. I guess the data you downloaded from the public repository is not the original GenomeStudio/BeadStudio output. But you can fix this using the following code: m <- match(TB2$genes,TB1$genes) TB1$genes <- data.frame(TB1$genes) TB2$genes <- data.frame(TB2$genes) TB <- cbind(TB2,TB1[m,]) I tried this code on my computer and it worked. Hope that will work for you. Cheers, Wei On Apr 16, 2011, at 7:34 PM, Gavin Koh wrote: > Dear Wei, > > I am afraid it still doesn't work. I this is because TB1 is a list and > not a data frame and I cannot coerce it to become a dataframe. >> TB <- cbind(TB2,TB1[common.probes,]) > Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions >> names(TB1) > [1] "source" "E" "genes" "targets" "other" >> class(TB1) > [1] "EListRaw" > attr(,"package") > [1] "limma" > > I checked EListRaw and it inherits directly from list and not from data frame. > So sorry, > > Gavin. > > On 16 April 2011 08:38, Wei Shi <shi at="" wehi.edu.au=""> wrote: >> Hi Gavin: >> >> Sorry, TB1[common.probes] should be changed to TB1[common.probes, ]. >> >> Hope it works now. >> >> Cheers, >> Wei >> >> >> On Apr 16, 2011, at 4:32 PM, Gavin Koh wrote: >> >>> Dear Wei, >>> >>> I am afraid this data is from a public repository, so I have no >>> control over what data is published or the format :-( >>> I am afraid cbind still does not appear to work with this subscripting. >>> >>>> common.probes <- match(TB2$genes,TB1$genes) >>>> TB <- cbind(TB2,TB1[common.probes]) >>> Error: Two subscripts required >>> >>> Please help? >>> >>> Gavin ?? ?? >>> >>> On 16 April 2011 00:33, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>> Dear Gavin: >>>> >>>> OK, so you did not input the control data. That is the reason why my code did not work. You should really include the control data in your analysis because they are very useful for the normalization. But you can use the following code to merge the data you are having now: >>>> >>>> m <- match(TB2$genes, TB1$genes) >>>> merged <- cbind(TB2,TB1[m]) >>>> >>>> This will remove the second ILMN_2038777 probe from TB1 and combine probes from TB1 and TB2 in the right order. >>>> >>>> Cheers, >>>> Wei >>>> >>>> On Apr 16, 2011, at 1:58 AM, Gavin Koh wrote: >>>> >>>>> Dear Wei >>>>> >>>>> I am very sorry, but this still does not work. >>>>> >>>>> ILMN_2038777 is not missing in TB1, but duplicated. The batches with >>>>> 48804 probes contain two copies of ILMN_2038777. The batches with >>>>> 48803 probes contain only one copy of ILMN_2038777. The order of >>>>> probes also seems to be different from batch to batch. >>>>> >>>>> TB1 was generated using: >>>>> >>>>> TB1 <- read.ilmn( >>>>> files=as.character(targets$name)[1:5], >>>>> probeid="Probe_ID", >>>>> expr="Signal", sep="\t", >>>>> other.columns="Detection" >>>>> ) >>>>> >>>>> The reason for this being that the summarized data for each array is >>>>> in a separate file. There is no bead level data available. There is no >>>>> xxx_profile.txt file. >>>>> >>>>> I tried removing ILMN_2038777, but I cannot. Am I right in saying that >>>>> this method of subsetting is only applicable to data frames? >>>>>> TB1 <- TB1[TB1$genes != "ILMN_2038777", ] >>>>> Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions >>>>>> TB1 <- TB1[!(TB1$genes == "ILMN_2038777"), ] >>>>> Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions >>>>> >>>>> Just so you can see the structure of the file that read.ilmn() has produced: >>>>> >>>>> --begin screen dump-- >>>>> >>>>>> TB1 >>>>> An object of class "EListRaw" >>>>> $source >>>>> [1] "illumina" >>>>> >>>>> $E >>>>> [,1] [,2] [,3] [,4] [,5] >>>>> ILMN_1809034 58.802010 24.907950 13.905010 10.07729 7.044668 >>>>> ILMN_1660305 236.458900 113.218000 193.581800 282.36350 127.023400 >>>>> ILMN_1792173 202.685800 120.449500 208.370600 242.63090 130.447200 >>>>> ILMN_1762337 -4.230737 -3.899888 -3.654122 -3.30873 -5.115820 >>>>> ILMN_2055271 7.409712 8.776000 9.394149 12.66054 1.250353 >>>>> 48799 more rows ... >>>>> >>>>> $genes >>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" "ILMN_2055271" >>>>> 48799 more elements ... >>>>> >>>>> $targets >>>>> [1] SampleNames >>>>> <0 rows> (or 0-length row.names) >>>>> >>>>> $other >>>>> $Detection >>>>> [,1] [,2] [,3] [,4] [,5] >>>>> ILMN_1809034 0.003952569 0.01844532 0.03952569 0.08432148 0.111989500 >>>>> ILMN_1660305 0.000000000 0.00000000 0.00000000 0.00000000 0.001317523 >>>>> ILMN_1792173 0.000000000 0.00000000 0.00000000 0.00000000 0.001317523 >>>>> ILMN_1762337 0.728590300 0.75230570 0.68247690 0.57444010 0.708827400 >>>>> ILMN_2055271 0.076416340 0.05138340 0.05665349 0.06719368 0.283267500 >>>>> 48799 more rows ... >>>>> >>>>> --end screen dump-- >>>>> >>>>> Gavin >>>>> >>>>> On 15 April 2011 12:24, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>>>> Dear Gavin: >>>>>> >>>>>> Thanks for the further information. The probe "ILMN_2038777" is not only a gene probe but also a positive control probe (control type: housekeeping). You can find more information about this probe in the HT12 manifest file. But I do not know why it was absent in your TB2 dataset. Anyway, it will be quite safe to remove the housekeeping "ILMN_2038777" from your TB1 dataset. Then you can combine these two datasets together. Below is the code to do this: >>>>>> >>>>>> x1 <- read.ilmn("your_TB1_probe_profile","your_TB1_control_probe profile") >>>>>> x2 <- read.ilmn("your_TB2_probe_profile","your_TB2_control_probe profile") >>>>>> x1 <- x1[!(x1$genes$Probe_Id == "ILMN_2038777" & tolower(x1$genes$Status) == "housekeeping"),] >>>>>> m <- match(x1$genes$Probe_Id, x2$genes$Probe_Id) >>>>>> x.merged <- cbind(x1,x2[m,]) >>>>>> >>>>>> This will combine TB1 with TB2. For the other four datasets, you can merge them to x.merged using the same procedure (removing housekeeping "ILMN_2038777" from the dataset first if it has, then using match and cbind commands to merge them). >>>>>> >>>>>> Hope this will work for you. But let you know it doesn't. >>>>>> >>>>>> Cheers, >>>>>> Wei >>>>>> >>>>>> >>>>>> On Apr 15, 2011, at 9:16 PM, Gavin Koh wrote: >>>>>> >>>>>>> Dear Wei, >>>>>>> >>>>>>> Thank you for replying so quickly. There appear to be 6 batches in >>>>>>> this dataset (TB1 to 6) >>>>>>> >>>>>>>> TB1$genes[1:10] >>>>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>>>>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>>>>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>>>>>> TB2$genes[1:10] >>>>>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>>>>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>>>>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>>>>>> TB3$genes[1:10] >>>>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>>>>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>>>>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>>>>>> TB4$genes[1:10] >>>>>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>>>>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>>>>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>>>>>> TB5$genes[1:10] >>>>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>>>>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>>>>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>>>>>> TB6$genes[1:10] >>>>>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>>>>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>>>>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>>>>> >>>>>>> ???????? >>>>>>> >>>>>>> Gavin >>>>>>> >>>>>>> On 15 April 2011 11:45, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>>>>>> Hi Gavin: >>>>>>>> >>>>>>>> It would be best if you can match the two batches using the probe identifiers because they are much less likely to have duplicates. Would it possible to show the first several probes in each dataset so that I can write some code to help you do this? >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Wei >>>>>>>> >>>>>>>> >>>>>>>> On Apr 15, 2011, at 7:54 PM, Gavin Koh wrote: >>>>>>>> >>>>>>>>> Dear Wei, >>>>>>>>> >>>>>>>>> A little more information: the difference seems to be a single duplicated probe. >>>>>>>>> Just comparing two batches (TB1 and TB2) with different probe numbers: >>>>>>>>>> length(TB1$genes) >>>>>>>>> [1] 48804 >>>>>>>>>> length(TB2$genes) >>>>>>>>> [1] 48803 >>>>>>>>>> length(unique(TB2$genes)) >>>>>>>>> [1] 48803 >>>>>>>>>> length(unique(TB1$genes)) >>>>>>>>> [1] 48803 >>>>>>>>>> setdiff(TB1$genes,TB2$genes) >>>>>>>>> character(0) >>>>>>>>>> setequal(TB1$genes,TB2$genes) >>>>>>>>> [1] TRUE >>>>>>>>> >>>>>>>>> That still leaves me the problem that I don't know how to identify the >>>>>>>>> repeated probe or how to cbind TB1 and TB2... :-( >>>>>>>>> >>>>>>>>> Gavin >>>>>>>>> >>>>>>>>> On 15 April 2011 02:38, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>>>>>>>> Hi Gavin: >>>>>>>>>> >>>>>>>>>> The number of probes which were present in one batch but not in others should be very small. So you can use the probes which are common in all batches for your analysis. >>>>>>>>>> >>>>>>>>>> Hope this helps. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Wei >>>>>>>>>> >>>>>>>>>> On Apr 15, 2011, at 1:20 AM, Gavin Koh wrote: >>>>>>>>>> >>>>>>>>>>> I am trying to analyse data from ArrayExpress E-GEOD-22098 (published >>>>>>>>>>> Dec last year). >>>>>>>>>>> According to the study methods, the data are Illumina HumanHT-12 v3 >>>>>>>>>>> Expression BeadChips, but the hybridisation seems to have been done in >>>>>>>>>>> several batches, with different numbers of probes in each batch, >>>>>>>>>>> alternating between 48803 and 48804. Can anyone tell me how to combine >>>>>>>>>>> these different batches into the same file, please? I am trying to >>>>>>>>>>> read the probe data using the read.ilmn() function in limma, but >>>>>>>>>>> failing, because cbind complains the matrices are not the same length >>>>>>>>>>> (precise error is "Error in cbind(out$E, objects[[i]]$E) : number of >>>>>>>>>>> rows of matrices must match (see arg 2)"). >>>>>>>>>>> >>>>>>>>>>> Thank you in advance, >>>>>>>>>>> >>>>>>>>>>> Gavin Koh >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioconductor mailing list >>>>>>>>>>> Bioconductor at r-project.org >>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ______________________________________________________________________ >>>>>>>>>> The information in this email is confidential and intended solely for the addressee. >>>>>>>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>>>>>>> ______________________________________________________________________ >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Hofstadter's Law: It always takes longer than you expect, even when >>>>>>>>> you take into account Hofstadter's Law. >>>>>>>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >>>>>>>> >>>>>>>> >>>>>>>> ______________________________________________________________________ >>>>>>>> The information in this email is confidential and intended solely for the addressee. >>>>>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>>>>> ______________________________________________________________________ >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Hofstadter's Law: It always takes longer than you expect, even when >>>>>>> you take into account Hofstadter's Law. >>>>>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >>>>>> >>>>>> >>>>>> ______________________________________________________________________ >>>>>> The information in this email is confidential and intended solely for the addressee. >>>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>>> ______________________________________________________________________ >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Hofstadter's Law: It always takes longer than you expect, even when >>>>> you take into account Hofstadter's Law. >>>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >>>> >>>> >>>> ______________________________________________________________________ >>>> The information in this email is confidential and intended solely for the addressee. >>>> You must not disclose, forward, print or use it without the permission of the sender. >>>> ______________________________________________________________________ >>>> >>> >>> >>> >>> -- >>> Hofstadter's Law: It always takes longer than you expect, even when >>> you take into account Hofstadter's Law. >>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the addressee. >> You must not disclose, forward, print or use it without the permission of the sender. >> ______________________________________________________________________ >> > > > > -- > Hofstadter's Law: It always takes longer than you expect, even when > you take into account Hofstadter's Law. > ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 13.0 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Dear Wei, I think that's worked! Thank you! Gavin. On 16 April 2011 13:25, Wei Shi <shi at="" wehi.edu.au=""> wrote: > Hi Gavin: > > ? ? ? ?I think the problem is that your TB1$genes (and TB2$genes) is a vector rather than a data frame. This made cbind fail to combine them. I guess the data you downloaded from the public repository is not the original GenomeStudio/BeadStudio output. But you can fix this using the following code: > > m <- match(TB2$genes,TB1$genes) > TB1$genes <- data.frame(TB1$genes) > TB2$genes <- data.frame(TB2$genes) > TB <- cbind(TB2,TB1[m,]) > > ? ? ? ?I tried this code on my computer and it worked. Hope that will work for you. > > Cheers, > Wei > > On Apr 16, 2011, at 7:34 PM, Gavin Koh wrote: > >> Dear Wei, >> >> I am afraid it still doesn't work. I this is because TB1 is a list and >> not a data frame and I cannot coerce it to become a dataframe. >>> TB <- cbind(TB2,TB1[common.probes,]) >> Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions >>> names(TB1) >> [1] "source" ?"E" ? ? ? "genes" ? "targets" "other" >>> class(TB1) >> [1] "EListRaw" >> attr(,"package") >> [1] "limma" >> >> I checked EListRaw and it inherits directly from list and not from data frame. >> So sorry, >> >> Gavin. >> >> On 16 April 2011 08:38, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>> Hi Gavin: >>> >>> ? ? ? ?Sorry, TB1[common.probes] should be changed to TB1[common.probes, ]. >>> >>> ? ? ? ?Hope it works now. >>> >>> Cheers, >>> Wei >>> >>> >>> On Apr 16, 2011, at 4:32 PM, Gavin Koh wrote: >>> >>>> Dear Wei, >>>> >>>> I am afraid this data is from a public repository, so I have no >>>> control over what data is published or the format :-( >>>> I am afraid cbind still does not appear to work with this subscripting. >>>> >>>>> common.probes <- match(TB2$genes,TB1$genes) >>>>> TB <- cbind(TB2,TB1[common.probes]) >>>> Error: Two subscripts required >>>> >>>> Please help? >>>> >>>> Gavin ?? ?? >>>> >>>> On 16 April 2011 00:33, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>>> Dear Gavin: >>>>> >>>>> ? ? ? ?OK, so you did not input the control data. That is the reason why my code did not work. You should really include the control data in your analysis because they are very useful for the normalization. But you can use the following code to merge the data you are having now: >>>>> >>>>> m <- match(TB2$genes, TB1$genes) >>>>> merged <- cbind(TB2,TB1[m]) >>>>> >>>>> This will remove the second ILMN_2038777 probe from TB1 and combine probes from TB1 and TB2 in the right order. >>>>> >>>>> Cheers, >>>>> Wei >>>>> >>>>> On Apr 16, 2011, at 1:58 AM, Gavin Koh wrote: >>>>> >>>>>> Dear Wei >>>>>> >>>>>> I am very sorry, but this still does not work. >>>>>> >>>>>> ILMN_2038777 is not missing in TB1, but duplicated. The batches with >>>>>> 48804 probes contain two copies of ILMN_2038777. The batches with >>>>>> 48803 probes contain only one copy of ILMN_2038777. The order of >>>>>> probes also seems to be different from batch to batch. >>>>>> >>>>>> TB1 was generated using: >>>>>> >>>>>> TB1 <- read.ilmn( >>>>>> ?files=as.character(targets$name)[1:5], >>>>>> ?probeid="Probe_ID", >>>>>> ?expr="Signal", sep="\t", >>>>>> ?other.columns="Detection" >>>>>> ) >>>>>> >>>>>> The reason for this being that the summarized data for each array is >>>>>> in a separate file. There is no bead level data available. There is no >>>>>> xxx_profile.txt file. >>>>>> >>>>>> I tried removing ILMN_2038777, but I cannot. Am I right in saying that >>>>>> this method of subsetting is only applicable to data frames? >>>>>>> TB1 <- TB1[TB1$genes != "ILMN_2038777", ] >>>>>> Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions >>>>>>> TB1 <- TB1[!(TB1$genes == "ILMN_2038777"), ] >>>>>> Error in object$genes[i, , drop = FALSE] : incorrect number of dimensions >>>>>> >>>>>> Just so you can see the structure of the file that read.ilmn() has produced: >>>>>> >>>>>> --begin screen dump-- >>>>>> >>>>>>> TB1 >>>>>> An object of class "EListRaw" >>>>>> $source >>>>>> [1] "illumina" >>>>>> >>>>>> $E >>>>>> ? ? ? ? ? ? ? ? ? [,1] ? ? ? [,2] ? ? ? [,3] ? ? ?[,4] ? ? ? [,5] >>>>>> ILMN_1809034 ?58.802010 ?24.907950 ?13.905010 ?10.07729 ? 7.044668 >>>>>> ILMN_1660305 236.458900 113.218000 193.581800 282.36350 127.023400 >>>>>> ILMN_1792173 202.685800 120.449500 208.370600 242.63090 130.447200 >>>>>> ILMN_1762337 ?-4.230737 ?-3.899888 ?-3.654122 ?-3.30873 ?-5.115820 >>>>>> ILMN_2055271 ? 7.409712 ? 8.776000 ? 9.394149 ?12.66054 ? 1.250353 >>>>>> 48799 more rows ... >>>>>> >>>>>> $genes >>>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" "ILMN_2055271" >>>>>> 48799 more elements ... >>>>>> >>>>>> $targets >>>>>> [1] SampleNames >>>>>> <0 rows> (or 0-length row.names) >>>>>> >>>>>> $other >>>>>> $Detection >>>>>> ? ? ? ? ? ? ? ? ? ?[,1] ? ? ? [,2] ? ? ? [,3] ? ? ? [,4] ? ? ? ?[,5] >>>>>> ILMN_1809034 0.003952569 0.01844532 0.03952569 0.08432148 0.111989500 >>>>>> ILMN_1660305 0.000000000 0.00000000 0.00000000 0.00000000 0.001317523 >>>>>> ILMN_1792173 0.000000000 0.00000000 0.00000000 0.00000000 0.001317523 >>>>>> ILMN_1762337 0.728590300 0.75230570 0.68247690 0.57444010 0.708827400 >>>>>> ILMN_2055271 0.076416340 0.05138340 0.05665349 0.06719368 0.283267500 >>>>>> 48799 more rows ... >>>>>> >>>>>> --end screen dump-- >>>>>> >>>>>> Gavin >>>>>> >>>>>> On 15 April 2011 12:24, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>>>>> Dear Gavin: >>>>>>> >>>>>>> ? ? ? ?Thanks for the further information. The probe "ILMN_2038777" is not only a gene probe but also a positive control probe (control type: housekeeping). You can find more information about this probe in the HT12 manifest file. But I do not know why it was absent in your TB2 dataset. Anyway, it will be quite safe to remove the housekeeping "ILMN_2038777" from your TB1 dataset. Then you can combine these two datasets together. Below is the code to do this: >>>>>>> >>>>>>> x1 <- read.ilmn("your_TB1_probe_profile","your_TB1_control_probe profile") >>>>>>> x2 <- read.ilmn("your_TB2_probe_profile","your_TB2_control_probe profile") >>>>>>> x1 <- x1[!(x1$genes$Probe_Id == "ILMN_2038777" & tolower(x1$genes$Status) == "housekeeping"),] >>>>>>> m <- match(x1$genes$Probe_Id, x2$genes$Probe_Id) >>>>>>> x.merged <- cbind(x1,x2[m,]) >>>>>>> >>>>>>> This will combine TB1 with TB2. For the other four datasets, you can merge them to x.merged using the same procedure (removing housekeeping "ILMN_2038777" from the dataset first if it has, then using match and cbind commands to merge them). >>>>>>> >>>>>>> Hope this will work for you. But let you know it doesn't. >>>>>>> >>>>>>> Cheers, >>>>>>> Wei >>>>>>> >>>>>>> >>>>>>> On Apr 15, 2011, at 9:16 PM, Gavin Koh wrote: >>>>>>> >>>>>>>> Dear Wei, >>>>>>>> >>>>>>>> Thank you for replying so quickly. There appear to be 6 batches in >>>>>>>> this dataset (TB1 to 6) >>>>>>>> >>>>>>>>> TB1$genes[1:10] >>>>>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>>>>>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>>>>>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>>>>>>> TB2$genes[1:10] >>>>>>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>>>>>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>>>>>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>>>>>>> TB3$genes[1:10] >>>>>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>>>>>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>>>>>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>>>>>>> TB4$genes[1:10] >>>>>>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>>>>>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>>>>>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>>>>>>> TB5$genes[1:10] >>>>>>>> [1] "ILMN_1809034" "ILMN_1660305" "ILMN_1792173" "ILMN_1762337" >>>>>>>> "ILMN_2055271" "ILMN_1736007" "ILMN_1814316" >>>>>>>> [8] "ILMN_2359168" "ILMN_1731507" "ILMN_1787689" >>>>>>>>> TB6$genes[1:10] >>>>>>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>>>>>>> "ILMN_1806310" "ILMN_1779670" "ILMN_2321282" >>>>>>>> [8] "ILMN_1671474" "ILMN_1772582" "ILMN_1735698" >>>>>>>> >>>>>>>> ???????? >>>>>>>> >>>>>>>> Gavin >>>>>>>> >>>>>>>> On 15 April 2011 11:45, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>>>>>>> Hi Gavin: >>>>>>>>> >>>>>>>>> ? ? ? ?It would be best if you can match the two batches using the probe identifiers because they are much less likely to have duplicates. Would it possible to show the first several probes in each dataset so that I can write some code to help you do this? >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Wei >>>>>>>>> >>>>>>>>> >>>>>>>>> On Apr 15, 2011, at 7:54 PM, Gavin Koh wrote: >>>>>>>>> >>>>>>>>>> Dear Wei, >>>>>>>>>> >>>>>>>>>> A little more information: the difference seems to be a single duplicated probe. >>>>>>>>>> Just comparing two batches (TB1 and TB2) with different probe numbers: >>>>>>>>>>> length(TB1$genes) >>>>>>>>>> [1] 48804 >>>>>>>>>>> length(TB2$genes) >>>>>>>>>> [1] 48803 >>>>>>>>>>> length(unique(TB2$genes)) >>>>>>>>>> [1] 48803 >>>>>>>>>>> length(unique(TB1$genes)) >>>>>>>>>> [1] 48803 >>>>>>>>>>> setdiff(TB1$genes,TB2$genes) >>>>>>>>>> character(0) >>>>>>>>>>> setequal(TB1$genes,TB2$genes) >>>>>>>>>> [1] TRUE >>>>>>>>>> >>>>>>>>>> That still leaves me the problem that I don't know how to identify the >>>>>>>>>> repeated probe or how to cbind TB1 and TB2... :-( >>>>>>>>>> >>>>>>>>>> Gavin >>>>>>>>>> >>>>>>>>>> On 15 April 2011 02:38, Wei Shi <shi at="" wehi.edu.au=""> wrote: >>>>>>>>>>> Hi Gavin: >>>>>>>>>>> >>>>>>>>>>> ? ? ? ?The number of probes which were present in one batch but not in others should be very small. So you can use the probes which are common in all batches for your analysis. >>>>>>>>>>> >>>>>>>>>>> ? ? ? ?Hope this helps. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Wei >>>>>>>>>>> >>>>>>>>>>> On Apr 15, 2011, at 1:20 AM, Gavin Koh wrote: >>>>>>>>>>> >>>>>>>>>>>> I am trying to analyse data from ArrayExpress E-GEOD-22098 (published >>>>>>>>>>>> Dec last year). >>>>>>>>>>>> According to the study methods, the data are Illumina HumanHT-12 v3 >>>>>>>>>>>> Expression BeadChips, but the hybridisation seems to have been done in >>>>>>>>>>>> several batches, with different numbers of probes in each batch, >>>>>>>>>>>> alternating between 48803 and 48804. Can anyone tell me how to combine >>>>>>>>>>>> these different batches into the same file, please? I am trying to >>>>>>>>>>>> read the probe data using the read.ilmn() function in limma, but >>>>>>>>>>>> failing, because cbind complains the matrices are not the same length >>>>>>>>>>>> (precise error is "Error in cbind(out$E, objects[[i]]$E) : number of >>>>>>>>>>>> rows of matrices must match (see arg 2)"). >>>>>>>>>>>> >>>>>>>>>>>> Thank you in advance, >>>>>>>>>>>> >>>>>>>>>>>> Gavin Koh >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Bioconductor mailing list >>>>>>>>>>>> Bioconductor at r-project.org >>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ______________________________________________________________________ >>>>>>>>>>> The information in this email is confidential and intended solely for the addressee. >>>>>>>>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>>>>>>>> ______________________________________________________________________ >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Hofstadter's Law: It always takes longer than you expect, even when >>>>>>>>>> you take into account Hofstadter's Law. >>>>>>>>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >>>>>>>>> >>>>>>>>> >>>>>>>>> ______________________________________________________________________ >>>>>>>>> The information in this email is confidential and intended solely for the addressee. >>>>>>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>>>>>> ______________________________________________________________________ >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Hofstadter's Law: It always takes longer than you expect, even when >>>>>>>> you take into account Hofstadter's Law. >>>>>>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >>>>>>> >>>>>>> >>>>>>> ______________________________________________________________________ >>>>>>> The information in this email is confidential and intended solely for the addressee. >>>>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>>>> ______________________________________________________________________ >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Hofstadter's Law: It always takes longer than you expect, even when >>>>>> you take into account Hofstadter's Law. >>>>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >>>>> >>>>> >>>>> ______________________________________________________________________ >>>>> The information in this email is confidential and intended solely for the addressee. >>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>> ______________________________________________________________________ >>>>> >>>> >>>> >>>> >>>> -- >>>> Hofstadter's Law: It always takes longer than you expect, even when >>>> you take into account Hofstadter's Law. >>>> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) >>> >>> >>> ______________________________________________________________________ >>> The information in this email is confidential and intended solely for the addressee. >>> You must not disclose, forward, print or use it without the permission of the sender. >>> ______________________________________________________________________ >>> >> >> >> >> -- >> Hofstadter's Law: It always takes longer than you expect, even when >> you take into account Hofstadter's Law. >> ?Douglas Hofstadter (in G?del, Escher, Bach, 1979) > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:14}}

ADD REPLY • link 13.0 years ago Gavin Koh ▴ 220

Login before adding your answer.