summarized expression values from beadarray versus GenomeStudio
1
0
Entering edit mode
Ina Hoeschele ▴ 620
@ina-hoeschele-2992
Last seen 3.3 years ago
United States
Hi Mark et al., I have calculated correlations among the expression vectors of different samples (in particular for a control sample that we use on each BeadChip), both for the expression data that I have processed in Bioconductor using the beadarray package and for the expression data produced by GenomeStudio (selecting quantile normalization). The correlations (especially for the control samples from different chips) are clearly worse for the Bioconductor processed data and I have been trying to track down where I have a problem. I also have the summarized (bead-type) intensities from GenomeStudio without normalization. I obtain the corresponding summarized values from beadarray with the following code myMean = function(x) mean(x, na.rm = TRUE) mySe = function(x) sd(x, na.rm = TRUE)/sqrt(length(x)) GreenChannelTransform <- function (BLData, array) { x = getBeadData(BLData, array = array, what = "Grn") return(x) } greenChannel = new("illuminaChannel",GreenChannelTransform,illuminaOut lierMethod,myMean,mySe,"G") for (iChip in 1:nChips) { setwd(Chip.Dir[iChip]) BLData = readIllumina(useImages=FALSE, illuminaAnnotation="Humanv4") BSData <- summarize(BLData,list(greenChannel),useSampleFac=TRU E,sampleFac=NULL,removeUnMappedProbes=TRUE) save(BSData,file="BSData.rda") rm(BLData); rm(BSData); gc() } If the data are summarized in this way using Bioconductor/beadarray, would you not expect the summarized values to be identical to those from GenomeStudio? I checked the summarized value for one beadtype on the first several sections of chip 1. The summary values from GenomeStudio are: 77.93, 159.16, 174.93, 131.05, 484.39 The summary values from beadarray are: 90.0, 192.0, 1q88.5, 157.0, 492.0 (I also calculated the first summary value by hand and come up with 103.36!) Why are these values different, any hint? Many thanks as always, Ina
Normalization beadarray Normalization beadarray • 1.5k views
ADD COMMENT
0
Entering edit mode
Mark Dunning ★ 1.1k
@mark-dunning-3319
Last seen 21 months ago
Sheffield, Uk
Hi Ina, Nothing seems to be wrong with your approach and it should re-create the BeadStudio intensities. We tried it out on some of our own data and managed to get very close to the BeadStudio values. Do the number of observations reported by beadarray and GenomeStudio agree? What are the dimensions of your BSData object and are they what you are expecting? It could be that summarize is incorrectly trying to combine data from multiple strips. Best, Mark On Mon, Apr 4, 2011 at 11:13 PM, Ina Hoeschele <inah at="" vbi.vt.edu=""> wrote: > Hi Mark et al., > ?I have calculated correlations among the expression vectors of different samples (in particular for a control sample that we use on each BeadChip), both for the expression data that I have processed in Bioconductor using the beadarray package and for the expression data produced by GenomeStudio (selecting quantile normalization). The correlations (especially for the control samples from different chips) are clearly worse for the Bioconductor processed data and I have been trying to track down where I have a problem. > > I also have the summarized (bead-type) intensities from GenomeStudio without normalization. I obtain the corresponding summarized values from beadarray with the following code > > myMean = function(x) mean(x, na.rm = TRUE) > mySe = function(x) sd(x, na.rm = TRUE)/sqrt(length(x)) > GreenChannelTransform <- function (BLData, array) > { > ? ? ? ?x = getBeadData(BLData, array = array, what = "Grn") > ? ? ? ?return(x) > } > greenChannel = new("illuminaChannel",GreenChannelTransform,illuminaO utlierMethod,myMean,mySe,"G") > > for (iChip in 1:nChips) > { > ? ? ? ?setwd(Chip.Dir[iChip]) > ? ? ? ?BLData = readIllumina(useImages=FALSE, illuminaAnnotation="Humanv4") > ? ? ? ?BSData <- summarize(BLData,list(greenChannel),useSampleFac=TR UE,sampleFac=NULL,removeUnMappedProbes=TRUE) > ? ? ? ?save(BSData,file="BSData.rda") > ? ? ? ?rm(BLData); rm(BSData); gc() > } > > > If the data are summarized in this way using Bioconductor/beadarray, would you not expect the summarized values to be identical to those from GenomeStudio? > > I checked the summarized value for one beadtype on the first several sections of chip 1. > The summary values from GenomeStudio are: 77.93, 159.16, 174.93, 131.05, 484.39 > The summary values from beadarray are: 90.0, 192.0, 1q88.5, 157.0, 492.0 > (I also calculated the first summary value by hand and come up with 103.36!) > > Why are these values different, any hint? > > Many thanks as always, Ina >
ADD COMMENT
0
Entering edit mode
Hi Mark: Could it be possible that difference was caused by the version difference of GenomeStudio? I found the slight version difference in BeadStudio (for example 3.1.2 and 3.1.3) could cause difference of probe intensity value of around 20 (raw intensity) when I used BeadStudio before. Cheers, Wei On Apr 7, 2011, at 7:33 PM, Mark Dunning wrote: > Hi Ina, > > Nothing seems to be wrong with your approach and it should re-create > the BeadStudio intensities. We tried it out on some of our own data > and managed to get very close to the BeadStudio values. > > Do the number of observations reported by beadarray and GenomeStudio > agree? What are the dimensions of your BSData object and are they what > you are expecting? It could be that summarize is incorrectly trying to > combine data from multiple strips. > > Best, > > Mark > > > > On Mon, Apr 4, 2011 at 11:13 PM, Ina Hoeschele <inah at="" vbi.vt.edu=""> wrote: >> Hi Mark et al., >> I have calculated correlations among the expression vectors of different samples (in particular for a control sample that we use on each BeadChip), both for the expression data that I have processed in Bioconductor using the beadarray package and for the expression data produced by GenomeStudio (selecting quantile normalization). The correlations (especially for the control samples from different chips) are clearly worse for the Bioconductor processed data and I have been trying to track down where I have a problem. >> >> I also have the summarized (bead-type) intensities from GenomeStudio without normalization. I obtain the corresponding summarized values from beadarray with the following code >> >> myMean = function(x) mean(x, na.rm = TRUE) >> mySe = function(x) sd(x, na.rm = TRUE)/sqrt(length(x)) >> GreenChannelTransform <- function (BLData, array) >> { >> x = getBeadData(BLData, array = array, what = "Grn") >> return(x) >> } >> greenChannel = new("illuminaChannel",GreenChannelTransform,illumina OutlierMethod,myMean,mySe,"G") >> >> for (iChip in 1:nChips) >> { >> setwd(Chip.Dir[iChip]) >> BLData = readIllumina(useImages=FALSE, illuminaAnnotation="Humanv4") >> BSData <- summarize(BLData,list(greenChannel),useSampleFac=T RUE,sampleFac=NULL,removeUnMappedProbes=TRUE) >> save(BSData,file="BSData.rda") >> rm(BLData); rm(BSData); gc() >> } >> >> >> If the data are summarized in this way using Bioconductor/beadarray, would you not expect the summarized values to be identical to those from GenomeStudio? >> >> I checked the summarized value for one beadtype on the first several sections of chip 1. >> The summary values from GenomeStudio are: 77.93, 159.16, 174.93, 131.05, 484.39 >> The summary values from beadarray are: 90.0, 192.0, 1q88.5, 157.0, 492.0 >> (I also calculated the first summary value by hand and come up with 103.36!) >> >> Why are these values different, any hint? >> >> Many thanks as always, Ina >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD REPLY
0
Entering edit mode
Hi Mark and Wei, thank you very much for your suggestions. For all of my 8 BSData objects the first dimension is 48,107 probes (47,224 gene probes, 883 control probes). The corresponding dataset produced by GenomeStudio contains 47,320 gene probes and 886 control probes, so I seem to have 96 fewer gene probes and 3 control probes less ... I do not know why there is this difference, but these numbers do not look like anything is really messed up. I would not be so worried about the discrepancy in values, but since the correlations among (control) samples (on different chips) are so much worse for Bioconductor compared to GenomeStudio (.91-.92 versus .98-.99), something must be going wrong somewhere. Related to this, for each sample run on a bead chip, there may be some bead types that failed. For all samples that are combined in a 'project' in GenomeStudio, bead types that have failed in any of these samples are excluded from the summarized data (unless one checks the impute option). I wonder how this is being handled in the summarization in beadarray. Since beadarray deals with a single chip at a time, a project in beadarrary would be a single chip. So if beadarray also excludes failed bead types, then different BSData objects (each representing a single chip) may have different bead types represented. I need to check whether this might have messed up my correlations between control samples from different chips (?) But for my first batch of 8 chips, all BSData objects have the same 1st dimension, which is a bit smaller than the number of summarized probes from GenomeStudio. Ina ----- Original Message ----- From: "Mark Dunning" <mark.dunning@gmail.com> To: "Ina Hoeschele" <inah at="" vbi.vt.edu=""> Cc: bioconductor at stat.math.ethz.ch Sent: Thursday, April 7, 2011 5:33:09 AM Subject: Re: summarized expression values from beadarray versus GenomeStudio Hi Ina, Nothing seems to be wrong with your approach and it should re-create the BeadStudio intensities. We tried it out on some of our own data and managed to get very close to the BeadStudio values. Do the number of observations reported by beadarray and GenomeStudio agree? What are the dimensions of your BSData object and are they what you are expecting? It could be that summarize is incorrectly trying to combine data from multiple strips. Best, Mark On Mon, Apr 4, 2011 at 11:13 PM, Ina Hoeschele <inah at="" vbi.vt.edu=""> wrote: > Hi Mark et al., > ?I have calculated correlations among the expression vectors of different samples (in particular for a control sample that we use on each BeadChip), both for the expression data that I have processed in Bioconductor using the beadarray package and for the expression data produced by GenomeStudio (selecting quantile normalization). The correlations (especially for the control samples from different chips) are clearly worse for the Bioconductor processed data and I have been trying to track down where I have a problem. > > I also have the summarized (bead-type) intensities from GenomeStudio without normalization. I obtain the corresponding summarized values from beadarray with the following code > > myMean = function(x) mean(x, na.rm = TRUE) > mySe = function(x) sd(x, na.rm = TRUE)/sqrt(length(x)) > GreenChannelTransform <- function (BLData, array) > { > ? ? ? ?x = getBeadData(BLData, array = array, what = "Grn") > ? ? ? ?return(x) > } > greenChannel = new("illuminaChannel",GreenChannelTransform,illuminaO utlierMethod,myMean,mySe,"G") > > for (iChip in 1:nChips) > { > ? ? ? ?setwd(Chip.Dir[iChip]) > ? ? ? ?BLData = readIllumina(useImages=FALSE, illuminaAnnotation="Humanv4") > ? ? ? ?BSData <- summarize(BLData,list(greenChannel),useSampleFac=TR UE,sampleFac=NULL,removeUnMappedProbes=TRUE) > ? ? ? ?save(BSData,file="BSData.rda") > ? ? ? ?rm(BLData); rm(BSData); gc() > } > > > If the data are summarized in this way using Bioconductor/beadarray, would you not expect the summarized values to be identical to those from GenomeStudio? > > I checked the summarized value for one beadtype on the first several sections of chip 1. > The summary values from GenomeStudio are: 77.93, 159.16, 174.93, 131.05, 484.39 > The summary values from beadarray are: 90.0, 192.0, 1q88.5, 157.0, 492.0 > (I also calculated the first summary value by hand and come up with 103.36!) > > Why are these values different, any hint? > > Many thanks as always, Ina >
ADD REPLY
0
Entering edit mode
Hi Ina, Could you send me the Illumina IDs and/or ArrayAddress IDs of any bead types that do not get summarized by beadarray? The Humanv4 platform that you are using has some extra spike controls that were not used on older arrays. My guess is that the mapping files used by beadarray to convert ArrayAddressIDs into Illumina IDs does not know about these IDs yet. This would go a long way to explaining the difference in row numbers. Could you give a bit more detail on how the GenomeStudio data were exported? i.e with/without normalisation Regards, Mark On Tue, Apr 12, 2011 at 3:22 PM, Ina Hoeschele <inah at="" vbi.vt.edu=""> wrote: > Hi Mark and Wei, > > thank you very much for your suggestions. > > For all of my 8 BSData objects the first dimension is 48,107 probes (47,224 gene probes, 883 control probes). The corresponding dataset produced by GenomeStudio contains 47,320 gene probes and 886 control probes, so I seem to have 96 fewer gene probes and 3 control probes less ... I do not know why there is this difference, but these numbers do not look like anything is really messed up. > > I would not be so worried about the discrepancy in values, but since the correlations among (control) samples (on different chips) are so much worse for Bioconductor compared to GenomeStudio (.91-.92 versus .98-.99), something must be going wrong somewhere. > > Related to this, for each sample run on a bead chip, there may be some bead types that failed. For all samples that are combined in a 'project' in GenomeStudio, bead types that have failed in any of these samples are excluded from the summarized data (unless one checks the impute option). ?I wonder how this is being handled in the summarization in beadarray. Since beadarray deals with a single chip at a time, a project in beadarrary would be a single chip. So if beadarray also excludes failed bead types, then different BSData objects (each representing a single chip) may have different bead types represented. I need to check whether this might have messed up my correlations between control samples from different chips (?) But for my first batch of 8 chips, all BSData objects have the same 1st dimension, which is a bit smaller than the number of summarized probes from GenomeStudio. > > Ina > > > > ----- Original Message ----- > From: "Mark Dunning" <mark.dunning at="" gmail.com=""> > To: "Ina Hoeschele" <inah at="" vbi.vt.edu=""> > Cc: bioconductor at stat.math.ethz.ch > Sent: Thursday, April 7, 2011 5:33:09 AM > Subject: Re: summarized expression values from beadarray versus GenomeStudio > > Hi Ina, > > Nothing seems to be wrong with your approach and it should re-create > the BeadStudio intensities. We tried it out on some of our own data > and managed to get very close to the BeadStudio values. > > Do the number of observations reported by beadarray and GenomeStudio > agree? What are the dimensions of your BSData object and are they what > you are expecting? It could be that summarize is incorrectly trying to > combine data from multiple strips. > > Best, > > Mark > > > > On Mon, Apr 4, 2011 at 11:13 PM, Ina Hoeschele <inah at="" vbi.vt.edu=""> wrote: >> Hi Mark et al., >> ?I have calculated correlations among the expression vectors of different samples (in particular for a control sample that we use on each BeadChip), both for the expression data that I have processed in Bioconductor using the beadarray package and for the expression data produced by GenomeStudio (selecting quantile normalization). The correlations (especially for the control samples from different chips) are clearly worse for the Bioconductor processed data and I have been trying to track down where I have a problem. >> >> I also have the summarized (bead-type) intensities from GenomeStudio without normalization. I obtain the corresponding summarized values from beadarray with the following code >> >> myMean = function(x) mean(x, na.rm = TRUE) >> mySe = function(x) sd(x, na.rm = TRUE)/sqrt(length(x)) >> GreenChannelTransform <- function (BLData, array) >> { >> ? ? ? ?x = getBeadData(BLData, array = array, what = "Grn") >> ? ? ? ?return(x) >> } >> greenChannel = new("illuminaChannel",GreenChannelTransform,illumina OutlierMethod,myMean,mySe,"G") >> >> for (iChip in 1:nChips) >> { >> ? ? ? ?setwd(Chip.Dir[iChip]) >> ? ? ? ?BLData = readIllumina(useImages=FALSE, illuminaAnnotation="Humanv4") >> ? ? ? ?BSData <- summarize(BLData,list(greenChannel),useSampleFac=T RUE,sampleFac=NULL,removeUnMappedProbes=TRUE) >> ? ? ? ?save(BSData,file="BSData.rda") >> ? ? ? ?rm(BLData); rm(BSData); gc() >> } >> >> >> If the data are summarized in this way using Bioconductor/beadarray, would you not expect the summarized values to be identical to those from GenomeStudio? >> >> I checked the summarized value for one beadtype on the first several sections of chip 1. >> The summary values from GenomeStudio are: 77.93, 159.16, 174.93, 131.05, 484.39 >> The summary values from beadarray are: 90.0, 192.0, 1q88.5, 157.0, 492.0 >> (I also calculated the first summary value by hand and come up with 103.36!) >> >> Why are these values different, any hint? >> >> Many thanks as always, Ina >> >
ADD REPLY
0
Entering edit mode
Hi Mark, sorry for my slow response (I am dealing with the 450K methylation data at the same time ...). << Could you send me the Illumina IDs and/or ArrayAddress IDs of any bead types that do not get summarized by beadarray? >> I currently have this information only for the "Gene" probes, not for the controls (my collaborator sent me the GenomeStudio summarized data only for the Gene probes). Below is the difference between the summarized Gene probes from GenomeStudio versus beadarray: > length(ILMN_GSData_Gene) [1] 47320 # number of Gene probes summarized by GenomeStudio > length(ILMN_BSData_Gene) [1] 47224 # number of Gene probes summarized by beadarray > setdiff(ILMN_GSData_Gene,ILMN_BSData_Gene) [1] "ILMN_2038777" "ILMN_2038774" "ILMN_3164734" "ILMN_3164750" "ILMN_3164765" [6] "ILMN_3164808" "ILMN_3164838" "ILMN_3164858" "ILMN_3164875" "ILMN_3164905" [11] "ILMN_3164915" "ILMN_3164950" "ILMN_3164979" "ILMN_3165007" "ILMN_3165027" [16] "ILMN_3165033" "ILMN_3165086" "ILMN_3165100" "ILMN_3165113" "ILMN_3165130" [21] "ILMN_3165170" "ILMN_3165190" "ILMN_3165201" "ILMN_3165218" "ILMN_3165229" [26] "ILMN_3165245" "ILMN_3165277" "ILMN_3165303" "ILMN_3165334" "ILMN_3165363" [31] "ILMN_3165378" "ILMN_3165415" "ILMN_3165426" "ILMN_3165438" "ILMN_3165457" [36] "ILMN_3165474" "ILMN_3165484" "ILMN_3165533" "ILMN_3165547" "ILMN_3165565" [41] "ILMN_3165590" "ILMN_3165604" "ILMN_3165619" "ILMN_3165638" "ILMN_3165650" [46] "ILMN_3165668" "ILMN_3165687" "ILMN_3165699" "ILMN_3165727" "ILMN_3165745" [51] "ILMN_3165757" "ILMN_3165768" "ILMN_3165829" "ILMN_3165877" "ILMN_3165896" [56] "ILMN_3165903" "ILMN_3165920" "ILMN_3165933" "ILMN_3165993" "ILMN_3166057" [61] "ILMN_3166075" "ILMN_3166098" "ILMN_3166114" "ILMN_3166132" "ILMN_3166177" [66] "ILMN_3166194" "ILMN_3166223" "ILMN_3166238" "ILMN_3166255" "ILMN_3166311" [71] "ILMN_3166325" "ILMN_3166368" "ILMN_3166404" "ILMN_3166414" "ILMN_3166430" [76] "ILMN_3166475" "ILMN_3166491" "ILMN_3166504" "ILMN_3166519" "ILMN_3166551" [81] "ILMN_3166569" "ILMN_3166578" "ILMN_3166596" "ILMN_3166630" "ILMN_3166640" [86] "ILMN_3166655" "ILMN_3166673" "ILMN_3166687" "ILMN_3166703" "ILMN_3166721" [91] "ILMN_3166728" "ILMN_3166775" "ILMN_3166789" "ILMN_3166804" "ILMN_1343295" [96] "ILMN_2038772" "ILMN_2038775" "ILMN_2038776" "ILMN_2038773" > setdiff(ILMN_BSData_Gene,ILMN_GSData_Gene) [1] "ILMN_1657147" "ILMN_3246658" "ILMN_3247816" Is this of any use? << Could you give a bit more detail on how the GenomeStudio data were exported? i.e with/without normalisation >> without normalization and without the second background correction. I will check with my collaborator on any other details and send tomorrow. Many thanks, Ina On Tue, Apr 12, 2011 at 3:22 PM, Ina Hoeschele <inah at="" vbi.vt.edu=""> wrote: > Hi Mark and Wei, > > thank you very much for your suggestions. > > For all of my 8 BSData objects the first dimension is 48,107 probes (47,224 gene probes, 883 control probes). The corresponding dataset produced by GenomeStudio contains 47,320 gene probes and 886 control probes, so I seem to have 96 fewer gene probes and 3 control probes less ... I do not know why there is this difference, but these numbers do not look like anything is really messed up. > > I would not be so worried about the discrepancy in values, but since the correlations among (control) samples (on different chips) are so much worse for Bioconductor compared to GenomeStudio (.91-.92 versus .98-.99), something must be going wrong somewhere. > > Related to this, for each sample run on a bead chip, there may be some bead types that failed. For all samples that are combined in a 'project' in GenomeStudio, bead types that have failed in any of these samples are excluded from the summarized data (unless one checks the impute option). ?I wonder how this is being handled in the summarization in beadarray. Since beadarray deals with a single chip at a time, a project in beadarrary would be a single chip. So if beadarray also excludes failed bead types, then different BSData objects (each representing a single chip) may have different bead types represented. I need to check whether this might have messed up my correlations between control samples from different chips (?) But for my first batch of 8 chips, all BSData objects have the same 1st dimension, which is a bit smaller than the number of summarized probes from GenomeStudio. > > Ina > > > > ----- Original Message ----- > From: "Mark Dunning" <mark.dunning at="" gmail.com=""> > To: "Ina Hoeschele" <inah at="" vbi.vt.edu=""> > Cc: bioconductor at stat.math.ethz.ch > Sent: Thursday, April 7, 2011 5:33:09 AM > Subject: Re: summarized expression values from beadarray versus GenomeStudio > > Hi Ina, > > Nothing seems to be wrong with your approach and it should re-create > the BeadStudio intensities. We tried it out on some of our own data > and managed to get very close to the BeadStudio values. > > Do the number of observations reported by beadarray and GenomeStudio > agree? What are the dimensions of your BSData object and are they what > you are expecting? It could be that summarize is incorrectly trying to > combine data from multiple strips. > > Best, > > Mark > > > > On Mon, Apr 4, 2011 at 11:13 PM, Ina Hoeschele <inah at="" vbi.vt.edu=""> wrote: >> Hi Mark et al., >> ?I have calculated correlations among the expression vectors of different samples (in particular for a control sample that we use on each BeadChip), both for the expression data that I have processed in Bioconductor using the beadarray package and for the expression data produced by GenomeStudio (selecting quantile normalization). The correlations (especially for the control samples from different chips) are clearly worse for the Bioconductor processed data and I have been trying to track down where I have a problem. >> >> I also have the summarized (bead-type) intensities from GenomeStudio without normalization. I obtain the corresponding summarized values from beadarray with the following code >> >> myMean = function(x) mean(x, na.rm = TRUE) >> mySe = function(x) sd(x, na.rm = TRUE)/sqrt(length(x)) >> GreenChannelTransform <- function (BLData, array) >> { >> ? ? ? ?x = getBeadData(BLData, array = array, what = "Grn") >> ? ? ? ?return(x) >> } >> greenChannel = new("illuminaChannel",GreenChannelTransform,illumina OutlierMethod,myMean,mySe,"G") >> >> for (iChip in 1:nChips) >> { >> ? ? ? ?setwd(Chip.Dir[iChip]) >> ? ? ? ?BLData = readIllumina(useImages=FALSE, illuminaAnnotation="Humanv4") >> ? ? ? ?BSData <- summarize(BLData,list(greenChannel),useSampleFac=T RUE,sampleFac=NULL,removeUnMappedProbes=TRUE) >> ? ? ? ?save(BSData,file="BSData.rda") >> ? ? ? ?rm(BLData); rm(BSData); gc() >> } >> >> >> If the data are summarized in this way using Bioconductor/beadarray, would you not expect the summarized values to be identical to those from GenomeStudio? >> >> I checked the summarized value for one beadtype on the first several sections of chip 1. >> The summary values from GenomeStudio are: 77.93, 159.16, 174.93, 131.05, 484.39 >> The summary values from beadarray are: 90.0, 192.0, 1q88.5, 157.0, 492.0 >> (I also calculated the first summary value by hand and come up with 103.36!) >> >> Why are these values different, any hint? >> >> Many thanks as always, Ina >> >
ADD REPLY
0
Entering edit mode
Mark, as to the options in GenomeStudio, it was confirmed to me that I actually have two sets of values: (1) summarized and quantile normalized (without the global background normalization) (2) summarized and NOT quantile normalized (without the global background normalization) There are no other options to set, except not to impute missing values in which case GenomeStudio deletes and bead types which failed for at least one sample. Thanks, Ina ----- Original Message ----- From: "Ina Hoeschele" <inah@vbi.vt.edu> To: "Mark Dunning" <mark.dunning at="" gmail.com=""> Cc: bioconductor at stat.math.ethz.ch Sent: Thursday, April 14, 2011 5:13:10 PM Subject: Re: [BioC] summarized expression values from beadarray versus GenomeStudio Hi Mark, sorry for my slow response (I am dealing with the 450K methylation data at the same time ...). << Could you send me the Illumina IDs and/or ArrayAddress IDs of any bead types that do not get summarized by beadarray? >> I currently have this information only for the "Gene" probes, not for the controls (my collaborator sent me the GenomeStudio summarized data only for the Gene probes). Below is the difference between the summarized Gene probes from GenomeStudio versus beadarray: > length(ILMN_GSData_Gene) [1] 47320 # number of Gene probes summarized by GenomeStudio > length(ILMN_BSData_Gene) [1] 47224 # number of Gene probes summarized by beadarray > setdiff(ILMN_GSData_Gene,ILMN_BSData_Gene) [1] "ILMN_2038777" "ILMN_2038774" "ILMN_3164734" "ILMN_3164750" "ILMN_3164765" [6] "ILMN_3164808" "ILMN_3164838" "ILMN_3164858" "ILMN_3164875" "ILMN_3164905" [11] "ILMN_3164915" "ILMN_3164950" "ILMN_3164979" "ILMN_3165007" "ILMN_3165027" [16] "ILMN_3165033" "ILMN_3165086" "ILMN_3165100" "ILMN_3165113" "ILMN_3165130" [21] "ILMN_3165170" "ILMN_3165190" "ILMN_3165201" "ILMN_3165218" "ILMN_3165229" [26] "ILMN_3165245" "ILMN_3165277" "ILMN_3165303" "ILMN_3165334" "ILMN_3165363" [31] "ILMN_3165378" "ILMN_3165415" "ILMN_3165426" "ILMN_3165438" "ILMN_3165457" [36] "ILMN_3165474" "ILMN_3165484" "ILMN_3165533" "ILMN_3165547" "ILMN_3165565" [41] "ILMN_3165590" "ILMN_3165604" "ILMN_3165619" "ILMN_3165638" "ILMN_3165650" [46] "ILMN_3165668" "ILMN_3165687" "ILMN_3165699" "ILMN_3165727" "ILMN_3165745" [51] "ILMN_3165757" "ILMN_3165768" "ILMN_3165829" "ILMN_3165877" "ILMN_3165896" [56] "ILMN_3165903" "ILMN_3165920" "ILMN_3165933" "ILMN_3165993" "ILMN_3166057" [61] "ILMN_3166075" "ILMN_3166098" "ILMN_3166114" "ILMN_3166132" "ILMN_3166177" [66] "ILMN_3166194" "ILMN_3166223" "ILMN_3166238" "ILMN_3166255" "ILMN_3166311" [71] "ILMN_3166325" "ILMN_3166368" "ILMN_3166404" "ILMN_3166414" "ILMN_3166430" [76] "ILMN_3166475" "ILMN_3166491" "ILMN_3166504" "ILMN_3166519" "ILMN_3166551" [81] "ILMN_3166569" "ILMN_3166578" "ILMN_3166596" "ILMN_3166630" "ILMN_3166640" [86] "ILMN_3166655" "ILMN_3166673" "ILMN_3166687" "ILMN_3166703" "ILMN_3166721" [91] "ILMN_3166728" "ILMN_3166775" "ILMN_3166789" "ILMN_3166804" "ILMN_1343295" [96] "ILMN_2038772" "ILMN_2038775" "ILMN_2038776" "ILMN_2038773" > setdiff(ILMN_BSData_Gene,ILMN_GSData_Gene) [1] "ILMN_1657147" "ILMN_3246658" "ILMN_3247816" Is this of any use? << Could you give a bit more detail on how the GenomeStudio data were exported? i.e with/without normalisation >> without normalization and without the second background correction. I will check with my collaborator on any other details and send tomorrow. Many thanks, Ina On Tue, Apr 12, 2011 at 3:22 PM, Ina Hoeschele <inah at="" vbi.vt.edu=""> wrote: > Hi Mark and Wei, > > thank you very much for your suggestions. > > For all of my 8 BSData objects the first dimension is 48,107 probes (47,224 gene probes, 883 control probes). The corresponding dataset produced by GenomeStudio contains 47,320 gene probes and 886 control probes, so I seem to have 96 fewer gene probes and 3 control probes less ... I do not know why there is this difference, but these numbers do not look like anything is really messed up. > > I would not be so worried about the discrepancy in values, but since the correlations among (control) samples (on different chips) are so much worse for Bioconductor compared to GenomeStudio (.91-.92 versus .98-.99), something must be going wrong somewhere. > > Related to this, for each sample run on a bead chip, there may be some bead types that failed. For all samples that are combined in a 'project' in GenomeStudio, bead types that have failed in any of these samples are excluded from the summarized data (unless one checks the impute option). ?I wonder how this is being handled in the summarization in beadarray. Since beadarray deals with a single chip at a time, a project in beadarrary would be a single chip. So if beadarray also excludes failed bead types, then different BSData objects (each representing a single chip) may have different bead types represented. I need to check whether this might have messed up my correlations between control samples from different chips (?) But for my first batch of 8 chips, all BSData objects have the same 1st dimension, which is a bit smaller than the number of summarized probes from GenomeStudio. > > Ina > > > > ----- Original Message ----- > From: "Mark Dunning" <mark.dunning at="" gmail.com=""> > To: "Ina Hoeschele" <inah at="" vbi.vt.edu=""> > Cc: bioconductor at stat.math.ethz.ch > Sent: Thursday, April 7, 2011 5:33:09 AM > Subject: Re: summarized expression values from beadarray versus GenomeStudio > > Hi Ina, > > Nothing seems to be wrong with your approach and it should re-create > the BeadStudio intensities. We tried it out on some of our own data > and managed to get very close to the BeadStudio values. > > Do the number of observations reported by beadarray and GenomeStudio > agree? What are the dimensions of your BSData object and are they what > you are expecting? It could be that summarize is incorrectly trying to > combine data from multiple strips. > > Best, > > Mark > > > > On Mon, Apr 4, 2011 at 11:13 PM, Ina Hoeschele <inah at="" vbi.vt.edu=""> wrote: >> Hi Mark et al., >> ?I have calculated correlations among the expression vectors of different samples (in particular for a control sample that we use on each BeadChip), both for the expression data that I have processed in Bioconductor using the beadarray package and for the expression data produced by GenomeStudio (selecting quantile normalization). The correlations (especially for the control samples from different chips) are clearly worse for the Bioconductor processed data and I have been trying to track down where I have a problem. >> >> I also have the summarized (bead-type) intensities from GenomeStudio without normalization. I obtain the corresponding summarized values from beadarray with the following code >> >> myMean = function(x) mean(x, na.rm = TRUE) >> mySe = function(x) sd(x, na.rm = TRUE)/sqrt(length(x)) >> GreenChannelTransform <- function (BLData, array) >> { >> ? ? ? ?x = getBeadData(BLData, array = array, what = "Grn") >> ? ? ? ?return(x) >> } >> greenChannel = new("illuminaChannel",GreenChannelTransform,illumina OutlierMethod,myMean,mySe,"G") >> >> for (iChip in 1:nChips) >> { >> ? ? ? ?setwd(Chip.Dir[iChip]) >> ? ? ? ?BLData = readIllumina(useImages=FALSE, illuminaAnnotation="Humanv4") >> ? ? ? ?BSData <- summarize(BLData,list(greenChannel),useSampleFac=T RUE,sampleFac=NULL,removeUnMappedProbes=TRUE) >> ? ? ? ?save(BSData,file="BSData.rda") >> ? ? ? ?rm(BLData); rm(BSData); gc() >> } >> >> >> If the data are summarized in this way using Bioconductor/beadarray, would you not expect the summarized values to be identical to those from GenomeStudio? >> >> I checked the summarized value for one beadtype on the first several sections of chip 1. >> The summary values from GenomeStudio are: 77.93, 159.16, 174.93, 131.05, 484.39 >> The summary values from beadarray are: 90.0, 192.0, 1q88.5, 157.0, 492.0 >> (I also calculated the first summary value by hand and come up with 103.36!) >> >> Why are these values different, any hint? >> >> Many thanks as always, Ina >> > _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 388 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6