Question: normalization of illumina bead array data
0
6.9 years ago by
Wei Shi3.2k
Australia
Wei Shi3.2k wrote:
Dear Michele, I had a close look at the data used in your analysis and found that the detection data for some arrays seem to be wrong. With illumina bead array data, probes with larger intensities should have a equal or higher detection score (or equal or lower detection p value) than probes with lower intensities. However, this is not the case for some of the arrays in this dataset. The second column in your 'maqc' object is one of such arrays. My code below found 325 probes which had larger intensities but smaller detection scores: > tmp_sel <- !duplicated(maqc$E[,2]) > d2e <- maqc$E[tmp_sel,2] > d2d <- maqc$other$Detection[tmp_sel,2] > d2ds <- d2d[order(d2e)] > sum(d2ds[-1] - d2ds[-c(length(d2ds))] < 0 ) [1] 325 This is the reason why negative values were calculated for sigma. It is not the problem of normexp.fit.detection.p function, but the problem of the data. You can contact the data submitter to let him/her correct this. Let me know if we could be of any further assistance. Cheers, Wei On Dec 23, 2012, at 12:45 PM, Michele wrote: > I am trying to process the raw data downloaded from: > http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-380 > > At the moment of using the function neqc I get the following error: > > Error in if (sigma <= 0) stop("sigma must be positive") : > missing value where TRUE/FALSE needed > > The problem seems to be in this line: > > In sqrt(weighted.mean(v, freq) * n/(n - 1)) > > of the function normexp.fit.detection.p > > This is generated by the fact that in this function, the difference among p-values is computed, and some of those differences turn out to be negative. > > Following is the code with which I am trying to process the data. > > library(rstudio) > library(beadarray) > library(limma) > > sample.name <- strsplit(dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), ".txt") > group <- sapplysample.name, function(x) ifelse(length(grep("RR",x))>0,"MT","WT")) > > setwd("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/") > maqc <- read.ilmn(files=dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), probeid = "Reporter name", other.columns = c("Detection", "Avg_NBEADS")) > > colnames(maqc$E) <- sample.name > colnames(maqc$other$Detection) <- sample.name > colnames(maqc$other$Avg_NBEADS) <- sample.name > maqc$targets <- unlistsample.name) > > maqc.norm <- neqc(maqc, detection.p='Detection') > > How can I overcome this? > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:8}}
process • 1.0k views
modified 6.9 years ago by Michele90 • written 6.9 years ago by Wei Shi3.2k
0
6.9 years ago by
Michele90
United States
Michele90 wrote:
Dear Wei, thanks for your reply. I will eliminate the array from the analysis for the time being, and I will contact the submitter as you suggest, so that they can check the data. Thanks again, Michele On Jan 2, 2013, at 1:03 AM, Wei Shi wrote: > Dear Michele, > > I had a close look at the data used in your analysis and found that the detection data for some arrays seem to be wrong. > > With illumina bead array data, probes with larger intensities should have a equal or higher detection score (or equal or lower detection p value) than probes with lower intensities. However, this is not the case for some of the arrays in this dataset. The second column in your 'maqc' object is one of such arrays. My code below found 325 probes which had larger intensities but smaller detection scores: > > > tmp_sel <- !duplicated(maqc$E[,2]) > > d2e <- maqc$E[tmp_sel,2] > > d2d <- maqc$other$Detection[tmp_sel,2] > > d2ds <- d2d[order(d2e)] > > sum(d2ds[-1] - d2ds[-c(length(d2ds))] < 0 ) > [1] 325 > > This is the reason why negative values were calculated for sigma. It is not the problem of normexp.fit.detection.p function, but the problem of the data. > > You can contact the data submitter to let him/her correct this. > > Let me know if we could be of any further assistance. > > > Cheers, > Wei > > > On Dec 23, 2012, at 12:45 PM, Michele wrote: > >> I am trying to process the raw data downloaded from: >> http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-380 >> >> At the moment of using the function neqc I get the following error: >> >> Error in if (sigma <= 0) stop("sigma must be positive") : >> missing value where TRUE/FALSE needed >> >> The problem seems to be in this line: >> >> In sqrt(weighted.mean(v, freq) * n/(n - 1)) >> >> of the function normexp.fit.detection.p >> >> This is generated by the fact that in this function, the difference among p-values is computed, and some of those differences turn out to be negative. >> >> Following is the code with which I am trying to process the data. >> >> library(rstudio) >> library(beadarray) >> library(limma) >> >> sample.name <- strsplit(dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), ".txt") >> group <- sapplysample.name, function(x) ifelse(length(grep("RR",x))>0,"MT","WT")) >> >> setwd("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/") >> maqc <- read.ilmn(files=dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), probeid = "Reporter name", other.columns = c("Detection", "Avg_NBEADS")) >> >> colnames(maqc$E) <- sample.name >> colnames(maqc$other$Detection) <- sample.name >> colnames(maqc$other$Avg_NBEADS) <- sample.name >> maqc$targets <- unlistsample.name) >> >> maqc.norm <- neqc(maqc, detection.p='Detection') >> >> How can I overcome this? >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
0
6.9 years ago by
Michele90
United States
Michele90 wrote:
correction: I will remove the arrays that have such problem, not the second only :) On Jan 2, 2013, at 1:03 AM, Wei Shi wrote: > With illumina bead array data, probes with larger intensities should have a equal or higher detection score (or equal or lower detection p value) than probes with lower intensities. However, this is not the case for some of the arrays in this dataset. The second column in your 'maqc' object is one of such arrays. [[alternative HTML version deleted]]
0
6.9 years ago by
Michele90
United States
Michele90 wrote:
Dear Wei, I am sorry to bother you again. I contacted the authors that produced the data to ask help with the dataset, but they were not collaborative. All they told me was "we used beadstudio and it gave us no errors". This aside, I would like to know more about this relation between intensity and detection score. Is it a fixed relation as you point out? If so, is there a place where I can read more about it? Is it possible that a probe A, composed by more beads is more reliable than a probe B with less beads, even though the intensity of A is lower than the intensity of B? Thanks, Michele On Jan 2, 2013, at 1:03 AM, Wei Shi wrote: > Dear Michele, > > I had a close look at the data used in your analysis and found that the detection data for some arrays seem to be wrong. > > With illumina bead array data, probes with larger intensities should have a equal or higher detection score (or equal or lower detection p value) than probes with lower intensities. However, this is not the case for some of the arrays in this dataset. The second column in your 'maqc' object is one of such arrays. My code below found 325 probes which had larger intensities but smaller detection scores: > > > tmp_sel <- !duplicated(maqc$E[,2]) > > d2e <- maqc$E[tmp_sel,2] > > d2d <- maqc$other$Detection[tmp_sel,2] > > d2ds <- d2d[order(d2e)] > > sum(d2ds[-1] - d2ds[-c(length(d2ds))] < 0 ) > [1] 325 > > This is the reason why negative values were calculated for sigma. It is not the problem of normexp.fit.detection.p function, but the problem of the data. > > You can contact the data submitter to let him/her correct this. > > Let me know if we could be of any further assistance. > > > Cheers, > Wei > > > On Dec 23, 2012, at 12:45 PM, Michele wrote: > >> I am trying to process the raw data downloaded from: >> http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-380 >> >> At the moment of using the function neqc I get the following error: >> >> Error in if (sigma <= 0) stop("sigma must be positive") : >> missing value where TRUE/FALSE needed >> >> The problem seems to be in this line: >> >> In sqrt(weighted.mean(v, freq) * n/(n - 1)) >> >> of the function normexp.fit.detection.p >> >> This is generated by the fact that in this function, the difference among p-values is computed, and some of those differences turn out to be negative. >> >> Following is the code with which I am trying to process the data. >> >> library(rstudio) >> library(beadarray) >> library(limma) >> >> sample.name <- strsplit(dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), ".txt") >> group <- sapplysample.name, function(x) ifelse(length(grep("RR",x))>0,"MT","WT")) >> >> setwd("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/") >> maqc <- read.ilmn(files=dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), probeid = "Reporter name", other.columns = c("Detection", "Avg_NBEADS")) >> >> colnames(maqc$E) <- sample.name >> colnames(maqc$other$Detection) <- sample.name >> colnames(maqc$other$Avg_NBEADS) <- sample.name >> maqc$targets <- unlistsample.name) >> >> maqc.norm <- neqc(maqc, detection.p='Detection') >> >> How can I overcome this? >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
Dear Michele, Their data on ArrayExpress are not in beadstudio format, therefore they could not be loaded into beadstudio to test whether there are errors or not. The link below points to the Illumina user guide which describes how detection p values are calculated (page 106). http://support.illumina.com/documents/MyIllumina/c94519f7-9348-4308 -a32f-b66ff3959e99/GenomeStudio_GX_Module_v1.0_UG_11319121_RevA.pdf Hope this helps. Cheers, Wei On Jan 12, 2013, at 7:59 AM, michele caseposta wrote: > Dear Wei, > I am sorry to bother you again. > I contacted the authors that produced the data to ask help with the dataset, but they were not collaborative. > All they told me was "we used beadstudio and it gave us no errors". > This aside, I would like to know more about this relation between intensity and detection score. Is it a fixed relation as you point out? If so, is there a place where I can read more about it? Is it possible that a probe A, composed by more beads is more reliable than a probe B with less beads, even though the intensity of A is lower than the intensity of B? > Thanks, > Michele > > On Jan 2, 2013, at 1:03 AM, Wei Shi wrote: > >> Dear Michele, >> >> I had a close look at the data used in your analysis and found that the detection data for some arrays seem to be wrong. >> >> With illumina bead array data, probes with larger intensities should have a equal or higher detection score (or equal or lower detection p value) than probes with lower intensities. However, this is not the case for some of the arrays in this dataset. The second column in your 'maqc' object is one of such arrays. My code below found 325 probes which had larger intensities but smaller detection scores: >> >> > tmp_sel <- !duplicated(maqc$E[,2]) >> > d2e <- maqc$E[tmp_sel,2] >> > d2d <- maqc$other$Detection[tmp_sel,2] >> > d2ds <- d2d[order(d2e)] >> > sum(d2ds[-1] - d2ds[-c(length(d2ds))] < 0 ) >> [1] 325 >> >> This is the reason why negative values were calculated for sigma. It is not the problem of normexp.fit.detection.p function, but the problem of the data. >> >> You can contact the data submitter to let him/her correct this. >> >> Let me know if we could be of any further assistance. >> >> >> Cheers, >> Wei >> >> >> On Dec 23, 2012, at 12:45 PM, Michele wrote: >> >>> I am trying to process the raw data downloaded from: >>> http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-380 >>> >>> At the moment of using the function neqc I get the following error: >>> >>> Error in if (sigma <= 0) stop("sigma must be positive") : >>> missing value where TRUE/FALSE needed >>> >>> The problem seems to be in this line: >>> >>> In sqrt(weighted.mean(v, freq) * n/(n - 1)) >>> >>> of the function normexp.fit.detection.p >>> >>> This is generated by the fact that in this function, the difference among p-values is computed, and some of those differences turn out to be negative. >>> >>> Following is the code with which I am trying to process the data. >>> >>> library(rstudio) >>> library(beadarray) >>> library(limma) >>> >>> sample.name <- strsplit(dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), ".txt") >>> group <- sapplysample.name, function(x) ifelse(length(grep("RR",x))>0,"MT","WT")) >>> >>> setwd("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/") >>> maqc <- read.ilmn(files=dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), probeid = "Reporter name", other.columns = c("Detection", "Avg_NBEADS")) >>> >>> colnames(maqc$E) <- sample.name >>> colnames(maqc$other$Detection) <- sample.name >>> colnames(maqc$other$Avg_NBEADS) <- sample.name >>> maqc$targets <- unlistsample.name) >>> >>> maqc.norm <- neqc(maqc, detection.p='Detection') >>> >>> How can I overcome this? >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the addressee. >> You must not disclose, forward, print or use it without the permission of the sender. >> ______________________________________________________________________ > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:8}}
Hi Wei, I checked the data, and every array has probes with inconsistent detection value. At this point I do not know if I should trust the data at all. Do you think that removing just the inconsistent probes would suffice? (the submitters made it clear that they DO NOT want to take care of this) Thanks for your help, Michele On Jan 12, 2013, at 12:59 AM, Wei Shi wrote: > Dear Michele, > > Their data on ArrayExpress are not in beadstudio format, therefore they could not be loaded into beadstudio to test whether there are errors or not. > > The link below points to the Illumina user guide which describes how detection p values are calculated (page 106). > > http://support.illumina.com/documents/MyIllumina/c94519f7-9348-4308 -a32f-b66ff3959e99/GenomeStudio_GX_Module_v1.0_UG_11319121_RevA.pdf > > Hope this helps. > > Cheers, > Wei > > On Jan 12, 2013, at 7:59 AM, michele caseposta wrote: > >> Dear Wei, >> I am sorry to bother you again. >> I contacted the authors that produced the data to ask help with the dataset, but they were not collaborative. >> All they told me was "we used beadstudio and it gave us no errors". >> This aside, I would like to know more about this relation between intensity and detection score. Is it a fixed relation as you point out? If so, is there a place where I can read more about it? Is it possible that a probe A, composed by more beads is more reliable than a probe B with less beads, even though the intensity of A is lower than the intensity of B? >> Thanks, >> Michele >> >> On Jan 2, 2013, at 1:03 AM, Wei Shi wrote: >> >>> Dear Michele, >>> >>> I had a close look at the data used in your analysis and found that the detection data for some arrays seem to be wrong. >>> >>> With illumina bead array data, probes with larger intensities should have a equal or higher detection score (or equal or lower detection p value) than probes with lower intensities. However, this is not the case for some of the arrays in this dataset. The second column in your 'maqc' object is one of such arrays. My code below found 325 probes which had larger intensities but smaller detection scores: >>> >>> > tmp_sel <- !duplicated(maqc$E[,2]) >>> > d2e <- maqc$E[tmp_sel,2] >>> > d2d <- maqc$other$Detection[tmp_sel,2] >>> > d2ds <- d2d[order(d2e)] >>> > sum(d2ds[-1] - d2ds[-c(length(d2ds))] < 0 ) >>> [1] 325 >>> >>> This is the reason why negative values were calculated for sigma. It is not the problem of normexp.fit.detection.p function, but the problem of the data. >>> >>> You can contact the data submitter to let him/her correct this. >>> >>> Let me know if we could be of any further assistance. >>> >>> >>> Cheers, >>> Wei >>> >>> >>> On Dec 23, 2012, at 12:45 PM, Michele wrote: >>> >>>> I am trying to process the raw data downloaded from: >>>> http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-380 >>>> >>>> At the moment of using the function neqc I get the following error: >>>> >>>> Error in if (sigma <= 0) stop("sigma must be positive") : >>>> missing value where TRUE/FALSE needed >>>> >>>> The problem seems to be in this line: >>>> >>>> In sqrt(weighted.mean(v, freq) * n/(n - 1)) >>>> >>>> of the function normexp.fit.detection.p >>>> >>>> This is generated by the fact that in this function, the difference among p-values is computed, and some of those differences turn out to be negative. >>>> >>>> Following is the code with which I am trying to process the data. >>>> >>>> library(rstudio) >>>> library(beadarray) >>>> library(limma) >>>> >>>> sample.name <- strsplit(dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), ".txt") >>>> group <- sapplysample.name, function(x) ifelse(length(grep("RR",x))>0,"MT","WT")) >>>> >>>> setwd("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/") >>>> maqc <- read.ilmn(files=dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), probeid = "Reporter name", other.columns = c("Detection", "Avg_NBEADS")) >>>> >>>> colnames(maqc$E) <- sample.name >>>> colnames(maqc$other$Detection) <- sample.name >>>> colnames(maqc$other$Avg_NBEADS) <- sample.name >>>> maqc$targets <- unlistsample.name) >>>> >>>> maqc.norm <- neqc(maqc, detection.p='Detection') >>>> >>>> How can I overcome this? >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> ______________________________________________________________________ >>> The information in this email is confidential and intended solely for the addressee. >>> You must not disclose, forward, print or use it without the permission of the sender. >>> ______________________________________________________________________ >> > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
Dear Michele, On Tue, 15 Jan 2013, michele caseposta wrote: > Hi Wei, > I checked the data, and every array has probes with inconsistent > detection value. That's what I suspected from your emails. If one array was affected, it seemed logical that all would be. > At this point I do not know if I should trust the data at all. Do you > think that removing just the inconsistent probes would suffice? Definitely not. Whatever the creators of the data done, whether mis-sorting the detection values, or pre-processing the expression values being presented in the raw file in some way, it is likely to have affected the entries for all the probes in some way. So I would not personally trust the detection values at all. In limma, it may be wiser to use backgroundCorrect(method="normexp") instead of neqc(). Best wishes Gordon > (the submitters made it clear that they DO NOT want to take care of > this) > > Thanks for your help, > Michele > > > > On Jan 12, 2013, at 12:59 AM, Wei Shi wrote: > >> Dear Michele, >> >> Their data on ArrayExpress are not in beadstudio format, therefore they >> could not be loaded into beadstudio to test whether there are errors or >> not. >> >> The link below points to the Illumina user guide which describes how >> detection p values are calculated (page 106). >> >> http://support.illumina.com/documents/MyIllumina/c94519f7-9348-4308 -a32f-b66ff3959e99/GenomeStudio_GX_Module_v1.0_UG_11319121_RevA.pdf >> >> Hope this helps. >> >> Cheers, >> Wei >> >> On Jan 12, 2013, at 7:59 AM, michele caseposta wrote: >> >>> Dear Wei, >>> I am sorry to bother you again. >>> I contacted the authors that produced the data to ask help with the dataset, but they were not collaborative. >>> All they told me was "we used beadstudio and it gave us no errors". >>> This aside, I would like to know more about this relation between intensity and detection score. Is it a fixed relation as you point out? If so, is there a place where I can read more about it? Is it possible that a probe A, composed by more beads is more reliable than a probe B with less beads, even though the intensity of A is lower than the intensity of B? >>> Thanks, >>> Michele >>> >>> On Jan 2, 2013, at 1:03 AM, Wei Shi wrote: >>> >>>> Dear Michele, >>>> >>>> I had a close look at the data used in your analysis and found that the detection data for some arrays seem to be wrong. >>>> >>>> With illumina bead array data, probes with larger intensities should have a equal or higher detection score (or equal or lower detection p value) than probes with lower intensities. However, this is not the case for some of the arrays in this dataset. The second column in your 'maqc' object is one of such arrays. My code below found 325 probes which had larger intensities but smaller detection scores: >>>> >>>>> tmp_sel <- !duplicated(maqc$E[,2]) >>>>> d2e <- maqc$E[tmp_sel,2] >>>>> d2d <- maqc$other$Detection[tmp_sel,2] >>>>> d2ds <- d2d[order(d2e)] >>>>> sum(d2ds[-1] - d2ds[-c(length(d2ds))] < 0 ) >>>> [1] 325 >>>> >>>> This is the reason why negative values were calculated for sigma. It is not the problem of normexp.fit.detection.p function, but the problem of the data. >>>> >>>> You can contact the data submitter to let him/her correct this. >>>> >>>> Let me know if we could be of any further assistance. >>>> >>>> >>>> Cheers, >>>> Wei >>>> >>>> >>>> On Dec 23, 2012, at 12:45 PM, Michele wrote: >>>> >>>>> I am trying to process the raw data downloaded from: >>>>> http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-380 >>>>> >>>>> At the moment of using the function neqc I get the following error: >>>>> >>>>> Error in if (sigma <= 0) stop("sigma must be positive") : >>>>> missing value where TRUE/FALSE needed >>>>> >>>>> The problem seems to be in this line: >>>>> >>>>> In sqrt(weighted.mean(v, freq) * n/(n - 1)) >>>>> >>>>> of the function normexp.fit.detection.p >>>>> >>>>> This is generated by the fact that in this function, the difference among p-values is computed, and some of those differences turn out to be negative. >>>>> >>>>> Following is the code with which I am trying to process the data. >>>>> >>>>> library(rstudio) >>>>> library(beadarray) >>>>> library(limma) >>>>> >>>>> sample.name <- strsplit(dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), ".txt") >>>>> group <- sapplysample.name, function(x) ifelse(length(grep("RR",x))>0,"MT","WT")) >>>>> >>>>> setwd("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/") >>>>> maqc <- read.ilmn(files=dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), probeid = "Reporter name", other.columns = c("Detection", "Avg_NBEADS")) >>>>> >>>>> colnames(maqc$E) <- sample.name >>>>> colnames(maqc$other$Detection) <- sample.name >>>>> colnames(maqc$other$Avg_NBEADS) <- sample.name >>>>> maqc$targets <- unlistsample.name) >>>>> >>>>> maqc.norm <- neqc(maqc, detection.p='Detection') >>>>> >>>>> How can I overcome this? >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> ______________________________________________________________________ >>>> The information in this email is confidential and intended solely for the addressee. >>>> You must not disclose, forward, print or use it without the permission of the sender. >>>> ______________________________________________________________________ >>> >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the addressee. >> You must not disclose, forward, print or use it without the permission of the sender. >> ______________________________________________________________________ > > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
Hi Gordon, thanks for your help. I am using backgroundCorrect(...) then. Should I normalize the data anyway after the background correction? If so, what would be the best way to do it? On Jan 15, 2013, at 7:07 PM, Gordon K Smyth wrote: > Dear Michele, > > On Tue, 15 Jan 2013, michele caseposta wrote: > >> Hi Wei, >> I checked the data, and every array has probes with inconsistent detection value. > > That's what I suspected from your emails. If one array was affected, it seemed logical that all would be. > >> At this point I do not know if I should trust the data at all. Do you think that removing just the inconsistent probes would suffice? > > Definitely not. Whatever the creators of the data done, whether mis-sorting the detection values, or pre-processing the expression values being presented in the raw file in some way, it is likely to have affected the entries for all the probes in some way. So I would not personally trust the detection values at all. > > In limma, it may be wiser to use backgroundCorrect(method="normexp") instead of neqc(). > > Best wishes > Gordon > >> (the submitters made it clear that they DO NOT want to take care of this) >> >> Thanks for your help, >> Michele >> >> >> >> On Jan 12, 2013, at 12:59 AM, Wei Shi wrote: >> >>> Dear Michele, >>> >>> Their data on ArrayExpress are not in beadstudio format, therefore they could not be loaded into beadstudio to test whether there are errors or not. >>> >>> The link below points to the Illumina user guide which describes how detection p values are calculated (page 106). >>> >>> http://support.illumina.com/documents/MyIllumina/c94519f7-9348-4308 -a32f-b66ff3959e99/GenomeStudio_GX_Module_v1.0_UG_11319121_RevA.pdf >>> >>> Hope this helps. >>> >>> Cheers, >>> Wei >>> >>> On Jan 12, 2013, at 7:59 AM, michele caseposta wrote: >>> >>>> Dear Wei, >>>> I am sorry to bother you again. >>>> I contacted the authors that produced the data to ask help with the dataset, but they were not collaborative. >>>> All they told me was "we used beadstudio and it gave us no errors". >>>> This aside, I would like to know more about this relation between intensity and detection score. Is it a fixed relation as you point out? If so, is there a place where I can read more about it? Is it possible that a probe A, composed by more beads is more reliable than a probe B with less beads, even though the intensity of A is lower than the intensity of B? >>>> Thanks, >>>> Michele >>>> >>>> On Jan 2, 2013, at 1:03 AM, Wei Shi wrote: >>>> >>>>> Dear Michele, >>>>> >>>>> I had a close look at the data used in your analysis and found that the detection data for some arrays seem to be wrong. >>>>> >>>>> With illumina bead array data, probes with larger intensities should have a equal or higher detection score (or equal or lower detection p value) than probes with lower intensities. However, this is not the case for some of the arrays in this dataset. The second column in your 'maqc' object is one of such arrays. My code below found 325 probes which had larger intensities but smaller detection scores: >>>>> >>>>>> tmp_sel <- !duplicated(maqc$E[,2]) >>>>>> d2e <- maqc$E[tmp_sel,2] >>>>>> d2d <- maqc$other$Detection[tmp_sel,2] >>>>>> d2ds <- d2d[order(d2e)] >>>>>> sum(d2ds[-1] - d2ds[-c(length(d2ds))] < 0 ) >>>>> [1] 325 >>>>> >>>>> This is the reason why negative values were calculated for sigma. It is not the problem of normexp.fit.detection.p function, but the problem of the data. >>>>> >>>>> You can contact the data submitter to let him/her correct this. >>>>> >>>>> Let me know if we could be of any further assistance. >>>>> >>>>> >>>>> Cheers, >>>>> Wei >>>>> >>>>> >>>>> On Dec 23, 2012, at 12:45 PM, Michele wrote: >>>>> >>>>>> I am trying to process the raw data downloaded from: >>>>>> http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-380 >>>>>> >>>>>> At the moment of using the function neqc I get the following error: >>>>>> >>>>>> Error in if (sigma <= 0) stop("sigma must be positive") : >>>>>> missing value where TRUE/FALSE needed >>>>>> >>>>>> The problem seems to be in this line: >>>>>> >>>>>> In sqrt(weighted.mean(v, freq) * n/(n - 1)) >>>>>> >>>>>> of the function normexp.fit.detection.p >>>>>> >>>>>> This is generated by the fact that in this function, the difference among p-values is computed, and some of those differences turn out to be negative. >>>>>> >>>>>> Following is the code with which I am trying to process the data. >>>>>> >>>>>> library(rstudio) >>>>>> library(beadarray) >>>>>> library(limma) >>>>>> >>>>>> sample.name <- strsplit(dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), ".txt") >>>>>> group <- sapplysample.name, function(x) ifelse(length(grep("RR",x))>0,"MT","WT")) >>>>>> >>>>>> setwd("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/") >>>>>> maqc <- read.ilmn(files=dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), probeid = "Reporter name", other.columns = c("Detection", "Avg_NBEADS")) >>>>>> >>>>>> colnames(maqc$E) <- sample.name >>>>>> colnames(maqc$other$Detection) <- sample.name >>>>>> colnames(maqc$other$Avg_NBEADS) <- sample.name >>>>>> maqc$targets <- unlistsample.name) >>>>>> >>>>>> maqc.norm <- neqc(maqc, detection.p='Detection') >>>>>> >>>>>> How can I overcome this? >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> >>>>> ______________________________________________________________________ >>>>> The information in this email is confidential and intended solely for the addressee. >>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>> ______________________________________________________________________ >>>> >>> >>> >>> ______________________________________________________________________ >>> The information in this email is confidential and intended solely for the addressee. >>> You must not disclose, forward, print or use it without the permission of the sender. >>> ______________________________________________________________________ >> >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}
Hi Michele, Yes, you should normalize the data after background correction. You may use normalizeBetweenArrays(x,method="quantile") to do this. Cheers, Wei On Jan 17, 2013, at 6:44 AM, michele caseposta wrote: > Hi Gordon, > thanks for your help. I am using backgroundCorrect(...) then. > Should I normalize the data anyway after the background correction? > If so, what would be the best way to do it? > > On Jan 15, 2013, at 7:07 PM, Gordon K Smyth wrote: > >> Dear Michele, >> >> On Tue, 15 Jan 2013, michele caseposta wrote: >> >>> Hi Wei, >>> I checked the data, and every array has probes with inconsistent detection value. >> >> That's what I suspected from your emails. If one array was affected, it seemed logical that all would be. >> >>> At this point I do not know if I should trust the data at all. Do you think that removing just the inconsistent probes would suffice? >> >> Definitely not. Whatever the creators of the data done, whether mis-sorting the detection values, or pre-processing the expression values being presented in the raw file in some way, it is likely to have affected the entries for all the probes in some way. So I would not personally trust the detection values at all. >> >> In limma, it may be wiser to use backgroundCorrect(method="normexp") instead of neqc(). >> >> Best wishes >> Gordon >> >>> (the submitters made it clear that they DO NOT want to take care of this) >>> >>> Thanks for your help, >>> Michele >>> >>> >>> >>> On Jan 12, 2013, at 12:59 AM, Wei Shi wrote: >>> >>>> Dear Michele, >>>> >>>> Their data on ArrayExpress are not in beadstudio format, therefore they could not be loaded into beadstudio to test whether there are errors or not. >>>> >>>> The link below points to the Illumina user guide which describes how detection p values are calculated (page 106). >>>> >>>> http://support.illumina.com/documents/MyIllumina/c94519f7-9348-4308 -a32f-b66ff3959e99/GenomeStudio_GX_Module_v1.0_UG_11319121_RevA.pdf >>>> >>>> Hope this helps. >>>> >>>> Cheers, >>>> Wei >>>> >>>> On Jan 12, 2013, at 7:59 AM, michele caseposta wrote: >>>> >>>>> Dear Wei, >>>>> I am sorry to bother you again. >>>>> I contacted the authors that produced the data to ask help with the dataset, but they were not collaborative. >>>>> All they told me was "we used beadstudio and it gave us no errors". >>>>> This aside, I would like to know more about this relation between intensity and detection score. Is it a fixed relation as you point out? If so, is there a place where I can read more about it? Is it possible that a probe A, composed by more beads is more reliable than a probe B with less beads, even though the intensity of A is lower than the intensity of B? >>>>> Thanks, >>>>> Michele >>>>> >>>>> On Jan 2, 2013, at 1:03 AM, Wei Shi wrote: >>>>> >>>>>> Dear Michele, >>>>>> >>>>>> I had a close look at the data used in your analysis and found that the detection data for some arrays seem to be wrong. >>>>>> >>>>>> With illumina bead array data, probes with larger intensities should have a equal or higher detection score (or equal or lower detection p value) than probes with lower intensities. However, this is not the case for some of the arrays in this dataset. The second column in your 'maqc' object is one of such arrays. My code below found 325 probes which had larger intensities but smaller detection scores: >>>>>> >>>>>>> tmp_sel <- !duplicated(maqc$E[,2]) >>>>>>> d2e <- maqc$E[tmp_sel,2] >>>>>>> d2d <- maqc$other$Detection[tmp_sel,2] >>>>>>> d2ds <- d2d[order(d2e)] >>>>>>> sum(d2ds[-1] - d2ds[-c(length(d2ds))] < 0 ) >>>>>> [1] 325 >>>>>> >>>>>> This is the reason why negative values were calculated for sigma. It is not the problem of normexp.fit.detection.p function, but the problem of the data. >>>>>> >>>>>> You can contact the data submitter to let him/her correct this. >>>>>> >>>>>> Let me know if we could be of any further assistance. >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Wei >>>>>> >>>>>> >>>>>> On Dec 23, 2012, at 12:45 PM, Michele wrote: >>>>>> >>>>>>> I am trying to process the raw data downloaded from: >>>>>>> http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-380 >>>>>>> >>>>>>> At the moment of using the function neqc I get the following error: >>>>>>> >>>>>>> Error in if (sigma <= 0) stop("sigma must be positive") : >>>>>>> missing value where TRUE/FALSE needed >>>>>>> >>>>>>> The problem seems to be in this line: >>>>>>> >>>>>>> In sqrt(weighted.mean(v, freq) * n/(n - 1)) >>>>>>> >>>>>>> of the function normexp.fit.detection.p >>>>>>> >>>>>>> This is generated by the fact that in this function, the difference among p-values is computed, and some of those differences turn out to be negative. >>>>>>> >>>>>>> Following is the code with which I am trying to process the data. >>>>>>> >>>>>>> library(rstudio) >>>>>>> library(beadarray) >>>>>>> library(limma) >>>>>>> >>>>>>> sample.name <- strsplit(dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), ".txt") >>>>>>> group <- sapplysample.name, function(x) ifelse(length(grep("RR",x))>0,"MT","WT")) >>>>>>> >>>>>>> setwd("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/") >>>>>>> maqc <- read.ilmn(files=dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), probeid = "Reporter name", other.columns = c("Detection", "Avg_NBEADS")) >>>>>>> >>>>>>> colnames(maqc$E) <- sample.name >>>>>>> colnames(maqc$other$Detection) <- sample.name >>>>>>> colnames(maqc$other$Avg_NBEADS) <- sample.name >>>>>>> maqc$targets <- unlistsample.name) >>>>>>> >>>>>>> maqc.norm <- neqc(maqc, detection.p='Detection') >>>>>>> >>>>>>> How can I overcome this? >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at r-project.org >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> >>>>>> ______________________________________________________________________ >>>>>> The information in this email is confidential and intended solely for the addressee. >>>>>> You must not disclose, forward, print or use it without the permission of the sender. >>>>>> ______________________________________________________________________ >>>>> >>>> >>>> >>>> ______________________________________________________________________ >>>> The information in this email is confidential and intended solely for the addressee. >>>> You must not disclose, forward, print or use it without the permission of the sender. >>>> ______________________________________________________________________ >>> >>> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the addressee. >> You must not disclose, forward, print or use it without the permission of the sender. >> ______________________________________________________________________ > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}