Question

Ringo - finding enriched regions

0

Entering edit mode

Hans-Ulrich Klein ▴ 330

@hans-ulrich-klein-1945

Last seen 12 months ago

United States

Hello, I am confused about the results returned from the "findChersOnSmoothed" function in the Ringo package. I have an ExpressionSet object storing normalized log ratios (ChIP / Control) from three replicates. I use this analysis workflow: > eSetS = computeRunningMedians(eSet, probeAnno, modColumn="type", winHalfSize=400, min.probes=5, combineReplicates=TRUE) [...] > y0 = upperBoundNull(exprs(eSetS), prob=0.99) > chers = findChersOnSmoothed(eSetS, probeAnno, thresholds=y0, distCutOff=600, minProbesInRow=3) Surprisingly, the first enriched region does not contain any probe intensity above the threshold y0. This applies to many regions called enriched. > chers[[1]] BCR_ABL.chr1.cher1 Chr 1 : 10001787 - 10002329 Antibody : BCR_ABL Maximum level = 1.665789 Score = 9.486747 Spans 15 probes. > y0 [1] 0.7279903 > dim(eSetS) Features Samples 4212009 1 > exprs(eSetS[chers[[1]]@probes,]) BCR_ABL 112645 0.2140274 112646 0.2469170 112647 0.2485301 112648 0.2501433 112649 0.2765225 112650 0.2813286 112651 0.2803291 112652 0.2727159 112653 0.2469170 112654 0.2469170 112655 0.1166212 112656 0.2355814 112657 0.2355814 112658 0.1608379 112659 0.2063285 Did I check the correct probes? Should not be the intensities > 0.727? My Ringo version is 1.8.0. Thanks in advance, Hans-Ulrich -- Hans-Ulrich Klein Department of Medical Informatics and Biomathematics University of M?nster Domagkstrasse 9 48149 M?nster, Germany Tel.: +49 (0)251 83-58405

Ringo Ringo • 1.9k views

ADD COMMENT • link updated 16.2 years ago by Joern Toedling ▴ 450 • written 16.2 years ago by Hans-Ulrich Klein ▴ 330

score 0 · Answer 1 · 2009-10-20

0

Entering edit mode

Joern Toedling ▴ 450

@joern-toedling-3465

Last seen 11.3 years ago

Hello, I suspect that there is some issue with converting vectors between different formats and the identifiers of your probes (the 'featureNames' of the ExpressionSet) here. The actual way to obtain those intensities with version 1.8.0 should be exprs(eSetS)[as.numeric(chers[[1]]@probes),] Please let me know if this does not give the expected results. However, I admit that providing indices as a character vector for the probes slot was not necessary and rather misleading. Thus I have made slight changes to the function and provided an additional method 'probes' which allows you to obtain a character vector of probe names from each ChIP-enriched region without having to access any slots directly. These changes can be found in the current development version 1.9.15, which you can obtain from the Bioconductor repository tomorrow, and will also be in the new release version (Ringo 1.10.0) at the end of this month. With the new version, the following is the preferred way for obtaining the values: exprs(eSetS)[probes(chers[[1]]),] Hope this helps. Best regards, Joern On Mon, 19 Oct 2009 12:05:03 +0200, Hans-Ulrich Klein wrote > Hello, > > I am confused about the results returned from the > "findChersOnSmoothed" function in the Ringo package. I have an > ExpressionSet object storing normalized log ratios (ChIP / Control) > from three replicates. I use this analysis workflow: > > > eSetS = computeRunningMedians(eSet, probeAnno, modColumn="type", > winHalfSize=400, min.probes=5, > combineReplicates=TRUE) > [...] > > y0 = upperBoundNull(exprs(eSetS), prob=0.99) > > chers = findChersOnSmoothed(eSetS, probeAnno, thresholds=y0, > distCutOff=600, minProbesInRow=3) > > Surprisingly, the first enriched region does not contain any probe > intensity above the threshold y0. This applies to many regions > called enriched. > > > chers[[1]] > BCR_ABL.chr1.cher1 > Chr 1 : 10001787 - 10002329 > Antibody : BCR_ABL > Maximum level = 1.665789 > Score = 9.486747 > Spans 15 probes. > > y0 > [1] 0.7279903 > > dim(eSetS) > Features Samples > 4212009 1 > > exprs(eSetS[chers[[1]]@probes,]) > BCR_ABL > 112645 0.2140274 > 112646 0.2469170 > 112647 0.2485301 > 112648 0.2501433 > 112649 0.2765225 > 112650 0.2813286 > 112651 0.2803291 > 112652 0.2727159 > 112653 0.2469170 > 112654 0.2469170 > 112655 0.1166212 > 112656 0.2355814 > 112657 0.2355814 > 112658 0.1608379 > 112659 0.2063285 > > Did I check the correct probes? Should not be the intensities > 0.727? > > My Ringo version is 1.8.0. > > Thanks in advance, > Hans-Ulrich > --- Joern Toedling Institut Curie -- U900 26 rue d'Ulm, 75005 Paris, FRANCE Tel. +33 (0)156246926

ADD COMMENT • link 16.2 years ago Joern Toedling ▴ 450

0

Entering edit mode

Dear Joern, the feature names of my ExpressionSet instance are: > all(featureNames(eSetS) == as.character(1:nrow(eSetS))) [1] TRUE So in my case both expressions > exprs(eSetS)[as.numeric(chers[[1]]@probes),] and > exprs(eSetS)[chers[[1]]@probes,] return the same probes that have log ratios smaller than y0 as described below. Best wishes, Hans-Ulrich Joern Toedling wrote: > Hello, > > I suspect that there is some issue with converting vectors between different > formats and the identifiers of your probes (the 'featureNames' of the > ExpressionSet) here. > The actual way to obtain those intensities with version 1.8.0 should be > > exprs(eSetS)[as.numeric(chers[[1]]@probes),] > > Please let me know if this does not give the expected results. > > However, I admit that providing indices as a character vector for the probes > slot was not necessary and rather misleading. Thus I have made slight changes > to the function and provided an additional method 'probes' which allows you to > obtain a character vector of probe names from each ChIP-enriched region > without having to access any slots directly. > > These changes can be found in the current development version 1.9.15, which > you can obtain from the Bioconductor repository tomorrow, and will also be in > the new release version (Ringo 1.10.0) at the end of this month. > > With the new version, the following is the preferred way for obtaining the values: > exprs(eSetS)[probes(chers[[1]]),] > > Hope this helps. > > Best regards, > Joern > > On Mon, 19 Oct 2009 12:05:03 +0200, Hans-Ulrich Klein wrote > >> Hello, >> >> I am confused about the results returned from the >> "findChersOnSmoothed" function in the Ringo package. I have an >> ExpressionSet object storing normalized log ratios (ChIP / Control) >> from three replicates. I use this analysis workflow: >> >> > eSetS = computeRunningMedians(eSet, probeAnno, modColumn="type", >> winHalfSize=400, min.probes=5, >> combineReplicates=TRUE) >> [...] >> > y0 = upperBoundNull(exprs(eSetS), prob=0.99) >> > chers = findChersOnSmoothed(eSetS, probeAnno, thresholds=y0, >> distCutOff=600, minProbesInRow=3) >> >> Surprisingly, the first enriched region does not contain any probe >> intensity above the threshold y0. This applies to many regions >> called enriched. >> >> > chers[[1]] >> BCR_ABL.chr1.cher1 >> Chr 1 : 10001787 - 10002329 >> Antibody : BCR_ABL >> Maximum level = 1.665789 >> Score = 9.486747 >> Spans 15 probes. >> > y0 >> [1] 0.7279903 >> > dim(eSetS) >> Features Samples >> 4212009 1 >> > exprs(eSetS[chers[[1]]@probes,]) >> BCR_ABL >> 112645 0.2140274 >> 112646 0.2469170 >> 112647 0.2485301 >> 112648 0.2501433 >> 112649 0.2765225 >> 112650 0.2813286 >> 112651 0.2803291 >> 112652 0.2727159 >> 112653 0.2469170 >> 112654 0.2469170 >> 112655 0.1166212 >> 112656 0.2355814 >> 112657 0.2355814 >> 112658 0.1608379 >> 112659 0.2063285 >> >> Did I check the correct probes? Should not be the intensities > 0.727? >> >> My Ringo version is 1.8.0. >> >> Thanks in advance, >> Hans-Ulrich >> >> -- Hans-Ulrich Klein Department of Medical Informatics and Biomathematics University of M?nster Domagkstrasse 9 48149 M?nster, Germany Tel.: +49 (0)251 83-58405

ADD REPLY • link 16.2 years ago Hans-Ulrich Klein ▴ 330

0

Entering edit mode

Dear Hans-Ulrich, in that case, I am afraid I cannot immediately tell you what the source of the problem is. You are right, the smoothed probe intensities of these probes should all be greater than y0. And in my analyses, I have never observed something else. How do the ChIP-enriched region look like when you plot them? (for example via plot(chers[[1]], eSetS, probeAnno) ). If these plots indicate correct results than at least the positions of your enriched regions seem to be correct and the problem is with assigning the probe identifiers to the enriched regions. There might be an issue with your probeAnno object and the way you generate it. What is the result of probeAnno["1.index"][probeAnno["1.start"]>=10001787 & probeAnno["1.start"]<=0002329] ? These probe identifiers should include the ones in the first enriched region. I would suggest to use different probe names than "as.character" of the row numbers. Due to R's implicit conversion between vector formats, such names could lead to all sorts of hard-to-debug problems. If you provide me with a short excerpt of your data and the example script, I could have a deeper look into it to see where the problem might be. Best regards, Joern On Tue, 20 Oct 2009 21:10:20 +0200, Hans-Ulrich Klein wrote > Dear Joern, > > the feature names of my ExpressionSet instance are: > > > all(featureNames(eSetS) == as.character(1:nrow(eSetS))) > [1] TRUE > > So in my case both expressions > > exprs(eSetS)[as.numeric(chers[[1]]@probes),] > and > > exprs(eSetS)[chers[[1]]@probes,] > return the same probes that have log ratios smaller than y0 as > described below. > > Best wishes, > Hans-Ulrich > > Joern Toedling wrote: > > Hello, > > > > I suspect that there is some issue with converting vectors between different > > formats and the identifiers of your probes (the 'featureNames' of the > > ExpressionSet) here. > > The actual way to obtain those intensities with version 1.8.0 should be > > > > exprs(eSetS)[as.numeric(chers[[1]]@probes),] > > > > Please let me know if this does not give the expected results. > > > > However, I admit that providing indices as a character vector for the probes > > slot was not necessary and rather misleading. Thus I have made slight changes > > to the function and provided an additional method 'probes' which allows you to > > obtain a character vector of probe names from each ChIP-enriched region > > without having to access any slots directly. > > > > These changes can be found in the current development version 1.9.15, which > > you can obtain from the Bioconductor repository tomorrow, and will also be in > > the new release version (Ringo 1.10.0) at the end of this month. > > > > With the new version, the following is the preferred way for obtaining the values: > > exprs(eSetS)[probes(chers[[1]]),] > > > > Hope this helps. > > > > Best regards, > > Joern > > > > On Mon, 19 Oct 2009 12:05:03 +0200, Hans-Ulrich Klein wrote > > > >> Hello, > >> > >> I am confused about the results returned from the > >> "findChersOnSmoothed" function in the Ringo package. I have an > >> ExpressionSet object storing normalized log ratios (ChIP / Control) > >> from three replicates. I use this analysis workflow: > >> > >> > eSetS = computeRunningMedians(eSet, probeAnno, modColumn="type", > >> winHalfSize=400, min.probes=5, > >> combineReplicates=TRUE) > >> [...] > >> > y0 = upperBoundNull(exprs(eSetS), prob=0.99) > >> > chers = findChersOnSmoothed(eSetS, probeAnno, thresholds=y0, > >> distCutOff=600, minProbesInRow=3) > >> > >> Surprisingly, the first enriched region does not contain any probe > >> intensity above the threshold y0. This applies to many regions > >> called enriched. > >> > >> > chers[[1]] > >> BCR_ABL.chr1.cher1 > >> Chr 1 : 10001787 - 10002329 > >> Antibody : BCR_ABL > >> Maximum level = 1.665789 > >> Score = 9.486747 > >> Spans 15 probes. > >> > y0 > >> [1] 0.7279903 > >> > dim(eSetS) > >> Features Samples > >> 4212009 1 > >> > exprs(eSetS[chers[[1]]@probes,]) > >> BCR_ABL > >> 112645 0.2140274 > >> 112646 0.2469170 > >> 112647 0.2485301 > >> 112648 0.2501433 > >> 112649 0.2765225 > >> 112650 0.2813286 > >> 112651 0.2803291 > >> 112652 0.2727159 > >> 112653 0.2469170 > >> 112654 0.2469170 > >> 112655 0.1166212 > >> 112656 0.2355814 > >> 112657 0.2355814 > >> 112658 0.1608379 > >> 112659 0.2063285 > >> > >> Did I check the correct probes? Should not be the intensities > 0.727? > >> > >> My Ringo version is 1.8.0. > >> > >> Thanks in advance, > >> Hans-Ulrich --- Joern Toedling Institut Curie -- U900 26 rue d'Ulm, 75005 Paris, FRANCE Tel. +33 (0)156246926

ADD REPLY • link 16.2 years ago Joern Toedling ▴ 450

0

Entering edit mode

Dear Joern, your guess was right. It was an issue with my probeAnno object. I created the probeAnno this way: (reads is an AlignedRead object with all my probes) pos = data.frame(CHROMOSOME=chromosome(reads), PROBE_ID=as.character(id(reads)), POSITION=position(reads), LENGTH=width(reads)) probeAnno = posToProbeAnno(pos, genome="Mus_musculus.NCBIM37.55", microarrayPlatform="mm.prompr.v02") Adding the parameter stringsAsFactors=FALSE to the data.frame() function solved my problem. Without that parameter the "X.index" in my probeAnno were factors. Thanks, Hans-Ulrich Joern Toedling wrote: > Dear Hans-Ulrich, > > in that case, I am afraid I cannot immediately tell you what the source of the > problem is. You are right, the smoothed probe intensities of these probes > should all be greater than y0. And in my analyses, I have never observed > something else. > How do the ChIP-enriched region look like when you plot them? > (for example via > plot(chers[[1]], eSetS, probeAnno) > ). If these plots indicate correct results than at least the positions of your > enriched regions seem to be correct and the problem is with assigning the > probe identifiers to the enriched regions. > There might be an issue with your probeAnno object and the way you generate it. > What is the result of > probeAnno["1.index"][probeAnno["1.start"]>=10001787 & > probeAnno["1.start"]<=0002329] > ? These probe identifiers should include the ones in the first enriched region. > I would suggest to use different probe names than "as.character" of the row > numbers. Due to R's implicit conversion between vector formats, such names > could lead to all sorts of hard-to-debug problems. > If you provide me with a short excerpt of your data and the example script, I > could have a deeper look into it to see where the problem might be. > > Best regards, > Joern > > On Tue, 20 Oct 2009 21:10:20 +0200, Hans-Ulrich Klein wrote > >> Dear Joern, >> >> the feature names of my ExpressionSet instance are: >> >> > all(featureNames(eSetS) == as.character(1:nrow(eSetS))) >> [1] TRUE >> >> So in my case both expressions >> > exprs(eSetS)[as.numeric(chers[[1]]@probes),] >> and >> > exprs(eSetS)[chers[[1]]@probes,] >> return the same probes that have log ratios smaller than y0 as >> described below. >> >> Best wishes, >> Hans-Ulrich >> >> Joern Toedling wrote: >> >>> Hello, >>> >>> I suspect that there is some issue with converting vectors between different >>> formats and the identifiers of your probes (the 'featureNames' of the >>> ExpressionSet) here. >>> The actual way to obtain those intensities with version 1.8.0 should be >>> >>> exprs(eSetS)[as.numeric(chers[[1]]@probes),] >>> >>> Please let me know if this does not give the expected results. >>> >>> However, I admit that providing indices as a character vector for the probes >>> slot was not necessary and rather misleading. Thus I have made slight changes >>> to the function and provided an additional method 'probes' which allows you to >>> obtain a character vector of probe names from each ChIP-enriched region >>> without having to access any slots directly. >>> >>> These changes can be found in the current development version 1.9.15, which >>> you can obtain from the Bioconductor repository tomorrow, and will also be in >>> the new release version (Ringo 1.10.0) at the end of this month. >>> >>> With the new version, the following is the preferred way for obtaining the >>> > values: > >>> exprs(eSetS)[probes(chers[[1]]),] >>> >>> Hope this helps. >>> >>> Best regards, >>> Joern >>> >>> On Mon, 19 Oct 2009 12:05:03 +0200, Hans-Ulrich Klein wrote >>> >>> >>>> Hello, >>>> >>>> I am confused about the results returned from the >>>> "findChersOnSmoothed" function in the Ringo package. I have an >>>> ExpressionSet object storing normalized log ratios (ChIP / Control) >>>> from three replicates. I use this analysis workflow: >>>> >>>> > eSetS = computeRunningMedians(eSet, probeAnno, modColumn="type", >>>> winHalfSize=400, min.probes=5, >>>> combineReplicates=TRUE) >>>> [...] >>>> > y0 = upperBoundNull(exprs(eSetS), prob=0.99) >>>> > chers = findChersOnSmoothed(eSetS, probeAnno, thresholds=y0, >>>> distCutOff=600, minProbesInRow=3) >>>> >>>> Surprisingly, the first enriched region does not contain any probe >>>> intensity above the threshold y0. This applies to many regions >>>> called enriched. >>>> >>>> > chers[[1]] >>>> BCR_ABL.chr1.cher1 >>>> Chr 1 : 10001787 - 10002329 >>>> Antibody : BCR_ABL >>>> Maximum level = 1.665789 >>>> Score = 9.486747 >>>> Spans 15 probes. >>>> > y0 >>>> [1] 0.7279903 >>>> > dim(eSetS) >>>> Features Samples >>>> 4212009 1 >>>> > exprs(eSetS[chers[[1]]@probes,]) >>>> BCR_ABL >>>> 112645 0.2140274 >>>> 112646 0.2469170 >>>> 112647 0.2485301 >>>> 112648 0.2501433 >>>> 112649 0.2765225 >>>> 112650 0.2813286 >>>> 112651 0.2803291 >>>> 112652 0.2727159 >>>> 112653 0.2469170 >>>> 112654 0.2469170 >>>> 112655 0.1166212 >>>> 112656 0.2355814 >>>> 112657 0.2355814 >>>> 112658 0.1608379 >>>> 112659 0.2063285 >>>> >>>> Did I check the correct probes? Should not be the intensities > 0.727? >>>> >>>> My Ringo version is 1.8.0. >>>> >>>> Thanks in advance, >>>> Hans-Ulrich >>>> > > > -- Hans-Ulrich Klein Department of Medical Informatics and Biomathematics University of M?nster Domagkstrasse 9 48149 M?nster, Germany Tel.: +49 (0)251 83-58405

ADD REPLY • link 16.2 years ago Hans-Ulrich Klein ▴ 330