Entering edit mode
Hi Tiandao,
I'm glad getSeq() is throwing an error instead of silently trying
to return something when the start/end are out of limits :-)
We supported hg16 (NCBI34) until BioC 2.1 (R-2.6). Starting with
BioC 2.2 (R-2.7, released in April 2008), we dropped it and kept
only hg17 and hg18. Note that hg16 is 5.5 years old now (from
Jul. 2003):
http://genome.ucsc.edu/FAQ/FAQreleases#release1
I would recommend you try to get a position file with positions
relative to hg17 or hg18 if possible. If this is not an option,
then you can always build your own BSgenome data package for hg16.
The process of building a BSgenome data package is pretty
straightforward and is documented in the BSgenomeForge vignette
from the BSgenome software package. If you don't need to include
masks then the process is even simpler. Let me know if you need
help with this.
Cheers,
H.
Tiandao Li wrote:
> Hi Herve,
>
> Thanks for your kind help. The position file my colleague sent was
> included some position out of chromosome limits of current release
of
> human genome (NCBI36). And the positions on human genome were
calculated
> with NCBI34. If there any human genome annotation package based on
NCBI34?
>
> Thanks,
>
> Tiandao
>
> On Fri, Jan 30, 2009 at 2:00 PM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote:
>
> Tiandao Li wrote:
>
>
>
> On Fri, Jan 30, 2009 at 1:04 PM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>> wrote:
>
> Hi Tiando,
>
> Have you checked the 'start' and 'end' you get when i =
5263?
> They are
> probably out of limits.
>
> > unique(chr6[,6])[5262]
> [1] 122701977
> > unique(chr6[,6])[5263]
> [1] 122701996
> > unique(chr6[,6])[5264]
> [1] 39870984
>
>
> Try to isolate the problem with something like:
>
> i <- 5263
> start <- unique(chr6[,6])[i]
> end <- start + 1
> getSeq(Hsapiens, "chr6", start, end)
>
> > getSeq(Hsapiens, "chr6",
> unique(chr6[,6])[5262],(unique(chr6[,6])[5262]+1))
> [1] "CA"
> > getSeq(Hsapiens, "chr6",
> unique(chr6[,6])[5263],(unique(chr6[,6])[5263]+1))
> [1] "TG"
> > getSeq(Hsapiens, "chr6",
> unique(chr6[,6])[5264],(unique(chr6[,6])[5264]+1))
> [1] "TC"
> >
>
>
> So calling getSeq() outside the for() loop seems to work
> for i = 5262, 5263 and 5264. Are you sure your for() loop
> failed for i = 5263?
>
>
>
>
> Do you have 1 <= start <= end <= length(Hsapiens$chr6) ?
>
> NO
>
>
> Well, that's the problem then. getSeq() will fail if the
start/end are
> not within the limits of the sequence.
> But what I don't understand is why you are saying NO. The
start/end
> you get are 122701996 and 122701997 (those are the numbers you
are
> showing
> above) and length(Hsapiens$chr6) is 170899992. Don't you agree
that
> 1 <= 122701996 <= 122701997 <= 170899992 ?
>
> Another thing I don't understand is that in your first post the
> error message
> you got was showing a start value of 63219120 (but that's
apparently
> not the
> start value corresponding to i = 5263).
>
>
>
>
> I've modified the subseq() function so the error mesg you
get
> is a
> little
> bit more informative (subseq() is defined in the IRanges
> package and
> called by getSeq()). After you update to IRanges 1.0.11
> (available
> within
> 24 hours), you will get:
>
> > getSeq(Hsapiens, 'chr6', start=170899994,
end=170899995)
>
> > getSeq(Hsapiens, 'chr6', start=170899994, end=170899995)
> Error in solveUserSEW(length(x), start = start, end = end,
width
> = width) :
> solving row 1: 'allow.nonnarrowing' is FALSE and the
supplied
> start (170899994) is > refwidth + 1
>
>
> Yes this is exactly what to expect because here the start/end
are out of
> limits. You'll see the improved error mesg printed by getSeq()
when you
> update IRanges to 1.0.11 (available in about 24 hours).
>
>
> > sessionInfo()
> other attached packages:
> [1] BSgenome.Hsapiens.UCSC.hg18_1.3.11 BSgenome_1.10.3
> [3] Biostrings_2.10.15 IRanges_1.0.9
>
>
>
> Error in solveUserSEW(length(x), start = start, end =
end,
> width =
> width) :
> solving row 1: 'allow.nonnarrowing' is FALSE and the
supplied
> start (170899994) is > refwidth + 1
> Error in subseq(unmasked(x), start = start, end = end,
width
> = width) :
> Invalid sequence coordinates.
> Are you sure the supplied 'start', 'end' and 'width'
arguments
> are defining a region that is within the limits of the
> sequence?
>
> YES!
>
>
> Well no, not in your call to getSeq above (start=170899994,
> end=170899995),
> sorry!
>
> All this confusion could be avoided if you sent me some stand-
alone
> code that I can run myself in order to reproduce the problem.
>
> Cheers,
>
> H.
>
>
>
>
> However, on thing is puzzling me. The error mesg you get
> shows that
> the start is 63219120 and that is of course within the
limits
> of Human
> chr6 (this chromosome is 170899992 long in hg18), so
maybe
> something
> else
> is going on...
>
> Cheers,
> H.
>
>
> Tiandao Li wrote:
>
> Hello,
>
> I used getSeq to retrieve human genomic sequences
from the
> BSgenome data
> package:
>
> library(BSgenome.Hsapiens.UCSC.hg18)
>
> # cpg is GTF file
> chr6 <- cpg[grep("6", cpg[[5]]),] # chrom 6
> length(unique(chr6[,6])) # 10683 genomic positions
>
> # loop to match "CG" and print 100bp to file
> for (i in 1:10683)
> {
> if (getSeq(Hsapiens, "chr6",
> unique(chr6[,6])[i],(unique(chr6[,6])[i]+1))=="CG")
> {
> write.table(paste(">",
>
unique(chr6[,6])[i],sep=""),file="chr6FOR",row.names=F,
> col.names=F,
> append=TRUE)
> write.table(getSeq(Hsapiens, "chr6",
>
> (unique(chr6[,6])[i]-50),(unique(chr6[,6])[i]+50)),file="ch
r6FOR",row.names=F,
> col.names=F, append=TRUE)
> }
> }
>
> However, I always got the error meg like the
following
> when i = 5263
>
> Error in solveUserSEW(length(x), start = start, end =
> end, width
> = width) :
> solving row 1: 'allow.nonnarrowing' is FALSE and the
> supplied start
> (63219120) is > refwidth + 1
>
> Please let me know if there's something I should be
paying
> attention to.
>
> Thanks!
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> <mailto:bioconductor at="" stat.math.ethz.ch="">
> <mailto:bioconductor at="" stat.math.ethz.ch=""> <mailto:bioconductor at="" stat.math.ethz.ch="">>
>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
>
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> -- Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
>
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319