getSeq human genomic sequences
3
0
Entering edit mode
Tiandao Li ▴ 60
@tiandao-li-3161
Last seen 9.7 years ago
Hello, I used getSeq to retrieve human genomic sequences from the BSgenome data package: library(BSgenome.Hsapiens.UCSC.hg18) # cpg is GTF file chr6 <- cpg[grep("6", cpg[[5]]),] # chrom 6 length(unique(chr6[,6])) # 10683 genomic positions # loop to match "CG" and print 100bp to file for (i in 1:10683) { if (getSeq(Hsapiens, "chr6", unique(chr6[,6])[i],(unique(chr6[,6])[i]+1))=="CG") { write.table(paste(">", unique(chr6[,6])[i],sep=""),file="chr6FOR",row.names=F, col.names=F, append=TRUE) write.table(getSeq(Hsapiens, "chr6", (unique(chr6[,6])[i]-50),(unique(chr6[,6])[i]+50)),file="chr6FOR",row. names=F, col.names=F, append=TRUE) } } However, I always got the error meg like the following when i = 5263 Error in solveUserSEW(length(x), start = start, end = end, width = width) : solving row 1: 'allow.nonnarrowing' is FALSE and the supplied start (63219120) is > refwidth + 1 Please let me know if there's something I should be paying attention to. Thanks! [[alternative HTML version deleted]]
• 2.9k views
ADD COMMENT
0
Entering edit mode
Tiandao Li ▴ 60
@tiandao-li-3161
Last seen 9.7 years ago
Hello, I used the same codes for chr22, it didn't show any error msg. #chr22 for (i in 1:17183) { if (getSeq(Hsapiens, "chr22", unique(chr22[,6])[i],(unique(chr22[,6])[i]+1))=="CG") { write.table(paste(">", unique(chr22[,6])[i],sep=""),file="chr22FOR",row.names=F, col.names=F, append=TRUE) write.table(getSeq(Hsapiens, "chr22", (unique(chr22[,6])[i]-50),(unique(chr22[,6])[i]+50)),file="chr22FOR",r ow.names=F, col.names=F, append=TRUE) } } > sessionInfo() R version 2.8.1 (2008-12-22) i486-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] BSgenome.Hsapiens.UCSC.hg18_1.3.11 BSgenome_1.10.3 [3] Biostrings_2.10.15 IRanges_1.0.9 loaded via a namespace (and not attached): [1] grid_2.8.1 lattice_0.17-20 Matrix_0.999375-18 tools_2.8.1 > On Thu, Jan 29, 2009 at 9:11 PM, Tiandao Li <litd99@gmail.com> wrote: > Hello, > > I used getSeq to retrieve human genomic sequences from the BSgenome data > package: > > library(BSgenome.Hsapiens.UCSC.hg18) > > # cpg is GTF file > chr6 <- cpg[grep("6", cpg[[5]]),] # chrom 6 > length(unique(chr6[,6])) # 10683 genomic positions > > # loop to match "CG" and print 100bp to file > for (i in 1:10683) > { > if (getSeq(Hsapiens, "chr6", > unique(chr6[,6])[i],(unique(chr6[,6])[i]+1))=="CG") > { > write.table(paste(">", > unique(chr6[,6])[i],sep=""),file="chr6FOR",row.names=F, col.names=F, > append=TRUE) > write.table(getSeq(Hsapiens, "chr6", > (unique(chr6[,6])[i]-50),(unique(chr6[,6])[i]+50)),file="chr6FOR",ro w.names=F, > col.names=F, append=TRUE) > } > } > > However, I always got the error meg like the following when i = 5263 > > Error in solveUserSEW(length(x), start = start, end = end, width = width) : > > solving row 1: 'allow.nonnarrowing' is FALSE and the supplied start > (63219120) is > refwidth + 1 > > Please let me know if there's something I should be paying attention to. > > Thanks! > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 3 days ago
Seattle, WA, United States
Hi Tiando, Have you checked the 'start' and 'end' you get when i = 5263? They are probably out of limits. Try to isolate the problem with something like: i <- 5263 start <- unique(chr6[,6])[i] end <- start + 1 getSeq(Hsapiens, "chr6", start, end) Do you have 1 <= start <= end <= length(Hsapiens$chr6) ? I've modified the subseq() function so the error mesg you get is a little bit more informative (subseq() is defined in the IRanges package and called by getSeq()). After you update to IRanges 1.0.11 (available within 24 hours), you will get: > getSeq(Hsapiens, 'chr6', start=170899994, end=170899995) Error in solveUserSEW(length(x), start = start, end = end, width = width) : solving row 1: 'allow.nonnarrowing' is FALSE and the supplied start (170899994) is > refwidth + 1 Error in subseq(unmasked(x), start = start, end = end, width = width) : Invalid sequence coordinates. Are you sure the supplied 'start', 'end' and 'width' arguments are defining a region that is within the limits of the sequence? However, on thing is puzzling me. The error mesg you get shows that the start is 63219120 and that is of course within the limits of Human chr6 (this chromosome is 170899992 long in hg18), so maybe something else is going on... Cheers, H. Tiandao Li wrote: > Hello, > > I used getSeq to retrieve human genomic sequences from the BSgenome data > package: > > library(BSgenome.Hsapiens.UCSC.hg18) > > # cpg is GTF file > chr6 <- cpg[grep("6", cpg[[5]]),] # chrom 6 > length(unique(chr6[,6])) # 10683 genomic positions > > # loop to match "CG" and print 100bp to file > for (i in 1:10683) > { > if (getSeq(Hsapiens, "chr6", > unique(chr6[,6])[i],(unique(chr6[,6])[i]+1))=="CG") > { > write.table(paste(">", > unique(chr6[,6])[i],sep=""),file="chr6FOR",row.names=F, col.names=F, > append=TRUE) > write.table(getSeq(Hsapiens, "chr6", > (unique(chr6[,6])[i]-50),(unique(chr6[,6])[i]+50)),file="chr6FOR",ro w.names=F, > col.names=F, append=TRUE) > } > } > > However, I always got the error meg like the following when i = 5263 > > Error in solveUserSEW(length(x), start = start, end = end, width = width) : > solving row 1: 'allow.nonnarrowing' is FALSE and the supplied start > (63219120) is > refwidth + 1 > > Please let me know if there's something I should be paying attention to. > > Thanks! > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
@gy852507309-12626
Last seen 7.2 years ago
Norway/bergen

Make sure the reference genome U aligned with and the genome for finding the coverage must be the same

ADD COMMENT

Login before adding your answer.

Traffic: 517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6