Entering edit mode
viritha kaza
▴
580
@viritha-kaza-4318
Last seen 10.3 years ago
Hi group,
I am interested in retrieving about 2000 sequences with the specific
chromosome number,start and end site.
I was thinking of using BSgenome package for this.
>source("http://bioconductor.org/biocLite.R")
> biocLite("BSgenome","BSgenome.Hsapiens.UCSC.hg19")
> library(BSgenome)
> library("BSgenome.Hsapiens.UCSC.hg19")
>myseq<- getSeq(Hsapiens,"chr2",start=10000,end=10020)
# this would work for specific values.
#So I tried to use a dataframe where it would retrieve chromosome
number
from by using
>full<-as.matrix(read.table(test_seq.txt,sep=\t,quote=,header=Tr
ue,
as.is=TRUE))
>full.df<-data.frame(full)
# test_seq contains this information
#chromosome Start End
#chr2 10000 10020
#chr3 10000 10020
> myseq<- getSeq(Hsapiens,full.df$chromosome,start=10000,end=10020)
#but then when I use start=full.df$Start. It naturally throws an error
saying 'start' must be a vector of integers
Questions:
How Do I handle this?
Does start here mean that each chromosome numbering starts from 1?
How do I split each sequence retrieved and create as fasta format
(>)with
sequence name attached to them retrieved from my input file?
>sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BSgenome.Hsapiens.UCSC.hg19_1.3.17 BSgenome_1.20.0
[3] Biostrings_2.20.1 GenomicRanges_1.4.6
[5] IRanges_1.10.4
loaded via a namespace (and not attached):
[1] tools_2.13.0
Waiting for your suggestions,
Thank you,
Viritha
[[alternative HTML version deleted]]