BSgenome.Hsapiens.UCSC.hg19 Sequence missing
1
0
Entering edit mode
@lakshmanan-iyer-1829
Last seen 8.6 years ago
United States

Hi

I downloaded the latest bsgenome...hg19. How many of the chromosomes have "N" as sequence.

however, chrM and others have regular letters.

Am I missing something here?

more information below

> library ("BSgenome.Hsapiens.UCSC.hg19")
> genome <- BSgenome.Hsapiens.UCSC.hg19
> genome$chr1
  249250621-letter "DNAString" instance
seq: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

> genome$chrM
  16571-letter "DNAString" instance
seq: GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTGTGCACGCGAT...ACTGTATCCGACATCTGGTTCCTACTTCAGGGCCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACATCACGATG

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BiocInstaller_1.18.4              BSgenome.Hsapiens.UCSC.hg19_1.4.0
 [3] BSgenome_1.36.3                   rtracklayer_1.28.10              
 [5] Biostrings_2.36.4                 XVector_0.8.0                    
 [7] GenomicRanges_1.20.6              GenomeInfoDb_1.4.2               
 [9] IRanges_2.2.7                     S4Vectors_0.6.5                  
[11] BiocGenerics_0.14.0              

loaded via a namespace (and not attached):
 [1] XML_3.98-1.3            Rsamtools_1.20.4        bitops_1.0-6           
 [4] GenomicAlignments_1.4.1 futile.options_1.0.0    zlibbioc_1.14.0        
 [7] futile.logger_1.4.1     lambda.r_1.1.7          BiocParallel_1.2.21    
[10] tools_3.2.2             RCurl_1.95-4.7         

bsgenome hg19 • 1.2k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 2 days ago
United States

Most chromosomes start with many N's reflecting our imperfect knowledge of telomeres. Here you can see that the first 100 nucleotides are N , but the 100 nucleotides starting at position 100000 are not.

> which = GRanges("chr1", IRanges(c(1, 100000), width=100))
> getSeq(BSgenome.Hsapiens.UCSC.hg19, which)
  A DNAStringSet instance of length 2
    width seq
[1]   100 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
[2]   100 CACTAAGCACACAGAGAATAATGTCTAGAATCTG...GCAGTCACACAGGCTGACATGTAAGCATCGCCA

 

ADD COMMENT

Login before adding your answer.

Traffic: 938 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6