Entering edit mode
sekawaiwai2006
▴
20
@sekawaiwai2006-9531
Last seen 9.2 years ago
Hello
Below are the codes that i used in order extract the upstream 1000 bp for every genes in A.thaliana. however, i get an error at the end when using the command :up1000seqs <- getSeq(genome, up1000), does anyone know what is the problem here??
thx
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘getSeq’ for signature ‘"character"’
My code is
library("TxDb.Athaliana.BioMart.plantsmart28") txdb <- TxDb.Athaliana.BioMart.plantsmart28 genome<-library("BSgenome.Athaliana.TAIR.TAIR9") gn <- sort(genes(txdb)) up1000 <- flank(gn, width=1000) up1000seqs <- getSeq(genome, up1000)
Session info is
> sessionInfo() R version 3.2.3 (2015-12-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.3 LTS locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] BSgenome.Athaliana.TAIR.TAIR9_1.3.1000 [2] BSgenome_1.38.0 [3] rtracklayer_1.30.1 [4] Biostrings_2.38.3 [5] XVector_0.10.0 [6] TxDb.Scerevisiae.UCSC.sacCer2.sgdGene_3.2.2 [7] TxDb.Athaliana.BioMart.plantsmart28_3.2.2 [8] GenomicFeatures_1.22.8 [9] AnnotationDbi_1.32.3 [10] Biobase_2.30.0 [11] GenomicRanges_1.22.3 [12] GenomeInfoDb_1.6.2 [13] IRanges_2.4.6 [14] S4Vectors_0.8.7 [15] BiocGenerics_0.16.1 loaded via a namespace (and not attached): [1] zlibbioc_1.16.0 GenomicAlignments_1.6.3 [3] BiocParallel_1.4.3 tools_3.2.3 [5] SummarizedExperiment_1.0.2 DBI_0.3.1 [7] lambda.r_1.1.7 futile.logger_1.4.1 [9] futile.options_1.0.0 bitops_1.0-6 [11] RCurl_1.95-4.7 biomaRt_2.26.1 [13] RSQLite_1.0.0 Rsamtools_1.22.0 [15] XML_3.98-1.3
so i have changed the code now
now the error appears like this:
I should have sat on my fingers and let others respond! The problem is that the sequence (chromosome) names in the TxDb are different from the sequence names in the BSgenome.
This could be because they are from the same genome build but using different naming conventions, or because the genome builds are different. From the following
it looks like the naming convention is different. One can rename seqlevels on either object (I think...)
and then have some success
I guess this occurs because some genes are within 1000 nucleotides of the start / end of the sequence, so the flank extends outside the chromosome bounds.
Following the advice
Returning to the warning message, I went looking for flanking sequences that were outside the end of the sequence length