Hi everyone,
so i am trying to extract the 1000bp upstream, for every gene in solanum pennellii, i know i would need to build the txdb object and the bsgenome for this, firstly i downloaded the fasta format genome and the gff file from ncbi genome database
http://www.ncbi.nlm.nih.gov/genome/?term=solanum+pennellii
after that , i used this perl script to split the genome of s.pennellii into chromosomes and build the bsgenome from these 12 chromosomes.
#!/usr/bin/perl
$f = $ARGV[0]; #get the file name
open (INFILE, "<$f")
or die "Can't open: $f $!";
while (<INFILE>) {
$line = $_;
chomp $line;
if ($line =~ /\>/) { #if has fasta >
close OUTFILE;
$new_file = substr($line,1);
$new_file .= ".fa";
open (OUTFILE, ">$new_file")
or die "Can't open: $new_file $!";
}
print OUTFILE "$line\n";
}
close OUTFILE;
__________________________________________
then i created the seed file and forge the bsgenome package as the instructions. there are actually no errors when checking the built package.
Package:BSgenome.Slycopersicum.ensembl.Heinz1706
Title:Full genome sequences for Solanum lycopersicum
Description:Full genome sequences for Solanmum lycopersicum provided by ensembl
Version:1.0
organism:Solanum lycopersicum
common_name:Tomato
provider:ensembl
provider_version:Heinz1706
release_date:Jan.2016
release_name:S.lyco Genome Sequencing
source_url:ftp://ftp.ensemblgenomes.org/pub/plants/release-30/fasta/solanum_lycopersicum/dna/
organism_biocview: Solanum_lycopersicum
BSgenomeObjname: Slycopersicum
seqnames: paste("chr", c(1:12), sep="")
circ_seqs: NULL
mseqnames: NULL
SrcDataFiles:ftp://ftp.ensemblgenomes.org/pub/plants/release-30/fasta/solanum_lycopersicum/dna/ PkgExamples: genome$GL896898 # same as genome[["GL896898"]]
seqs_srcdir: ~/Desktop/S.lyco
_________________________
Unfortunately, when i typed the code, the test return something weird , also the seqinfo returns an error too
library("BSgenome.Spennellii.ncbi.SPENNV200")
test<-library("BSgenome.Spennellii.ncbi.SPENNV200")
> test
[1] "BSgenome.Athaliana.TAIR.TAIR9"
[2] "TxDb.Athaliana.BioMart.plantsmart28"
[3] "GenomicFeatures"
[4] "AnnotationDbi"
[5] "Biobase"
[6] "BSgenome.Spennellii.ncbi.SPENNV200"
[7] "BSgenome"
[8] "rtracklayer"
[9] "Biostrings"
[10] "XVector"
[11] "GenomicRanges"
[12] "GenomeInfoDb"
[13] "IRanges"
[14] "S4Vectors"
[15] "stats4"
[16] "BiocGenerics"
[17] "parallel"
[18] "stats"
[19] "graphics"
[20] "grDevices"
[21] "utils"
[22] "datasets"
[23] "methods"
[24] "base"
> seqinfo(test)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘seqinfo’ for signature ‘"character"’
_______________________
for the txdb object , the code i used was as below and it looks wrong to me as well.
gtffile <- file.path("~/Desktop/s.pen/gff/GCF_001406875.1_SPENNV200_genomic.gff")
makeTxDbFromGFF(gtffile)
txdb <- makeTxDbFromGFF(gtffile)
seqinfo(txdb)
Seqinfo object with 12 sequences from an unspecified genome; no seqlengths: seqnames seqlengths isCircular genome NC_028637.1 <NA> <NA> <NA> NC_028638.1 <NA> <NA> <NA> NC_028639.1 <NA> <NA> <NA> NC_028640.1 <NA> <NA> <NA> NC_028641.1 <NA> <NA> <NA> ... ... ... ... NC_028644.1 <NA> <NA> <NA> NC_028645.1 <NA> <NA> <NA> NC_028646.1 <NA> <NA> <NA> NC_028647.1 <NA> <NA> <NA> NC_028648.1 <NA> <NA> <NA>
can anybody tell me may be i have done something wrong?? thank you very much.
> sessionInfo()
R version 3.2.3 (2015-12-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.3 LTS locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] BSgenome.Athaliana.TAIR.TAIR9_1.3.1000 [2] TxDb.Athaliana.BioMart.plantsmart28_3.2.2 [3] GenomicFeatures_1.22.12 [4] AnnotationDbi_1.32.3 [5] Biobase_2.30.0 [6] BSgenome.Spennellii.ncbi.SPENNV200_1.0 [7] BSgenome_1.38.0 [8] rtracklayer_1.30.1 [9] Biostrings_2.38.3 [10] XVector_0.10.0 [11] GenomicRanges_1.22.4 [12] GenomeInfoDb_1.6.3 [13] IRanges_2.4.6 [14] S4Vectors_0.8.11 [15] BiocGenerics_0.16.1 loaded via a namespace (and not attached): [1] zlibbioc_1.16.0 GenomicAlignments_1.6.3 [3] BiocParallel_1.4.3 tools_3.2.3 [5] SummarizedExperiment_1.0.2 DBI_0.3.1 [7] lambda.r_1.1.7 futile.logger_1.4.1 [9] futile.options_1.0.0 bitops_1.0-6 [11] biomaRt_2.26.1 RCurl_1.95-4.7 [13] RSQLite_1.0.0 Rsamtools_1.22.0 [15] XML_3.98-1.3

Instead of saying
test <- library("BSgenome.Spennellii.ncbi.SPENNV200"), saylibrary("BSgenome.Spennellii.ncbi.SPENNV200"), test = BSgenome.Spennellii.ncbi.SPENNV200