Hi,
I'm getting the error in the subject line using DEXseq version 1.12.1, explained in detail below. Preprocessing of files and count data generation was carried out using the two python scripts provided with the package.
My flattened gff was initially an igenomes ucsc mouse mm10 gtf - I know ye recommend Ensembl but I would have had to realign all my BAM files - which I then flattened using dexseq_prepare_annotation.py. I generated a counts file using dexseq_count.py -p 'yes -s 'no' -f 'bam'.
I then removed the 5 lines of 'unmapped' info from all count files:
_ambiguous 0
_ambiguous_readpair_position 0
_empty 32653
_lowaqual 0
_notaligned 0
Then I did a count:
wc -l count_file = 216656
grep -c "exonic_part" flattened_file = 216656
It seemed OK, I thought.
I then ran the following code in R to generate the error:
dxd = DEXSeqDataSetFromHTSeq(
countsFiles,
sampleData=sample_names,
design= ~ sample + exon + condition:exon,
flattenedfile=flattened_gtf )
Error in DEXSeqDataSetFromHTSeq(countsFiles, sampleData = sample_names, :
Count files do not correspond to the flattened annotation file
sample_names is a 2-column data.frame of samples and a condition for each sample.
I'd appreciate any help you might provide other than to realign using Ensembl and use their gtf :)
Thanks & A happy new year to all,
Sean.
=================================================================
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] DEXSeq_1.12.1 BiocParallel_1.0.0 DESeq2_1.6.3 RcppArmadillo_0.4.550.1.0 Rcpp_0.11.3
[6] GenomicRanges_1.18.3 GenomeInfoDb_1.2.4 IRanges_2.0.1 S4Vectors_0.4.0 Biobase_2.26.0
[11] BiocGenerics_0.12.1
loaded via a namespace (and not attached):
[1] acepack_1.3-3.3 annotate_1.44.0 AnnotationDbi_1.28.1 base64enc_0.1-2 BatchJobs_1.5 BBmisc_1