Count files do not correspond to the flattened annotation file
1
0
Entering edit mode
so2346 • 0
@so2346-7210
Last seen 9.3 years ago
United States

Hi, 

I'm getting the error in the subject line using DEXseq version 1.12.1, explained in detail below. Preprocessing of files and count data generation was carried out using the two python scripts provided with the package.

My flattened gff was initially an igenomes ucsc mouse mm10 gtf - I know ye recommend Ensembl but I would have had to realign all my BAM files - which I then flattened using dexseq_prepare_annotation.py. I generated a counts file using dexseq_count.py -p 'yes -s 'no' -f 'bam'.

I then removed the 5 lines of 'unmapped' info from all count files:
_ambiguous    0
_ambiguous_readpair_position    0
_empty    32653
_lowaqual    0
_notaligned    0

Then I did a count:

wc -l count_file =  216656

grep -c "exonic_part" flattened_file =  216656

It seemed OK, I thought.

I then ran the following code in R to generate the error:
dxd = DEXSeqDataSetFromHTSeq(
        countsFiles,
        sampleData=sample_names,
        design= ~ sample + exon + condition:exon,
        flattenedfile=flattened_gtf )

Error in DEXSeqDataSetFromHTSeq(countsFiles, sampleData = sample_names,  : 
  Count files do not correspond to the flattened annotation file

sample_names is a 2-column data.frame of samples and a condition for each sample.

I'd appreciate any help you might provide other than to realign using Ensembl and use their gtf :)

Thanks & A happy new year to all,  
Sean.
=================================================================

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DEXSeq_1.12.1             BiocParallel_1.0.0        DESeq2_1.6.3              RcppArmadillo_0.4.550.1.0 Rcpp_0.11.3              
 [6] GenomicRanges_1.18.3      GenomeInfoDb_1.2.4        IRanges_2.0.1             S4Vectors_0.4.0           Biobase_2.26.0           
[11] BiocGenerics_0.12.1      

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3      annotate_1.44.0      AnnotationDbi_1.28.1 base64enc_0.1-2      BatchJobs_1.5        BBmisc_1

dexseq • 1.7k views
ADD COMMENT
0
Entering edit mode
Alejandro Reyes ★ 1.9k
@alejandro-reyes-5124
Last seen 1 day ago
Novartis Institutes for BioMedical Rese…

Hi,

There is no need to delete those 5 lines manually, the function "DEXSeqDataSetFromHTSeq" will remove them automatically.

Alejandro

ADD COMMENT

Login before adding your answer.

Traffic: 497 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6