Question: Count files do not correspond to the flattened annotation file
gravatar for so2346
3.9 years ago by
United States
so23460 wrote:


I'm getting the error in the subject line using DEXseq version 1.12.1, explained in detail below. Preprocessing of files and count data generation was carried out using the two python scripts provided with the package.

My flattened gff was initially an igenomes ucsc mouse mm10 gtf - I know ye recommend Ensembl but I would have had to realign all my BAM files - which I then flattened using I generated a counts file using -p 'yes -s 'no' -f 'bam'.

I then removed the 5 lines of 'unmapped' info from all count files:
_ambiguous    0
_ambiguous_readpair_position    0
_empty    32653
_lowaqual    0
_notaligned    0

Then I did a count:

wc -l count_file =  216656

grep -c "exonic_part" flattened_file =  216656

It seemed OK, I thought.

I then ran the following code in R to generate the error:
dxd = DEXSeqDataSetFromHTSeq(
        design= ~ sample + exon + condition:exon,
        flattenedfile=flattened_gtf )

Error in DEXSeqDataSetFromHTSeq(countsFiles, sampleData = sample_names,  : 
  Count files do not correspond to the flattened annotation file

sample_names is a 2-column data.frame of samples and a condition for each sample.

I'd appreciate any help you might provide other than to realign using Ensembl and use their gtf :)

Thanks & A happy new year to all,  

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DEXSeq_1.12.1             BiocParallel_1.0.0        DESeq2_1.6.3              RcppArmadillo_0.4.550.1.0 Rcpp_0.11.3              
 [6] GenomicRanges_1.18.3      GenomeInfoDb_1.2.4        IRanges_2.0.1             S4Vectors_0.4.0           Biobase_2.26.0           
[11] BiocGenerics_0.12.1      

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3      annotate_1.44.0      AnnotationDbi_1.28.1 base64enc_0.1-2      BatchJobs_1.5        BBmisc_1

ADD COMMENTlink modified 3.8 years ago by Alejandro Reyes1.6k • written 3.9 years ago by so23460
gravatar for Alejandro Reyes
3.8 years ago by
Alejandro Reyes1.6k
Dana-Farber Cancer Institute, Boston, USA
Alejandro Reyes1.6k wrote:


There is no need to delete those 5 lines manually, the function "DEXSeqDataSetFromHTSeq" will remove them automatically.


ADD COMMENTlink written 3.8 years ago by Alejandro Reyes1.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 123 users visited in the last hour