Error in DESeqDataSetFromHTSeqCount: Gene IDs (first column) differ between files.
2
0
Entering edit mode
JunLVI ▴ 40
@junlvi-8996
Last seen 6.3 years ago
Japan

Hi, I was trying to import the count result from HTSeq Count into DESeq2. 

# inside of "HTSeq_Count_ChIPpeak_H3K27ac_geneX.csv"
"Samplename", "countfile", "genotype", "compartment"
"H3K27ac_WTCellB","ChIPpeakWTCellB","WT","CellB"
"H3K27ac_geneXcKOCellA","ChIPpeakgeneXcKOCellA","geneXcKO","CellA"
"H3K27ac_geneXcKOCellC","ChIPpeakgeneXcKOCellC","geneXcKO","CellC"
"H3K27ac_WTCellC","ChIPpeakWTCellC","WT","CellC"
"H3K27ac_geneXcKOCellB","ChIPpeakgeneXcKOCellB","geneXcKO","CellB"
"H3K27ac_WTCellA","ChIPpeakWTCellA","WT","CellA"

> sampleTable <- read.csv(file="HTSeq_Count_ChIPpeak_H3K27ac_geneX.csv")
> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
+                                       directory = directory,
+                                        design= ~ genetype + compartment)

met error: 

Error in DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory,  : 
  Gene IDs (first column) differ between files.
In addition: Warning messages:
1: In is.na(e1) | is.na(e2) :
  longer object length is not a multiple of shorter object length
  ....
  ....
10: In `==.default`(a$V1, l[[1]]$V1) :
  longer object length is not a multiple of shorter object length

 

"Gene IDs (first column) differ between files." 

What is "Gene IDs" indicated here? I have only gff file without "gene_id" 

Any suggestion ? 

Update: 

I checked the example 

recreated sample like:

> sampleTable
               Samplename             countfile  genotype compartment
1         ChIPpeakWTCellB       ChIPpeakWTCellB        WT       CellB
2   ChIPpeakgeneXcKOCellA ChIPpeakgeneXcKOCellA  geneXcKO       CellA
3   ChIPpeakgeneXcKOCellC ChIPpeakgeneXcKOCellC  geneXcKO       CellC
4         ChIPpeakWTCellC       ChIPpeakWTCellC        WT       CellC
5   ChIPpeakgeneXcKOCellB ChIPpeakgeneXcKOCellB  geneXcKO       CellB
6         ChIPpeakWTCellA       ChIPpeakWTCellA        WT       CellA​

but re run the process, did not make any difference. 

I also run the example  with the package "papilla" with no problem. So I guess R or R studio is OK...

PSS: the count file was generated by HTSeq Count: 

gt bed_to_gff3 -o ChIPpeak.gff ChIPpeak.bed # convert ChIPpeak.bed file into gff by genometools http://genometools.org/tools.html
# count ChIP reads fall into genomic regions defined by ChIPpeak.gff                                   
python -m HTSeq.scripts.count -f bam -s no -t BED_feature -i Name -o ChIPpeakGeneXcKOCellA  H3K27ac_GeneXcKOCellA.bam ChIPpeak.gff


​here is R information:

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.4

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DESeq2_1.12.4              RColorBrewer_1.1-2         DiffBind_2.0.9         
[4] SummarizedExperiment_1.2.3 Biobase_2.32.0             GenomicRanges_1.24.3   
[7] GenomeInfoDb_1.8.7         IRanges_2.6.1              S4Vectors_0.10.3       
[10] BiocGenerics_0.18.0       

loaded via a namespace (and not attached):
[1] Category_2.38.0         bitops_1.0-6            tools_3.3.2         
[4] backports_1.0.5         R6_2.2.0                rpart_4.1-10        
[7] KernSmooth_2.23-15      Hmisc_4.0-2             DBI_0.5-1           
[10] lazyeval_0.2.0          colorspace_1.3-2        nnet_7.3-12         
[13] gridExtra_2.2.1         sendmailR_1.2-1         graph_1.50.0        
[16] htmlTable_1.9           rtracklayer_1.32.2      caTools_1.17.1      
[19] scales_0.4.1            checkmate_1.8.2         BatchJobs_1.6       
[22] genefilter_1.54.2       RBGL_1.48.1             stringr_1.1.0       
[25] digest_0.6.12           Rsamtools_1.24.0        foreign_0.8-67      
[28] AnnotationForge_1.14.2  XVector_0.12.1          base64enc_0.1-3     
[31] htmltools_0.3.5         limma_3.28.21           htmlwidgets_0.8     
[34] RSQLite_1.1-2           BBmisc_1.10             GOstats_2.38.1      
[37] hwriter_1.3.2           BiocParallel_1.6.6      gtools_3.5.0        
[40] acepack_1.4.1           dplyr_0.5.0             RCurl_1.95-4.8      
[43] magrittr_1.5            GO.db_3.3.0             Formula_1.2-1       
[46] Matrix_1.2-8            Rcpp_0.12.9             munsell_0.4.3       
[49] stringi_1.1.2           edgeR_3.14.0            zlibbioc_1.18.0     
[52] gplots_3.0.1            fail_1.3                plyr_1.8.4          
[55] grid_3.3.2              gdata_2.17.0            lattice_0.20-34     
[58] Biostrings_2.40.2       splines_3.3.2           GenomicFeatures_1.24.5
[61] annotate_1.50.1         locfit_1.5-9.1          knitr_1.15.1        
[64] rjson_0.2.15            systemPipeR_1.6.4       geneplotter_1.50.0     

 

chipseq deseq2 htseqcounts • 3.8k views
ADD COMMENT
1
Entering edit mode

It is because of some lines at the end of your count files. Open your count file and scroll down to the bottom. You will see some lines that are not counts. Delete them and save. Reimport your counts to the R and run the code. Now it should be working.

ADD REPLY
1
Entering edit mode
JunLVI ▴ 40
@junlvi-8996
Last seen 6.3 years ago
Japan

I did not figure out how to using DESeqDataSetFromHTSeqCount to input my data into DESeq2, 

but tried "DESeqDataSetFromMatrix", and it worked out, I put it here, in case someone met similar issue:

source("https://bioconductor.org/biocLite.R")
biocLite("Rsubread") 
library("Rsubread")
# count the reads using featureCounts from "Rsubread"
ChIPpeakK27ac_Count <- featureCounts(files=c("H3K27ac_geneXcKOCellA.bam",
                      "H3K27ac_geneXcKOCellC.bam",
                      "H3K27ac_WTCellB.bam",
                      "H3K27ac_geneXcKOCellB.bam",
                      "H3K27ac_WTCellA.bam",
                      "H3K27ac_WTCellC.bam"),annot.ext="ChIPpeak.gtf",
                       isGTFAnnotationFile=TRUE,
                       GTF.featureType="peak",
                       GTF.attrType="gene_id",
                       nthreads=4)

# you need to create a file "H3K27ac_geneX_sampletable.csv"
H3K27ac_geneXcKOCellA.bam    
H3K27ac_geneXcKOCellC.bam        
H3K27ac_WTCellB.bam
H3K27ac_geneXcKOCellB.bam        
H3K27ac_WTCellA.bam        
H3K27ac_WTCellC.bam

sample.table <- read.csv("H3K27ac_geneX_sample.csv",header=T)
dds.fc <- DESeqDataSetFromMatrix(ChIPpeakK27ac_Count$counts,
                                 colData=sample.table, 
                                 design=~ genotype + compartment)

 

ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 9 hours ago
United States

The error says that the count files have different length. Do they?

ADD COMMENT
0
Entering edit mode

@Michael Love, thank you for your suggestion. I am not so sure what " count files have different length" means:

here is the size of the files as the out put of HTSeq count ( I assume those are count files), 

and the inside of the files (only 4 rows are shown), if that gives you some ideas or I have completely misunderstood, please kindly let me know. 

-rw-r--r--  1 myusername  staff   473467378 Feb 19 21:38 ChIPpeakgeneXcKOCellA
-rw-r--r--  1 myusername  staff   506264216 Feb 19 23:10 ChIPpeakgeneXcKOCellB
-rw-r--r--  1 myusername  staff   269994476 Feb 19 22:42 ChIPpeakgeneXcKOCellC
-rw-r--r--  1 myusername  staff   578858748 Feb 19 23:28 ChIPpeakWTCellA
-rw-r--r--  1 myusername  staff   293150217 Feb 19 23:01 ChIPpeakWTCellB
-rw-r--r--  1 myusername  staff   283239052 Feb 19 23:25 ChIPpeakWTCellC

shunkis-MacBook-Pro:H3K27ac_geneX myusername$ less ChIPpeakgeneXcKOCellA 

        XF:Z:__no_feature
        XF:Z:__no_feature
        XF:Z:__no_feature
        XF:Z:__no_feature
        
        
shunkis-MacBook-Pro:H3K27ac_geneX myusername$ less ChIPpeakgeneXcKOCellB 

        XF:Z:__no_feature
        XF:Z:__no_feature
        XF:Z:__no_feature
        XF:Z:__no_feature

shunkis-MacBook-Pro:H3K27ac_geneX myusername$ less ChIPpeakgeneXcKOCellC

        XF:Z:__no_feature
        XF:Z:__no_feature
        XF:Z:__no_feature
        XF:Z:__no_feature
        
shunkis-MacBook-Pro:H3K27ac_geneX myusername$ less ChIPpeakWTCellA 

        XF:Z:__no_feature
        XF:Z:__no_feature
        XF:Z:__no_feature
        XF:Z:__no_feature

shunkis-MacBook-Pro:H3K27ac_geneX myusername$ less ChIPpeakWTCellB 

        XF:Z:__no_feature
        XF:Z:__no_feature
        XF:Z:__no_feature
        XF:Z:__no_feature

shunkis-MacBook-Pro:H3K27ac_geneX myusername$ less ChIPpeakWTCellC 

        XF:Z:__no_feature
        XF:Z:__no_featuref
        XF:Z:__no_feature
        XF:Z:__no_featuref

 

 

 

ADD REPLY

Login before adding your answer.

Traffic: 724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6