Hi, I was trying to import the count result from HTSeq Count into DESeq2.
# inside of "HTSeq_Count_ChIPpeak_H3K27ac_geneX.csv" "Samplename", "countfile", "genotype", "compartment" "H3K27ac_WTCellB","ChIPpeakWTCellB","WT","CellB" "H3K27ac_geneXcKOCellA","ChIPpeakgeneXcKOCellA","geneXcKO","CellA" "H3K27ac_geneXcKOCellC","ChIPpeakgeneXcKOCellC","geneXcKO","CellC" "H3K27ac_WTCellC","ChIPpeakWTCellC","WT","CellC" "H3K27ac_geneXcKOCellB","ChIPpeakgeneXcKOCellB","geneXcKO","CellB" "H3K27ac_WTCellA","ChIPpeakWTCellA","WT","CellA" > sampleTable <- read.csv(file="HTSeq_Count_ChIPpeak_H3K27ac_geneX.csv") > ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, + directory = directory, + design= ~ genetype + compartment)
met error:
Error in DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, : Gene IDs (first column) differ between files. In addition: Warning messages: 1: In is.na(e1) | is.na(e2) : longer object length is not a multiple of shorter object length .... .... 10: In `==.default`(a$V1, l[[1]]$V1) : longer object length is not a multiple of shorter object length
"Gene IDs (first column) differ between files."
What is "Gene IDs" indicated here? I have only gff file without "gene_id"
Any suggestion ?
Update:
I checked the example :
recreated sample like:
> sampleTable Samplename countfile genotype compartment 1 ChIPpeakWTCellB ChIPpeakWTCellB WT CellB 2 ChIPpeakgeneXcKOCellA ChIPpeakgeneXcKOCellA geneXcKO CellA 3 ChIPpeakgeneXcKOCellC ChIPpeakgeneXcKOCellC geneXcKO CellC 4 ChIPpeakWTCellC ChIPpeakWTCellC WT CellC 5 ChIPpeakgeneXcKOCellB ChIPpeakgeneXcKOCellB geneXcKO CellB 6 ChIPpeakWTCellA ChIPpeakWTCellA WT CellA
but re run the process, did not make any difference.
I also run the example with the package "papilla" with no problem. So I guess R or R studio is OK...
PSS: the count file was generated by HTSeq Count:
gt bed_to_gff3 -o ChIPpeak.gff ChIPpeak.bed # convert ChIPpeak.bed file into gff by genometools http://genometools.org/tools.html # count ChIP reads fall into genomic regions defined by ChIPpeak.gff python -m HTSeq.scripts.count -f bam -s no -t BED_feature -i Name -o ChIPpeakGeneXcKOCellA H3K27ac_GeneXcKOCellA.bam ChIPpeak.gff
here is R information:
> sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X El Capitan 10.11.4 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] DESeq2_1.12.4 RColorBrewer_1.1-2 DiffBind_2.0.9 [4] SummarizedExperiment_1.2.3 Biobase_2.32.0 GenomicRanges_1.24.3 [7] GenomeInfoDb_1.8.7 IRanges_2.6.1 S4Vectors_0.10.3 [10] BiocGenerics_0.18.0 loaded via a namespace (and not attached): [1] Category_2.38.0 bitops_1.0-6 tools_3.3.2 [4] backports_1.0.5 R6_2.2.0 rpart_4.1-10 [7] KernSmooth_2.23-15 Hmisc_4.0-2 DBI_0.5-1 [10] lazyeval_0.2.0 colorspace_1.3-2 nnet_7.3-12 [13] gridExtra_2.2.1 sendmailR_1.2-1 graph_1.50.0 [16] htmlTable_1.9 rtracklayer_1.32.2 caTools_1.17.1 [19] scales_0.4.1 checkmate_1.8.2 BatchJobs_1.6 [22] genefilter_1.54.2 RBGL_1.48.1 stringr_1.1.0 [25] digest_0.6.12 Rsamtools_1.24.0 foreign_0.8-67 [28] AnnotationForge_1.14.2 XVector_0.12.1 base64enc_0.1-3 [31] htmltools_0.3.5 limma_3.28.21 htmlwidgets_0.8 [34] RSQLite_1.1-2 BBmisc_1.10 GOstats_2.38.1 [37] hwriter_1.3.2 BiocParallel_1.6.6 gtools_3.5.0 [40] acepack_1.4.1 dplyr_0.5.0 RCurl_1.95-4.8 [43] magrittr_1.5 GO.db_3.3.0 Formula_1.2-1 [46] Matrix_1.2-8 Rcpp_0.12.9 munsell_0.4.3 [49] stringi_1.1.2 edgeR_3.14.0 zlibbioc_1.18.0 [52] gplots_3.0.1 fail_1.3 plyr_1.8.4 [55] grid_3.3.2 gdata_2.17.0 lattice_0.20-34 [58] Biostrings_2.40.2 splines_3.3.2 GenomicFeatures_1.24.5 [61] annotate_1.50.1 locfit_1.5-9.1 knitr_1.15.1 [64] rjson_0.2.15 systemPipeR_1.6.4 geneplotter_1.50.0
It is because of some lines at the end of your count files. Open your count file and scroll down to the bottom. You will see some lines that are not counts. Delete them and save. Reimport your counts to the R and run the code. Now it should be working.