Hi,
I received read counts for each gene for each BAM file which was generated by HTseq-count.
```
augustus_masked-lcl_ScwjSwM_1-processed-gene-0.2 0
augustus_masked-lcl_ScwjSwM_100-processed-gene-0.0 1
augustus_masked-lcl_ScwjSwM_1000-processed-gene-0.1 0
augustus_masked-lcl_ScwjSwM_1000-processed-gene-0.3 0
augustus_masked-lcl_ScwjSwM_1000-processed-gene-1.13 1
augustus_masked-lcl_ScwjSwM_1000-processed-gene-2.0 0
```
Looking at this https://f1000research.com/articles/5-1438/v2 they combined all BAM files and then ran `fc <- featureCounts(all.bam, annot.inbuilt="mm10")`
Is it possible to combine all the counts from each BAM file and create an compatible table for edgeR?
Thank you in advance
This script htseq-combine_all.R script describes how to merge HTSeq results. Is it the best way to do it?
The script seems very complicated. Have you tried the solution I suggested, which is just one line instead of a whole script?
Hi, Sorry for not posting earlier you suggestion with readDGE but I got the same error:
Error in DGEList(data[, -1], group = group, genes = data[, 1, drop = FALSE]) :
Length of 'group' must equal number of columns in 'counts'
# where are we?
> basedir <- "~/projects/mel/Sporobolus_pyramadalis/"
> setwd(basedir)
>
> cntdir <- paste(basedir, "Htseq_read_counts", sep="/")
> pat <- "_001.sorted.out"
> files <- list.files(path = cntdir,
+ pattern = pat,
+ all.files = TRUE,
+ recursive = FALSE,
+ ignore.case = FALSE,
+ full.names = TRUE,
+ include.dirs = FALSE)
>
> data <- readDGE(files, header=FALSE)
Meta tags detected: __no_feature, __ambiguous, __too_low_aQual, __not_aligned, __alignment_not_unique
> targets <- read.delim("phenoSp.txt", stringsAsFactors=FALSE)
> ## ----group---------------------------------------------------------------
> group <- paste(targets$CellType, targets$Status, sep=".")
> group <- factor(group)
> table(group)
group
Leaves.1.5 Leaves.2 Leaves.3 Root_tip.1.5 Root_tip.2 Root_tip.3
4 4 4 4 4 4
>
> library(edgeR)
> y <- DGEList(data[,-1], group=group,
+ genes=data[,1,drop=FALSE])
Error in DGEList(data[, -1], group = group, genes = data[, 1, drop = FALSE]) :
Length of 'group' must equal number of columns in 'counts'
What did I miss?
Well, it's not actually a question of what you've missed, but what you're doing that you don't need to. readDGE() already created a DGEList for you, so you don't need to run DGEList(). Why are you throwing out the first sample by data[,-1]? Naturally that will leave you with one column short.
Just skip that -- you can start doing analysis directly on the DGEList that you got from readDGE().
Thank you. I have tried to follow https://f1000research.com/articles/5-1438/v2 but I will try look for examples how to do it without DGEList.
Thank you for your help it works like advertise:
y <- readDGE(files, header=FALSE, group = group)
keep <- rowSums(cpm(y)>1) >= 2