read counts for each gene for edgeR
1
0
Entering edit mode
mictadlo ▴ 10
@mictadlo-10885
Last seen 4.2 years ago

Hi,

I received read counts for each gene for each BAM file which was generated by HTseq-count.

```

augustus_masked-lcl_ScwjSwM_1-processed-gene-0.2        0
augustus_masked-lcl_ScwjSwM_100-processed-gene-0.0      1
augustus_masked-lcl_ScwjSwM_1000-processed-gene-0.1     0
augustus_masked-lcl_ScwjSwM_1000-processed-gene-0.3     0
augustus_masked-lcl_ScwjSwM_1000-processed-gene-1.13    1
augustus_masked-lcl_ScwjSwM_1000-processed-gene-2.0     0

```

Looking at this https://f1000research.com/articles/5-1438/v2 they combined all BAM files and then ran `fc <- featureCounts(all.bam, annot.inbuilt="mm10")`

Is it possible to combine all the counts from each BAM file and create an compatible table for edgeR?

Thank you in advance

 

edger htseqcounts • 3.2k views
ADD COMMENT
3
Entering edit mode
@gordon-smyth
Last seen 10 minutes ago
WEHI, Melbourne, Australia

readDGE() will read the htseq-count output files into R and edgeR.

Just make a vector files containing the names of the htseq-count output files that contain the counts. Then

library(edgeR)
dge <- readDGE(files, header=FALSE)

 

ADD COMMENT
0
Entering edit mode

This script htseq-combine_all.R script describes how to merge HTSeq results.  Is it the best way to do it?

ADD REPLY
0
Entering edit mode

The script seems very complicated. Have you tried the solution I suggested, which is just one line instead of a whole script?

ADD REPLY
0
Entering edit mode

Hi, Sorry for not posting earlier you suggestion with readDGE but I got the same error:

Error in DGEList(data[, -1], group = group, genes = data[, 1, drop = FALSE]) :
  Length of 'group' must equal number of columns in 'counts'

# where are we?
> basedir <- "~/projects/mel/Sporobolus_pyramadalis/"
> setwd(basedir)
>
> cntdir <- paste(basedir, "Htseq_read_counts", sep="/")
> pat <- "_001.sorted.out"
> files <- list.files(path = cntdir,
+                             pattern = pat,
+                             all.files = TRUE,
+                             recursive = FALSE,
+                             ignore.case = FALSE,
+                             full.names = TRUE,
+                             include.dirs = FALSE)
>

> data <- readDGE(files, header=FALSE)
Meta tags detected: __no_feature, __ambiguous, __too_low_aQual, __not_aligned, __alignment_not_unique

> targets <- read.delim("phenoSp.txt", stringsAsFactors=FALSE)

> ## ----group---------------------------------------------------------------
> group <- paste(targets$CellType, targets$Status, sep=".")
> group <- factor(group)
> table(group)
group
  Leaves.1.5     Leaves.2     Leaves.3 Root_tip.1.5   Root_tip.2   Root_tip.3
           4            4            4            4            4            4
>
> library(edgeR)
> y <- DGEList(data[,-1], group=group,
+              genes=data[,1,drop=FALSE])
Error in DGEList(data[, -1], group = group, genes = data[, 1, drop = FALSE]) :
  Length of 'group' must equal number of columns in 'counts'

 

What did I miss?

ADD REPLY
0
Entering edit mode

Well, it's not actually a question of what you've missed, but what you're doing that you don't need to. readDGE() already created a DGEList for you, so you don't need to run DGEList(). Why are you throwing out the first sample by data[,-1]? Naturally that will leave you with one column short.

Just skip that -- you can start doing analysis directly on the DGEList that you got from readDGE().

ADD REPLY
0
Entering edit mode

Thank you. I have tried to follow https://f1000research.com/articles/5-1438/v2 but I will try look for examples how to do it without DGEList.

ADD REPLY
0
Entering edit mode

Thank you for your help it works like advertise:

y <- readDGE(files, header=FALSE, group = group)
keep <- rowSums(cpm(y)>1) >= 2

 

ADD REPLY

Login before adding your answer.

Traffic: 647 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6