Question: read counts for each gene for edgeR
0
gravatar for mictadlo
2.3 years ago by
mictadlo0
mictadlo0 wrote:

Hi,

I received read counts for each gene for each BAM file which was generated by HTseq-count.

```

augustus_masked-lcl_ScwjSwM_1-processed-gene-0.2        0
augustus_masked-lcl_ScwjSwM_100-processed-gene-0.0      1
augustus_masked-lcl_ScwjSwM_1000-processed-gene-0.1     0
augustus_masked-lcl_ScwjSwM_1000-processed-gene-0.3     0
augustus_masked-lcl_ScwjSwM_1000-processed-gene-1.13    1
augustus_masked-lcl_ScwjSwM_1000-processed-gene-2.0     0

```

Looking at this https://f1000research.com/articles/5-1438/v2 they combined all BAM files and then ran `fc <- featureCounts(all.bam, annot.inbuilt="mm10")`

Is it possible to combine all the counts from each BAM file and create an compatible table for edgeR?

Thank you in advance

 

edger htseqcounts • 914 views
ADD COMMENTlink modified 2.3 years ago by Gordon Smyth39k • written 2.3 years ago by mictadlo0
Answer: read counts for each gene for edgeR
3
gravatar for Gordon Smyth
2.3 years ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

readDGE() will read the htseq-count output files into R and edgeR.

Just make a vector files containing the names of the htseq-count output files that contain the counts. Then

library(edgeR)
dge <- readDGE(files, header=FALSE)

 

ADD COMMENTlink modified 2.2 years ago • written 2.3 years ago by Gordon Smyth39k

This script htseq-combine_all.R script describes how to merge HTSeq results.  Is it the best way to do it?

ADD REPLYlink written 2.3 years ago by mictadlo0

The script seems very complicated. Have you tried the solution I suggested, which is just one line instead of a whole script?

ADD REPLYlink written 2.2 years ago by Gordon Smyth39k

Hi, Sorry for not posting earlier you suggestion with readDGE but I got the same error:

Error in DGEList(data[, -1], group = group, genes = data[, 1, drop = FALSE]) :
  Length of 'group' must equal number of columns in 'counts'

# where are we?
> basedir <- "~/projects/mel/Sporobolus_pyramadalis/"
> setwd(basedir)
>
> cntdir <- paste(basedir, "Htseq_read_counts", sep="/")
> pat <- "_001.sorted.out"
> files <- list.files(path = cntdir,
+                             pattern = pat,
+                             all.files = TRUE,
+                             recursive = FALSE,
+                             ignore.case = FALSE,
+                             full.names = TRUE,
+                             include.dirs = FALSE)
>

> data <- readDGE(files, header=FALSE)
Meta tags detected: __no_feature, __ambiguous, __too_low_aQual, __not_aligned, __alignment_not_unique

> targets <- read.delim("phenoSp.txt", stringsAsFactors=FALSE)

> ## ----group---------------------------------------------------------------
> group <- paste(targets$CellType, targets$Status, sep=".")
> group <- factor(group)
> table(group)
group
  Leaves.1.5     Leaves.2     Leaves.3 Root_tip.1.5   Root_tip.2   Root_tip.3
           4            4            4            4            4            4
>
> library(edgeR)
> y <- DGEList(data[,-1], group=group,
+              genes=data[,1,drop=FALSE])
Error in DGEList(data[, -1], group = group, genes = data[, 1, drop = FALSE]) :
  Length of 'group' must equal number of columns in 'counts'

 

What did I miss?

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by mictadlo0

Well, it's not actually a question of what you've missed, but what you're doing that you don't need to. readDGE() already created a DGEList for you, so you don't need to run DGEList(). Why are you throwing out the first sample by data[,-1]? Naturally that will leave you with one column short.

Just skip that -- you can start doing analysis directly on the DGEList that you got from readDGE().

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Gordon Smyth39k

Thank you. I have tried to follow https://f1000research.com/articles/5-1438/v2 but I will try look for examples how to do it without DGEList.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by mictadlo0

Thank you for your help it works like advertise:

y <- readDGE(files, header=FALSE, group = group)
keep <- rowSums(cpm(y)>1) >= 2

 

ADD REPLYlink written 2.2 years ago by mictadlo0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 426 users visited in the last hour