Question: Error when counting reads in genes with summarizeOverlaps (Genomic-Aligments package)
0
4.9 years ago by
United States
alejandro.colaneri20 wrote:

Hello,
I'm following the RNA-seq workflow for differential gene expression

## Use the function summarizeOverlaps to count reads in the gene
library("GenomicAlignments")
se <- summarizeOverlaps(exonsByGene, BamFileList(bamFiles), mode="Union", singleEnd=TRUE, ignore.strand=FALSE, fragments=FALSE);

however I got this error and I have not idea how to fix it:

Error in .summarizeOverlaps_BamFileList(features, reads, mode, ignore.strand = ignore.strand, :

all the steps I did before try to create the object "se" are below

### read the table: sampleTable.csv

### build the full path to the tophat produced bam files

bamFiles <- file.path(".", sampleTable$dirName, sampleTable$fileName);

### see the created vector with paths

bamFiles

##### Use the BamFile function from the RsamTools to se if these paths are functional

library ("Rsamtools");
seqinfo(BamFile(bamFiles[1]));

library("GenomicFeatures");

hse <-makeTranscriptDbFromGFF("/proj/seq/data/TAIR10_Ensembl/Annotation/Genes/genes.gtf", format="gtf")
exonsByGene <- exonsBy(hse, by="gene");

## Use the function summarizeOverlaps to count reads in the gene
library("GenomicAlignments")
se <- summarizeOverlaps(exonsByGene, BamFileList(bamFiles), mode="Union", singleEnd=TRUE, ignore.strand=FALSE, fragments=FALSE);

modified 4.9 years ago by Dan Tenenbaum8.2k • written 4.9 years ago by alejandro.colaneri20

I believe that error is complaining that you have at least two files in your BamFileList with the same name. Is that the case?

Actually when I built the list of path to my files I did not care about that. But the answer is YES, all the bam files in my bam file list have the same name, the original accepted_hits.bam name provided by tophat. Do you think this could be the source of the problem?

I think you can provide distinct names for your bamFiles, e.g,

bamFiles <- file.path(".", sampleTable$dirName, sampleTable$fileNames
names(bamFiles) <- basename(dirname(bamFiles))

Or something more manual and the distinct names will carry forward.