Search
Question: Using summarizeOverlaps for bacterial genomes
0
gravatar for as9309
21 months ago by
as93090
as93090 wrote:

Hi,

I have a set of bam files for my samples and I just want to align them to the e coli reference genome to get non-normalised unique reads for each gene. I am trying to do this using summarizeOverlaps (as I'm using R and I don't have a mac for featurecounts) and I have looked at the example on bioconductor as follows:

(ebg <- exonsBy(txdb, by="gene"))
se <- summarizeOverlaps(features=ebg, reads=bamfiles,
                        mode="Union",
                        singleEnd=FALSE,
                        ignore.strand=TRUE,
                        fragments=TRUE )

I don't really understand how my code will differ for the E coli genome compared to the example; what if I don't have introns of exons and just want to align to each gene? What do I write/create for "features="?

I assume I keep the "Union" mode as shown in Fig 1 of the vignette? Are there other optional modes?

As my data is single ended, I assume I also put true and do not put anything for ignore.strand or fragments if singleEnd=TRUE?

So far my code is:

(ebg <- exonsBy(txdb, by="gene"))
se <- summarizeOverlaps(features=ebg, reads=bamfiles,
                        mode="Union",
                        singleEnd=FALSE)

Is this correct or do I need to change ebg? 

I've searched through the forum and still can't understand, sorry.

Thanks for your help.

 
ADD COMMENTlink modified 21 months ago by Valerie Obenchain ♦♦ 6.4k • written 21 months ago by as93090
0
gravatar for Valerie Obenchain
21 months ago by
Valerie Obenchain ♦♦ 6.4k
United States
Valerie Obenchain ♦♦ 6.4k wrote:

Hi,

The GenomicFeatures package has many extractors for isolating regions of interest, e.g., transcriptsBy(), genes(), etc. See the man pages ?genes and ?exonsBy. The genes(txdb) extractor returns the full range of the gene. Using this as 'features' will count reads that fall in any portion of the gene range (i.e., they aren't restricted to introns, exons, etc.). To count at the gene-level but enforce that the reads fall in the transcripts use transcriptsBy(txdb, "gene"). Similarly, exonsBy(txdb, "gene") counts at the gene-level but enforces that the reads fall in the exons, and so on.

The different counting modes for summarizeOverlaps() are all described on the man page ?summarizeOverlaps and in the 'Counting Reads with summarizeOverlaps' vignette, see browseVignettes("GenomicAlignments"). You can also provide your own count function as the 'mode' argument, this is shown in the examples.

If your data are single end, yes, you want singleEnd=TRUE. How you set ignore.strand depends on your data (how processed, what type) and what type of analysis you are planning to do. 

Valerie

ADD COMMENTlink written 21 months ago by Valerie Obenchain ♦♦ 6.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 184 users visited in the last hour