Question

Feature request for Rsubread::featureCounts: read length adjustment

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 8 months ago

Scripps Research, La Jolla, CA

Hello, I would like to request a simple feature for Rsubread's featureCounts function that would make it more useful for ChIP-Seq applications. I want to use featureCounts to count the number of reads falling in each of my called peaks. However, each read represents a DNA fragment of a specific length, which can be estimated by cross-strand correlation analysis or known a priori. In my case, it is the length of one nucleosome, i.e. 147 bp. So I would like to treat each read as being 147 bp long for the purpose of computing overlaps, since the number of bp sequenced is not representative of the fragment length. Would it be possible to add a parameter to featureCounts to allow this adjustment? Also, an additional feature that would be nice to have, but is less important, would be the ability to require that a certain percentage of a read overlaps a feature before counting it. Thanks for listening, -Ryan Thompson

• 1.3k views

ADD COMMENT • link updated 8.4 years ago by Gordon Smyth 50k • written 10.1 years ago by Ryan C. Thompson ★ 7.9k

Gordon Smyth · Answer 1 · 2014-04-08

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 22 days ago

Australia/Melbourne/Olivia Newton-John …

Dear Ryan,

I'm not entirely sure what you are trying to do. But would extending the genomic regions you use in your summarization achieve the same effect?

For your second request, maybe you can do a filtering after you get the read counts, which is pretty straightforward to do?

Best wishes,

Wei

ADD COMMENT • link updated 8.4 years ago by Gordon Smyth 50k • written 10.1 years ago by Wei Shi ★ 3.6k

0

Entering edit mode

Hi Wei, > I'm not entirely sure what you are trying to do. But would extending the genomic regions you use in your summarization achieve the same effect? No, that would effectively extend both ends of each read symmetrically. I want to keep the 5-prime position of the read the same, but change the length. So if the effective fragment length was set to 150, then a 100-bp read mapped in the forward direction at position 500 would overlap a peak that starts at 625, but it would not overlap a peak that ends at 475. > For your second request, maybe you can do a filtering after you get the read counts, which is pretty straightforward to do? I think you've misunderstood what I'm asking here. It's kind of hard to explain in words. I mean that currently, if there is even 1 bp of overlap between a read and a feature, featureCounts will count it. I'm saying that it would be nice to be able to be more stringent by requiring more than 1 bp of overlap. E.g. require 50 bp of overlap for a 100bp read to count it, or even count only reads that fall completely within a feature (i.e. 100% overlap). Now that I think about it, I could implement the first request and part of the second one if I could provide the reads in e.g. a GRanges object or a text file that just has columns for chromosome, start, end, and strand (or a bed file, etc.). Then I could pre-process my reads to adjust the fragment lengths however I want. However, the featureCounts help indicates that bam (or sam) is the only acceptable input format. Is this correct, or is there another way to provide the input reads? -Ryan > On Apr 8, 2014, at 11:19 AM, Ryan C. Thompson wrote: > >> Hello, >> >> I would like to request a simple feature for Rsubread's featureCounts function that would make it more useful for ChIP-Seq applications. I want to use featureCounts to count the number of reads falling in each of my called peaks. However, each read represents a DNA fragment of a specific length, which can be estimated by cross- strand correlation analysis or known a priori. In my case, it is the length of one nucleosome, i.e. 147 bp. So I would like to treat each read as being 147 bp long for the purpose of computing overlaps, since the number of bp sequenced is not representative of the fragment length. Would it be possible to add a parameter to featureCounts to allow this adjustment? Also, an additional feature that would be nice to have, but is less important, would be the ability to require that a certain percentage of a read overlaps a feature before counting it. >> >> Thanks for listening, >> >> -Ryan Thompson > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:4}}

ADD REPLY • link 10.1 years ago Ryan C. Thompson ★ 7.9k

score 0 · Answer 2 · 2015-12-10

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 9 hours ago

WEHI, Melbourne, Australia

I think that the csaw package now does your first request, i.e., extending reads to putative fragments before counting overlaps.

ADD COMMENT • link 8.4 years ago Gordon Smyth 50k