Rsubread : featureCounts PEReadsReordering=TRUE deletes unpaired reads?

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 8 months ago

Scripps Research, La Jolla, CA

Hello, I am using the featureCounts function in Rsubread to count RNA-seq reads. Since my BAM files are coordinate-sorted paired-end, I am using "isPairedEnd=TRUE" and "PEReadsReordering=TRUE". When I run my script, I see the following in the output: Process /gpfs/home/rcthomps/Projects/cyno/tophat_results/R77_CN7314_P4 ... Resort the input file ... 2718714 unpaired reads were ignored in reordering. Total number of fragments is : 1653582 Number of successfully assigned fragments is : 1529491 (92.5%) Running time : 3.39 minutes In particular, I am worried about the line "2718714 unpaired reads were ignored in reordering." The documentation for PEReadsReordering doesn't say anything about discarding unpaired reads, so I'm wondering whether this is intended behavior, and whether there is a way around this? If I sort my bam files by name so that pairs are adjacent in the file, will subread still discard the unpaired ones, or will it count all of them? -Ryan Thompson

Rsubread Rsubread • 2.5k views

ADD COMMENT • link updated 10.5 years ago by Wei Shi ★ 3.6k • written 10.5 years ago by Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 18 days ago

Australia/Melbourne/Olivia Newton-John …

Dear Ryan, featureCounts requires that both reads from the same pair are present in the BAM/SAM file. But this doesn't seem to be the case for your data. It looks like unmapped reads were not included in your BAM files and therefore you got quite a few unpaired reads in your data. When counting in paired-end mode (isPairedEnd=TRUE) and read reordering being required (PEReadsReordering=TREU), featureCounts tries to reorder reads from the same pair to make them be adjacent to each other before performing read counting. If unpaired reads are encountered, featureCounts simply discards them. Simply sorting your bam files by name is not going to solve this problem because many reads are unpaired in your data. You may use an aligner that outputs all the reads in it mapping output (such as the align function in Rsubread) and then runs featureCounts on it. In the meantime, we are now trying to improve featureCounts to deal with this simulation. Will let you know when this is done. Also note that this problem only occurs for paired-end data, but not for single-end read data. Best regards, Wei On Nov 9, 2013, at 10:54 AM, Ryan C. Thompson wrote: > Hello, > > I am using the featureCounts function in Rsubread to count RNA-seq reads. Since my BAM files are coordinate-sorted paired-end, I am using "isPairedEnd=TRUE" and "PEReadsReordering=TRUE". When I run my script, I see the following in the output: > > Process /gpfs/home/rcthomps/Projects/cyno/tophat_results/R77_CN7314_P4 ... > Resort the input file ... > 2718714 unpaired reads were ignored in reordering. > Total number of fragments is : 1653582 > Number of successfully assigned fragments is : 1529491 (92.5%) > Running time : 3.39 minutes > > In particular, I am worried about the line "2718714 unpaired reads were ignored in reordering." The documentation for PEReadsReordering doesn't say anything about discarding unpaired reads, so I'm wondering whether this is intended behavior, and whether there is a way around this? If I sort my bam files by name so that pairs are adjacent in the file, will subread still discard the unpaired ones, or will it count all of them? > > -Ryan Thompson > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD COMMENT • link 10.5 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Ah, I see now. I am using tophat, which outputs the unmapped reads to a separate bam file. I guess I should merge the two bam files and then featureCounts should work as I expect? On Sun Nov 10 16:59:32 2013, Wei Shi wrote: > Dear Ryan, > > featureCounts requires that both reads from the same pair are present in the BAM/SAM file. But this doesn't seem to be the case for your data. It looks like unmapped reads were not included in your BAM files and therefore you got quite a few unpaired reads in your data. When counting in paired-end mode (isPairedEnd=TRUE) and read reordering being required (PEReadsReordering=TREU), featureCounts tries to reorder reads from the same pair to make them be adjacent to each other before performing read counting. If unpaired reads are encountered, featureCounts simply discards them. > > Simply sorting your bam files by name is not going to solve this problem because many reads are unpaired in your data. You may use an aligner that outputs all the reads in it mapping output (such as the align function in Rsubread) and then runs featureCounts on it. > > In the meantime, we are now trying to improve featureCounts to deal with this simulation. Will let you know when this is done. > > Also note that this problem only occurs for paired-end data, but not for single-end read data. > > Best regards, > Wei > > On Nov 9, 2013, at 10:54 AM, Ryan C. Thompson wrote: > >> Hello, >> >> I am using the featureCounts function in Rsubread to count RNA-seq reads. Since my BAM files are coordinate-sorted paired-end, I am using "isPairedEnd=TRUE" and "PEReadsReordering=TRUE". When I run my script, I see the following in the output: >> >> Process /gpfs/home/rcthomps/Projects/cyno/tophat_results/R77_CN7314_P4 ... >> Resort the input file ... >> 2718714 unpaired reads were ignored in reordering. >> Total number of fragments is : 1653582 >> Number of successfully assigned fragments is : 1529491 (92.5%) >> Running time : 3.39 minutes >> >> In particular, I am worried about the line "2718714 unpaired reads were ignored in reordering." The documentation for PEReadsReordering doesn't say anything about discarding unpaired reads, so I'm wondering whether this is intended behavior, and whether there is a way around this? If I sort my bam files by name so that pairs are adjacent in the file, will subread still discard the unpaired ones, or will it count all of them? >> >> -Ryan Thompson >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:4}}

ADD REPLY • link 10.5 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Yes, that should work. But the changes we plan to make to featureCounts do not need this separate bam file including unmapped reads, so it should be more efficient. Wei On Nov 11, 2013, at 12:23 PM, Ryan wrote: > Ah, I see now. I am using tophat, which outputs the unmapped reads to a separate bam file. I guess I should merge the two bam files and then featureCounts should work as I expect? > > On Sun Nov 10 16:59:32 2013, Wei Shi wrote: >> Dear Ryan, >> >> featureCounts requires that both reads from the same pair are present in the BAM/SAM file. But this doesn't seem to be the case for your data. It looks like unmapped reads were not included in your BAM files and therefore you got quite a few unpaired reads in your data. When counting in paired-end mode (isPairedEnd=TRUE) and read reordering being required (PEReadsReordering=TREU), featureCounts tries to reorder reads from the same pair to make them be adjacent to each other before performing read counting. If unpaired reads are encountered, featureCounts simply discards them. >> >> Simply sorting your bam files by name is not going to solve this problem because many reads are unpaired in your data. You may use an aligner that outputs all the reads in it mapping output (such as the align function in Rsubread) and then runs featureCounts on it. >> >> In the meantime, we are now trying to improve featureCounts to deal with this simulation. Will let you know when this is done. >> >> Also note that this problem only occurs for paired-end data, but not for single-end read data. >> >> Best regards, >> Wei >> >> On Nov 9, 2013, at 10:54 AM, Ryan C. Thompson wrote: >> >>> Hello, >>> >>> I am using the featureCounts function in Rsubread to count RNA-seq reads. Since my BAM files are coordinate-sorted paired-end, I am using "isPairedEnd=TRUE" and "PEReadsReordering=TRUE". When I run my script, I see the following in the output: >>> >>> Process /gpfs/home/rcthomps/Projects/cyno/tophat_results/R77_CN7314_P4 ... >>> Resort the input file ... >>> 2718714 unpaired reads were ignored in reordering. >>> Total number of fragments is : 1653582 >>> Number of successfully assigned fragments is : 1529491 (92.5%) >>> Running time : 3.39 minutes >>> >>> In particular, I am worried about the line "2718714 unpaired reads were ignored in reordering." The documentation for PEReadsReordering doesn't say anything about discarding unpaired reads, so I'm wondering whether this is intended behavior, and whether there is a way around this? If I sort my bam files by name so that pairs are adjacent in the file, will subread still discard the unpaired ones, or will it count all of them? >>> >>> -Ryan Thompson >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the addressee. >> You must not disclose, forward, print or use it without the permission of the sender. >> ______________________________________________________________________ ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 10.5 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Hi Ryan, We have made changes to featureCounts to let it deal with those read pairs that having missing mates. featureCounts now also checks if reads from the same pair are adjacent to each other and if not it will automatically re-order the reads to make them ready for read counting. So you do not need to specify the PEReadsReordering option any more (it is now removed from the function) and you do not need to use the separate bam file which includes unmapped reads either. Changes have been committed to bioc devel and they should be available in a couple of days (Rsubread v1.13.6) Wei On Nov 11, 2013, at 12:23 PM, Ryan wrote: > Ah, I see now. I am using tophat, which outputs the unmapped reads to a separate bam file. I guess I should merge the two bam files and then featureCounts should work as I expect? > > On Sun Nov 10 16:59:32 2013, Wei Shi wrote: >> Dear Ryan, >> >> featureCounts requires that both reads from the same pair are present in the BAM/SAM file. But this doesn't seem to be the case for your data. It looks like unmapped reads were not included in your BAM files and therefore you got quite a few unpaired reads in your data. When counting in paired-end mode (isPairedEnd=TRUE) and read reordering being required (PEReadsReordering=TREU), featureCounts tries to reorder reads from the same pair to make them be adjacent to each other before performing read counting. If unpaired reads are encountered, featureCounts simply discards them. >> >> Simply sorting your bam files by name is not going to solve this problem because many reads are unpaired in your data. You may use an aligner that outputs all the reads in it mapping output (such as the align function in Rsubread) and then runs featureCounts on it. >> >> In the meantime, we are now trying to improve featureCounts to deal with this simulation. Will let you know when this is done. >> >> Also note that this problem only occurs for paired-end data, but not for single-end read data. >> >> Best regards, >> Wei >> >> On Nov 9, 2013, at 10:54 AM, Ryan C. Thompson wrote: >> >>> Hello, >>> >>> I am using the featureCounts function in Rsubread to count RNA-seq reads. Since my BAM files are coordinate-sorted paired-end, I am using "isPairedEnd=TRUE" and "PEReadsReordering=TRUE". When I run my script, I see the following in the output: >>> >>> Process /gpfs/home/rcthomps/Projects/cyno/tophat_results/R77_CN7314_P4 ... >>> Resort the input file ... >>> 2718714 unpaired reads were ignored in reordering. >>> Total number of fragments is : 1653582 >>> Number of successfully assigned fragments is : 1529491 (92.5%) >>> Running time : 3.39 minutes >>> >>> In particular, I am worried about the line "2718714 unpaired reads were ignored in reordering." The documentation for PEReadsReordering doesn't say anything about discarding unpaired reads, so I'm wondering whether this is intended behavior, and whether there is a way around this? If I sort my bam files by name so that pairs are adjacent in the file, will subread still discard the unpaired ones, or will it count all of them? >>> >>> -Ryan Thompson >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the addressee. >> You must not disclose, forward, print or use it without the permission of the sender. >> ______________________________________________________________________ ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 10.5 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Thanks, that's great news! On Nov 14, 2013 9:29 PM, "Wei Shi" <shi@wehi.edu.au> wrote: > Hi Ryan, > > We have made changes to featureCounts to let it deal with those read pairs > that having missing mates. featureCounts now also checks if reads from the > same pair are adjacent to each other and if not it will automatically > re-order the reads to make them ready for read counting. So you do not need > to specify the PEReadsReordering option any more (it is now removed from > the function) and you do not need to use the separate bam file which > includes unmapped reads either. > > Changes have been committed to bioc devel and they should be available in > a couple of days (Rsubread v1.13.6) > > Wei > > On Nov 11, 2013, at 12:23 PM, Ryan wrote: > > > Ah, I see now. I am using tophat, which outputs the unmapped reads to a > separate bam file. I guess I should merge the two bam files and then > featureCounts should work as I expect? > > > > On Sun Nov 10 16:59:32 2013, Wei Shi wrote: > >> Dear Ryan, > >> > >> featureCounts requires that both reads from the same pair are present > in the BAM/SAM file. But this doesn't seem to be the case for your data. It > looks like unmapped reads were not included in your BAM files and therefore > you got quite a few unpaired reads in your data. When counting in > paired-end mode (isPairedEnd=TRUE) and read reordering being required > (PEReadsReordering=TREU), featureCounts tries to reorder reads from the > same pair to make them be adjacent to each other before performing read > counting. If unpaired reads are encountered, featureCounts simply discards > them. > >> > >> Simply sorting your bam files by name is not going to solve this > problem because many reads are unpaired in your data. You may use an > aligner that outputs all the reads in it mapping output (such as the align > function in Rsubread) and then runs featureCounts on it. > >> > >> In the meantime, we are now trying to improve featureCounts to deal > with this simulation. Will let you know when this is done. > >> > >> Also note that this problem only occurs for paired-end data, but not > for single-end read data. > >> > >> Best regards, > >> Wei > >> > >> On Nov 9, 2013, at 10:54 AM, Ryan C. Thompson wrote: > >> > >>> Hello, > >>> > >>> I am using the featureCounts function in Rsubread to count RNA- seq > reads. Since my BAM files are coordinate-sorted paired-end, I am using > "isPairedEnd=TRUE" and "PEReadsReordering=TRUE". When I run my script, I > see the following in the output: > >>> > >>> Process /gpfs/home/rcthomps/Projects/cyno/tophat_results/R77_CN7314_P4 > ... > >>> Resort the input file ... > >>> 2718714 unpaired reads were ignored in reordering. > >>> Total number of fragments is : 1653582 > >>> Number of successfully assigned fragments is : 1529491 (92.5%) > >>> Running time : 3.39 minutes > >>> > >>> In particular, I am worried about the line "2718714 unpaired reads > were ignored in reordering." The documentation for PEReadsReordering > doesn't say anything about discarding unpaired reads, so I'm wondering > whether this is intended behavior, and whether there is a way around this? > If I sort my bam files by name so that pairs are adjacent in the file, will > subread still discard the unpaired ones, or will it count all of them? > >>> > >>> -Ryan Thompson > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor@r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> ______________________________________________________________________ > >> The information in this email is confidential and intended solely for > the addressee. > >> You must not disclose, forward, print or use it without the permission > of the sender. > >> ______________________________________________________________________ > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}

ADD REPLY • link 10.5 years ago Ryan C. Thompson ★ 7.9k

Login before adding your answer.