Question

Rsubread: change in behavior when accepting mixed paired end and single end files

0

Entering edit mode

barrel0luck • 0

@barrel0luck-23701

Last seen 4.5 years ago

I'd a question regarding a recent change in the Rsubread package. I routinely deal with mixed paired end and single end data.

The Rsubread function featureCounts had a isPairedEnd argument that used to accept a logical vector containing information of which files are paired end or not.

That seems to no longer work and Rsubread only takes the first logical and applies that to all files.

As a result, many of my scripts have stopped working.

Does anyone know if this is a bug or whether this is expected behaviour?

software error Rsubread • 1.9k views

ADD COMMENT • link updated 4.5 years ago by Gordon Smyth 52k • written 4.5 years ago by barrel0luck • 0

0

Entering edit mode

What is the version of Rsubread you are using? Please follow the posting guideline to provide all required information (eg. output of sessionInfo()) when you post a question.

ADD REPLY • link 4.5 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Sorry about that! I've Rsubread 1.34.7 installed.

Session info:

R version 3.6.3 (2020-02-29)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 31 (Workstation Edition)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggrepel_0.8.1     GGally_1.4.0      ggpubr_0.2.5      magrittr_1.5      forcats_0.5.0     stringr_1.4.0     dplyr_0.8.4       purrr_0.3.3      
 [9] readr_1.3.1       tidyr_1.0.2       tibble_2.1.3      ggplot2_3.2.1     tidyverse_1.3.0   WriteXLS_5.0.0    data.table_1.12.8 Rsubread_1.34.7  
[17] edgeR_3.26.8      limma_3.40.6     

loaded via a namespace (and not attached):
 [1] tidyselect_1.0.0   locfit_1.5-9.1     haven_2.2.0        lattice_0.20-38    colorspace_1.4-1   vctrs_0.2.3        generics_0.0.2    
 [8] rlang_0.4.5        pillar_1.4.3       glue_1.3.1         withr_2.1.2        DBI_1.1.0          RColorBrewer_1.1-2 dbplyr_1.4.2      
[15] modelr_0.1.6       readxl_1.3.1       plyr_1.8.5         lifecycle_0.1.0    munsell_0.5.0      ggsignif_0.6.0     gtable_0.3.0      
[22] cellranger_1.1.0   rvest_0.3.5        fansi_0.4.1        broom_0.5.5        Rcpp_1.0.4         backports_1.1.5    scales_1.1.0      
[29] jsonlite_1.6.1     fs_1.3.1           hms_0.5.3          stringi_1.4.6      cowplot_1.0.0      grid_3.6.3         cli_2.0.2         
[36] tools_3.6.3        lazyeval_0.2.2     crayon_1.3.4       pkgconfig_2.0.3    xml2_1.2.2         reprex_0.3.0       lubridate_1.7.4   
[43] reshape_0.8.8      assertthat_0.2.1   httr_1.4.1         rstudioapi_0.11    R6_2.4.1           nlme_3.1-144       compiler_3.6.3

ADD REPLY • link 4.5 years ago barrel0luck • 0

0

Entering edit mode

Thanks for providing the info. Your Rsubread version is not up-to-date and you should update it to the latest version (2.2.2).

ADD REPLY • link 4.5 years ago Wei Shi ★ 3.6k

0

Entering edit mode

So I tried updating all packages using Bioconductor. It doesn't seem to install the latest 2.2.2 version of Rsubread. Sorry for asking this nooby question, but can you tell me how I can install 2.2.2?

ADD REPLY • link 4.5 years ago barrel0luck • 0

0

Entering edit mode

So I think, I'll first need to update Fedora 31 to 32 and then update R from 3.6.3 to 4... and then update Rsubread to 2.2.2. I'm a little hesitant to do that now, as I'm in the middle of submitting some papers and don't want to break the system. Can you tell me whether isPairedEnd accepts logical vector input in Rsubread 2.2.2? If yes, then, I'll trust you and go ahead and submit without worrying. My script did work a year ago atleast. So I'm confident it works. And I've not changed that part of my script that deals with Rsubread.

ADD REPLY • link 4.5 years ago barrel0luck • 0

0

Entering edit mode

You can always make virtual environments with tools like conda or virtualenv and then install the required tools inside this new environment, leaving your base system untouched.

ADD REPLY • link 4.5 years ago ATpoint ★ 4.6k

score 2 · Answer 1 · 2020-06-19

The isPairedEnd parameter is expecting just a TRUE or FALSE value. When you provide a vector, it will only use the first element of the vector for counting. It never accepts a vector as input value.

The purpose of this parameter is to indicate if reads or read pairs should be counted, as described in the help page:

"logical indicating if counting should be performed on read pairs or reads. FALSE by default. If TRUE, read pairs will be counted instead of individual reads."

score 2 · Answer 2 · 2020-06-19

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 13 minutes ago

WEHI, Melbourne, Australia

featureCounts automatically detects which BAM files contain paired-end reads and which contain single-end, so there is no need for you to indicate that.

For most purposes, you can set isPairedEnd = TRUE all the time and everything will behave as you expect regardless of whether your data is single or paired end. The option isPairedEnd = FALSE causes a read pair to be counted as if it was two single-end reads. In most cases you won't want to do that. Setting isPairedEnd = TRUE has no effect on the single-end files, they will still be counted correctly.

The name of the argument is admitedly confusing. In retrospect, a better name might have been count.by.pairs = TRUE.

There has been no recent change to featureCounts behavior in this regard.

ADD COMMENT • link 4.5 years ago Gordon Smyth 52k

0

Entering edit mode

Thanks.

Sorry for taking long to respond. I tried updating to Fedora 32 and as usual I'd issues with R. Latest version of R is not available and for some reason Rsubread doesn't install on R 3.6.3 on Fedora 32, so I'm out of options on trying to test this out. I've never really managed to use conda, especially cause I use Rstudio a lot. I'm just going to have to wait for Fedora devs to push R 4.0 out.

Bioconductor packages can be quite problematic as they stop supporting previous versions of R as soon as the new version is out. I really hope someday devs realize this can be a major issue.

I'm going to assume featureCounts can autodetect paired end files from a mix of files (I've not been able to try it out right now).

I swear, a year ago, when my script was working, it was taking a vector argument for isPairedEnd. All my scripts are based on that. Everything was working just perfectly. I don't remember the versions of the packages I'd then.

And just so I don't come out as ungrateful, I want to thank you and Wei Shi for the Rsubread package. Despite some people saying its not fast (I've come across in my searches due to this issue), I think its really really really fast, and also the easiest package to use out there.

ADD REPLY • link 4.5 years ago barrel0luck • 0

1

Entering edit mode

Yes you can provide a vector to isPairedEnd, however featureCounts will only use the first element of your vector for counting.

And yes featureCounts does automatically detect paired end files from a mix of files.

ADD REPLY • link 4.5 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Perfect, thanks! I think it will be better to throw out a warning if a vector is used instead.

ADD REPLY • link 4.5 years ago barrel0luck • 0