featureCounts simple (borderline stupid...) question
3
0
Entering edit mode
@sylvain-foisy-5539
Last seen 3.6 years ago

Hi,

What would be the simple way of telling featureCounts to read a whole folder filled with BAM files? I am hoping not to have to write a few hundred lines of text in my command... And yes,  I am not a R pro :-(

Best regards

S

featurecounts • 5.7k views
2
Entering edit mode
@james-w-macdonald-5106
Last seen 4 hours ago
United States

The help page for featureCounts says this, for the first argument:

files

a character vector giving names of input files containing read mapping results. The files can be in either SAM format or BAM format. The file format is automatically detected by the function.

So you just need a character vector of all the bam files in your directory, which you can get using

fls <- dir(".", "bam$") ADD COMMENT 1 Entering edit mode @sylvain-foisy-5539 Last seen 3.6 years ago Canada Hi James, Ok, take 2 of the story. My BAM files are symlinked to aliases in a folder. If I specified a single of these aliases in files="", featureCounts will work. If I use your solution to create a vector of these aliases, I get something like this for all files: || Process Unknown file 169377_LYMPHO_B_accepted_hits_properly_paired.bam... || || Failed to open file 169377_LYMPHO_B_accepted_hits_properly_paired.bam || || No counts were generated for this file. || Any idea? It seems that featureCounts (or the fls object created above) does not resolves the aliases toward the original files... Thanks in advance S ADD COMMENT 0 Entering edit mode I don't see that at all: linux$ ln -s data/batch1/Sample_7749.bam first.bam
linux$ln -s data/batch1/Sample_7752.bam second.bam linux$ ln -s data/batch1/Sample_7754.bam third.bam

> fls <- dir(".","bam")
> fls
[1] "first.bam"  "second.bam" "third.bam"

> z <- featureCounts(fls, "mm10", nthreads = 30)
NCBI RefSeq annotation for mm10 (build 38.1) is used.

==========     _____ _    _ ____  _____  ______          _____
=====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
=====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 3 BAM files                                      ||
||                           P first.bam                                      ||
||                           P second.bam                                     ||
||                           P third.bam                                      ||
||                                                                            ||
||             Output file : ./.Rsubread_featureCounts_pid28665               ||
||             Annotations : /data/oldR/R-3.1.1/lib64/R/library/Rsubread/ ... ||
||                                                                            ||
||                   Level : meta-feature level                               ||
||              Paired-end : no                                               ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||                                                                            ||

//================================= Running ==================================\\
||                                                                            ||
||    Features : 222996                                                       ||
||    Meta-features : 27179                                                   ||
||    Chromosomes : 43                                                        ||
||                                                                            ||
|| Process BAM file first.bam...                                              ||
||    Paired-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 52406794                                                  ||
||    Successfully assigned reads : 37962960 (72.4%)                          ||
||    Running time : 0.91 minutes                                             ||
||                                                                            ||
|| Process BAM file second.bam...                                             ||
||    Paired-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 36114964                                                  ||
||    Successfully assigned reads : 26062635 (72.2%)                          ||
||    Running time : 0.60 minutes                                             ||
||                                                                            ||
|| Process BAM file third.bam...                                              ||
||    Paired-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 44862286                                                  ||
||    Successfully assigned reads : 32515674 (72.5%)                          ||
||    Running time : 0.73 minutes                                             ||
||                                                                            ||
||                                                                            ||

>

0
Entering edit mode

Hi James,

I get the same thing as you: when I check the content of fls created from my folder, I get the correct list of files... However, I still get the same errors :-(

These are my lines:

myFiles<-dir("where_my_bam_files_aliases_are_stored","bam\$")

fc<-featureCounts(files=myFiles,
isGTFAnnotationFile=T,
useMetaFeatures=T,
isPairedEnd=T,
annot.ext="/Illumina_iGenomes/Homo_sapiens/NCBI/build37.2/Annotation/Archives/archive-2014-06-02-13-47-29/Genes/genes.gtf",
GTF.featureType="exon",
GTF.attrType="gene_id")

Perplexing...

S

0
Entering edit mode

Did you look at the myFiles vector? It shouldn't be perplexing at all. If you have the files (or soft links) in a different directory, and you don't use the full.names argument to dir(), then all you get are the file names, so R thinks you are saying they are in the current working dir.

And if you have the soft links to the bam files in a different dir, then why are you bothering with the soft links at all? You can just point to the actual files themselves and go with that.

0
Entering edit mode

Hi James,

The full.names arg did the trick; I was assuming that R would use the filesystem to expand the paths toward the real location; it certainly did so for a single alias but needed the other arg for a list. Like I said earlier, I dive into R/Bioconductor maybe twice a year and it is always a challenge: the learning curve is steep, even for the simplest things. FYI take 1: I have hundreds of samples across dozens of tissue types. I decided to sort the BAM files accordingly; whenever I need them collected, I create a single folder with aliases. FYI take 2: my initial counting effort used easyRNASeq and that library did found the location of the original w/o issues. Typical situation in R/Bioconductor where there is more than one way to get there but for some, it's more twisted than others...

Thanks for you help

0
Entering edit mode
@sylvain-foisy-5539
Last seen 3.6 years ago

Hi James,

I knew it had to be simple and stupid... I am used to some other packages (mostly microarray-based) that can extract a list of files from a folder automatically. I was expecting something similar ;-)

Thanks for guiding the blind!

S