Hi,
What would be the simple way of telling featureCounts to read a whole folder filled with BAM files? I am hoping not to have to write a few hundred lines of text in my command... And yes, I am not a R pro :-(
Best regards
S
Hi,
What would be the simple way of telling featureCounts to read a whole folder filled with BAM files? I am hoping not to have to write a few hundred lines of text in my command... And yes, I am not a R pro :-(
Best regards
S
The help page for featureCounts says this, for the first argument:
files
a character vector giving names of input files containing read mapping results. The files can be in either SAM format or BAM format. The file format is automatically detected by the function.
So you just need a character vector of all the bam files in your directory, which you can get using
fls <- dir(".", "bam$")
Hi James,
Ok, take 2 of the story. My BAM files are symlinked to aliases in a folder. If I specified a single of these aliases in files="", featureCounts will work. If I use your solution to create a vector of these aliases, I get something like this for all files:
|| Process Unknown file 169377_LYMPHO_B_accepted_hits_properly_paired.bam... ||
|| Failed to open file 169377_LYMPHO_B_accepted_hits_properly_paired.bam ||
|| No counts were generated for this file. ||
Any idea? It seems that featureCounts (or the fls object created above) does not resolves the aliases toward the original files...
Thanks in advance
S
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I don't see that at all:
Hi James,
I get the same thing as you: when I check the content of fls created from my folder, I get the correct list of files... However, I still get the same errors :-(
These are my lines:
myFiles<-dir("where_my_bam_files_aliases_are_stored","bam$")
fc<-featureCounts(files=myFiles,
isGTFAnnotationFile=T,
useMetaFeatures=T,
isPairedEnd=T,
annot.ext="/Illumina_iGenomes/Homo_sapiens/NCBI/build37.2/Annotation/Archives/archive-2014-06-02-13-47-29/Genes/genes.gtf",
GTF.featureType="exon",
GTF.attrType="gene_id")
Perplexing...
S
Did you look at the myFiles vector? It shouldn't be perplexing at all. If you have the files (or soft links) in a different directory, and you don't use the full.names argument to dir(), then all you get are the file names, so R thinks you are saying they are in the current working dir.
And if you have the soft links to the bam files in a different dir, then why are you bothering with the soft links at all? You can just point to the actual files themselves and go with that.
Hi James,
The full.names arg did the trick; I was assuming that R would use the filesystem to expand the paths toward the real location; it certainly did so for a single alias but needed the other arg for a list. Like I said earlier, I dive into R/Bioconductor maybe twice a year and it is always a challenge: the learning curve is steep, even for the simplest things. FYI take 1: I have hundreds of samples across dozens of tissue types. I decided to sort the BAM files accordingly; whenever I need them collected, I create a single folder with aliases. FYI take 2: my initial counting effort used easyRNASeq and that library did found the location of the original w/o issues. Typical situation in R/Bioconductor where there is more than one way to get there but for some, it's more twisted than others...
Thanks for you help
S