I have done alignment for paired-end RNA-Seq data using STAR. I now want to do differential expression analysis using the following workflow:
http://www.bioconductor.org/help/workflows/rnaseqGene/
I have a few questions before I proceed.
1) I have generated both Unsorted and SortedByCoordinate BAM files. Which one should I use for differential expression analysis?
2) I used the option --sjdbGTFfile during alignment. Do I need to modify the downstream workflow because of this?
3) I prepared a sampleTable as suggested in workflow. It has just two columns: sample name and group (case, control). Do I need to add other details or it would be sufficient.
Thanks.
Sumit Paliwal
There is more than one Bioc tools available for counting reads. featureCounts in Rsubread package is one of them and it has the same speed for counting reads in location-sorted bam files and unsorted bam files. Counting 20 million reads only costs about half a minute.
Hi Wei Shi,
I did use featureCounts as follows:
> fc <- featureCounts(files= filenames, annot.ext = "Rnor_6.0.84.gtf", isGTFAnnotationFile =TRUE, GTF.featureType = "exon", GTF.attrType = "gene_id", chrAliases=NULL, useMetaFeatures = TRUE, allowMultiOverlap = FALSE, isPairedEnd=TRUE, requireBothEndsMapped = TRUE, checkFragLength = TRUE, nthreads=6)
|| Load annotation file Rnor_6.0.84.gtf ...
|| Number of features is 305352
|| Number of meta-features is 32662
|| Number of chromosomes is 162
|| Process BAM file SM-C-F1_Aligned.sortedByCoord.out.bam... ||
|| Assign fragments (read pairs) to features... ||
|| Each fragment is counted once. ||
|| Found reads that are not properly paired. ||
|| (missing mate or the mate is not the next read) ||
|| 128001 reads have missing mates. ||
|| Input was converted to a format accepted by featureCounts. ||
|| Found reads that are not properly paired. ||
|| (missing mate or the mate is not the next read) ||
|| Process BAM file SM-C-M5_Aligned.sortedByCoord.out.bam... ||
|| Assign fragments (read pairs) to features... ||
|| Each fragment is counted once. ||
|| Found reads that are not properly paired. ||
|| (missing mate or the mate is not the next read) ||
There are 5 BAM files but it was not progressing beyond this point. I am using R for Ubuntu-14.04-LTS on a system that has 64GB RAM with 8-cores. Any suggestions.
One more thing to notice is that for GTC file , the number of chromosomes it shows is 162. Is something wrong here?
Thanks
Is your Rsubread version up to date?
I am using 'Rsubread' version 1.12.6.
Your Rsubread version is 5 years old. You need to update both your R and Rsubread to their latest version.