Question: Differential expression analysis for STAR-aligned RNA-Seq data
0
gravatar for sum31278
3.2 years ago by
sum312780
sum312780 wrote:

I have done alignment for paired-end RNA-Seq data using STAR. I now want to do differential expression analysis using the following workflow:

http://www.bioconductor.org/help/workflows/rnaseqGene/

I have a few questions before I proceed.  

1) I have generated both Unsorted and SortedByCoordinate BAM files. Which one should I use for differential expression analysis?

2) I used the option --sjdbGTFfile during alignment. Do I need to modify the downstream workflow because of this?

3) I prepared a sampleTable as suggested in workflow. It has just two columns: sample name and group (case, control). Do I need to add other details or it would be sufficient.

Thanks.

Sumit Paliwal

 

 

 

 

rnaseq deseq2 star • 1.8k views
ADD COMMENTlink modified 3.2 years ago by Michael Love25k • written 3.2 years ago by sum312780

There is more than one Bioc tools available for counting reads. featureCounts in Rsubread package is one of them and it has the same speed for counting reads in location-sorted bam files and unsorted bam files. Counting 20 million reads only costs about half a minute.

ADD REPLYlink written 3.2 years ago by Wei Shi3.2k

Hi Wei Shi,

I did use featureCounts as follows:


> fc <- featureCounts(files= filenames, annot.ext = "Rnor_6.0.84.gtf", isGTFAnnotationFile =TRUE, GTF.featureType = "exon", GTF.attrType = "gene_id", chrAliases=NULL, useMetaFeatures = TRUE, allowMultiOverlap = FALSE, isPairedEnd=TRUE, requireBothEndsMapped = TRUE, checkFragLength = TRUE, nthreads=6)
   
|| Load annotation file Rnor_6.0.84.gtf ...
||    Number of features is 305352 
||    Number of meta-features is 32662 
||    Number of chromosomes is 162                

                          
|| Process BAM file SM-C-F1_Aligned.sortedByCoord.out.bam...                  ||
||    Assign fragments (read pairs) to features...                            ||
||    Each fragment is counted once.                                          ||
||    Found reads that are not properly paired.                               ||
||    (missing mate or the mate is not the next read)                         ||
||    128001 reads have missing mates.                                        ||
||    Input was converted to a format accepted by featureCounts.              ||
||    Found reads that are not properly paired.                               ||
||    (missing mate or the mate is not the next read)                         ||
|| Process BAM file SM-C-M5_Aligned.sortedByCoord.out.bam...                  ||
||    Assign fragments (read pairs) to features...                            ||
||    Each fragment is counted once.                                          ||
||    Found reads that are not properly paired.                               ||
||    (missing mate or the mate is not the next read)                         ||


There are 5 BAM files but it was not progressing beyond this point.  I am using R for Ubuntu-14.04-LTS on a system that has 64GB RAM with 8-cores. Any suggestions.

One more thing to notice is that  for GTC file , the number of chromosomes it shows is 162. Is something wrong here?

Thanks

ADD REPLYlink written 3.1 years ago by sum312780

Is your Rsubread version up to date?

ADD REPLYlink written 3.1 years ago by Wei Shi3.2k

I am using 'Rsubread' version 1.12.6.

ADD REPLYlink written 3.1 years ago by sum312780

Your Rsubread version is 5 years old. You need to update both your R and Rsubread to their latest version.

ADD REPLYlink written 3.1 years ago by Wei Shi3.2k
Answer: Differential expression analysis for STAR-aligned RNA-Seq data
0
gravatar for Michael Love
3.2 years ago by
Michael Love25k
United States
Michael Love25k wrote:

1) You need to decide first how you are going to count the reads, which tool you will use from the table in the workflow. I believe they are all fastest if you provide name sorted BAM files, but you should check the documentation of the tool you choose to use.

2) no

3) That is sufficient, but the important thing is that you confirm that the files you provide to the counting tool are in the same order as the rows of sampleTable. We make sure this is the case by having the files be a column of the sampleTable and provide this column to the counting tool.

ADD COMMENTlink written 3.2 years ago by Michael Love25k

Thank you Michael Love for the reply. However, I did not understand how the order of the file would make the difference.

ADD REPLYlink written 3.2 years ago by sum312780
Just make sure the order of samples in colData matches the columns of the count matrix.
ADD REPLYlink written 3.2 years ago by Michael Love25k

Hi Michael, 

I am following the workflow as suggested.

When I ran the following command, after sometime the system responded with a warning stating it is low on memory and R should be closed.


se <- summarizeOverlaps(features=ebg, reads = bamfiles, mode = "Union", singleEnd = FALSE, ignore.strand = TRUE, fragments =TRUE)


To solve this issue I did the following:


library("BiocParallel")
register(SnowParam(stop.on.error = TRUE, timeout = Inf)


However, it displayed  the following warning: 


Warning message: In initRefFields(.self, .refClassDef, as.environment(.self), list(...)) : NAs introduced by coercion to integer range


 Also, it did not solve the purpose.

I have generated the alignment files using paired end reads with a total of 50 million reads/sample.  Here, I am processing 5 files.

I am using R-3.3.1 (64-bit) for windows. The system has 64GB RAM with 8-cores.

I am surely missing something. Please suggest

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by sum312780

There are numerous ways to produce a count matrix suitable for DESeq2. If you are having difficulty with one, I'd recommend you try an alternative method, for example featureCounts() from the Rsubread package, or you can use a lightweight transcript quantifier such as Salmon, Sailfish or kallisto and then use the Bioconductor package txmiport to build a DESeqDataSet.

ADD REPLYlink written 3.2 years ago by Michael Love25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 337 users visited in the last hour