Entering edit mode
empyrean999
▴
10
@empyrean999-5705
Last seen 10.6 years ago
Hello..
I have RNA Seq data from about 20 different samples which are of
differnet
stages. The experiment was not properly designed for expression
profiling
but i wanted to extract some meaningful and correct information from
the
analysis
I have the data set up like this..
Stage 1 : Day1 (Sample 1), day 10 (Sample 2) , day 20 (Sample 3) -
total reads combining 3 samples (100 mil)
Stage 2 : Day 3 (Sample 4) (30 million)
Stage 3 : male - day 1, 2, 10 (Sample 5,6,7) (400 million)
Stage 4 : female - day 1,2,10 (Sample 8,9,10) (250 million)
Stage 5 : specific tissue (Sample 11) (50 million)
The total reads for 5 diff stages varying from 30 million to 400
million. There is no reference genome for this so i assembled them
using trinity by combining all the reads. i have around 300k
transcripts. Now i have done two diff experiments..
1) Mapping the reads back separetely for all samples to assembled
transcriptome (300k ) and used edgeR to call differential expression.
I used the downstream processing pipeline mentioned in trinity which
uses edgeR.
I considered samples separately and got all vs all comparisons. I
wanted to get expression profiling of those different stages. But with
edgeR, i might not get good profiling as it is at sample level as you
see that for one stage i have one sample where as for one stage i have
max of 6 samples.
2) b) Combine the raw reads stage wise in to fastq files like for
Stage1 :
100 mil , Stage 2 : 30 mil , Stage3 : 400mil etc and run edgeR with
the
stages.
But my question here is as i have huge variation in number of reads,
do you think edgeR can handle well to give correct FPKM values and
correct profiling of these samples?
Any suggestions on how i should proceed further in this analysis??
[[alternative HTML version deleted]]