Difference in differentially expressed genes captured from two sets of same sample
1
0
Entering edit mode
alva.james • 0
@alvajames-6967
Last seen 6.3 years ago
Germany

Dear All,

I have gene count from RNA-seq data from 50 samples, the data is collected from patients from two different time points. In which I have divided the genes as 2 sets one set is the whole transcriptome (with protein-coding , non-coding together) and another set (focus set ) is only non-coding genes. Then I ran DEseq between two-time series for each set. From the whole set, I have a total of 301 (non-coding) significantly differently expressed genes filtered as the top candidate based on filters such as FDR <=0.05 , p_value <=0.05 and log fold change below and above -1 and +1 as both up and down regulated genes.

Whereas the for the second set which I have named as focus set which includes only the non-coding genes, I have run the same analysis and filtered with same filters and I am left out with a significant candidate of 201 genes.  So I need an advice or suggestion here for following questions,

1. Is it statistically right to group the genes and run DEseq for those ? if no, then why not?

2. Why is it there is the huge difference of numbers between results from both sets ?

 

Thanky ou so much for support

 

 

deseq2 deseq • 1.2k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 1 day ago
United States

hi,

1) It is not recommended to split the genes into sets to process differently. The DESeq2 methods expect all the genes in one DESeqDataSet object. Why? Because the prior estimation steps need to look at all the genes in order to come up with a reasonable prior hyperparameters (location and width of the dispersion prior, width of the LFC prior, see DESeq2 paper).

2) I can turn this question around: why do you expect that differentially expressed genes would be present and detectable at an equal proportion in these two sets? It's not a random subset after all, but selected for some biological reasons.

But most importantly, you should not split the DESeqDataSet into two, but put all genes into a single DESeqDataSet.

ADD COMMENT
0
Entering edit mode

Thanks for your reply.

First a comment: I only counted the resulting differentially expressed lncRNAs for comparing the two approaches.

2.) Regarding point 2: We were hypothesizing that if we separate the lncRNAs from the protein_coding transcripts, that the method may get more sensitive to the generally much lower expression levels of lncRNAs. The fear was that the high expression of protein-coding transcripts has bigger changes between the two sample groups and therefore dilutes the 'smaller changes' within the lncRNAs. **The result** was the opposite: more lncRNAs were classified as differentially expressed, when all transcripts together were included in the analysis.

 

 

 

ADD REPLY
0
Entering edit mode

Hmm, yes that assumption is not a good one. The method is most sensitivity when it can observe the data from all the genes. The highly expressed genes do not really steal the sensitivity from the lowly expressed genes.

ADD REPLY
0
Entering edit mode

Ok, Thank you 

ADD REPLY

Login before adding your answer.

Traffic: 665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6