Salmon counts for a gene but it is not in my DESeq2 tables
1
0
Entering edit mode
tomtom ▴ 10
@f5b64afd
Last seen 2.1 years ago
France

Hello everyone,

I am doing a big RNA seq experiment and I was looking for the expression of a particular gene but I did not find it in my DESeq2 output table. I found this strange as I expect the gene to be expressed (perhaps not DE but expressed). So then I decided to look into the salmon counts matrix I used to run DESeq2 and I see that the gene has indeed counts. This surprises me. Can anyone explain this to me? I have the output which concerns me shown below, first part of the DESeq2 table, then the salmon counts. It regards gene m00115031

Thanks

baseMean    log2FoldChange  lfcSE   stat    pvalue  padj
XTTCFBP2054_m00115001   112.666716821165    -0.136943580215841  0.28231739912585    -0.485069572898675  0.627626994693917   0.999275281977537
XTTCFBP2054_m00115011   92.4633686971637    0.0581141581946056  0.427434081334662   0.135960515860467   0.891852494829847   0.999275281977537
XTTCFBP2054_m00115021   0.53638193947359    0.086491969831991   2.33640180992394    0.0370193044127142  0.970469613593924   0.999275281977537
XTTCFBP2054_m00115051   2603544.95989068    0.438149536947451   0.403800651143322   1.08506396833902    0.277893324463571   0.999275281977537
XTTCFBP2054_m00115061   616.107744531023    0.261236459279101   0.555682029181473   0.47011860301458    0.638270283926091   0.999275281977537



Xtt-CFBP2054_RbmL_EV_R1 Xtt-CFBP2054_RbmL_EV_R2 Xtt-CFBP2054_RbmL_EV_R3 Xtt-CFBP2054_RbmL_hrpGstar_R1   Xtt-CFBP2054_RbmL_hrpGstar_R2   Xtt-CFBP2054_RbmL_hrpGstar_R3
XTTCFBP2054_m00115011   150 29  113 225 38  112 
XTTCFBP2054_m00115021   1   0   1   3   0   0   
XTTCFBP2054_m00115031   190 32  90  24219   296 994 
XTTCFBP2054_m00115041   0   0   0   0   0   0   
XTTCFBP2054_m00115051   2629215 940760  3407073 3954105 1853755 3638923 
XTTCFBP2054_m00115061   730 141 985 829 422 840 
XTTCFBP2054_m00115071   0   0   0   0   0   0   
XTTCFBP2054_m00115081   987 207 385 1737    124 417
DESeq2 • 570 views
ADD COMMENT
1
Entering edit mode
jeroen.gilis ▴ 90
@jeroengilis-21551
Last seen 4 months ago
Belgium

Hi tomtom,

This is indeed strange given the high counts in the salmon output.

Could you check at which stage in your workflow the gene was lost? Was it when importing the data to R, after gene-level filtering, or after calling the DESeq2::results function?

The DESeq2::results could be the culprit. It by default performs independent filtering (see vignette https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#independent-filtering-and-multiple-testing) and deals with outliers (https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#approach-to-count-outliers), which could be whats happing with your gene given the outlier count in sample 4. However, neither of these steps should removed genes from the output dataframe directly, but would results in NA values for pvalue and/or padj. Maybe you removed such genes from the output?

Also, did you you use tximport to load your salmon quantifications?

ADD COMMENT
1
Entering edit mode

Hoi Jeroen,

Yes I used tximport. Thanks for your answer, it had been a while ago since I called DEG and indeed, the pval and padj are NA and these were removed in a downstream processing step. I forgot about that, thanks for clarifying and helping me figure out my problem.

ADD REPLY

Login before adding your answer.

Traffic: 565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6