Question

DESeq2: FPMs varying based on sample groups used

0

Entering edit mode

Jay • 0

@2034d2f5

Last seen 6 months ago

Austria

I recently analysed a transcriptomic dataset with 3 sample groups (4 samples each) and performed a pairwise comparison between the groups. I output the FPMs for the pairwise comparisons and noticed that the same gene in a sample would have a slightly different FPM in each of the comparisons.

example: A gene has the following FPM in one comparison: Sample 1: 476.992795157303, Sample 2: 472.464072441368, Sample 3: 488.11759330905, Sample 4: 461.634140423592

and in the second: Sample 1: 448.229377708702, Sample 2: 449.795231722178, Sample 3: 460.560159012059, Sample 4: 431.705745786016

Is there a reason this would happen? Is it expected?

Example commands run once data was in DESeq:

dds_LactuG <- DESeqDataSetFromMatrix(countData=FCcounts_clean_LactuG, colData=Meta_LactG, design=~condition, tidy = TRUE)
dds_LactuG$condition<-relevel(dds_LactuG$condition, ref="Lactose")

#run DESEQ
dds_LactuG <- DESeq(dds_LactuG)

#get results
res <- results(dds_LactuG)
resOrdered <- res[order(res$pvalue),]

FPM_table <- fpm(dds_LactuG) %>% as.data.frame() %>% rownames_to_column("Geneid")

Thank you for any help!

DESeq2 • 509 views

ADD COMMENT • link updated 6 months ago by ATpoint ★ 4.1k • written 6 months ago by Jay • 0

score 0 · Answer 1 · 2023-11-03

0

Entering edit mode

ATpoint ★ 4.1k

@atpoint-13662

Last seen 9 hours ago

Germany

If you have different samples in the dds object when you do the normalization then yes, this is expected. Normalization is relative to all involved samples and it will (slightly most of the time) change when you add or subtract samples. Unlike "naive" FPM the DESeq2 version uses the a version that uses the DESeq2 size factors and these, as said, depend on present samples during its calculation. Unlike this, naive per-million scaling would always be the same, but is a poor technique that does not correct for composition bias.

ADD COMMENT • link 6 months ago ATpoint ★ 4.1k

0

Entering edit mode

Thanks for the response I thought that was the situation but wasn't sure.

I would like to get FPMs/FPKMs across all treatments and use those for visuals. Would making a sample set including all samples, with a dummy metadata set, and running DESeq on it to just get FPMs (not using all of the actual results from DESeq) be alright? Is there another way to get one FPM value per sample in these situations?

I would like to use them for visualizations and in supplementary data.

ADD REPLY • link 6 months ago Jay • 0

0

Entering edit mode

You don't need to subset your data for a pairwise analysis. See the vignette on contrasts. Unless there is a good reason to subset just run DESeq on all samples, then use contrasts for the pairwise analysis and use the FPKMs or any normalized counts from this analysis.

ADD REPLY • link 6 months ago ATpoint ★ 4.1k