Adjusting for unwanted variation in DESeq2
1
0
Entering edit mode
@nikolay-ivanov-23079
Last seen 13 months ago
USA/New York City/Weill Cornell Medicine

I have a question regarding the best way to adjust for unwanted variation while using DESeq2.

Case 1: I have a dataset that came from one lab (so there are no known batch effects), and I wish to adjust for unwanted variation. I’m running svaseq on my count matrix, getting 17 SVs and adding them to my model.

dds=DESeqDataSetFromMatrix(countData = counts, colData = phenoData,
design = ~ SV_1 + … + SV_17 + covariate_of_interest)


Is that an appropriate thing to do? Is ok to add this many SVs? Is there a better way to adjust for unwanted variation?

What if there are ~30 SVs, can you just add them into the model?

Case 2: I’m combining datasets generated by multiple labs, so now there are known batch effects. Should I include the known batch effects in my model in addition to the SVs estimated by svaseq?

Additional questions:

• The instructions for using svaseq state that the input should be a “transformed data matrix”. Does that mean I can run svaseq on a count matrix, or does it have to transformed in some way?

• When you are fitting an interaction model and you also have SVs, can you set up your model like so:

dds=DESeqDataSetFromMatrix(countData = counts, colData = phenoData,
design = ~ SV_1 + … + SV_17 + genotype + condition+ genotype:condition)


Thank you!

ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 14 hours ago
United States

You can hypothetically add SVs to the model until you have no more remaining degrees of freedom, but there is some point where you might consider it to be excessive and you might then want to do some better EDA to figure out what's up. If you have like 200 samples then 17 SVs is probably fine. If you have 20 samples, then that's probably too many?

In case 2 you should probably include the batches in the mod argument and then fit them as part of the model.

You run svaseq on counts. I don't know what 'transformed data matrix' means in that context (the help says samples in columns and genes in rows, so maybe that should be 'transposed'?), but both the example and the code indicate it should be counts.

ADD COMMENT

Login before adding your answer.

Traffic: 476 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6