I am trying to improve my RNA-Seq DGE analysis practice so would like to see if I have been doing it correctly.
First part is related to tximport. For DESeq2, I use the suggested way:
txi <- tximport(files, type = "salmon", tx2gene = tx2gene) dds <- DESeqDataSetFromTximport(txi, sampleTable, ~condition)
This uses the offset matrix to correct for the length if I am not mistaken. For limma:
txi <- tximport(files, type = "salmon", tx2gene = tx2gene, countsFromAbundance = "lengthScaledTPM") y <- DGEList(txi$counts)
So my first question is, if I want to eliminate some of the samples from the DESeq or DGEList object, should I just select them from the object like:
dds <- dds[,sample_i_want_to_keep] # or y <- y[,sample_i_want_to_keep]
Or should I redo the
tximport() process? Would it affect the adjustment? My understanding is that TPM adjusts to gene length and library size so it shouldn't but I am not sure.
The second part of the question is about DGE design matrix. Let's say I have something like this, where donor is the origin of the cells, and I want to compare conditions NTC vs others and between A, B and C.
donor Condition 1_NTC 1 NTC 1_A 1 A 1_B 1 B 1_C 1 C 2_NTC 2 NTC 2_A 2 A 2_B 2 B 2_C 2 C 3_A 3 A
Due to technical issue 3_NTC, 3_B and 3_C could not be sequenced. I am just wondering when creating the design matrix, should I include donor or not. How will it affect the result when I compare NTC vs non-NTC and between A, B and C, given that for donor 3 there is no NTC, B or C? I assume in both
limma, the answer for this question would be the same since both are GLM-based?
Thanks a lot! Hope the questions make sense.