Hi,
I am trying to improve my RNA-Seq DGE analysis practice so would like to see if I have been doing it correctly.
First part is related to tximport. For DESeq2, I use the suggested way:
txi <- tximport(files, type = "salmon", tx2gene = tx2gene)
dds <- DESeqDataSetFromTximport(txi, sampleTable, ~condition)
This uses the offset matrix to correct for the length if I am not mistaken. For limma:
txi <- tximport(files, type = "salmon", tx2gene = tx2gene, countsFromAbundance = "lengthScaledTPM")
y <- DGEList(txi$counts)
So my first question is, if I want to eliminate some of the samples from the DESeq or DGEList object, should I just select them from the object like:
dds <- dds[,sample_i_want_to_keep]
# or
y <- y[,sample_i_want_to_keep]
Or should I redo the tximport() process? Would it affect the adjustment? My understanding is that TPM adjusts to gene length and library size so it shouldn't but I am not sure.
The second part of the question is about DGE design matrix. Let's say I have something like this, where donor is the origin of the cells, and I want to compare conditions NTC vs others and between A, B and C.
donor Condition
1_NTC 1 NTC
1_A 1 A
1_B 1 B
1_C 1 C
2_NTC 2 NTC
2_A 2 A
2_B 2 B
2_C 2 C
3_A 3 A
Due to technical issue 3_NTC, 3_B and 3_C could not be sequenced. I am just wondering when creating the design matrix, should I include donor or not. How will it affect the result when I compare NTC vs non-NTC and between A, B and C, given that for donor 3 there is no NTC, B or C? I assume in both DESeq2 and limma, the answer for this question would be the same since both are GLM-based?
Thanks a lot! Hope the questions make sense.

Thanks Michael Love ! I just reread the tutorial for tximport again and had a relevant question regarding to the scaling. So We don't have to scale to transcript length, but library size is always a must to be normalised to. My question is for
limma-voomin the tutorial, you recommended to use"scaledTPM'to normalise the counts to library size, but then a few lines later you redo the normalisation again:Is the
calNormFactors()step necessary given that the counts are already normalised? Will it change the value of the original count?scaledTPMdoesn't correct for library size. For example, on thefilesin the man page: