After running the dex.padj <- getAdjustedPValues(stageRObj, order=FALSE, onlySignificantGenes=TRUE)
, and filtering for only the transcripts in the final table that are less than 0.05 I received a list of genes with only 1 transcript that was significant.
I was curious to see what the expression of this gene looked like in relation to the 1 transcript that was found to be significant. Using the counts
from scaledTPM
in tximport
I was surprised to see that in many cases there were other transcripts that maybe should have been considered DTU as well...
Anecdotal Example:
geneID txID gene transcript
ENSG00000006468 ENST00000485475 7.875795e-06 1.402172e-06
ENSG00000006468 ENST00000483075 7.875795e-06 3.333497e-01
ENSG00000006468 ENST00000405192 7.875795e-06 6.832075e-01
A graph of expression, can be found here: https://www.dropbox.com/s/52vpnqroxk88h3y/ENSG00000006468.pdf?dl=0
As a side note, what would be considered the minimum read depth for DTU detection?
Thank You
Rina
I can look for a clearer case, but if the expression looks roughly similar, than I should not have gotten even a single significant transcript..? The error bars are SD.
I "filtered" just to make my own script clear, but from the workflow, "The final table with adjusted p-values summarizes the information from the two-stage analysis. Only genes that passed the filter are included in the table, so the table already represents screened genes. The transcripts with values in the column, transcript, less than 0.05 pass the confirmation stage on a target 5% overall false discovery rate, or OFDR."
maybe a clearer case:
https://www.dropbox.com/s/tc7mfp9zzspor2o/ENSG00000125812.pdf?dl=0
Thanks :)
Also, as a side note, what would be considered the minimum read depth for DTU detection?
I don’t have an answer for this, but one can sometimes assess this by exploring one or more real datasets.
So what’s the question exactly? Is it, how is it possible that there is evidence for one transcript but not the other? The answer to that is: because we make an arbitrary decision on “significance” and because the transcripts have different power based on their distributions, as well as simple sampling variance, it’s expected that some genes could end up with one transcript passing the arbitrary threshold while another is above. It is true that for one transcript to actually have participated in DTU, another one must have as well, but that is a question about the actual underlying proportions, whereas statistical testing is a different matter.
Yes, my question was how there can be evidence for a single transcript when in order for DTU to occur by definition there needs to be another transcript involved. I understand that there is a distinction between the underlying proportions v. statistical testing.
So, what would you suggest one do with these types of examples? Should we disregard cases where only 1 transcript is identified, as there seems to not be enough statistical evidence that supports that 2 transcripts are involved in DTU? Or do we analyze this on a case by case basis?
For the genes where only one transcript passes the threshold for DTU, I would assume that, if the gene and transcript are true (and not part of the FDR allowed FP), then the one transcript has the strongest signal, and the other transcript(s) must participate in DTU, but didn't show as strong a signal. I would not disregard these cases.
OK.
Thank you for the detailed answer.