I am following Michael Love's tutorial: "Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification" for DTU analysis.
Having some issue with setting it up for DRIMSeq, I am getting this:
> all(rownames(cts) %in% txdf$TXNAME)
[1] FALSE
all(rownames(cts) == txdf$TXNAME)
[1] NA
Not sure how to fix that?
Assuming the above is leading to this issue when putting this command:
d <- dmDSdata(counts=counts, samples=samps)
It's coming back with error saying: all(samples$sample_id %in% colnames(counts)) is not TRUE
I can tell that my sample id's does not match the column sample id's, for example, sample id's are in this format: '780-med', and colnames(counts) is 'X780.med'
Would anyone know how to fix the two issues? Thank you in advance!
Hi phoebe1354,
It indeed seems like there is something wrong with both the row names of your count matrix (hence all(rownames(cts) %in% txdf$TXNAME) is FALSE) and the column names of the matrix (hence all(samples$sample_id %in% colnames(counts)) is FALSE). I expect you actually don't have rownames at all, given all(rownames(cts) == txdf$TXNAME) is NA.
How did you obtain the counts matrix? It is most easily obtained by importing it from
rnaseqDTU
package by running the commanddata(salmon_cts)
. This is how it is done in the Rscript associated with thernaseqDTU
package (which in turn is the package associated with the swimming downstream paper). The code can be obtained from here https://bioconductor.org/packages/release/workflows/vignettes/rnaseqDTU/inst/doc/rnaseqDTU.R. This script runs without errors for me.Note that the authors commented out the
tximport
bit from the R script, and now import the data fromrnaseqDTU
(you can also obtain it withrnaseqDTU::cts
). So maybe the data changed location and is now only available from the latter. Btw, if you were using the originaltximport
code from the paper, what did you fill in as the "/path/to/dir"?Hope it helps!
Jeroen
Also, this previous discussion could be relevant (depending on how you obtained the data): Error using DRIMSeq with Salmon count data
So I am bringing my own RNAseq data so I don't need the rnaseqDTU package.
I thought maybe my reference transcriptome was wrong so I used Tximeta and it gave me the matching transcriptome. I then mapped it to the correct transcriptome but it is still telling me that
If there are transcripts in cts that don't appear in txdf, then the next step of making the counts data frame and running dmDSdata won't work. Any idea on what the root of the issue could be?
Hi phoebe1354,
Sorry for the late reply, will try to follow it up now (if still relevant). Could you just give me
head(rownames(cts))
andhead(txdf$TXNAME)
?Jeroen