Question

DEXSeq: samples$sample_id %in% colnames(counts) is not TRUE

0

Entering edit mode

phoebe1354 • 0

@d114ea1c

Last seen 3.6 years ago

Canada

I am following Michael Love's tutorial: "Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification" for DTU analysis.

Having some issue with setting it up for DRIMSeq, I am getting this:

> all(rownames(cts) %in% txdf$TXNAME)

[1] FALSE

all(rownames(cts) == txdf$TXNAME)

[1] NA

Not sure how to fix that?

Assuming the above is leading to this issue when putting this command:

d <- dmDSdata(counts=counts, samples=samps)

It's coming back with error saying: all(samples$sample_id %in% colnames(counts)) is not TRUE

I can tell that my sample id's does not match the column sample id's, for example, sample id's are in this format: '780-med', and colnames(counts) is 'X780.med'

Would anyone know how to fix the two issues? Thank you in advance!

DEXSeq DRIMSeq • 2.6k views

ADD COMMENT • link updated 3.5 years ago by jeroen.gilis ▴ 100 • written 3.6 years ago by phoebe1354 • 0

0

Entering edit mode

Hi phoebe1354,

It indeed seems like there is something wrong with both the row names of your count matrix (hence all(rownames(cts) %in% txdf$TXNAME) is FALSE) and the column names of the matrix (hence all(samples$sample_id %in% colnames(counts)) is FALSE). I expect you actually don't have rownames at all, given all(rownames(cts) == txdf$TXNAME) is NA.

How did you obtain the counts matrix? It is most easily obtained by importing it from rnaseqDTU package by running the command data(salmon_cts). This is how it is done in the Rscript associated with the rnaseqDTU package (which in turn is the package associated with the swimming downstream paper). The code can be obtained from here https://bioconductor.org/packages/release/workflows/vignettes/rnaseqDTU/inst/doc/rnaseqDTU.R. This script runs without errors for me.

Note that the authors commented out the tximport bit from the R script, and now import the data from rnaseqDTU (you can also obtain it with rnaseqDTU::cts). So maybe the data changed location and is now only available from the latter. Btw, if you were using the original tximport code from the paper, what did you fill in as the "/path/to/dir"?

Hope it helps!

Jeroen

ADD REPLY • link 3.6 years ago jeroen.gilis ▴ 100

0

Entering edit mode

Also, this previous discussion could be relevant (depending on how you obtained the data): Error using DRIMSeq with Salmon count data

ADD REPLY • link 3.6 years ago jeroen.gilis ▴ 100

0

Entering edit mode

So I am bringing my own RNAseq data so I don't need the rnaseqDTU package.

I thought maybe my reference transcriptome was wrong so I used Tximeta and it gave me the matching transcriptome. I then mapped it to the correct transcriptome but it is still telling me that

> all(rownames(cts) %in% txdf$TXNAME)
[1] FALSE

> all(rownames(cts) == txdf$TXNAME)
[1] NA

If there are transcripts in cts that don't appear in txdf, then the next step of making the counts data frame and running dmDSdata won't work. Any idea on what the root of the issue could be?

ADD REPLY • link 3.6 years ago phoebe1354 • 0

0

Entering edit mode

Hi phoebe1354,

Sorry for the late reply, will try to follow it up now (if still relevant). Could you just give me head(rownames(cts)) and head(txdf$TXNAME)?

Jeroen

ADD REPLY • link 3.5 years ago jeroen.gilis ▴ 100