I'd like to manually add a vector of transcript sizes to my DDS object. I tried setting:
dds@assays$data$avgTxLength=data.frame(row.names=txOrder,"avgTxLength"=tlen[txOrder,])
Where txOrder is the row names from the dds data and tlen is the (out of order) transcript list for those row names. The error I get when I try using fpkm() is "invalid 'dimnames' given for data frame".
How do I add this vector so I can output fpkm?
Nope, neither worked.
I'm working at the transcript level and I have the length for each transcript (calculated with a simple script from the gtf). So tlen is a data frame with transcript_id's as row names and one column of data labeled "Len". txOrder is a character array with the transcript_id values used in my dds object (after filtering) in the correct order.
Replacing "xxx" or "x" in the above examples with tlen[txOrder,] (which produces a numeric vector) still gives the "dimnames" error when I try fpkm().
Can you make a small reproducible example, so I can take a look? Ideally with just simulated data, e.g. makeExampleDESeqDataSet.
In creating a subset example object, I can now get the mcols(dds)$basepairs<-x method to work. So now I'll go back and re-create my original dds object--perhaps I screwed it up with my other trial methods. But thanks--this seems to have solved my problem!
For bp gene lengths, should I use the original CDS length or the effective CDS length (CDS length - read length)? Thanks!
I don't have any preference here. I tend to use methods like Salmon for calculating abundance, rather than FPKM from counts.