Fit a multinomial to RNA-seq count data
3
1
Entering edit mode
rubi ▴ 90
@rubi-6462
Last seen 3.0 years ago

Hi,

I have microRNA-seq count data from several experimental conditions. Most microRNA transcripts produce two forms of mature (bioactive) microRNAs (the 5p arm and the 3p arm). In my RNA-seq read data, reads align to either of the mature forms (which are disjoint locations in the microRNA transcript), or to other regions in the transcript not compatible with either mature forms.Hence, for each microRNA transcript I have 3 categories. (I can actually define more categories since reads that are compatible with the mature forms often include modifications),

What I want to test is whether the proportions among the 3 categories mentioned above are affected by my experimental conditions (i.e., factors). Is there anything in limma, or other RNA-seq count based packages, that can be used for this question?

I think that a GLM with a Dirichlet multinomial distribution, as implemented in the MGLM package, might be relevant but haven't used it and would assume that is not doing as good of a job in modeling the variance like voom.

Thanks a lot,

rubi

voom limma glm multinomial regression • 986 views
6
Entering edit mode
@gordon-smyth
Last seen 3 minutes ago
WEHI, Melbourne, Australia

There is no need to fit a multinomial distribution because the usual Poisson models for counts are already equivalent to multinomial models when testing for interactions. This is because the interaction in effect conditions on the marginal totals.

Anyway, the diffSplice() function in the limma package can test what you want. It uses the voom weights to do a differential exon usage analysis.

Suppose that 'Counts' contains your read counts, with columns for samples and rows for microRNAs, with the 3p, 5p and other categories kept as separate rows for each microRNA. Then

v <- voom(Counts, design)
fit <- lmFit(v, design)
dx <- diffSplice(fit, geneid)
topSplice(dx, coef=2)

where geneid is a vector containing the microRNA IDs (with the 3p, 5p and 'other' rows from the same microRNA sharing the same ID). This will give a toptable of microRNAs for which the 3p/5p/other proportions differ between conditions. You can use coef to indicate any coef in the linear model.

There is an analogous function diffSpliceDGE() in the edgeR package that does a similar analysis.

2
Entering edit mode
@ryan-c-thompson-5618
Last seen 11 months ago
Scripps Research, La Jolla, CA

I think you're probably best off analyzing this data using a package for differential exon usage such as DEXSeq (or limma::diffSplice or edgeR::diffSpliceDGE, as Gordon mentioned). Just treat the 3p mature, 5p mature, and "other" categories as three exons.

2
Entering edit mode
Robert Castelo ★ 2.7k
@rcastelo
Last seen 27 days ago
Barcelona/Universitat Pompeu Fabra

Hi,

take a look to the DRIMSeq Bioconductor package and its corresponding article. It uses a Dirichlet-multinomial model to detect isoform changes within an empirical Bayes framework analogous to edgeR to moderate dispersion estimates with limited sample size. It sounds like it could be used to address your question.

cheers,

robert.