Question: mogsa on RNA-seq and ATAC-seq data
gravatar for meeta.mistry
5 days ago by
United States
meeta.mistry20 wrote:


I am trying to use mogsa to analyze ATAC -seq and RNA-seq data on the same samples. In the vignette example, the data matrices are multiple microarray data. If I use ATAC-seq data, what should I use as input? I was thinking a count matrix for a set of consensus peaks across all samples. However, in order to map it to the RNA-seq data presumably we need nearest gene annotations for each peak? IN taht case we will have multiple peaks mapping to a single gene - is this going to be problematic for mogsa? Is this the correct way to handle this type of data?

Any help is much appreciated.



ADD COMMENTlink written 5 days ago by meeta.mistry20
gravatar for Aedin Culhane
4 days ago by
Aedin Culhane510
United States
Aedin Culhane510 wrote:

Hi Meeta

There are 3 stages to mogsa,  1) the datasets are projected into the same space, 2) the gene set scores of the genes/proteins are calculated for each principal component,  3)  the overall score of each geneset per sample is extracted for the selected components.

Both datasets would need to be in the form of matrices with matched samples.  However the features (rows) do not need to match.  Then the set of features in the new space will show the covariance/association between ATAC seq and RNAseq for the samples. For part 2), the gene set score would be generated probably using only the RNA annotation.  The gene set annoation, if a binary or weight matrix of genes x genesets, where 1 (or any score >0)  means a gene is in that geneset.   If you wish to generate gene set annotation for the ATAC seq, I would include the maximum number of possible associations, of chromatin regions to genesets.  If you have no annotation, create an empty matrix  (all zero) with the features of the ATAC in the rows, and the genesets in the columns.  Then only the RNAseq will be used to score the genesets, however the weight of each gene in the space is determined by both datasets.  3) finally consider if any of the PCs are associated to batch effects, or how many PCs provide useful data.  Include the PCs you wish to keep for the final score.  Hope this helps, happy to generate a vignette if you point me to example data.



ADD COMMENTlink written 4 days ago by Aedin Culhane510
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 192 users visited in the last hour