Search
Question: mogsa on RNA-seq and ATAC-seq data
0
gravatar for meeta.mistry
4 weeks ago by
meeta.mistry20
United States
meeta.mistry20 wrote:

Hi,

I am trying to use mogsa to analyze ATAC -seq and RNA-seq data on the same samples. In the vignette example, the data matrices are multiple microarray data. If I use ATAC-seq data, what should I use as input? I was thinking a count matrix for a set of consensus peaks across all samples. However, in order to map it to the RNA-seq data presumably we need nearest gene annotations for each peak? IN taht case we will have multiple peaks mapping to a single gene - is this going to be problematic for mogsa? Is this the correct way to handle this type of data?

Any help is much appreciated.

Thanks,

Meeta

ADD COMMENTlink modified 28 days ago by mengchen180 • written 4 weeks ago by meeta.mistry20
0
gravatar for Aedin Culhane
4 weeks ago by
Aedin Culhane510
United States
Aedin Culhane510 wrote:

Hi Meeta

There are 3 stages to mogsa,  1) the datasets are projected into the same space, 2) the gene set scores of the genes/proteins are calculated for each principal component,  3)  the overall score of each geneset per sample is extracted for the selected components.

Both datasets would need to be in the form of matrices with matched samples.  However the features (rows) do not need to match.  Then the set of features in the new space will show the covariance/association between ATAC seq and RNAseq for the samples. For part 2), the gene set score would be generated probably using only the RNA annotation.  The gene set annoation, if a binary or weight matrix of genes x genesets, where 1 (or any score >0)  means a gene is in that geneset.   If you wish to generate gene set annotation for the ATAC seq, I would include the maximum number of possible associations, of chromatin regions to genesets.  If you have no annotation, create an empty matrix  (all zero) with the features of the ATAC in the rows, and the genesets in the columns.  Then only the RNAseq will be used to score the genesets, however the weight of each gene in the space is determined by both datasets.  3) finally consider if any of the PCs are associated to batch effects, or how many PCs provide useful data.  Include the PCs you wish to keep for the final score.  Hope this helps, happy to generate a vignette if you point me to example data.

Best

Aedin

ADD COMMENTlink written 4 weeks ago by Aedin Culhane510
0
gravatar for mengchen18
28 days ago by
mengchen180
Germany
mengchen180 wrote:

"I was thinking a count matrix for a set of consensus peaks across all samples"

You can use the count data. But in order to avoid your results driven by a few rows (chromatin regions) in ATAC seq data, it's better to normalized the matrix in advance, such as log transform or as in (non-symmetric) correspondence analysis.

"However, in order to map it to the RNA-seq data presumably we need nearest gene annotations for each peak? IN taht case we will have multiple peaks mapping to a single gene - is this going to be problematic for mogsa? "

As Aedin said, there is nothing wrong with this method. You can also consider more, e.g. TF binding region of genes or any other mechanism related to the gene expression. 

ADD COMMENTlink modified 28 days ago • written 28 days ago by mengchen180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 155 users visited in the last hour