GSVA on separate healthy and disease datasets
2
0
Entering edit mode
Lucy ▴ 60
@lucy-17014
Last seen 3 months ago
United Kingdom

Hi,

We have performed scRNA-seq on blood and tissue from a cohort of disease patients. Unfortunately our scRNA-seq dataset doesn't include any healthy controls. We are interested in pathways that are upregulated in both the blood and tissue of disease patients relative to blood from healthy controls i.e. what are common disease-associated pathways?

Is it possible to use GSVA to address this question or are the results likely to be influenced by batch effects?

Best wishes,
Lucy

GSVA • 867 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

Without controls I don't see how you can identify any changes. Even if you knew some pathways, how would you know if the genes in those pathways were changing as compared to the controls that you don't have?

Anyway, this support site is meant to help people with specific questions about Bioconductor tools, not general analysis questions. In future, please ask general analysis questions over on biostars.org.

0
Entering edit mode

Thank you. This was a specific question relating to the GSVA Bioconductor package.

ADD REPLY
0
Entering edit mode
Robert Castelo ★ 3.4k
@rcastelo
Last seen 2 days ago
Barcelona/Universitat Pompeu Fabra

Hi Lucy,

I'm afraid the answer is negative, but others in this forum with more experience than me with such data might have different views. A way to think about it is to forget about pathways and think in terms of genes. Can we identify reliably identify upregulated genes in samples from disease patients relative to healthy controls, when control data belongs to an external study (b/c your own study does not have such controls)?

I would say that the answer is no because the technical variability between your disease data and the external control data is confounded with the comparison you want to do, and therefore, the variability in which you are interested cannot be disentangled from the technical variability between the two studies.

cheers,

robert.

ADD COMMENT
0
Entering edit mode

Hi Robert Castelo,

Thank you for the response. I was wondering whether I could perform GSVA separately on the three datasets (healthy blood, disease blood, disease tissue) and look at relative enrichment of gene sets across cell types e.g. perhaps IFN-stimulated genes are upregulated in myeloid cells in disease blood and tissue, but not in healthy blood.

Best wishes,
Lucy

ADD REPLY
1
Entering edit mode

If I understand you correctly, you are not then proposing to compare GSVA scores calculated from independent datasets, but to compare GSVA scores within each dataset, to obtain a list of differentially expressed gene sets per dataset, and then compare those lists of gene sets.

I'd say that comparing lists of differentially expressed gene sets from different datasets is as safe as comparing lists of differentially expressed genes from different datasets, taking into account that the statistical power of detection may differ across datasets. For instance, if the dataset of healthy individuals is less powered than the other datasets, you may not find in that dataset that IFN-stimulated genes are upregulated in myeloid cells, that is not because of a genuinely distinct biological mechanism, but because you have no sufficient statistical power to detect it.

robert.

ADD REPLY
0
Entering edit mode

Up-regulated is a comparative measure, meaning you have compared one thing to another, which is inadvisable in your study because you don't have any controls, and any controls you get from somewhere else will be completely confounded by technical differences.

Perhaps you are thinking that RNA-Seq values alone tell you something about the sample? That's not true. If you find that gene X has, on average, 150 counts per sample in healthy blood, that measure itself isn't meaningful. It's only meaningful if you also find that the counts are much lower (say on average 50 counts per sample) in diseased blood, which indicates that the disease is reducing the expression of the gene in the diseased blood.

But if the healthy blood samples come from lab Z, and were run at a different time, by a different technician, using different reagents, and possibly a different sequencer, then it's impossible to say if any differences are due to all those technical differences I just laid out or the biological differences you care to detect.

GSVA requires you to rank the genes based on the differences between one group and another, so you cannot perform GSVA separately. You have to compare the groups first.

ADD REPLY

Login before adding your answer.

Traffic: 574 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6