Question: How to create the colData for a complex experiment
0
21 months ago by
European Union
Lluís Revilla Sancho500 wrote:

I have a dataset where for the same patient and time we have extracted different samples from different locations.

What would be the best way to encode this in MAE?

Patient Time Region State of the region
A 0 A Healthy
A 0 B Injured
A 0 C Injured
A 24 A Healthy
A 24 B Injured

The sampleMap is provides "many-to-one" mapping, but when those phenotypes are from each sample how should I store it? I have several variables related to the patient (sex, age of diagnosis, disease, C-reactive protein, treatment followed, antibiotics, ...) and some related to the sample mainly (date of extraction, region extracted, state of that region, Endoscopic Score of the region, type of sample, ...)

The only way I thought is using as ID a combination of Patient, Time and Location, something like paste(Patient, Time, Location, collapse = "_") of the samples but it would duplicate information about the patient in order to store correctly the information about the sample.

Is there any better solution?

modified 21 months ago by Levi Waldron950 • written 21 months ago by Lluís Revilla Sancho500
Answer: How to create the colData for a complex experiment
0
21 months ago by
Levi Waldron950
CUNY Graduate School of Public Health and Health Policy, New York, NY
Levi Waldron950 wrote:

Hi Lluís - there are different ways you could do this, but if you think of the five rows you showed of as five different "biological units", it might make sense to keep them separate in the colData as shown. The main difference from the MAE perspective will be how you interact with the object with subsetting by column and reshaping through wideFormat() etc. If you collapse rows like you suggested, then MAE management functions like mergeReplicates() and duplicated() would treat those five measurements as duplicates. Why do you want to collapse those rows? I would be more inclined to keep them separate as you showed, but maybe I don't understand your motivation for having one row per patient in the colData.

Hi Levi, I didn't explain myself well, sorry.

I have some samples linked to a location (biopsies from 5 regions) and some that aren't (stools) [or that they are are always from the same region]. I have two essays for the biopsies (RNA-seq and 16S-seq) and one assay for the stools (16S-seq).

My main goal is to know the relationship between assays. However, the regions of the biopsies differ on how they behave, so the relationship between assays could be different depending on the region of the biopsies.  At the same time, it is interesting to see if there is a common relationship between patients in the relationship between biopsies and stools (RNA-seq to 16S-seq, 16S-seq to 16S-seq or between all the assays). I was considering to have just one row per patient in order to be able to see these common relationship between assays.

I hope I have explained myself a bit better. Many thanks

ADD REPLYlink written 21 months ago by Lluís Revilla Sancho500
Answer: How to create the colData for a complex experiment
0
21 months ago by
Levi Waldron950
CUNY Graduate School of Public Health and Health Policy, New York, NY
Levi Waldron950 wrote:

I think I understand better now. I guess the biopsies would be labeled by intestinal location, so they are not exchangeable (for example location might be labeled stool, rectum, sigmoid, descending, transverse, ascending, caecum).  So I see several potential ways to set up:

1. separate rows in the MultiAssayExperiment colData for each site, with a column specifying location.

2. one row per patient in the colData, with sampling location as per-experiment colData variable.

3. one row per patient in the colData, with each biopsy site as a different ExperimentList element (like a different assay, with assay names reflecting body sites).

I lean towards option 3, which I suspect will allow the simplest syntax for calculating simple correlations. For setting up regressions like 16S ~ RNA-seq + location, option 1 might be simplest. For simple correlations, you might use the assays() extractor to give a list of matrices to calculate correlations on with cor(). For regressions, I imagine using wideFormat() to integrate the assays and colData column for location into a single DataFrame. But if I were the data analyst here, I would probably start with 3 and see if something about it ends up being annoying, and if so think about doing it differently :).