Question: How to create the colData for a complex experiment
gravatar for Lluís R
11 weeks ago by
Lluís R310
European Union
Lluís R310 wrote:

I have a dataset where for the same patient and time we have extracted different samples from different locations.

What would be the best way to encode this in MAE?

Patient Time Region State of the region
A 0 A Healthy  
A 0 B Injured  
A 0 C Injured  
A 24 A Healthy  
A 24 B Injured  

The sampleMap is provides "many-to-one" mapping, but when those phenotypes are from each sample how should I store it? I have several variables related to the patient (sex, age of diagnosis, disease, C-reactive protein, treatment followed, antibiotics, ...) and some related to the sample mainly (date of extraction, region extracted, state of that region, Endoscopic Score of the region, type of sample, ...)

The only way I thought is using as ID a combination of Patient, Time and Location, something like paste(Patient, Time, Location, collapse = "_") of the samples but it would duplicate information about the patient in order to store correctly the information about the sample.

Is there any better solution?

ADD COMMENTlink modified 10 weeks ago by Levi Waldron390 • written 11 weeks ago by Lluís R310
gravatar for Levi Waldron
10 weeks ago by
Levi Waldron390
CUNY Graduate School of Public Health and Health Policy, New York, NY
Levi Waldron390 wrote:

Hi Lluís - there are different ways you could do this, but if you think of the five rows you showed of as five different "biological units", it might make sense to keep them separate in the colData as shown. The main difference from the MAE perspective will be how you interact with the object with subsetting by column and reshaping through wideFormat() etc. If you collapse rows like you suggested, then MAE management functions like mergeReplicates() and duplicated() would treat those five measurements as duplicates. Why do you want to collapse those rows? I would be more inclined to keep them separate as you showed, but maybe I don't understand your motivation for having one row per patient in the colData.

ADD COMMENTlink written 10 weeks ago by Levi Waldron390

Hi Levi, I didn't explain myself well, sorry. 

I have some samples linked to a location (biopsies from 5 regions) and some that aren't (stools) [or that they are are always from the same region]. I have two essays for the biopsies (RNA-seq and 16S-seq) and one assay for the stools (16S-seq).

My main goal is to know the relationship between assays. However, the regions of the biopsies differ on how they behave, so the relationship between assays could be different depending on the region of the biopsies.  At the same time, it is interesting to see if there is a common relationship between patients in the relationship between biopsies and stools (RNA-seq to 16S-seq, 16S-seq to 16S-seq or between all the assays). I was considering to have just one row per patient in order to be able to see these common relationship between assays.

I hope I have explained myself a bit better. Many thanks

ADD REPLYlink written 10 weeks ago by Lluís R310
gravatar for Levi Waldron
10 weeks ago by
Levi Waldron390
CUNY Graduate School of Public Health and Health Policy, New York, NY
Levi Waldron390 wrote:

I think I understand better now. I guess the biopsies would be labeled by intestinal location, so they are not exchangeable (for example location might be labeled stool, rectum, sigmoid, descending, transverse, ascending, caecum).  So I see several potential ways to set up:

1. separate rows in the MultiAssayExperiment colData for each site, with a column specifying location.

2. one row per patient in the colData, with sampling location as per-experiment colData variable.

3. one row per patient in the colData, with each biopsy site as a different `ExperimentList` element (like a different assay, with assay names reflecting body sites).

I lean towards option 3, which I suspect will allow the simplest syntax for calculating simple correlations. For setting up regressions like 16S ~ RNA-seq + location, option 1 might be simplest. For simple correlations, you might use the `assays()` extractor to give a list of matrices to calculate correlations on with `cor()`. For regressions, I imagine using `wideFormat()` to integrate the assays and colData column for location into a single DataFrame. But if I were the data analyst here, I would probably start with 3 and see if something about it ends up being annoying, and if so think about doing it differently :). 


ADD COMMENTlink written 10 weeks ago by Levi Waldron390
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 155 users visited in the last hour