Help with defining groups
1
0
Entering edit mode
@phinney-brett-6324
Last seen 2 days ago
United States

Hi everyone, thanks for the great software! I was wondering if you can give a short code example of defining groups after I read in a DIA-NN report.parquet file. I assume there is an easy way to link the file names to their conditions and replicates ?

Cheers

Brett

limpa • 133 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 18 hours ago
WEHI, Melbourne, Australia

Defining sample groups is separate to reading in the DIA-NN data, because the sample conditions and covariates are not necessarily coded into the file names. It is something that is done as part of the R computing language rather than specifically by limpa. It is the same for any Bioconductor package that does differential expression or differential abundance analyses, like limma, edgeR or DESeq2, and you can see lots of examples in the case studies of those packages.

After you read in the DIA-NN peptide quants using limpa:

x <- readDIANN(...)

the sample names extracted from the DIA-NN report are available from colnames(x). If the sample names are informative, then you can often convert them easily into a condition factor.

In my work, I encourage my collaborators and my lab team to create an Excel spreadsheet giving all the sample annotation available. One column will give the sample file names while the other columns will give conditions and covariates. In the limma documentation, this is called the targets data.frame. The design matrix is then created using model.matrix() using the column information in the targets data.frame. My reasoning is that the biologists who prepared the samples must have such a spreadsheet, or its equivalent, as part of their sample preparation. They then create a unique sample ID when passing the sample onto the proteomics lab for mass spectrometry. The targets data.frame then links the sample IDs to the sample annotation.

The expression objects created by limpa optionally contain sample annotation in the targets component, which is a data.frame with one row for each sample.

ADD COMMENT

Login before adding your answer.

Traffic: 752 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6