Question

How do you create a targets dataframe in edgeR?

0

Entering edit mode

Terra • 0

@terra-24154

Last seen 13 months ago

Australia

I am running differential expression on paired cancer samples from women who are healthy or obese.

I'm following '3.5 Comparisons both between and within subjects' and it references a targets dataframe.

Patient   Disease    Tissue
1         Healthy    Normal
1         Healthy    Cancer
2         Healthy    Normal
2         Healthy    Cancer
3         Obese      Normal
3         Obese      Cancer
4         Obese      Normal
4         Obese      Cancer

Is the targets dataframe different to the y$samples dataframe which contains the lib.size and norm.factors?

How do you create the targets dataframe so that it the correct information reflects the file that is read in for that patient/sample?

edgeR RNA-SEQ • 1.3k views

ADD COMMENT • link updated 5.4 years ago by James W. MacDonald 68k • written 5.4 years ago by Terra • 0

score 3 · Accepted Answer · 2020-09-22

The targets data.frame in that example is different from the samples item of the DGEList, but it doesn't have to be. You could put everything you want in the samples data.frame if you want, and if you want to be able to easily fit models with a subset of the data it might be advantageous.

Your second question is hard to answer without resorting to a tautology. You create the targets data.frame by inserting the correct information, as pertains to your experiment. In other words, the rows of the targets data.frame should correspond to the rows of the samples data.frame, which corresponds to the columns of the counts.

That sounds really simple, but is something that has repeatedly tripped me up over my career. Which is why things like a DGEList or a SummarizedExperiment are so useful - so long as you pay close attention when you create them, ensuring that everything lines up correctly - they ensure that you don't end up making silly errors when subsetting data, or aligning annotations and phenotypes when outputting data.

But it's up to you to ensure that you create a correct targets data.frame to begin with. I in general make a targets data.frame first, where I ensure that the phenotypic data align correctly with the filenames of the data I plan to import (usually putting the file names into the targets data.frame itself), and then I read in the files using information from the targets data.frame. That way I minimize the chance that I get mis-matched data.