Question

deseq2 for TCGA tumor/normal paired sample from multiple patients

0

Entering edit mode

luren • 0

@luren-9141

Last seen 8.4 years ago

United States

Basically I want to run DEseq to analyze TCGA paired data. say in certain cancer type, there are 107 patients, 214 samples(paired). I used following code(pretty much default). By setting like this, I expect DEseq2 giving me a result using the pairing information as well as 107 patient as biological replicates, is this appropriate? I feel not confident because I use 107 different values for the factor-- patient.

library('DESeq2')
directory<-"tBRCA"
sampleFiles <-c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100","101","102","103","104","105","106","107" ....to "214")

#Above I use "1""2" to "214" as file name for 214 samples

samplePatient=rep(c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100","101","102","103","104","105","106","107"),2)

#Above I use "1" "2"... to "107" as patient name for 107 patients

sampleCondition<-rep(c("untreated","treated"),each=107)

sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition,patient=samplePatient)

ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory,design=~patient+condition)

colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("untreated","treated"))

dds<-DESeq(ddsHTSeq)

res<-results(dds)
res<-res[order(res$padj),]
write.table(res, file = 'myfile.txt',sep='\t')
plotMA(dds,ylim=c(-2,2),main="DESeq2")

deseq2 • 2.8k views

ADD COMMENT • link updated 2.8 years ago by Michael Love 41k • written 8.4 years ago by luren • 0

1

Entering edit mode

Just a tip to make your life easier. You could declare your sampleFiles and samplePatient variables like so:

sampleFiles <- as.character(1:214)
samplePatient <- as.character(rep(1:107, 2))

ADD REPLY • link 8.4 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Many thanks for your tip!!!!

ADD REPLY • link 8.4 years ago luren • 0

score 0 · Answer 1 · 2015-11-10

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 23 hours ago

United States

Yes that looks correct.

ADD COMMENT • link 8.4 years ago • updated 2.8 years ago Michael Love 41k

0

Entering edit mode

Thanks a lot for fast response! But this gives me more than 10k out of 20k genes with p adj 10^-6 and lower. I am using TCGA BRCA data(214 total, from 107 patient, paired, raw counts.) Is this normal?Do you have any suggestions to shrunk it?

ADD REPLY • link 8.4 years ago luren • 0

score 0 · Answer 2 · 2021-07-08

0

Entering edit mode

Neha • 0

@3fa075d2

Last seen 2.2 years ago

Finland

I am new in cancer dataset. I have readcount data from htseq for each patient normal and cancer stage. I have generated separate matrix for normal and cancer. How should I proceed to find expression data for further eQTL analysis.

In DesSEq2 I used read-count for normal patient separately and cancer separately or combined in one folder which is not possible due to same patient number.
I should use matrix for normal and cancer separately or combined matrix.

Kindly help me...how should I proceed.

ADD COMMENT • link 2.8 years ago Neha • 0

0

Entering edit mode

This is out of scope for support for DESeq2 software specifically. It sounds like you need general bioinformatic advice for pre-processing expression data for eQTL analysis. You might try posting a detailed question about your dataset and goals to https://www.biostars.org.

ADD REPLY • link 2.8 years ago Michael Love 41k