Question

How to call "DBA_CONDITION" for mutiple conditions and How to control GLM program

0

Entering edit mode

arcucamila • 0

@arcucamila-14873

Last seen 2.8 years ago

Argentina

Hi everybody!

I'm starting to introduce myself into the world of differential affinity analysis, but I have a series of problems (please, forgive me my english):

I'm interested in doing a differential afinity analysis for multiple samples of a H3K27ac ChIP cancer's cells. I have many cell lines and also each sample was taken from different labs, so samples from a particular cell line are no replicates extrictly and I don't have a real replicate for each sample (let's say some replicates were not good). Skiping this for a moment , my fist question is to identify diferencial activity in peaks between diferent cells subtype, such as: Luminal, Basal A, Basal B. So, going on, I think it could be a possiblilty to make a table, as the one I'm showing below, where I use this cells subtype as my condition so then I will only have one sample of each condition with many "replicates" that are all the true samples that fit on that condition. I don't have a clear decision of this. Maybe I could mix the labs and use the labs as replicate for each cell line. By the way, I'm interested on the peaks, not too much on the cells lines.

Well, I know that this may sound a little weird, but follow me couse my problem appears later.

So this is the table I made:

SampleID	Tissue	Factor	Condition	bamReads	ControlID	bamControl	Peaks	PeakCaller
BT549-Schor	BT549	H27K27Ac	BaB	/home/bam/BT549_Young	BT549-Young-c	/home/bamC/BT549_Young	/home/bed/BT549_Young	bed
BT549-Young	BT549	H27K27Ac	BaB	/home/bam/BT549_Young	BT549-Young-c	/home/bamC/BT549_Young	/home/bed/BT549_Young	bed
HCC1569	HCC1569	H27K27Ac	BaA	/home/bam/HCC1569	HCC1569-c	/home/bamC/HCC1569	/home/bed/HCC1569	bed
HCC1569-Schor	HCC1569	H27K27Ac	BaA	/home/bam/HCC1569	HCC1569-c	/home/bamC/HCC1569	/home/bed/HCC1569	bed
MDAMB231-Arc	MDAMB231	H27K27Ac	BaB	/home/bam/MDAMB231_Hardy	MDAMB231-Hardy.merged-c	/home/bamC/MDAMB231_Hardy	/home/bed/MDAMB231_Hardy	bed
MDAMB231-Hardy.merged	MDAMB231	H27K27Ac	BaB	/home/bam/MDAMB231_Hardy	MDAMB231-Hardy.merged-c	/home/bamC/MDAMB231_Hardy	/home/bed/MDAMB231_Hardy	bed
MDAMB468-Arc	MDAMB468	H27K27Ac	BaA	/home/bam/MDAMB468_Young	MDAMB468-Young-c	/home/bamC/MDAMB468_Young	/home/bed/MDAMB468_Young	bed
MDAMB468-Young	MDAMB468	H27K27Ac	BaA	/home/bam/MDAMB468_Young	MDAMB468-Young-c	/home/bamC/MDAMB468_Young	/home/bed/MDAMB468_Young	bed
MCF7-Hardy	MCF7	H27K27Ac	Lu	/home/bam/MCF7_Hardy	MCF7-Hardy-c	/home/bamC/MCF7_Hardy	/home/bed/MCF7_Hardy
MCF7-Schor	MCF7	H27K27Ac	Lu	/home/bam/MCF7_Schor	MCF7-Schor-c	/home/bamC/MCF7_Schor	/home/bed/MCF7_Schor

And this is the script I made until now (note that I have used a short table just to taste the readablility of it):

> GeneClusterDBA<- dba(sampleSheet="lines.name-type7.csv")
MDAMB468-Young MDAMB468 H27K27Ac BaA NA bed
HCC1569 HCC1569 H27K27Ac BaA NA bed
MDAMB231-Hardy.merged MDAMB231 H27K27Ac BaB NA bed
BT549-Young BT549 H27K27Ac BaB NA bed

....
> GeneClusterDBA

4 Samples, 22384 sites in matrix (58575 total):
                     ID   Tissue   Factor Condition Caller Intervals
1        MDAMB468-Young MDAMB468 H27K27Ac       BaA    bed     34448
2               HCC1569 HCC1569 H27K27Ac       BaA    bed     32683
3 MDAMB231-Hardy.merged MDAMB231 H27K27Ac       BaB    bed      3091
4           BT549-Young    BT549 H27K27Ac       BaB    bed     30053

....
> GeneClusterDBA<-dba.count(GeneClusterDBA)
> GeneClusterDBA
4 Samples, 22384 sites in matrix:
                     ID   Tissue   Factor Condition Caller Intervals
1        MDAMB468-Young MDAMB468 H27K27Ac       BaA counts     22384
2               HCC1569 HCC1569 H27K27Ac       BaA counts     22384
3 MDAMB231-Hardy.merged MDAMB231 H27K27Ac       BaB counts     22384
4           BT549-Young    BT549 H27K27Ac       BaB counts     22384
FRiP
1 0.37
2 0.44
3 0.09
4 0.43

.....

First, I don't know if the NA's that appears when the program reads de .csv are normal..Maybe there is something the program can not read.

Second, I also don't know if I should have named the column of the conditions "DBA_CONDITION" intead of "condition" so I could have called dba,contrats like this with out get the warning of: "No contrast have been made.." or something like that..

How Can I use "DBA_CONDITION'?

GeneClusterDBA<-dba.contrast(GeneClusterDBA, categories=DBA_CONDITION, block=...)

The three dots besides "block=" means that I think I should use a blocking factor but I don't understand very well how to use it. If I am interested to compare Lu with BaB and BaA, then BaB with Lu and BaA, and then BaA with Lu and BaB, Do I have to do 3 analysis? First blocking Lu, the BaB and finally BaA? or maybe Can I do it in the same analysis? Can I made all the possible comparisons in the same analysis?

Also using "block" I don't know If I have to specify any particular column name or somthing like that. How does the program know that I want to block this o that factor? Should I use a mask? Is that the only possibility?

Should I also have to specify a column of replicates using as replicates the first possiblity that I said before(use one sample for each condition)? Will the program understands them despite the fact that the Tissue column have diferent names but many of the represent the same condition..?

Also I want to know from where can i read more about edgeR and DESeq2? I want to understand them more and maybe get how to change parameters of them.

Sorry for all my basics questions, I'm really lost in this stuff.

I will appreciate very much your help

Thank you in advance

Camila

cancer deseq2 edger CONDITIONS DBA • 1.5k views

ADD COMMENT • link 8.0 years ago arcucamila • 0