-
edgeR
包的安装
edgeR
包是基于
Bioconductor
平台发布的,所以安装不能直接
用
es()
命令从
CRAN
上来下载
?
安装
:
?
# try http:// if https://
URLs are not supported
>
sour
ce
(
)
<
/p>
>
biocLite
(
< br>
)
数据导入
由于
edgeR
对测序结果的下游分析是依赖
count
< br>计数来进行基因差异
表达分析的,在这里使用的是
fea
tureCounts
来进行统计
`
.bam
`
文件中
Map
的结果
?
count
结果如下
:
?
>
lib
rary
(
edgeR
)
>
mydata
<-
p>
(
,
header
=TRUE
,
quote
=
't'
,
skip
=
1
)
>
sampleNames
<-
c
(
,
,
,
,
,
< br>
)
>
names
(
mydata
)[
7
:
12
]
<-
sampleNames
>
head
(
mydata<
/p>
)
Gene
idChrStartEndStrandLengthCA_1CA_2CA_3CC_1CC_2CC_3<
/p>
1
gene1314NW_13942
1.1
12571745
+
48900
0000
2
gene1315NW_
139421.1
21153452
+
1338000000
3
gene1
316NW_139421.1
38564680
+
825000000
4
gene1317NW_139421.1
48665435
-
570000000
5
gene1318NW_139421.1
60666836
-
771000000
6<
/p>
gene1319NW_139421.1
72949483
p>
+
2190000000
?
在这里我们只是需要
Geneid
和后
6
列的样本的
count
信息来组成矩
阵,所以要处理下
p>
>
countMatrix
<-
(
mydata
[
7
:
12
])
>
rownames
(
countMatrix
)
<-<
/p>
mydata
$$
Geneid
>
head
(
countMatrix
)
CA_1CA_2CA_3CC_1CC_2CC_3
gene1314
000000
gene1315
000000
gene1316
000000
gene1317
000000
gene1318
000000
gene1319
000000
*
要导入的矩阵由
< br>3v3
样本组成
(
三组生物学重
复
)
创建
DEGlist
>
group
<-
factor
(
c
(
,
,
,
,
p>
,
))
>
y
<-
DGEList
(
counts
=
countMatrix
,
gr
oup
=
group
)
>
y
Anobjectofclass
$$
counts
CA_1CA_2CA_3CC_1CC_2CC_3
gene1314
000000
gene1315
000000
gene1316
000000
gene1317
000000
gene1318
000000
14212
morerows...
$$
samples
s
CA_1CA_1
17885371
CA_2CA_2
18255461
CA_3CA_3
19030171
CC_1CC_1
18260421
CC_2CC_2
21244681
CC_3CC_3
20250631
过滤
?
过滤掉那些
count
结果都为<
/p>
0
的数据,这些没有表达的基因对结果的分
析没有用,过滤又两点好处
:
1
可以减少内存的压力
2
可以减少计算的压力
>
keep
<-
rowSu
ms
(
cpm
(
y
)
>
1
)
>=
2
><
/p>
y
<-
y
[
p>
keep
,,
=F
ALSE
]
>
y
Anobjectofclass
$$
counts
CA_1CA_2CA_3CC_1CC_2CC_3
gene1321
8194220
gene1322
231133
gene1323
2
gene1324
62
gene1325
322921587556
3877
morerows...
$$
samples
s
CA_1CA_1
17883621
CA_2CA_2
18253081
CA_3CA_3
19027961
CC_1CC_1
18258891
CC_2CC_2
21241551
CC_3CC_3
20247861
标准化处理
?
edgeR
采用的是
TMM
方法进行标准化处理
,
只有标准化处理后的数
据才
又可比性
>
y
<-
calcNormFac
tors
(
y
)
>
y
Anobjectofclass
$$
counts
CA_1CA_2CA_3CC_1CC_2CC_3
gene1321
8194220
gene1322
231133
gene1323
2
gene1324
62
gene1325
322921587556
3877
morerows...
$$
samples
s
CA_1CA_1
17883620.9553769
CA_2CA_
2
18253080.9052539
CA_3CA_3
19027960.9686232
p>
CC_1CC_1
18258890.9923455
CC_2CC_2
21241551.12751
78
CC_3CC_3
202478
61.0668754
设计矩阵
?
为什么要一个设计矩阵呢,
道理很简单,
有了一个设计矩阵才能够更好的
分组分析
>
subGroup
<-
factor
(
subs
tring
(
colnames
(
p>
countMatrix
),
4
,
4
))
>
design
<-
(
~
subGroup
+
group
)
< br>>
rownames
(
desi
gn
)
<-
colnames
(
y
)
>
design
(
Intercept
)
subGroup2subGroup3groupCC
< br>
CA_1
1000
CA_2
1100
CA_3
1010
CC_1
1001
CC_2
1101
CC_3
1011
< br>attr
(,
)
[
1
]
0112
attr
(,
)
attr
(,
)
$$
subGroup
[
1<
/p>
]
attr
(,
)
$$
group
[<
/p>
1
]
评估离散度
>
y
<-
estimateDisp
(
y
,
des
ign
,
robust
=TRUE
p>
)
>
y
$$
sion
[
1
]
0.0268
3622
#plot
>
plotBCV
(
y<
/p>
)
差异表达基因
>
fit
<-
glmQLFit
(<
/p>
y
,
design
,
robust
=TRUE
)
>
qlf
<-
p>
glmQLFTest
(
fit
)
>
topTags<
/p>
(
qlf
)
<
/p>
Coefficient
:
groupC
C
logFClogCPM
F
PValueFDR
gene7024
p>
-5.5156489.612809594.92326.431484e-442.49
6702e-40
gene6612
5.1302828.451143468.20601.557517e-393.023140e-36
p>
gene2743
4.3774925.
586773208.02683.488383e-264.513967e-23
gene12032
4.7343835.098148192
.93784.359649e-254.231040e-22
gene491
-2.73391010.412673190.98396.10
4188e-254.739291e-22
gene894
1
2.9971856.839106177.76146.332836e-244.
097345e-21
gene2611
-2.8469247.216173174.73321.099339e-236.096619e-2 1
gene6242
2.52912
59.897771169.26583.022914e-231.466869e-20
gene7252
3.7323156.1376701
88.20943.890569e-231.678132e-20
gene6125
2.8754236.569935160.31891.6
56083e-226.428914e-20
查看差异表达基因原始的
CMP
>
top
<-
rownames
(
topTags
(
qlf
))
>
cpm
(
y
)[
top
,]
CA_1CA_2CA_3CC_1CC_2CC_3
gene7024
1711.3830021405.8618991
480.12111533.1141837.1604029.62696
gene6612
17.55864912.10384826.585
753403.99298582.457961044.35046
gene2743
4.6823061.8155775.96823062.
9169487.26431114.34156
gene1
2032
1.7558652.4207702.71283265.6764647.
5987275.45617
gene491
2811.1397272059.4696692222.351938444.83381385.
38258253.68087
gene8941
23.99682024.81288824.415488131.35291244.6741 0225.90560
gene2611
245.821088310.463691225.16505243.0484326.3045539 .81123
gene6242
23
1.188880299.570228298.4115151348.298991343.6198821
91.93237
gene7252
9.36461313.3142325.42566492.71970108.55847181.9280
7
gene6125
23.4115
3214.52461729.841152145.70239160.75005185.16852
查看上调和下调基因的数目
>
summary
(
dt
<-
decideTestsDGE
(
qlf
))
[,
1
]
-1536
02793
1553
挑选出差异表达基因的名字
>
isDE
<-
l
(<
/p>
dt
)
>
p>
DEnames
<-
rownames
p>
(
y
)[
isDE
]
>
hea
d
(
DEnames
)
[
1
]
差异表达基因画图
>
plotSmear
(
qlf
,
=
DEnames
< br>)
>
abline
(
h
=
c
(
-1
,
1
< br>),
col
=
< br>)
DESeq2
包的安装
?
## try http:// if https://
URLs are not supported
>
sour
ce
(
)
安装
:
-
-
-
-
-
-
-
-
-
上一篇:科研用CCD基本参数概念
下一篇:牙体牙髓名解