关键词不能为空

当前您在: 主页 > 英语 >

edgeR-DESeq2分析RNA-seq差异表达

作者:高考题库网
来源:https://www.bjmy2z.cn/gaokao
2021-02-06 01:01
tags:

-

2021年2月6日发(作者:ambiguous)


edgeR


包的安装



edgeR


包是基于



Bioconductor



平台发布的,所以安装不能直接




es()



命令从


CRAN


上来下载



?



安装


:


?




# try http:// if https:// URLs are not supported


>


sour ce


(



)


< /p>


>


biocLite


(

< br>


)




数据导入



由于


edgeR


对测序结果的下游分析是依赖


count

< br>计数来进行基因差异


表达分析的,在这里使用的是


fea tureCounts



来进行统计




.bam




文件中


Map


的结果



?



count


结果如下


:


?



>


lib rary


(


edgeR


)



>


mydata


<-



(



,


header


=TRUE


,


quote


=


't'


,


skip


=


1


)



>


sampleNames


<-


c


(



,



,



,



,



,

< br>


)



>


names


(


mydata


)[


7


:


12


]


<-


sampleNames



>


head


(


mydata< /p>


)




Gene idChrStartEndStrandLengthCA_1CA_2CA_3CC_1CC_2CC_3< /p>



1


gene1314NW_13942 1.1


12571745


+


48900 0000



2


gene1315NW_ 139421.1


21153452


+


1338000000



3


gene1 316NW_139421.1


38564680


+

< p>
825000000



4


gene1317NW_139421.1


48665435


-


570000000



5


gene1318NW_139421.1


60666836


-


771000000



6< /p>


gene1319NW_139421.1


72949483


+


2190000000




?



在这里我们只是需要


Geneid


和后


6


列的样本的


count


信息来组成矩


阵,所以要处理下



>


countMatrix


<-

< p>


(


mydata


[


7


:


12


])



>


rownames


(


countMatrix


)


<-< /p>


mydata


$$


Geneid

< p>


>


head


(


countMatrix


)




CA_1CA_2CA_3CC_1CC_2CC_3



gene1314


000000



gene1315


000000



gene1316


000000



gene1317


000000



gene1318


000000



gene1319


000000




*


要导入的矩阵由

< br>3v3


样本组成


(


三组生物学重 复


)


创建


DEGlist



>


group


<-


factor


(


c


(



,



,



,



,



,



))

< p>


>


y


<-


DGEList


(


counts

=


countMatrix


,


gr oup


=


group


)



>


y




Anobjectofclass




$$


counts



CA_1CA_2CA_3CC_1CC_2CC_3



gene1314


000000



gene1315


000000



gene1316


000000



gene1317


000000



gene1318


000000



14212


morerows...




$$


samples



s



CA_1CA_1


17885371



CA_2CA_2


18255461



CA_3CA_3


19030171



CC_1CC_1


18260421



CC_2CC_2


21244681



CC_3CC_3


20250631




过滤



?



过滤掉那些


count


结果都为< /p>


0


的数据,这些没有表达的基因对结果的分


析没有用,过滤又两点好处


:


1


可以减少内存的压力


2


可以减少计算的压力




>


keep


<-


rowSu ms


(


cpm


(


y


)


>


1


)


>=


2



>< /p>


y


<-


y


[


keep


,,



=F ALSE


]



>


y




Anobjectofclass




$$


counts



CA_1CA_2CA_3CC_1CC_2CC_3



gene1321


8194220



gene1322


231133



gene1323


2



gene1324


62



gene1325


322921587556



3877


morerows...




$$


samples



s



CA_1CA_1


17883621



CA_2CA_2


18253081



CA_3CA_3


19027961



CC_1CC_1


18258891



CC_2CC_2


21241551



CC_3CC_3


20247861





标准化处理



?



edgeR


采用的是


TMM


方法进行标准化处理


,


只有标准化处理后的数 据才


又可比性



>


y


<-


calcNormFac tors


(


y


)



>


y




Anobjectofclass




$$


counts



CA_1CA_2CA_3CC_1CC_2CC_3



gene1321


8194220



gene1322


231133



gene1323


2



gene1324


62



gene1325


322921587556



3877


morerows...




$$


samples



s



CA_1CA_1


17883620.9553769



CA_2CA_ 2


18253080.9052539



CA_3CA_3


19027960.9686232



CC_1CC_1


18258890.9923455



CC_2CC_2


21241551.12751 78



CC_3CC_3


202478 61.0668754




设计矩阵



?



为什么要一个设计矩阵呢,


道理很简单,


有了一个设计矩阵才能够更好的


分组分析




>


subGroup


<-


factor


(


subs tring


(


colnames


(


countMatrix


),


4

< p>
,


4


))



>


design


<-



(


~


subGroup


+


group


)


< br>>


rownames


(


desi gn


)


<-


colnames


(


y


)



>


design




(


Intercept


)


subGroup2subGroup3groupCC

< br>


CA_1


1000



CA_2


1100



CA_3


1010



CC_1


1001



CC_2


1101



CC_3


1011


< br>attr


(,



)



[


1


]

0112



attr


(,



)



attr


(,



)


$$


subGroup



[


1< /p>


]





attr


(,



)


$$


group



[< /p>


1


]





评估离散度



>


y


<-


estimateDisp


(


y


,


des ign


,


robust


=TRUE


)



>


y


$$


sion




[


1


]


0.0268 3622




#plot

< p>
>


plotBCV


(


y< /p>


)





差异表达基因



>

fit


<-


glmQLFit


(< /p>


y


,


design


,


robust


=TRUE


)



>


qlf


<-


glmQLFTest


(


fit

< p>
)



>


topTags< /p>


(


qlf


)


< /p>


Coefficient


:


groupC C



logFClogCPM


F


PValueFDR



gene7024


-5.5156489.612809594.92326.431484e-442.49 6702e-40



gene6612


5.1302828.451143468.20601.557517e-393.023140e-36



gene2743


4.3774925. 586773208.02683.488383e-264.513967e-23



gene12032


4.7343835.098148192 .93784.359649e-254.231040e-22


gene491


-2.73391010.412673190.98396.10 4188e-254.739291e-22



gene894 1


2.9971856.839106177.76146.332836e-244. 097345e-21



gene2611

-2.8469247.216173174.73321.099339e-236.096619e-2 1



gene6242


2.52912 59.897771169.26583.022914e-231.466869e-20


gene7252


3.7323156.1376701 88.20943.890569e-231.678132e-20



gene6125


2.8754236.569935160.31891.6 56083e-226.428914e-20




查看差异表达基因原始的


CMP



>


top


<-


rownames


(


topTags


(


qlf


))



>


cpm


(


y


)[


top


,]



CA_1CA_2CA_3CC_1CC_2CC_3



gene7024


1711.3830021405.8618991 480.12111533.1141837.1604029.62696



gene6612


17.55864912.10384826.585 753403.99298582.457961044.35046



gene2743


4.6823061.8155775.96823062. 9169487.26431114.34156



gene1 2032


1.7558652.4207702.71283265.6764647. 5987275.45617



gene491


2811.1397272059.4696692222.351938444.83381385. 38258253.68087



gene8941

< p>
23.99682024.81288824.415488131.35291244.6741 0225.90560



gene2611

245.821088310.463691225.16505243.0484326.3045539 .81123



gene6242


23 1.188880299.570228298.4115151348.298991343.6198821 91.93237



gene7252


9.36461313.3142325.42566492.71970108.55847181.9280 7



gene6125


23.4115 3214.52461729.841152145.70239160.75005185.16852




查看上调和下调基因的数目



>


summary


(


dt


<-


decideTestsDGE


(


qlf


))



[,


1


]



-1536



02793



1553




挑选出差异表达基因的名字



>


isDE


<-


l


(< /p>


dt


)



>


DEnames


<-


rownames


(


y


)[


isDE


]



>


hea d


(


DEnames


)



[


1


]




差异表达基因画图



>


plotSmear


(


qlf


,



=


DEnames

< br>)



>


abline

< p>
(


h


=


c


(


-1


,


1

< br>),


col


=


< br>)


















DESeq2


包的安装



?




## try http:// if https:// URLs are not supported


>


sour ce


(



)



安装


:

-


-


-


-


-


-


-


-



本文更新与2021-02-06 01:01,由作者提供,不代表本网站立场,转载请注明出处:https://www.bjmy2z.cn/gaokao/604300.html

edgeR-DESeq2分析RNA-seq差异表达的相关文章