-
Hammersley
Clifford
定理
Hammersley
–
Clifford theorem The
Hammersley
–
Clifford theorem
is
aresult in probability
theory,mathematical statistics and statistical
mechanics,that gives necessary and
sufficient conditions under which
apositive probability distribution can
be represented as aMarkov
network(also
known as aMarkov random field).It states that
aprobability
distribution that has
apositive mass or density satisfies one of the
Markov properties with respect to an
undirected graph Gif and only if it
is
aGibbs random field,that is,its density can be
factorized over the
cliques(or complete
subgraphs)of the graph.
The
relationship between Markov and Gibbs random
fields was initiated by
Roland
Dobrushin[1]and Frank Spitzer[2]in the context of
statistical
theorem is named after
John Hammersley and Peter Clifford
who
proved the equivalence in an unpublished paper in
1971.[3][4]Simpler
proofs using the
inclusion-exclusion principle were given
independently
by Geoffrey
Grimmett,[5]Preston[6]and Sherman[7]in 1973,with
afurther
proof by Julian Besag in
1974.[8]
Notes
p>
^Dobrushin,P.L.(1968),
Conditio
nal Probabilities and Conditions of Its
Regularity
Probability and its
Applications 13(2)
:
197
–
224,doi
:
10
.1137/1113026,Spitzer,Frank(1971),
Ensem
bles
:
142
–
< br> 154,doi
:
10.2307/2317621,
JSTOR 2317621,Hammersley,J.M.
;
Clifford,P.(1971),Markov
fields on
finite graphs and
lattices,Clifford,P.(1990),
fields in st
atistics
;
Welsh,D.J.A.,Disord
er in
Physical
Systems
:
A Volume in Honour
of John sley,Oxford
University
Press,pp.19
–
32,ISBN ,MR
1064553,retrieved 2009-
05-04^Grimmett,G
.R.(1973),
the London Mathematical
Society 5(1)
:
81
–<
/p>
84,doi
:
10.1112/bl
ms/5.1.81,MR
0329039^Preston,C.J.(1973),
states and
Markov random fields
:
242
–
261,doi
:
10.2307/1426035,JSTO
R 1426035,MR
1426035,Sherman,S.(1973),
fields
:
92
< br>–
103,doi
:
10.1007/BF
02761538,MR
0321185^Besag,J.(1974),
statistical
analysis of lattice systems
Statistical
B(Methodological)36(2)
:
192
p>
–
236,MR
2984812
Further reading Bilmes,Jeff(Spring 2006),Handout <
/p>
2
:
Hammersley
–
Clifford,course notes from
University of Washington
tt,Geoffrey,Probability on Gr
aphs,Chapter 7,Helge,The
Hammersley
–
Clifford Theorem and its Impact on Modern
Statistics,probability-related article
is can help Wikipedia
by expanding
it.
Retrieved
from
–
Clifford_theorem
encyclopediaThe first
afternoon of the memorial session for Julian Besag
in Bristol was an intense and at times
emotional moment,where friends
and
colleagues of Julian shared memories and
collection of
tributes showed how much
of alarger-than-life character he was,from his
long-termed and wide-ranged impact on
statistics to his very high
expectations,both for himself and for
others,leading to atotal and
uncompromising research ethics,to his
passion for[extreme]sports and
outdoors.(The stories during and after
diner were of amore personal
nature,but
at least as much
enjoyable
!
)The talks on the
second day
showed how much and how
deeply Julian had contributed to spatial
statistics and agricultural
experiments,to pseudo-likelihood,to Markov
random fields and image analysis,and to
MCMC methodology and practice.I
hope
Idid not botch too much my presentation on the
history of
MCMC,while Ifound reading
through the 1974,1986 and 1993 Read Papers and
their discussions an immensely
rewarding experiment(I wish Ihad done
prior to completing our Statistical
Science paper,but it was bound to be
incomplete by
nature
!
).Some interesting
links made by the audience were
the
prior publication of proofs of the Hammersley-
Clifford theorem in
1973(by
Grimmet,Preston,and Steward,respectively),as well
as the
proposal of aGibbs sampler by
Brian Ripley as early as 1977(even though
Hastings did use Gibbs steps in one of
his examples).Christophe Andrieu
also
pointed out to me avery early Monte Carlo review
by John Halton in
the 1970 SIAM
Rewiew,review that Iwill read(and commment)as soon
as
l,I am quite glad Icould take part
in this memorial and
Iam grateful to
both Peters for organising it as afitting tribute
to
Chain Monte Carlo(MCMC)methods are
currently avery active
field of
methods are sampling methods,based on Markov
Chains which
are ergodic with respect to the target probability
principle of adaptive methods is to
optimize on the fly some
design
parameters of the algorithm with respect to agiven
criterion
reflecting the sampler's
performance(opti mize the acceptance
rate,optimize an importance sampling
function,etc…).A postdoctoral
position
is opened to work on the numerical analysis of
adaptive MCMC
methods
:
convergence,numerical efficiency,development
and analysis of
new algorithms.A
particular emphasis will be given to applications
in
statistics and molecular
dynamics.(Detailed description)Position funded
by the French National Research
Agency(ANR)through the 2009-2012 project
position will benefit from an
interdisciplinary
environment involving
numerical analysts,statisticians and
probabilists,and of strong interactions
between the partners of the
project
ANR-08-BLAN-021 In the most recent issue of
Statistical
Science,the special topic
is
Quandunciacentennial
and
Wing Wong on the emergence of MCMC Bayesian
computation in the
1980′s,This survey
is more focused and more informative than our
global
history(also to appear in Stati
stical Science).In particular,it
provides the authors'analysis as to why
MCMC was delayed by ten years or
so(or
even more when considering that aGibbs sampler as
asimulation tool
appears in both
Hastings'(1970)and Besag's(1974)papers).They
dismiss[our]concerns about computing
power(I was running Monte Carlo
simulations on my Apple IIe by 1986 and
asingle mean square error curve
evaluation for aJames-Stein type
estimator would then take close to
aweekend
!
)and
Markov innumeracy,rather attributing the
reluctance to
alack of confidence into
the perspective remains debatable
as,apart from Tony O'Hagan who was then
fighting again Monte Carlo
methods as
being un-Bayesian(1987,JRSS D),I do not remember
any negative
attitude at the time about
simulation and the immediate spread of the
MCMC methods from Alan Gelfand's and
Adrian Smith's presentations of
their
1990 paper shows on the opposite that the Bayesian
community was
ready for the
move.
Another interesting
point made in this historical survey is that
Metropolis'and other Markov chain
methods were first presented outside
simulation sections of books like
Hammersley and
Handscomb(1964),Rubinstein(1981)and
Ripley(1987),perpetuating the
impression that such methods were
mostly optimisation or niche specific
is also why Besag's earlier works(not mentioned in
this
survey)did not get wider
recognition,until ing Iwas not
aware is the appearance of
iterative adap tive importance
sampling(tion Monte Carlo)in the
Bayesian literature of the
1980′s,with
proposals from Herman va
n Dijk,Adrian
Smith,and
appendix about Smith et
al.(1985),the 1987 special issue of JRSS D,and
the computation contents of Valencia
3(that Isadly missed for being in
the
Army
!
)is also quite
informative about the perception of
computational Bayesian statistics at
this time.
A missing
connection in this survey is Gilles Celeux and
Jean Diebolt's
stochastic EM(or SEM).As
early as 1981,with Michel Broniatowski,they
proposed asimulated version of EM for
mixtures where the latent variable
zwas
simulated from its conditional distribution rather
than replaced
with its this was the
first half of the Gibbs sampler for
mixtures we completed with Jean Diebolt
about ten years later.(Also
found in
Gelman and King,1990.)These authors did not get
much
recognition from the
community,though,as they focused almost
exclusively
on mixtures,used simulation
to produce arandomness that would escape the
local mode attraction,rather than
targeting the posterior
distribution,and did not analyse the
Markovian nature of their algorithm
until later with the simulated
annealing EM algorithm.
Shar
e
:
Share
概率图模型分为有向和无向的模型。有向的概率图模型主要包括贝叶斯网络和隐马
尔
可夫模型
,
无向的概率图模型则主要包括马尔可夫随机场模型和
条件随机场模
型。
2001
年,卡耐基
.
梅隆大学的
Lafferty
教授
(John
Lafferty
,
Andrew
McCallum
,
Fernando Pereira)
p>
等针对序列数据处理提出了
CRF
模型
p>
(Conditional Random
Fields
Probabilistic Models for Segmenting and Labeling
Sequence Data)
。
这种模型直接对后验概率建模
,很好地解决了
MRF
模型利用多特征时需要复杂的似
然分布建模以及不能利用观察图像中上下文信息的问题。
Kumar
p>
博士在
2003
年将
CRF
模型扩展到
2-
维格型结构,
开始将其引入到图像分析领域,吸引了学术界的高
度关注。
<
/p>
对给定观察图像,估计对应的标记图像
y
观察图像,
x
未知的标记图像
1.
如果直接对后验概率建模
(
即考虑公式中的第一项
)
,可以得到判别的
(Discriminative)
概率框架。特别地,如果后验概率直
接通过
Gibbs
分布建模,
(x,y
)
称为一个
CRF
,得到的模型称为判
别的
CRF
模型。
2.
通过对
(x,y)
的联合建模
(
即考虑公式中的第二项
)
,可以得到
联合的概率框架
?
。特别地,如果考虑双随机
< br>场
(x,y)
的马尔可夫性,即公式的第二项为
Gibbs
分布,那么
(x,y)
被称为一个双
MRF(Pairwise MRF,PMRF)[9]
。
p>
3.
后验概率通过公式所示的
p(x)
p>
和
p(y|x)
建模,其
< br>中
p(y|x)
为生成观察图像的模型,因此这种框架称
为生成的
(Generative)
概率框
架。特别地,如果先验
p(x)
服从
Gibbs
分布,
x
称为一个
MRF[12]
,得到的模型称
为生成的
p>
MRF
模型。
--
【面向图像标记的随机场模型研究】运用
Hammersley-
Clifford
定理,标记场的后验概率服从
Gibbs<
/p>
分布
其中,
z
(y,
θ
)
为归一化函数,
φ
c
为定义在基团
c
p>
上的带有参数
θ
的势函数。
CRF
模型中一个关键的问题是定义合适的势函数。
因此发展不同形式的扩展
CRF
模型是
当前
CRF
模型的一个主要研究方向。具体的技
术途径包括:一是扩展势函数。通过引进更复杂的势函数,更多地利用多特征和上
下文信息;二是扩展模型结构。通过引入更复杂的模型结构,可以利用更高层次、
更多
形式的上下文信息。扩展势函数
(1)
对数回归
(Logistic
Regression,LR)
(2)
支持向量机
(Support
Vector Machine,SVM)
(3)
核函数
(4)Boost
(5)
Probit
扩展模型结构
(1)
动态
CRF
模型
动态
CRF(Dynamic CRF,DCRF)
模型用于对给定的观测数据,同时进行
多个标记任
务,以此充分利用不同类型标记之间的相关性。
(2
)
隐
CRF
模型
CRF
模型的另一类扩展图结构是在观察图像和标记图像之
间引入过渡的隐变量层
h
,得到的模型称为隐
< br>CRF(Hidden Conditional Random Field,HCRF)
。隐含层的
引入使
CRF
模
型具有更丰富的表达能力,可以对一些子结构进行建模。隐变量可以
是抽象的,也可以具
有明确的物理意义。
(3)
树结构
CR
F
模型
CRF
模型的标准图结构中,标记之间的相关性通过格型结构的边
(edge)
表示。
(4)
混合
CRF
模型
假设
p>
有限历史以及平稳。
有限历史指的是和有限的历史相关
平稳指的是两个状态的关系和时间无关。
给定观察序列
{O1,O2,O3.}
,每个观察
Oi
对应隐状态序列
{S1,}
。
HMM
解决三个问题:
1.
计算
观察序列的概率
利用
forward
算
法即可
2.
跟定观察序列,计算出对应概率最大的隐状态序列<
/p>
Viterbi
算法,提供
O(N*N*
T)
的复杂度
3.
给定观察序列以及状态集合,估计参数
A(
状态转移矩阵
)B(
发射概率
)
EM
算法,
forward-back
word
算法
问题
2
< br>类似序列标注的问题
Pr(O|S)=p(O1|S1)*p(O2|S2).p
(On|Sn)
P(O)=p(O1|start)*p(O2|O1).p(On|O
n-1)
P(S|O)=argmaxPr(O|S)P(O)=argmax(.p(
Oi|Si)*p(Si|Si-1).)
ME
:
分类器,将给定的观察值
O
进行分类。
ME
需要从
O
中提取出相关的
Feature
以及
计算对应
w
。
注意:主要解决的是观察
值
O
分类问题,如文本分类
d
那个
P(C=c|O)
MEMM
:
序列标注问题,综合<
/p>
ME
和
HMM
,
提供更多的
Featrue
,优于
HM
M
考虑到
t
时间附近观察以及状态对其影响。