-
Nomenclature for
the description of
sequence
variations
J.T.
den
Dunnen,
S.E.
Antonarakis:
Hum
Genet
109(1):
121-124,
2001
Reproduced with kind
permission from Prof. S. E.
Antonarakis
(last modified March 7,
2001)
Questions and comments regarding
nomenclature should be
directed to
Professor Stylianos Antonarakis
(
rakis@
)
or
Dr.
Johan
T.
den Dunnen (
ddunnen@
). This
page can also be found
at the
HGVS
site.
Contents
?
?
Introduction
Recommendations
1
/ 1
文档可自由编辑
o
o
o
o
?
General
DNA-
level
RNA-level
protein-level
Codons and encoded amino
acids
o
o
genetic
code
amino
acid
descriptions
(one
/
three
letter
code)
Introduction
Recently, a nomenclature
system has been suggested for the
description
of
changes
(mutations
and
polymorphisms)
in
DNA
and
protein sequences
[
Antonarakis, S.E. and the Nomenclature
Working Group
(1998)
Recommendations for a
nomenclature
system for human gene
mutations
.
. 11: 1-3]. These
nomenclature recommendations have now
been largely accepted
and stimulated
the uniform and unequivocal description of
sequence changes. However, current
rules do not yet cover all
types of
mutations, nor do they cover more complex
mutations.
This document lists the
existing recommendations and
summarizes
suggestions
for
the
description
of
additional,
more
1 /
1
文档可自由编辑
complex changes, (
shown in
italics
) based on a manuscript
published in Human Mutation
[
den Dunnen, JT and Antonarakis,
SE
(2000).
Mutation nomenclature extensions and
suggestions
to describe complex
mutations: a discussion.
. 15:
7-12] (copy in PDF format).
Discussions
regarding
the advantages
and
disadvantages
of
the
suggestions
are
necessary
in
order
to
continuously
improve
the
designation of sequence
changes. The consensus of the
discussions
will
be
posted
here
and
we
invite
investigators
to
communicate
with
us
regarding
these
suggestions.
Furthermore,
we invite
investigators to send us complicated cases not
covered yet, with a suggestion of how
to describe these (mail
to
ddunnen@ and
rakis@).
We
hope
these
pages
will be used as a
guide to describe any sequence change,
ultimately evolving into a uniformly
accepted reference for
mutation
nomenclature description.
General recommendations
(suggestions extending the current
recommendations are in
italtics)
1 /
1
文档可自由编辑
The term
is used to
prevent confusion
with the terms
and
, mutation
meaning
change
disease-causing
change
or higher in the
population
The basic
recommendation is to use
systematic
names
to
describe
each sequence variation. For this, variations are
described at the most basic level, i.e.
the DNA level, using
either a genomic
or a cDNA reference sequence. A genomic
reference
sequence
is
preferred
because
it
overcomes
difficult
cases, including multiple transcription
initiation sites
(promoters),
alternative
splicing,
the
use
of
different
poly-A
addition signals, multiple translation
initiation sites
(ATG-codons)
and
the
occurence
of
length
variations.
When,
like
in
most
cases,
the
entire
genomic
sequence
is
not
known,
a
cDNA
reference
sequence should be used instead.
?
sequence
variations are described in relation to a
reference sequence for which the
accession number from a
primary
sequence database (Genbank, EMBL, DDJB,
1 / 1
文档可自由编辑
SWISS-PROT) should be
mentioned in the
publication/database
submission (e.g. M18533)
?
tabular
listings of the sequence variations described
should contain columns for DNA, RNA and
protein and
clearly indicate whether
the changes were
experimentally
determined
or
only theoretically deduced
?
to
avoid
confusion
in
the
description
of
a
sequence
change,
preceed
the
description
with
a
letter
indicating
the
type
of reference sequence
used;
o
o
o
genomic
sequence (e.g. g.76A>T)
cDNA
sequence (e.g. c.76A>T)
mitochondrial
sequence (e.g.
m.76A>T)
(from
David
Fung,
Camperdown,
Australia)
o
o
?
RNA
sequence (e.g. r.76a>u)
protein
sequence (e.g. p.K76A)
to discrimintate between the different
levels (DNA, RNA
or protein),
descriptions are unique;
o
at DNA-level,
in capitals, starting with a number
refering to the first nucleotide
affected (e.g.
c.76A>T)
1 / 1
文档可自由编辑
o
at
RNA-level,
in
lower-case,
starting
with
a
number
refering to the first nucleotide
affected (e.g.
r.76a>u)
o
at
protein
level,
in
capitals,
starting
with
a
letter
referring
to
first
the
amino
acid
(one-letter
code)
affected (e.g.
p.T26P)
?
a range of affected residues is
indicated by a
(underscore)
separating
the
first
and
last
residue affected (e.g. 76_78delACT)
NOTE:
current
recommendations use the
(i.e.
76-78delACT)
?
for
deletions,
duplications
or
insertions
in
short
tandem
repeats, the most 3' nucleotide is
arbitrarily assigned
as the nucleotide
changed
?
two sequence variations in one allele
are listed between
brackets,
separated by a
(e.g. [76A>C +
83G>C])
NOTE:
current
recommendations use the
a separator
(i.e. [76A>C; 83G>C])
?
sequence
changes
in
different
alleles
(e.g.
for
recessive
diseases) are
listed between brackets, separated by a
1 / 1
文档可自由编辑
NOTE:
the current
recommendation is [76A>C + 87delG]
?
a unique
identifier should be assigned to each mutation.
The unique OMIM-identifier can be used,
otherwise
database curators should
assign unique identifiers
DNA level
?
nucleotides are
designated by the bases (in upper case);
A (adenine), C (cytosine), G (guanine)
and T (thymidine)
?
nucleotide
numbering;
o
nucleotide +1 is the A of the ATG-
translation
initiation
codon,
the
nucleotide
5'
to
+1
is
numbered
-1; there is no base 0
o
non-coding
regions;
?
the nucleotide 5' of the ATG-
translation
initiation codon is
-1
?
the nucleotide 3' of the translation
termination codon is *1
o
intronic
nucleotides;
1 /
1
文档可自由编辑
?
beginning of
the intron:
the number of
the
last
nucleotide
of
the
preceeding
exon,
a
plus
sign
and
the
position
in
the
intron,
e.g.
77+1G,
77+2T (when the exon
number is known, the
notation can also
be described as IVS1+1G,
IVS1+2T)
?
end of the
intron:
the number of the
first
nucleotide
of
the
following
exon,
a
minus
sign
and the position
upstream in the intron, e.g.
78-2A,
78-1G (when the exon number is known,
the
notation
can
also
be
described
as
IVS1-2A,
IVS1-2G)
o
for
deletions,
duplications
or
insertions
in
single
nucleotide (or amino
acid) stretches or tandem
repeats,
the
most
3'
copy
is
arbitrarily
assigned
to
have been changed (e.g.
ACTTTG
TG
CC to ACTTTGCC is
described as 7_8delTG)
Description of nucleotide
changes
?
substitutions
are designated
by a “>”
-character
o
76A>C denotes
that at nucleotide 76 a A is changed
to
a C
1 /
1
文档可自由编辑
o
88+1G>T
(alternatively IVS2+1G>T) denotes the G to
T
substitution
at
nucleotide
+1of
intron
2,
relative
to
the
cDNA
positioned
between
nucleotides
88
and
89
o
89-2A>C
(alternativelyIVS2-2A>C)
denotes
the
A
to
C
substitution
at
nucleotide
-2
of
intron
2,
relative
to
the
cDNA
positioned
between
nucleotides
88
and
89
NOTE:
polymorphic
variants
are sometimes
described
as
76A/G, but this is not recommened
!
?
deletions
are
designated
by
after
the
nucleotide(s)
flanking the
deletion site
o
76_78del (alternatively 76_78delACT)
denotes a ACT
deletion from nucleotides
76 to 78
o
82_83del (alternatively 82_83delTG)
denotes a TG
deletion
in
the
sequence
ACTTTG
TG
CC
(A
is
nucleotide
76) to ACTTTGCC
o
IVS2_IVS5del
(alternatives 88+?_923+? or EX3_5del)
denotes an exonic deletion starting at
an unknown
position
in
intron
2
(after
nucleotide
88)
and
ending
at
an
unknown
position
in
intron
5
(after
nucleotide
923)
1 / 1
文档可自由编辑
-
-
-
-
-
-
-
-
-
上一篇:金属材料英文词汇
下一篇:SGM常用缩写(1)