dung-漏磁
Categorized |
生物信息学
Tags |
NCBI, refseq,
格式
NCBI
RefSeq
命名格式的详细说明
Posted on 23
四月
2009 by
柳城
,阅读
4,194
NCBI RefSeq
(
美国国立生物技术信息中心参考序列库
)
< br>是目前世界上最具有权威性的序
列数据库。
NCBI
p>
的参考序列计划(
RefSeq
)将为中心
法则中自然存在的分子,从染色体到
mRNA
到蛋白提供参考序
列标准。
RefSeq
标准为人类基因组的功能注解提供一个基
础。
它们
为突变分析,
基因表达研究,
和多态发现提供一个稳定的参考点。
由于一些序列来自异常连<
/p>
接产生的转录物或由计算机推演产生的不正确内含子
-
外显子剪切,因此该数据库所收集的
参考序列一直在不断地被修改中,尽管
如此,
NCBI
RefSeq
仍是目
前最可信赖的人类基因
mRNA
序列数据库。
< br>
RefSeq
一般的命名格
式
:
前缀为两个字母,
然后下横线
p>
(
'_'
)
。
p>
区别于其它的
GenBank
的命名
格式。
Accession Molecule Method @ Note
说明
AC_123456 Genomic Mixed Alternate
complete genomic molecule. This prefix is used for
records
that
are
provided
to
reflect
an
alternate
assembly
or
annotation.
Primarily
used
for
viral,
prokaryotic records.
基因组序列,主要是病毒、原核生物。
AP_123456
Protein
Mixed
Protein
products;
alternate
protein
record.
This
prefix
is
used
for
records
that
are
provided
to
reflect
an
alternate
assembly
or
annotation.
The
AP_
prefix
was
originally designated
for bacterial proteins but this usage was changed.
蛋白序列,
AP_
< br>原本只
用于细菌的蛋白。
NC_123456
Genomic
Mixed
Complete
genomic
molecules
including
genomes,
chromosomes,
organelles,
plasmids.
全基因组序列,包括细胞器的、质粒等
NG_123456 Genomic Mixed
Incomplete genomic region; supplied to support the
NCBI genome
annotation
pipeline.
Represents
either
non-transcribed
pseudogenes,
or
larger
regions
representing a gene cluster that is
difficult to annotate via automatic methods.
不完整的基因
组序列,
NM_123456
NM_123456789
mRNA
Mixed
Transcript
products;
mature
messenger RNA
(mRNA)
transcripts.
成熟的
mRNA
NP_123456
NP_123456789
Protein Mixed Protein products; primarily full-
length precursor products but may
include some partial proteins and
mature peptide products.
全长
蛋白序列。但也有可能包括
非全长的蛋白或成熟的多肽序列。
NR_123456
RNA
Mixed
Non-coding
transcripts
including
structural
RNAs,
transcribed
pseudogenes, and
others.
不编码的
RNA<
/p>
,假基因或其它
NT_123456
Genomic
Automated
Intermediate
genomic
assemblies
of
BAC
and/or
Whole
Genome Shotgun
sequence data.
BAC
法或鸟枪法得到的基因组序列
NW_123456
NW_123456789
Genomic
Automated
Intermediate
genomic
assemblies
of
BAC
or
Whole
Genome Shotgun
sequence data.
BAC
法或鸟枪法得到的基因组序列
NZ_ABCD12345678 Genomic
Automated A collection of whole genome shotgun
sequence data
for a project. Accessions are not
tracked between releases. The first four
characters following the
underscore
(e.g. 'ABCD') identifies a genome project.
'ABCD'
代表的是具体的基因组
计划
XM_123456
XM_123456789
mRNA
Automated
Transcript
products;
model
mRNA
provided
by
a
genome
annotation process; sequence
corresponds to the genomic contig.
转录序列
XP_123456
XP_123456789
Protein
Automated
Protein
products;
model
proteins
provided
by
a
genome
annotation process;
sequence corresponds to the genomic contig.
蛋白序列
XR_123456
RNA
Automated
Transcript
products;
model
non-coding
transcripts
provided
by
a
genome annotation process; sequence
corresponds to the genomic contig.
不编码的转录序列,
YP_123456
YP_123456789
Protein
Mixed
Protein
products;
no
corresponding
transcript
record
provided.
Primarily used for
bacterial, viral, and mitochondrial records.
蛋白序列,没有对应的转录序列。
用
于细菌、病毒和线粒体
ZP_12345678
Protein
Automated
Protein
products;
annotated
on
NZ_
accessions
(often
via
computational methods).
蛋白序列。来自对应的
NZ_
开头的核酸序列
。
NS_123456
Genomic
Automated
Genomic
records
that
represent
an
assembly
which
does
not
reflect
the
structure
of
a
real
biological
molecule.
The
assembly
may
represent
an
unordered
assembly
of
unplaced
scaffolds,
or
it
may represent
an
assembly
of DNA
sequences
generated
from a biological
sample that may not represent a single organism.
比较复杂
@ Method:
Mixed:
indicates
the
process
flow
includes
both
automated
processing
and
expert
review
for
some of the records;
curation analysis may be provided either by NCBI
staff or collaborators.
由专
家手动
检查过的
Automated: indicates
records that are not individually reviewed;
updates are released in bulk for
a
genome.
自动注释
本文详细出处参考:
/379/