-
引用
NCBI
常见术语(缩略词)详解
3-D or 3D
Three-
dimensional.
Accession
number
An
Accession number
is a unique
identifier given to a sequence when it is
submitted to one of
the
DNA
repositories
(
GenBank
,
EMBL
,
DDBJ
). The initial
deposition of a sequence record is
referred to as version 1. If the
sequence is updated, the version number is
incremented, but
the
Accession number
will remain
constant.
allele
One
of
the
variant
forms
of
a
gene
at
a
particular
locus
on
a
chromosome.
Different
alleles
produce variation in
inherited characteristics such as hair color or
blood type. In an
individual,
one
form
of
the
allele
(the
dominant
one)
may
be
expressed more
than
another
form
(the
recessive
one).
When
―genes‖
are
considered simply
as
segments
of
a nucleotide
sequence,
allele
refers to each of the
possible alternative nucleotides at a specific
position in
the
sequence.
For
example,
a
CT
polymorphism
such
as
CCT[C/T]CCAT
would
have
two
alleles
: C and
T
.
API
Application
Programming
Interface.
An
API
is
a
set
of
routines
that
an
application
uses
to
request and carry out
lower-level services performed by a computer's
operating system. For
computers
running
a
graphical
user
interface,
an
API
manages
an
application's
windows,
icons, menus, and dialog
boxes.
ASN.1
Abstract Syntax
Notation 1
is an international standard data-representation
format used to
achieve interoperability
between computer platforms. It allows for the
reliable exchange of
data in terms of
structure and content by computer and software
systems of all types.
BAC
Bacterial
Artificial
Chromosome.
A
BAC
is
a
large
segment
of
DNA
(100,000
–
200,000
bp)
from
another species
cloned
into
bacteria.
Once the
foreign
DNA
has
been
cloned
into
the
host bacteria, many
copies of it can be made.
bit score
The
value S′ is derived from the raw alignment score S
in which the statistical properties of
the scoring system used have been taken
into account. By normalizing a raw score using the
formula:
a ―bit score‖ S′ is attained, which has
a standard set
of units, and where K
and
lambda
are
the
statistical parameters of the scoring system.
Because
bit scores
have been
normalized with
respect to the scoring
system, they can be used to compare alignment
scores from different
searches.
BLAST
Basic Local Alignment Search
T
ool (
Altschul et al., J Mol
Biol 215:403-410; 1990
). A sequence
comparison
algorithm
that is optimized
for speed and used to search sequence databases
for
optimal local alignments to a
query. See the
BLAST
chapter
(Chapter 15) or the
tutorial
or the
narrative
guide
to
BLAST
.
blastn
nucleotide
–
nucleotide
BLAST
.
blastn
takes
nucleotide
sequences
in
FASTA
format,
GenBankAccession numbers
, or
GI
numbers and compares them
against the
NCBINucleotide
databases
.
blastp
protein
–
protein
BLAST
.
blastp
takes
protein
sequences in
FASTA
format,
GenBankAccession
numbers
, or
GI
numbers and compares them
against the
NCBIProtein
databases
.
BLAT
A
DNA
/Protein sequence
analysis
program
to
quickly
find
sequences
of
95%
and
greater
similarity
of
length
40
bases
or
more.
It
may
miss
more
divergent
or
shorter
sequence
alignments.
BLA
T
on
proteins
finds sequences
of
80%
and
greater
similarity
of
length
20
amino acids or more.
BLA
T
is not
BLAST
. (See the
BLA
T web
page
.)
BLOB
Binary Large Object (or binary data
object).
BLOB
refers to a
large piece of data, such as a
bitmap.
A
BLOB
is characterized by
large field values, an unpredictable table size,
and data
that
are
formless
from the
perspective
of
a
program. It
is
also
a keyword
designating
the
BLOB
structure, which
contains information about a block of
data.
build
A
run
of
the
genome
assembly
and
annotation
process
of
the
set
of
products
generated
by
that
run.
CCAP
Cancer Chromosome Aberration Project.
CCAP
was designed to expedite
the definition and
detailed
characterization
of the
distinct
chromosomal
alterations
that
are
associated
with
malignant
transformation.
The
project
is
a
collaboration
among
the
NCI
,
the
NCBI
,
and
numerous research labs.
CD
Conserved
Domain
.
CD
refers to
a
domain
(a
distinct
functional
and/or
structural
unit
of
a
protein)
that
has
been
conserved
during
evolution.
During
evolution,
changes
at
specific
positions of an amino acid sequence in
the protein have occurred in a way that preserve
the
physico-chemical
properties
of
the
original
residues,
and
hence
the
structural
and/or
functional properties of that region of
the protein.
CDART
Conserved
Domain
Architecture
Retrieval
T
ool.
When
given
a
protein
query
sequence,
CDART
displays
the
functional
domains
that
make
up
the
protein
and
lists
proteins
with
similar
domain
architectures. The
functional
domains
for a
sequence are found by comparing
the
protein sequence to a database of conserved
domain
alignments,
CDD
using
RPS-
BLAST
.
CDD
Conserved
Domain
Database.
This
database
is
a
collection
of
sequence
alignments
and
profiles representing
protein
domains
conserved
during molecular evolution.
cDNA
complementary
DNA
.
A
DNA
sequence
obtained
by
reverse
transcription
of
a
messenger
p>
RNA
(
mRNA
) sequence.
CDS
coding region, coding sequence.
CDS
refers to the portion of
a genomic
DNA
sequence that
is
translated, from the start
codon
to the stop
codon
, inclusively, if
complete. A partial
CDS
lacks
part of the complete
CDS
(it may lack either or
both the start and stop
codons
). Successful
translation of a
CDS
results in the synthesis
of a protein.
CEPH
Centre
d'Etude du Polymorphism Humain
CGAP
Cancer
Genome
Anatomy
Project.
CGAP
is
an
interdisciplinary
program
to
identify
the
human genes expressed in different
cancerous states, based on
cDNA
< br>(
EST
) libraries, and to
determine the molecular profiles of
normal, precancerous, and malignant cells. The
project
is a collaboration
among the
NCI
, the
NCBI
, and numerous research
labs.
CGH
Comparative
Genomic
Hybidization.
CGH
is
a
fluorescent
molecular
cytogenetic technique
that
identifies chromosomal aberrations and maps these
changes to metaphase chromosomes.
CGH
can be used to generate a
map of
DNA
copy
number changes in tumor genomes.
CGH
is
based
on
quantitative
two-color
fluorescence
in
situ
hybridization
(
FISH
).
DNA
extracted
from
tumor cells is labeled in one color (e.g., green)
and mixed in a 1:1 ratio with
DNA
from
normal
cells, which is labeled in a different color
(e.g., red). The mixture is then applied to
normal
metaphase
chromosomes.
Portions
of
the
genome
that
are
equally
represented
in
normal
and
tumor
cells
will
appear
orange,
regions
that
are
deleted
in
the
tumor
sample
relative to the normal sample
will appear red, and regions that are present in
higher copy
number
in the
tumor
sample
(because
of
amplification)
will
appear
green.
Special
image
analysis tools are necessary to
quantitate the ratio of green-to-red fluorescence
to determine
whether a given region is
more highly represented in the normal or in the
tumor sample.
CGI
Common
Gateway
Interface.
A
mechanism
that
allows
a
W
eb
server
to
run
a
program
or
script
on the server and send the output to a Web
browser.
cluster
A group that is created based on
certain criteria. For example, a gene
cluster
may include a
set
of
genes
whose
similar expression
profiles
are
found to
be similar
according
to
certain
criteria, or a
cluster
may refer to a group
of clones that are related to each other by
homology
.
Cn3D
―See
in
3
-
D‖
is
a
structure
and
sequence
alignment
viewer
for
NCBI
databases.
It
allows
viewing
of
3-D
structures
and
sequence
–
structure
or
structur
e
–
structure
alignments.
Cn3D
can work as a helper
application to the browser or as a
client
–
server application
that
retrieves
structure
records
from
the
Molecular
Modeling
Database
(
MMDB
,
see
below)
directly
from
the
internet.
The
Cn3D
homepage
provides
access
to
information
on
how
to
install
the program, a tutorial to get started, and a
comprehensive help document.
codon
Sequence of
three nucleotides in
DNA
or
mRNA
that specifies a
particular amino acid during
protein
synthesis; also called a triplet. Of the 64
possible
codons
, 3 are stop
codons
, which do
not specify amino acids.
COGs
Clusters
of Orthologous
Groups (of proteins) were delineated by comparing
protein sequences
from completely
sequenced genomes. Each COG
consists of individual proteins or
groups of
paralogs
from at
least three lineages and thus corresponds to an
ancient conserved
domain
.
consensus sequence
The nucleotides or amino acids found
most commonly at each position in the sequences of
homologousDNAs
,
RNAs
, or
proteins.
contig
A
contiguous
segment
of the
genome made
by
joining
overlapping
clones
or
sequences.
A
clone
contig
consists
of
a
group
of
cloned
(copied)
pieces
of
DNA
representing
overlapping
regions
of
a
particular
chromosome.
A
sequence
contig
is
an extended sequence
created
by
merging primary sequences
that overlap. A
contig
map
shows the regions of a chromosome
where
contiguous
DNA
segments
overlap.
Contig
maps provide
the ability to study a complete
and
often large segment of the genome by examining a
series of overlapping clones, which
then provide an unbroken succession of
information about that region.
Coriell
Coriell
Institute of Aging Cell Repository
CPU
Central
Processing Unit. The
CPU
is
the computational and control unit of a
computer
, the
device that
interprets and executes instructions.
CSS
Cascading
Style Sheets.
CSS
specify the
formatting details that control the presentation
and
layout of
HTML
and
XML
elements.
CSS
can be used for
describing the formatting behavior
and
text decoration of simply structured
XML
documents but cannot
display structure that
varies from the
structure of the source data.
Cubby
A tool of
Entrez
, the
Cubby
stores search
strategies that may be updated at any time, stores
LinkOut
preferences to
specify which
LinkOut
providers have to be
displayed in
PubMed
, and
changes the default document delivery
service.
DCMS
Data Creation and Maintenance
System
DDBJ
DNA
Data Bank of
Japan
definition
line
A
sequence
in
FASTA
format
begins
with
a
single-line
description,
followed
by
lines
of
sequence data. The
definition line
or
description line is distinguished from
the sequence data
by
a
―greater
than‖
(>)
symbol
in the
first
column
(see
example
);
also
DEFLINE,
as
in
a
flatfile.
DNA
Deoxyribonucleic
acid
is the
chemical
inside
the
nucleus
of
a
cell that
carries the
genetic
instructions for
making living organisms.
DNA
is composed of two anti-
parallel strands, each
a
linear
polymer
of
nucleotides.
Each
nucleotide
has
a
phosphate
group
linked
by
a
phosphoester bond to a pentose (a five-
carbon sugar molecule, deoxyribose), that in turn
is
linked to one of four organic bases,
adenine, guanine, cytosine, or thymine,
abbreviated A, G
,
C,
and
T
,
respectively.
The
bases
are
of two
types:
purines,
which
have two
rings
and
are
slightly
larger
(A
and
G);
and
pyrimidines,
which
have
only
one
ring
(C
and
T).
Each
nucleotide
is
joined to the
next
nucleotide
in the
chain
by
a
covalent
phosphodiester
bond
between the
5′
carbon
of
one
deoxyribose
group
and
the
3′
carbon
of
the
next.
DNA
is
a
helical
molecule
with
the
sugar
–
phosphate
backbone
on
the
outside
and
the
nucleotides
extending
toward
the
central
axis.
There
is
specific
base-pairing
between
the
bases
on
opposite strands in such a way that A
always pairs with T and G
always pairs
with C.
domain
A ―
domain
‖ refers
to a discrete portion of a protein assumed to fold
independently of the rest
of the
protein and which possesses its own
function.
draft
sequence
Draft
sequence
refers to
DNA
sequence that is not yet
finished but is generally of high quality
(i.e., an accuracy of greater than
90%).
Draft sequence
data are
mostly in the form of 10,000
base pair-
sized fragments, the approximate chromosomal
locations of which are known. The
following keywords are associated with
draft sequence
: phase 0,
light-pass coverage of a clone,
generally only 1×
coverage;
phase 1, 4
–
10×
coverage of a
BAC
clone
(order and orientation of
the
fragments
are
unknown);
and
phase
2,
4
–
10×
coverage
of
a
BAC
clone
(order
and
orientation of the
fragments are known). Phase 3 refers to the
completely
finished
sequence
.
DTD
Document
T
ype Definition. The
DTD
is an optional part of
the prolog of an
XML
document
that defines the rules of the document.
It sets constraints for an
XML
document by specifying
which
elements
are
present
in
the
document
and
the
relationships
between elements, e.g.,
which tags can contain other tags, the
number and sequence of the tags, and attributes of
the
tags.
The
DTD
helps
to
validate
the
data
when
the
receiving
application
does
not
have
a
built-in description of
the incoming data.
DUST
A program
for filtering low-complexity regions from nucleic
acid sequences.
E-value
Expect
value. The
E-value
is a
parameter that describes the number of hits one
can ―expect‖
to see by chance when
searching a database of a particular size. It
decreases exponentially
with
the
score
(S)
that
is
assigned
to
a
match
between
two
sequences.
Essentially,
the
E-value
describes
the
random
background noise
that
exists
for
matches
between
sequences.
For
example,
an
E-value
of
1
assigned
to
a
hit
can
be
interpreted
as
meaning
that
in
a
database of the current
size, one might expect to see one match with a
similar score simply
by chance. This
means that the lower the
E-value
, or the closer it is
to ―0‖, the higher is the
―significance‖
of
the
match.
However,
it
is
important
to
note
that
searches
wit
h
short
sequences
can
be
virtually
identical
and
have
relatively
high
E-value
.
This
is
because
the
calculation
of
the
E-value
also
takes
into
account the
length
of the
query
sequence.
This
is
because
shorter sequences
have
a
high
probability
of
occurring
in the
database
purely
by
chance. For more
information, see the following
tutorial
.
EC number
A
number
assigned
to
a
type
of
enzyme
according
to
a
scheme
of
standardized
enzyme
nomenclature developed by the Enzyme
Commission of the Nomenclature Committee of the
International
Union
of
Biochemistry
and
Molecular
Biology
(IUBMB).
EC
numbers
may
be
found in
ENZYME
, the Enzyme
nomenclature database, maintained at the
ExPASy
molecular
biology server.
EMBL
European
Molecular Biology Laboratory
Entrez
Entrez
is
a
retrieval system
for
searching
several linked
databases. It
provides
access to
the
following
NCBI
databases:
PubMed
,
GenBank
,
Protein,
Structure,
Genome,
PopSet,
OMIM
,
T
axonomy, Books, ProbeSet,
3D
Domains
,
UniSTS
,
SNP
, and
CDD
. (See the
Entrez
chapter or
the
Entrez web
page
.)
Entrez
Gene
(formerly known
as
LocusLink).
Entrez
Gene
provides tracked,
unique
identifiers for
genes
(
GeneID
s)
and
reports
information
associated
with
those
identifiers
for unrestricted
public
use. See the
Entrez Genechapter
or
web page
.)
EST
Expressed
Sequence
T
ag.
ESTs
are
short
(usually
approximately
300
–
500
base
pairs),
single-pass sequence reads from
cDNA
. T
ypically,
they are produced in large batches. They
represent the genes expressed in a
given tissue and/or at a given developmental
stage. They
are tags (some coding,
others not) of expression for a given
cDNA
library. They are useful
in
identifying full-length genes and in
mapping.
e-PCR
Electronic
PCR
is
used to compare a query sequence to mapped
sequence-tagged sites (
STS
s)
to find a possible map location for the
query sequence.
e-PCR
finds
STSs
in
DNA
sequences by
searching for subsequences that closely
match the
PCR
primers present
in mapped markers.
The
subsequences
must
have
the
correct
order,
orientation,
and
spacing
that
they
could
plausibly prime the amplification of a
PCR
product of the correct
molecular
weight
.
epub citation
―Ahead
-of-
print‖
citation.
PubMed
now
accepts
citations
from
publishers
for
articles
that
have been published electronically
ahead of the printed issue.
PubMed
displays the category
―[epub ahead
of print]‖ in
the part of the citation where the volume and
pagination would
ordinarily
display.
For
example:
Proc
Natl
Acad
Sci
U
S
A.
2000
May
2
[epub
ahead
of
print].
ExoFish
Exon
Finding by Sequence
Homology. Exofish is a tool based on homology
searches for the
rapid
and
reliable
identification
of
human
genes.
It
relies
on
the
sequence
of
another
vertebrate,
the
pufferfish
Tetraodon
nigroviridis
(similar
to
Fugu),
to
detect
conserved
sequences
with
a
very
low
background.
The
genome
of
T.
nigroviridis
is
eight
times
more
compact
than
the
human
genome
and
has
been used
in
the
comparative
identification
of
human
genes
from
the
rough
draft
of the
human
genome
(
Roest
Crollius
et
al.,
Nat
Genet
25:235-238;
2000
).
exon
Refers
to
the
portion
of
a
gene
that encodes for
a
part
of that
gene's
mRNA
.
A
gene
may
comprise many
exons
, some of which may
include only protein-coding sequence; however, an
exon
may also include 5' or
3' untranslated sequence. Each
exon
codes for a specific
portion of
the
complete
protein.
In
some
species (including humans),
a
gene's
exons
are
separated
by
long
regions of
DNA
(called
intron
s or sometimes ―junk
DNA
‖) that often have no
apparent
function but have been shown
to encode small untranslated
RNAs
or regulatory
information.
(See also
splice sites
.)
exon-trapped
Exon
trapping
is
a technique
for
cloning
exon
sequences
from
genomic
DNA
by
selecting
for
functional
splice
sites
,
relying
on
the
cellular
splicing
machinery.
The
genomic
DNA
containing the putative
exon
(s) is cloned into an
exon
-trap vector
,
which has a promoter
,
polyadenylation
signals,
and
splice
sites
,
and
then transfected
into
a
cell
line. If
there
are
functional
splice
sites
in the genomic
DNA
fragment, the segments of
DNA
between the
splice
sites
will
be removed. T
otal
RNA
is isolated and reverse-
transcribed. After
cDNA
synthesis and
PCR
amplification, the
exon
of interest is
cloned.
ExPASy
Expert
Protein
Analysis
System
is
a
proteomics
server
of the
Swiss
Bioinformatics
Institute
(SIB).
FAST
A
The
first
widely
used
algorithm
for
similarity
searching
of
protein
and
DNA
sequence
databases.
The
program
looks
for
optimal
local
alignments
by
scanning
the sequence
for
small
matches
called
―words‖.
Initially
,
the
scores
of
segments in
which
there
are
multiple
word hits are
calculated (―init1‖). Later
, the scores
of several segments may be summed to
generate an ―initn‖ score. An optimized
alignment that includes
gaps
is shown in the output
as ―opt‖. The sensitivity and speed of
the search are inversely related and controlled by
the
―k
-
tup‖
variable, which specifies the size of a ―word‖
(
Pearson and Lipman
). Also
refers to a
format
for a
nucleic acid or protein sequence.
fingerprint
The
pattern of bands on a gel produced by a clone when
restricted by a particular enzyme,
such
as
Hin
dIII.
finished sequence
High-quality, low-error
DNA
sequence that is free of
gaps
. T
o qualify
as a finished sequence,
only a single
error out of every 10,000 bases (i.e., an accuracy
of 99.999%) is allowed.
FISH
Fluorescence
in situ
hybridization. In
this technique, fluorescent molecules are used to
label a
DNA
probe, which can
then hybridize to a specific
DNA
sequence in a chromosome
spread so
that
the
site
becomes
visible
through
a
microscope.
FISH
has
been
used
to
highlight
the
locations of genes,
subchromosome regions, entire chromosomes, or
specific
DNA
sequences.
It
has
been
used
for
mapping
and
the
detection
of
genomic
rearrangements,
as
well
as
studies
on
DNA
replication.
flatfile or flat
file
A
flat
file
is
a
data
file
that
contains
records
(each
corresponding
to
a
row
in
a
table);
however, these records have no
structured relationships. T
o
inte
rpret these files, the format
properties
of
the
file
should
be
known.
For
example,
a
database
management system
may
allow the user to export data to a
comma-delimited file. Such a file is called a flat
file because
it
has
no
inherent
information
about
the
data,
and
interpretation
requires
additional
information.
Files in a database management system have more
complex storage structures.
freeze
T
o copy changing data so as
to preserve the dataset as it existed at a
particular point in time.
Also used to
refer to the resulting set of frozen
data.
FTP
File
T
ransfer
Protocol.
A
method
of
retrieving
files
over
a network
directly to
the
user's
computer or to
his/her home directory using a set of protocols
that govern how the data are
to be
transported.
gap
A
gap
is a space
introduced into an alignment to compensate for
insertions and deletions in
one
sequence
relative
to
another
.
T
o
prevent
the
accumulation
of
too
many
gaps
in
an
alignment, introduction of a
gap
causes the deduction of a
fixed amount (the
gap
score)
from
the alignment score. Extension of
the
gap
to encompass
additional nucleotides or amino acid
is
also penalized in the scoring of an alignment.
(See the
figure
for more
information.)
GB
gigabytes
GBFF
GenBank
Flat File. Refers to
a format .gbff.
GenBank
GenBank
is a database of
nucleotide sequences from more than 100,000
organisms. Records
that
are
annotated
with
coding
region
features
also
include
amino
acid
translations.
GenBank
belongs
to
an
international
collaboration
of sequence
databases
that
also
includes
EMBL
and
DDBJ
. [See the
GenBank
chapter (Chapter 1)
or the
GenBank web
page
.]
GeneID
GeneID
is a unique identifier
that is assigned to a gene record in
Entrez Gene
. It is an
integer
and
is
species
specific.
In
other
words,
the
integer
assigned
to
dystrophin
in
human
is
different
from
that
in
any
other
species.
For
genomes
that
had
been
represented
in
LocusLink, the
GeneID
is the same as the
LocusID. The
GeneID
is
reported in
RefSeq
records
as a 'db_xref' (e.g. /db_xref=
GeneID
:856646
GenBank
format).
genetic
code
The instructions in a
gene that tell the cell how to make a specific
protein. A, T
, G
, and C are
the ―letters‖ of the
DNA
code; they stand for the
chemicals adenine, thymine, guanine, and
cytosine, respectively, that make up
the nucleotide bases of
DNA
.
Each gene's code combines
the four
chemicals in various ways to spell out
three-
letter ―words‖ that specify which
amino
acid is needed at every position
for making a protein.
GenomeScan
A
gene
identification
algorithm
that
is
used
to
identify
exon
–
intron
structures
in
genomic
DNA
sequence.
genotype
The
genetic
identity
of
an
individual
that
does
not
show
as
outward
characteristics.
The
genotype
refers
to
the
pair
of
alleles
for
a
given
region
of
the
genome
that
an
individual
carries.
GEO
Gene
Expression Omnibus.
GEO
is a
gene expression data repository and online
resource for
the retrieval of gene
expression data from any organism or artificial
source. Many types of
gene
expression
data
from
platform
types,
such
as
spotted
microarray,
high-density
oligonucleotide
array,
hybridization
filter,
and
serial
analysis
of
gene
expression
(
SAGE
)
data,
are
accepted,
accessioned,
and
archived
as
a
public
dataset.
[See
the
GEO
chapter
(Chpater 6) or the
GEO web
page
.]
GI
The
GenInfo Identifier
is
a sequence
identification
number for
a nucleotide
sequence. If
a
nucleotide
sequence
changes
in
any
way,
a
new
GI
number
will
be
assigned.
A
separate
GI
number
is also assigned to each protein translation
within a nucleotide
sequence record,
and a new
GI
is
assigned if the protein translation changes in any
way.
GI
sequence identifiers
run parallel to the new n system of
sequence identifiers (see the description
of
V
ersion
).
GSS
Genome Survey Sequences are analogous
to
EST
s except that the
sequences are genomic in
origin, rather
than
cDNA
(
mRNA
). The
GSS
division
of
GenBank
contains (but is
not limited to)
the
following
types
of
data:
random
―single
-
pass
read‖
genome
survey
sequences,
cosmid/
BAC
/
Y
AC
end sequences,
exon-trapped
genomic
sequences, and
Alu
-
PCR
sequences.
heterozygosity
The probability that a diploid
individual will have two different
alleles
at a particular
genome
locus
.
These
individuals
are
defined
as
heterozygous,
whereas
individuals
who
have
two
identical
alleles
at the
locus
are defined as
homozygous. The probability can be estimated by
sampling
a
representative
number
of
individuals
from
the
population
and
dividing
the
number of heterozygotes by the total
number sampled.
HIV
Human
Immunodeficiency
Virus.
HIV
-1
is
a
retrovirus
that
is
recognized
as
the
causative
agent of AIDS
(Acquired Immunodeficiency Syndrome).
HNPCC
Hereditary
nonpolyposis colon cancer
homogeneously staining
region
A
region
of
the
chromosome
identified
cytologically
by
DNA
staining
or the
FISH
technique
because
of
the
presence
of
multiple
copies
of
a
subchromosomal
region
resulting
from
amplification.
homologous
The
term
refers
to
similarity
attributable
to
descent
from
a
common
ancestor
.
Homologous
chromosomes are
members of a pair of essentially identical
chromosomes, each
derived from one
parent. They have the same or allelic genes with
genetic loci arranged in
the
same order.
Homologous
chromosomes
synapse during meiosis.
HTGS
High-
Throughput
Genomic
Sequences.
The
source
of
HTGS
are
large-
scale
genome
sequencing
centers;
unfinished
sequence
s are in phases 0, 1, and 2,
and
finished sequence
s are
in phase 3.
HTGS_CANCELLED
A
keyword added to
GenBank
entries by sequencing
centers to indicate that work has stopped
on a clone and that the existing
sequence will not be finished. Sequencing centers
may stop
work because the clone is
redundant or for various other reasons.
HTGS_PHASE0, HTGS_PHASE1, HTGS_PHASE2,
HTGS_PHASE3
Keywords added
to
GenBank
entries by
sequencing centers to indicate the status (phase)
of
the sequence (see phase definitions
described under
draft
sequence
).
HTML
Hypertext
Markup
Language.
HTML
is
derived
from
SGML
.
It
is
a
text-based
mark-up
language and is used to primarily
display information using a web browser and to
link pieces
of information via
hyperlinks. The tags used in an
HTML
document provide
information only
on how the content is
to be displayed but do not provide information
about the content they
encompass.
HUP
Hold Until
Published.
HUP
refers to the
category for data that is electronically submitted
for
when it should be released to the
public.
ICBN
International Code of Botanical
Nomenclature
ICD
International Classification of
Diseases
ICD-O-3
International Classification of
Diseases for Oncology, 3rd edition
ICNB
International Code of Nomenclature of
Bacteria
ICNCP
International Code of Nomenclature for
Cultivated Plants
ICTV
International Committee on Taxonomy of
V
iruses
ICVCN
International Code of V
irus
Classification and Nomenclature
ICZN
International Code of Zoological
Nomenclature
ideogram
A
diagrammatic representation of the
karyotype
of an
organism.
IMAGE
Consortium
Integrated
Molecular Analysis of Genomes and their
Expression. A consortium of academic
groups
that
share
high-quality,
arrayed
cDNA
libraries
and
place
sequence,
map,
and
expression
data
of
the
clones
in these
arrays
into
the
public
domain
.
With
the
use
of
this
information,
unique
clones
can
be
rearrayed
to
form
a
―master
array‖,
with
the
aim
of
ultimately
having a representative
cDNA
from every gene in the
genome under study. T
o date,
human, mouse, rat, zebrafish, and
Xenopus laevis
genomes have
been studied.
intron
Refers to
that portion of the
DNA
sequence that is present
in the primary transcript and that
is
removed by splicing during
RNA
processing and is not
included in the mature, functional
mRNA
, rRNA, or tRNA. Also
called an intervening sequence. (See also
splice sites
.)
ISAM
Indexed
Sequential-Access Method.
ISAM
is a database access
method. It allows data records
in a
database to be accessed either sequentially (in
the order in which they were entered) or
randomly (using an index). In the
index, each record has a unique key that enables
its rapid
location. The key is the
field used to reference the record.
ISCN
International System for Human
Cytogenetic Nomenclature
ISO
International
Organization for Standardization
ISSN
International
Standard
Serial
Number
.
The
ISSN
is
an
eight-digit
number
that
identifies
periodical
publications, including electronic
serials.
karyotype
The
particular chromosome complement of an individual
or a related group of individuals,
as
defined
by
both
the
number
and
morphology
of
the
chromosomes,
usually
in
mitotic
metaphase, and arranged by pairs
according to the standard
classification.
LANL
Los Alamos
National Lab
LIMS
Laboratory Information Management
Systems.
LIMS
comprise
software that helps biological
and
chemical
laboratories
handle
data
generation,
information
management,
and
data
archiving.
LinkOut
A
registry service to create links from specific
articles, journals, or biological data in
Entrez
to
resources
on
external
web
sites.
Third
parties
can
provide
a
URL
,
resource
name,
brief
description of their web sites, and
specification of the
NCBI
data from which they
would like
to establish links. The
specification can be written as a valid
Boolean
query to
Entrez
or as a
list of identifiers for specific
articles or sequences.
EntrezPubMed
users can then
select which
external links are visible
in their searches through the
NCBICubby
service (see
above). (See
the
LinkOut
chapter or
web page
.)
locus
In a
genomic contect,
locus
refers
to position on a chromosome. It may, therefore,
refer to a
marker
, a gene,
or any other landmark that can be
described.
MACA
W
Multiple
Alignment
Construction
and
Analysis
Workbench.
MACA
W
is
a
program
for
locating,
analyzing,
and
editing
blocks
of
localized
sequence
similarity
among
multiple
seqences and
linking them into a composite multiple
alignment.
Map
Viewer
The
Map
Viewer
is
a
software
component
of
Entrez
Genomes
that
provides
special
browsing
capabilities
for
a
subset
of
organisms.
It
allows
one
to
view
and
search
an
organism's
complete genome, display chromosome
maps, and zoom into progressively greater levels
of
detail, down to the sequence data
for a region of interest. If multiple maps are
available for a
chromosome, it displays
them aligned to each other based on shared marker
and gene names
and, for the sequence
maps, based on a common sequence coordinate
system. The organisms
currently
represented in the
Map
Viewer
are listed in the
Entrez Map Viewer help
document
,
which
provides
general
information
on
how
to
use
that
tool.
The
number
and
types
of
available
maps
vary
by
organism
and
are
described
in
the
―data
and
search
tips‖
file
-
-
-
-
-
-
-
-
-
上一篇:尽职调查资料-中英文对照版
下一篇:充满科技感的词汇