-
Manual Reference Pages - bwa
1.
NAME
bwa - Burrows-Wheeler Alignment Tool
2.
CONTENTS
(1) Synopsis
(2) Description
(3) Commands
And Options
(4) Sam Alignment Format
(5) Notes On Short-read Alignment
Alignment
Accuracy
Estimating Insert Size Distribution
Memory
Requirement
Speed
(6) Changes In Bwa-0.6
(7) License And Citation
(1)
SYNOPSIS
bwa index
bwa mem >
bwa mem >
bwa aln
short_ > aln_
bwa samse aln_ short_ >
bwa sampe aln_ aln_ >
bwa bwasw long_ >
(2)
DESCRIPTION
BWA
is
a
software
package
for
mapping
low-
divergent
sequences
against
a
large
reference
genome,
such
as
the
human
genome.
It
consists
of
three
algorithms:
BWA-backtrack,
BWA-SW
and
BWA-MEM.
The
first
algorithm
is
designed
for
Illumina sequence reads up to 100bp,
while the rest two for longer sequences ranged
from
70bp
to
1Mbp.
BWA-MEM
and
BWA-SW
share
similar
features
such
as
long-read
support
and
split
alignment,
but
BWA-MEM,
which
is
the
latest,
is
generally
recommended
for
high-quality
queries
as
it
is
faster
and
more
accurate.
BWA-MEM also has better performance
than BWA-backtrack for 70-100bp Illumina
reads.
For all
the algorithms, BWA first needs to construct the
FM-index for the reference
genome
(the
index
command).
Alignment
algorithms
are
invoked
with
different
sub-commands:
aln/samse/sampe
for
BWA-
backtrack,
bwasw
for
BWA-SW
and
mem
for the BWA-MEM
algorithm.
(3)
COMMANDS AND OPTIONS
index
bwa index
[-p prefix] [-a algoType] <>
Index
database sequences in the FASTA format.
OPTIONS:
-p
STR
Prefix of
the output database [same as db filename]
-a
STR
Algorithm
for constructing BWT index. Available options are:
is
IS linear-time
algorithm for constructing suffix array. It
requires
5.37N memory where N is the
size of the database. IS is moderately
fast,
but
does
not
work
with
database
larger
than
2GB.
IS
is
the
default
algorithm
due
to
its
simplicity.
The
current
codes
for
IS
algorithm
are reimplemented by Yuta Mori.
bwtsw
Algorithm
implemented
in
BWT-SW.
This
method
works
with
the whole human genome.
mem
bwa
mem
[
-aCHMpP
]
[-
t
nThreads]
[-
k
minSeedLen]
[-
w
bandWidth]
[-
d
zDropoff]
[-
r
seedSplitRatio]
[-
c
maxOcc]
[-
A
matchScore]
[-
B
mmPenalty]
[-
O
gapOpenPen]
[-
E
gapExtPen]
[-
L
clipPen]
[-
U
unpairPen]
[-
R
RGline]
[-
v
verboseLevel]
[]
Align 70bp-1Mbp query sequences with
the BWA-MEM algorithm. Briefly,
the
algorithm
works
by
seeding
alignments
with
maximal
exact
matches
(MEMs)
and
then
extending
seeds
with
the
affine-
gap
Smith-Waterman
algorithm
(SW).
If file is absent
and option -
p
is not set,
this command regards input
reads
are single-end.
If
is
present,
this
command assumes the i-th
read in and the i-th read in
constitute a read pair. If
-
p
is
used,
the
command
assumes
the
2i-th
and
the
(2i+1)-th
read
in
constitute a read pair (such input file
is said to be interleaved). In this case,
is
ignored.
In
the
paired-end
mode,
the
mem
command
will
infer
the
read orientation and the insert size distribution
from a batch of reads.
The
BWA-MEM
algorithm
performs
local
alignment.
It
may
produce
multiple primary
alignments for different part of a query sequence.
This is a
crucial
feature
for
long
sequences.
However,
some
tools
such
as
Picard’s
markDuplicates
does
not
work
with
split
alignments.
One
may
consider
to
use option
-
M
to flag shorter split
hits as secondary.
OPTIONS:
-t
INT
Number of threads [1]
-k
INT
Minimum seed length. Matches shorter
than INT will be missed. The
alignment
speed
is
usually
insensitive
to
this
value
unless
it
significantly deviates
20. [19]
-w
INT
Band
width.
Essentially,
gaps
longer
than
INT
will
not
be
found.
Note
that
the
maximum
gap
length
is
also
affected
by
the
scoring
matrix and the hit length, not solely
determined by this option. [100]
-d
INT
Off-diagonal
X-dropoff
(Z-dropoff).
Stop
extension
when
the
difference between the
best and the current extension score is above
|i-j|*A+INT, where i and j are the
current positions of the query and
reference,
respectively,
and
A
is
the
matching
score.
Z-dropoff
is
similar to BLAST’s
X
-
dropoff except that it
doesn’t penalize gaps in
one
of
the
sequences
in
the
alignment.
Z-dropoff
not
only
avoids
unnecessary
extension,
but
also
reduces
poor
alignments
inside
a
long
good alignment. [100]
-r
FLOAT
Trigger re-
seeding for a MEM longer than minSeedLen*FLOAT.
This
is
a
key
heuristic
parameter
for
tuning
the
performance.
Larger value
yields fewer seeds, which leads to
faster alignment
speed but lower
accuracy. [1.5]
-c
INT
Discard a MEM
if it has more than INT occurence in the genome.
This is an insensitive parameter.
[10000]
-P
In the
paired-end mode, perform SW to rescue missing hits
only but
do not try to find hits that
fit a proper pair.
-A
INT
Matching
score. [1]
-B
INT
Mismatch
penalty. The sequence error rate is approximately:
{.75 *
exp[-log(4) * B/A]}. [4]
-O
INT
Gap open penalty. [6]
-E
INT
Gap extension
penalty. A gap of length k costs O + k*E (i.e. -O
is for
opening a zero-length gap). [1]
-L
INT
Clipping
penalty.
When
performing
SW
extension,
BWA-MEM
keeps track of the
best score reaching the end of query. If this
score
is larger than the best SW score
minus the clipping penalty, clipping
will not be applied. Note that in this
case, the SAM AS tag reports
the best
SW score; clipping penalty is not deducted. [5]
-U
INT
Penalty
for
an
unpaired
read
pair.
BWA-MEM
scores
an
unpaired
read
pair
as
scoreRead1+scoreRead2-INT
and
scores
a
paired
as
scoreRead1+scoreRead2-insertPenalty. It
compares these two scores
to determine
whether we should force pairing. [9]
-p
Assume the
first input query file is interleaved paired-end
FASTA/Q.
See the command description
for details.
-R
STR
Complete read
group header line. ’
t’ can
be used in STR and will be
converted to
a TAB in the output SAM. The read group ID will be
attached
to
every
read
in
the
output.
An
example
is
’@RG
tID:foo
tSM:bar’.
[null]
-T
INT
Don’t output
alignment with score lower than INT. This option
only
affects output. [30]
-a
Output
all
found
alignments
for
single-end
or
unpaired
paired-end
reads. These alignments will be flagged
as secondary alignments.
-C
Append append FASTA/Q comment to SAM
output. This option can
be used to
transfer read meta information (e.g. barcode) to
the SAM
output. Note that the FASTA/Q
comment (the string after a space in
the header line) must conform the SAM
spec (e.g. BC:Z:CGTAC).
Malformated
comments lead to incorrect SAM output.
-H
Use
hard
clipping
’H’
in
the
SAM
output.
This
option
may
dramatically
reduce
the
redundancy
of
output
when
mapping
long
contig or BAC sequences.
-M
Mark shorter
split hits as secondary (for Picard
compatibility).
-v
INT
Control
the
verbose
level
of
the
output.
This
option
has
not
been
fully supported
throughout BWA. Ideally, a value 0 for disabling
all
the output to stderr; 1 for
outputting errors only; 2 for warnings and
errors; 3 for all normal messages; 4 or
higher for debugging. When
this option
takes value 4, the output is not SAM. [3]
aln
bwa
aln
[-n
maxDiff]
[-o
maxGapO]
[-e
maxGapE]
[-d
nDelTail]
[-i
nIndelEnd] [-k maxSeedDiff] [-l
seedLen] [-t nThrds] [-cRN] [-M misMsc]
[-O
gapOsc]
[-E
gapEsc]
[-q
trimQual]
<>
<>
>
<>
Find
the
SA
coordinates
of
the
input
reads.
Maximum
maxSeedDiff
differences
are
allowed
in
the
first
seedLen
subsequence
and
maximum
maxDiff differences
are allowed in the whole sequence.
OPTIONS:
-n
NUM
M
aximum edit distance if the
value is INT, or the fraction of missing
alignments given 2% uniform base error
rate if FLOAT. In the latter
case,
the
maximum
edit
distance
is
automatically
chosen
for
different read lengths.
[0.04]
-o
INT
Maximum number of gap opens [1]
-e
INT
Maximum
number
of
gap
extensions,
-1
for
k-difference
mode
(disallowing long gaps) [-1]
-d
INT
Disallow a long deletion within INT bp
towards the 3’
-end [16]
-i
INT
Disallow an indel within INT bp towards
the ends [5]
-l
INT
Take
the
first
INT
subsequence
as
seed.
If
INT
is
larger
than
the
query sequence, seeding
will be disabled. For long reads, this option
is typically ranged from 25 to 35 for
‘
-
k 2’. [inf]
-k
INT
Maximum edit distance in the seed [2]
-t
INT
Number of threads (multi-threading
mode) [1]
-M
INT
Mismatch penalty. BWA
will not search for suboptimal hits with
a
score lower than
(bestScore-misMsc). [3]
-O
INT
Gap open
penalty [11]
-E
INT
Gap extension penalty
[4]
-R
INT
Proceed with suboptimal alignments if
there are no more than INT
equally
best
hits.
This
option
only
affects
paired-end
mapping.
Increasing
this
threshold
helps
to
improve
the
pairing
accuracy
at
the cost of speed, especially for short
reads (~32bp).
-c
Reverse
query
but
not
complement
it,
which
is
required
for
alignment in the color space. (Disabled
since 0.6.x)
-N
Disable
iterative
search.
All
hits
with
no
more
than
maxDiff
differences
will
be
found.
This
mode
is
much
slower
than
the
default.
-q
INT
Parameter
for
read
trimming.
BWA
trims
a
read
down
to
argmax_x{sum_{i=x+1}^l(INT-q_i)}
if
q_l
where
l
is
the
original read length. [0]
-I
The
input
is
in
the
Illumina
1.3+
read
format
(quality
equals
ASCII-64).
-B
INT
Length
of barcode starting
from
the 5’
-end. When
INT is
positive,
the barcode of each read will be
trimmed before mapping and will
be
written
at
the
BC
SAM
tag.
For
paired-end
reads,
the
barcode
from both ends are
concatenated. [0]
-b
Specify
the
input
read
sequence
file
is
the
BAM
format.
For
paired-end
data,
two
ends
in
a
pair
must
be
grouped
together
and
options -1 or -2 are usually applied to
specify which end should be
mapped.
Typical
command
lines
for
mapping
pair-end
data
in
the
BAM format are:
bwa aln -b1 >
bwa aln -b2 >
bwa sampe >
-0
When -b is specified, only
use single-end reads in mapping.
-1
When
-b
is
specified,
only
use
the
first
read
in
a
read
pair
in
mapping (skip single-end
reads and the second reads).
-2
When
-b
is
specified,
only
use
the
second
read
in
a
read
pair
in
mapping.
samse
bwa samse
[-n maxOcc] <> <> <> > <>
Generate
alignments
in
the
SAM
format
given
single-end
reads.
Repetitive
hits will be
randomly chosen.
OPTIONS:
-n
INT
Maximum number of alignments to output
in the XA tag for reads
paired
properly. If a read has more than INT hits, the XA
tag will not
be written. [3]
-r
STR
Specify
the
read
group
in
a
format
like
‘@RG<
/p>
tID:foo
tSM:bar’.
[null]
sampe
bwa sampe
[-a maxInsSize] [-o maxOcc] [-n maxHitPaired] [-N
maxHitDis]
[-P] <> <> <> <> <> > <>
Generate alignments in
the
SAM format given paired-end reads. Repetitive
read pairs will be placed randomly.
-
-
-
-
-
-
-
-
-
上一篇:中西方食品英文表示
下一篇:(完整版)Abaqus操作说明