bwa使用说明_高中生题库网|高考真题|高考试题-「密云二中」

-

2021年2月16日发(作者：舒舒)

Manual Reference Pages-

bwa (1)

NAME

bwa - Burrows-Wheeler Alignment Tool

CONTENTS

Synopsis

Description

Commands And Options

Sam Alignment Format

Notes On Short-read Alignment

Alignment Accuracy

Estimating Insert Size Distribution

Memory Requirement

Speed

Changes In Bwa-0.6

See Also

Author

License And Citation

History

SYNOPSIS

bwa index

构建索引

bwa mem >

单端测序

bwa mem >

双端测序

bwa aln short_ > aln_

bwa samse aln_ short_ >

bwa sampe aln_ aln_ >

bwa bwasw long_ >

DESCRIPTION

BWA

software

package

for

mapping

low- divergent

sequences

against

large

reference

genome,

such

the

human

genome.

consists

three

algorithms:

BWA-backtrack, BWA-SW and BWA-MEM.

The first algorithm is designed for Illumina

sequence reads up to 100bp, while the rest two for longer sequences ranged from

70bp to 1Mbp

. BWA-MEM and BWA-SW share similar features such as long- read support

and

split

alignment,

but

BWA-MEM,

which

the

latest,

generally

recommended

for

high-quality

queries

faster

and

accurate.

BWA-MEM

also

has

better

performance than BWA- backtrack for 70-100bp Illumina reads.

For

all

the

algorithms,

BWA

first

needs

construct

the

FM- index

for

the

reference

genome

(the

index

command).

Alignment

algorithms

are

invoked

with

different

sub-commands:

aln

samse

sampe

for BWA-backtrack,

bwasw

for BWA-SW and

mem

for

the BWA- MEM algorithm.

COMMANDS AND OPTIONS

index

bwa index [-p prefix] [-a algoType] <>

Index database sequences in the FASTA format.

OPTIONS:

-p

STR

Prefix of the output database [same as db filename]

-a

STR

Algorithm for constructing BWT index. Available options are:

(

默认

)

IS linear-time algorithm for constructing suffix array. It

requires 5.37N memory where N is the size of the database.

IS is moderately fast, but does not work with database

larger than 2GB

. IS is the default algorithm due to its

simplicity. The current codes for IS algorithm are

reimplemented by Yuta Mori.

bwtsw

Algorithm implemented in BWT-SW. This method works

with the whole human genome.

mem

bwa mem

[

-aCHMpP

] [

-t

nThreads

] [

-k

minSeedLen

] [

-w

bandWidth

] [

-d

zDropoff

] [

-r

seedSplitRatio

] [

-c

maxOcc

] [

-A

matchScore

] [

-B

mmPenalty

] [

-O

gapOpenPen

] [

-E

gapExtPen

] [

-L

clipPen

] [

-U

unpairPen

] [

-R

RGline

] [

-v

verboseLevel

]

[

]

Align

70bp-1Mbp

query

sequences

with

the

BWA-MEM

algorithm.

Briefly,

the algorithm works by seeding alignments with maximal exact matches (MEMs)

and then extending seeds with the affine-gap Smith-Waterman algorithm (SW).

file

absent

and

option

-p

not

set,

this

command

regards

input

reads are single-end. If

is present, this command assumes the

-th read

and the

-th read in

constitute a read pair. If

-p

is used, the

command assumes the 2

-th and the (2

+1)-th read in

constitute a read

pair (such input file is said to be interleaved). In this case,

is ignored. In

the paired-end mode, the

mem

command will infer the read orientation and the

insert size distribution from a batch of reads.

The

BWA-MEM

algorithm

performs

local

alignment.

may

produce

multiple

primary

alignments

for

different

part

query

sequence.

This

crucial

feature

for

long

sequences.

However,

some

tools

such

Picard’s

markDuplicates

does

not

work

with

split

alignments.

One

may

consider

use

option

-M

to flag shorter split hits as secondary.

OPTIONS:

-t

INT

-k

INT

Number of threads [1]

Minimum seed length. Matches shorter than

INT

will be missed. The

alignment speed is usually insensitive to this value unless it

significantly deviates 20. [19]

-w

INT

Band width. Essentially, gaps longer than

INT

will not be found. Note

that the maximum gap length is also affected by the scoring matrix

and the hit length, not solely determined by this option. [100]

-d

INT

Off-diagonal X-dropoff (Z-dropoff). Stop extension when the

difference between the best and the current extension score is

above |

< p>
i

INT

, where

and

are the current positions of the

query and reference, respectively, and

is the matching score.

Z-

dropoff is similar to BLAST’s X

-dropoff except

that it doesn’t

penalize gaps in one of the sequences in the alignment. Z-dropoff

not only avoids unnecessary extension, but also reduces poor

alignments inside a long good alignment. [100]

-r

FLOAT

Trigger re-seeding for a MEM longer than

minSeedLen

F LOAT

. This

is a key heuristic parameter for tuning the performance. Larger value

yields fewer seeds, which leads to faster alignment speed but lower

accuracy. [1.5]

-c

INT

-P

Discard a MEM if it has more than

INT

occurence in the genome.

This is an insensitive parameter. [10000]

In the paired-end mode, perform SW to rescue missing hits only but

do not try to find hits that fit a proper pair.

-A

INT

Matching score. [1]

-B

INT

Mismatch penalty. The sequence error rate is approximately: {.75 *

exp[-log(4) * B/A]}. [4]

-O

INT

Gap open penalty. [6]

-E

INT

-L

INT

Gap extension penalty. A gap of length k costs O + k*E (i.e.

-O

is for

opening a zero-length gap). [1]

Clipping penalty. When performing SW extension, BWA-MEM keeps

track of the best score reaching the end of query. If this score is

larger than the best SW score minus the clipping penalty, clipping

will not be applied. Note that in this case, the SAM AS tag reports

the best SW score; clipping penalty is not deducted. [5]

-U

INT

Penalty for an unpaired read pair. BWA-MEM scores an unpaired

read pair as scoreRead1+scoreRead2-

INT

and scores a paired as

scoreRead1+scoreRead2-insertPenalty. It compares these two

scores to determine whether we should force pairing. [9]

-p

Assume the first input query file is interleaved paired-end FASTA/Q.

See the command description for details.

-R

STR

Complete read

group header line. ’

t’ can be used in

STR

and will be

converted to a TAB in the output SAM. The read group ID will be

attached to every read in the output. An example

is ’@RG

tID:foo

tSM:bar’. [null]

-T

INT

-a

-C

Don’t output alignment with score lower than

INT

. This option only

affects output. [30]

Output all found alignments for single- end or unpaired paired-end

reads. These alignments will be flagged as secondary alignments.

Append append FASTA/Q comment to SAM output. This option can

be used to transfer read meta information (e.g. barcode) to the SAM

output. Note that the FASTA/Q comment (the string after a space in

the header line) must conform the SAM spec (e.g. BC:Z:CGTAC).

Malformated comments lead to incorrect SAM output.

-H

Use hard clipping ’H’ in the SAM output. This option may

dramatically reduce the redundancy of output when mapping long

contig or BAC sequences.

-M

-v

INT

Mark shorter split hits as secondary (for Picard compatibility).

Control the verbose level of the output. This option has not been

fully supported throughout BWA. Ideally, a value 0 for disabling all

the output to stderr; 1 for outputting errors only; 2 for warnings and

errors; 3 for all normal messages; 4 or higher for debugging. When

this option takes value 4, the output is not SAM. [3]

aln

bwa aln [-n maxDiff] [-o maxGapO] [-e maxGapE] [-d nDelTail] [-i nIndelEnd] [-k

maxSeedDiff] [-l seedLen] [-t nThrds] [-cRN] [-M misMsc] [-O gapOsc] [-E

gapEsc] [-q trimQual] <> <> >< >

Find

the

coordinates

the

input

reads.

Maximum

maxSeedDiff

differences

are allowed in the first

seedLen

subsequence and maximum

maxDiff

differences

are allowed in the whole sequence.

OPTIONS:

-n

NUM

aximum edit distance if the value is INT, or the fraction of missing

alignments given 2% uniform base error rate if FLOAT. In the latter

case, the maximum edit distance is automatically chosen for different

read lengths. [0.04]

-o

INT

Maximum number of gap opens [1]

-e

INT

Maximum number of gap extensions, -1 for k-difference mode

(disallowing long gaps) [-1]

-d

INT

Disallow a long deletion within INT bp towards the 3’

-end [16]

-i

INT

Disallow an indel within INT bp towards the ends [5]

-l

INT

Take the first INT subsequence as seed. If INT is larger than the query

sequence, seeding will be disabled. For long reads, this option is

typically ranged from 25 to 35 for ‘

k 2’. [inf]

-k

INT

Maximum edit distance in the seed [2]

-t

INT

Number of threads (multi-threading mode) [1]

-M

INT

Mismatch penalty. BWA will not search for suboptimal hits with a

score lower than (bestScore-misMsc). [3]

-O

INT

Gap open penalty [11]

-E

INT

Gap extension penalty [4]

-R

INT

Proceed with suboptimal alignments if there are no more than INT

equally best hits. This option only affects paired-end mapping.

Increasing this threshold helps to improve the pairing accuracy at the

cost of speed, especially for short reads (~32bp).

-c

-N

Reverse query but not complement it, which is required for alignment

in the color space. (Disabled since 0.6.x)

Disable iterative search. All hits with no more than

maxDiff

differences will be found. This mode is much slower than the default.

-q

INT

Parameter for read trimming. BWA trims a read down to

argmax_x{sum_{i=x+1}^l(INT-q_i)} if q_l

read length. [0]

-I

The input is in the Illumina 1.3+ read format (quality equals ASCII-64).

-B

INT

Length of barcode start

ing from the 5’

-end. When

INT

is positive, the

barcode of each read will be trimmed before mapping and will be

written at the

SAM tag. For paired-end reads, the barcode from

both ends are concatenated. [0]

-b

Specify the input read sequence file is the BAM format. For

paired-end data, two ends in a pair must be grouped together and

options

-1

-2

are usually applied to specify which end should be

mapped. Typical command lines for mapping pair-end data in the

BAM format are:

bwa aln -b1 >

bwa aln -b2 >

bwa sampe >

-0

-1

-2

When

-b

is specified, only use single-end reads in mapping.

When

-b

is specified, only use the first read in a read pair in mapping

(skip single-end reads and the second reads).

When

-b

is specified, only use the second read in a read pair in

mapping.

samse

bwa samse [-n maxOcc] <> <> <> > <>

Generate alignments in the

SAM format given single-end reads. Repetitive hits

will be randomly chosen.

OPTIONS:

-n

INT

Maximum number of alignments to output in the XA tag for reads

paired properly. If a read has more than INT hits, the XA tag will not be

written. [3]

-r

STR

Specify the read group in a format like ‘@RG

tID:foo

tSM:bar’. [null]

sampe

bwa sampe [-a maxInsSize] [-o maxOcc] [-n maxHitPaired] [-N maxHitDis] [-P]

<> <> <> <> <> > <>

Generate alignments in the SAM format given paired-end reads. Repetitive read

pairs will be placed randomly.

OPTIONS:

-a

INT

Maximum insert size for a read pair to be considered being mapped

properly. Since 0.4.5, this option is only used when there are not

enough good alignment to infer the distribution of insert sizes. [500]

-o

INT

Maximum occurrences of a read for pairing. A read with more

occurrneces will be treated as a single-end read. Reducing this

parameter helps faster pairing. [100000]

-P

Load the entire FM-index into memory to reduce disk operations

(base-space reads only). With this option, at least 1.25N bytes of

memory are required, where N is the length of the genome.

-n

INT

Maximum number of alignments to output in the XA tag for reads

paired properly. If a read has more than INT hits, the XA tag will not be

written. [3]

-N

INT

Maximum number of alignments to output in the XA tag for

disconcordant read pairs (excluding singletons). If a read has more

than INT hits, the XA tag will not be written. [10]

-r

STR

Specify the read group in a format like ‘@RG

tID:foo

tSM:bar’. [null]

bwasw

bwa bwasw [-a matchScore] [-b mmPen] [-q gapOpenPen] [-r gapExtPen] [-t

nThreads] [-w bandWidth] [-T thres] [-s hspIntv] [-z zBest] [-N nHspRev] [-c

thresCoef] <> <> []

Align

query

sequences

the

file.

When

present,

perform

paired-end

alignment.

The

paired-end

mode

only

works

for

reads

Illumina

short-insert

libraries.

the

paired-end

mode,

BWA-SW

may

still

output

split

alignments

but

they

are

all

marked

not

properly

paired;

the

mate

positions

will not be written if the mate has multiple local hits.

OPTIONS:

-a

INT

-b

INT

-q

INT

-r

INT

-t

INT

Score of a match [1]

Mismatch penalty [3]

Gap open penalty [5]

Gap extension penalty. The penalty for a contiguous gap of size k is

q+k*r. [2]

Number of threads in the multi- threading mode [1]

-w

INT

Band width in the banded alignment [33]

-T

INT

Minimum score threshold divided by a [37]

Given an l-long query, the threshold for a hit to be retained is

a*max{T,c*log(l)}. [5.5]

-z

INT

Z-best heuristics. Higher -z increases accuracy at the cost of speed.

-c

FLOAT

Coefficient for threshold adjustment according to query length.

[1]

-s

INT

Maximum SA interval size for initiating a seed. Higher -s increases

accuracy at the cost of speed. [3]

-N

INT

Minimum number of seeds supporting the resultant alignment to

skip reverse alignment. [5]

SAM ALIGNMENT FORMAT

The output of the

‘aln’

command is binary and designed for BWA use only. BWA outputs

the final alignment in the SAM (Sequence Alignment/Map) format. Each line consists of:

Col

Field

Description

QNAME

Query (pair) NAME

FLAG

bitwise FLAG

RNAME

Reference sequence NAME

POS

MAPQ

CIAGR

1-based leftmost POSition/coordinate of clipped sequence

MAPping Quality (Phred-scaled)

extended CIGAR string

MRNM

Mate Reference sequence NaMe (‘=’ if same as RNAME)

MPOS

ISIZE

SEQ

QUAL

OPT

1-based Mate POSistion

Inferred insert SIZE

query SEQuence on the same strand as the reference

query QUALity (ASCII-33 gives the Phred base quality)

variable OPTional fields in the format TAG:VTYPE:VALUE

Each bit in the FLAG field is defined as:

Chr

Flag

Description

0x0001

the read is paired in sequencing

0x0002

the read is mapped in a proper pair

0x0004

the query sequence itself is unmapped

0x0008

the mate is unmapped

0x0010

strand of the query (1 for reverse)

0x0020

strand of the mate

0x0040

the read is the first read in a pair

0x0080

the read is the second read in a pair

0x0100

the alignment is not primary

0x0200

QC failure

0x0400

optical or PCR duplicate

The Please

check <

> for the format specification and the

tools for post-processing the alignment.

BWA generates the following optional fields. Tags starting with ‘X’ are specific to BWA.

Tag

Meaning

Edit distance

Mismatching positions/bases

Alignment score

Barcode sequence

Number of best hits

Number of suboptimal hits found by BWA

Number of ambiguous bases in the referenece

Number of mismatches in the alignment

Number of gap opens

Number of gap extentions

Type: Unique/Repeat/N/Mate-sw

Alternative hits; format: (chr,pos,CIGAR,NM;)*

Suboptimal alignment score

Support from forward/reverse alignment

Number of supporting seeds

Note

that

and

are

generated

BWT

while

the

CIGAR

string

Smith-Waterman

alignment.

These

two

-

本文更新与2021-02-16 07:29，由作者提供，不代表本网站立场，转载请注明出处：https://www.bjmy2z.cn/gaokao/657839.html

返回列表：英语

腊八节英语作文：腊八节Laba Rice Porridge Festival.doc

中西方食品英文表示

当前您在：主页 > 英语 >

bwa使用说明

-

-

-

-

-

-

-

-

-

返回列表：英语

bwa使用说明的相关文章

余华爱情经典语录,余华爱情句子

心情低落的图片压抑,心情低落的图片发朋友圈

经典古训100句图片大全,古训名言警句

关于青春奋斗的名人名言鲁迅,关于青年奋斗的名言鲁迅

三国群英单机版手游礼包码,三国群英手机单机版攻略

不收费的情感挽回专家电话,情感挽回免费咨询

新婚贺语怎么说祝福语,新

适合小学生包容的句子经

开启美好一天的句子,开启

林徽因传,林徽因传主要内

结婚祝福语句句暖心,结婚

正能量的句子经典简短1

沈从文语录经典语录关于

史铁生的简介和作品,史铁

打动人心的爱情句子:我的

平凡的生活.简单的幸福的

母爱的最经典金句,母亲的

相守一生不离不弃的句子

余华的作品值得初中生看

奇妙萌可珍珠公主变好,彩

喝酒后的心情经典句子,适

努力挣钱的霸气图片,努力

有深度有涵养的句子精选

高情商女人分手说的话,高

当前您在： 主页 > 英语 >

-

-

-

-

-

-

-

-

-

bwa使用说明的相关文章

当前您在：主页 > 英语 >