BWA操作手册_高中生题库网|高考真题|高考试题-「密云二中」

-

2021年2月16日发(作者：下雨天)

Manual Reference Pages - bwa

NAME

bwa - Burrows-Wheeler Alignment Tool

CONTENTS

(1) Synopsis

(2) Description

(3) Commands And Options

(4) Sam Alignment Format

(5) Notes On Short-read Alignment

Alignment Accuracy

Estimating Insert Size Distribution

Memory Requirement

Speed

(6) Changes In Bwa-0.6

(7) License And Citation

(1)

SYNOPSIS

bwa index

bwa mem >

bwa aln short_ > aln_

bwa samse aln_ short_ >

bwa sampe aln_ aln_ >

bwa bwasw long_ >

(2)

DESCRIPTION

BWA

software

package

for

mapping

low- divergent

sequences

against

large

reference

genome,

such

the

human

genome.

consists

three

algorithms:

BWA-backtrack,

BWA-SW

and

BWA-MEM.

The

first

algorithm

designed

for

Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged

from

70bp

1Mbp.

BWA-MEM

and

BWA-SW

similar

features

such

long-read

support

and

split

alignment,

but

BWA-MEM,

which

the

latest,

generally

recommended

for

high-quality

queries

faster

and

accurate.

BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina

reads.

For all the algorithms, BWA first needs to construct the FM-index for the reference

genome

(the

index

command).

Alignment

algorithms

are

invoked

with

different

sub-commands:

aln/samse/sampe

for

BWA- backtrack,

bwasw

for

BWA-SW

and

mem

for the BWA-MEM algorithm.

(3)

COMMANDS AND OPTIONS

index

bwa index [-p prefix] [-a algoType] <>

Index database sequences in the FASTA format.

OPTIONS:

-p

STR

Prefix of the output database [same as db filename]

-a

STR

Algorithm for constructing BWT index. Available options are:

IS linear-time algorithm for constructing suffix array. It requires

5.37N memory where N is the size of the database. IS is moderately

fast,

but

does

not

work

with

database

larger

than

2GB.

the

default

algorithm

due

its

simplicity.

The

current

codes

for

algorithm are reimplemented by Yuta Mori.

bwtsw

Algorithm

implemented

BWT-SW.

This

method

works

with the whole human genome.

mem

bwa mem

[

-aCHMpP

] [-

nThreads] [-

minSeedLen] [-

bandWidth] [-

zDropoff] [-

seedSplitRatio] [-

maxOcc] [-

matchScore] [-

mmPenalty]

gapOpenPen] [-

gapExtPen] [-

clipPen] [-

unpairPen] [-

RGline]

verboseLevel] []

Align 70bp-1Mbp query sequences with the BWA-MEM algorithm. Briefly,

the

algorithm

works

seeding

alignments

with

maximal

exact

matches

(MEMs)

and

then

extending

seeds

with

the

affine- gap

Smith-Waterman

algorithm (SW).

If file is absent and option -

is not set, this command regards input

reads

are single-end.

present,

this

command assumes the i-th

read in and the i-th read in constitute a read pair. If

used,

the

command

assumes

the

2i-th

and

the

(2i+1)-th

read

constitute a read pair (such input file is said to be interleaved). In this case,

ignored.

the

paired-end

mode,

the

mem

command

will

infer

the read orientation and the insert size distribution from a batch of reads.

The

BWA-MEM

algorithm

performs

local

alignment.

may

produce

multiple primary alignments for different part of a query sequence. This is a

crucial

feature

for

long

sequences.

However,

some

tools

such

Picard’s

markDuplicates

does

not

work

with

split

alignments.

One

may

consider

use option -

to flag shorter split hits as secondary.

OPTIONS:

-t

INT

Number of threads [1]

-k

INT

Minimum seed length. Matches shorter than INT will be missed. The

alignment

speed

usually

insensitive

this

value

unless

significantly deviates 20. [19]

-w

INT

Band

width.

Essentially,

gaps

longer

than

INT

will

not

found.

Note

that

the

maximum

gap

length

also

affected

the

scoring

matrix and the hit length, not solely determined by this option. [100]

-d

INT

Off-diagonal

X-dropoff

(Z-dropoff).

Stop

extension

when

the

difference between the best and the current extension score is above

|i-j|*A+INT, where i and j are the current positions of the query and

reference,

respectively,

and

the

matching

score.

Z-dropoff

similar to BLAST’s X

dropoff except that it doesn’t penalize gaps in

one

the

sequences

the

alignment.

Z-dropoff

not

only

avoids

unnecessary

extension,

but

also

reduces

poor

alignments

inside

long good alignment. [100]

-r

FLOAT

Trigger re- seeding for a MEM longer than minSeedLen*FLOAT.

This

key

heuristic

parameter

for

tuning

the

performance.

Larger value

yields fewer seeds, which leads to faster alignment

speed but lower accuracy. [1.5]

-c

INT

Discard a MEM if it has more than INT occurence in the genome.

This is an insensitive parameter. [10000]

-P

In the paired-end mode, perform SW to rescue missing hits only but

do not try to find hits that fit a proper pair.

-A

INT

Matching score. [1]

-B

INT

Mismatch penalty. The sequence error rate is approximately: {.75 *

exp[-log(4) * B/A]}. [4]

-O

INT

Gap open penalty. [6]

-E

INT

Gap extension penalty. A gap of length k costs O + k*E (i.e. -O is for

opening a zero-length gap). [1]

-L

INT

Clipping

penalty.

When

performing

extension,

BWA-MEM

keeps track of the best score reaching the end of query. If this score

is larger than the best SW score minus the clipping penalty, clipping

will not be applied. Note that in this case, the SAM AS tag reports

the best SW score; clipping penalty is not deducted. [5]

-U

INT

Penalty

for

unpaired

read

pair.

BWA-MEM

scores

unpaired

read

pair

scoreRead1+scoreRead2-INT

and

scores

paired

scoreRead1+scoreRead2-insertPenalty. It compares these two scores

to determine whether we should force pairing. [9]

-p

Assume the first input query file is interleaved paired-end FASTA/Q.

See the command description for details.

-R

STR

Complete read group header line. ’

t’ can be used in STR and will be

converted to a TAB in the output SAM. The read group ID will be

attached

every

read

the

output.

example

is ’@RG

tID:foo

tSM:bar’. [null]

-T

INT

Don’t output alignment with score lower than INT. This option only

affects output. [30]

-a

Output

all

found

alignments

for

single-end

unpaired

paired-end

reads. These alignments will be flagged as secondary alignments.

-C

Append append FASTA/Q comment to SAM output. This option can

be used to transfer read meta information (e.g. barcode) to the SAM

output. Note that the FASTA/Q comment (the string after a space in

the header line) must conform the SAM spec (e.g. BC:Z:CGTAC).

Malformated comments lead to incorrect SAM output.

-H

Use

hard

clipping

’H’

the

SAM

output.

This

option

may

dramatically

reduce

the

redundancy

output

when

mapping

long

contig or BAC sequences.

-M

Mark shorter split hits as secondary (for Picard compatibility).

-v

INT

Control

the

verbose

level

the

output.

This

option

has

not

been

fully supported throughout BWA. Ideally, a value 0 for disabling all

the output to stderr; 1 for outputting errors only; 2 for warnings and

errors; 3 for all normal messages; 4 or higher for debugging. When

this option takes value 4, the output is not SAM. [3]

aln

bwa

aln

[-n

maxDiff]

[-o

maxGapO]

[-e

maxGapE]

[-d

nDelTail]

[-i

nIndelEnd] [-k maxSeedDiff] [-l seedLen] [-t nThrds] [-cRN] [-M misMsc]

[-O

gapOsc]

[-E

gapEsc]

[-q

trimQual]

Find

the

coordinates

the

input

reads.

Maximum

maxSeedDiff

differences

are

allowed

the

first

seedLen

subsequence

and

maximum

maxDiff differences are allowed in the whole sequence.

OPTIONS:

-n

NUM

aximum edit distance if the value is INT, or the fraction of missing

alignments given 2% uniform base error rate if FLOAT. In the latter

case,

the

maximum

edit

distance

automatically

chosen

for

different read lengths. [0.04]

-o

INT

Maximum number of gap opens [1]

-e

INT

Maximum

number

gap

extensions,

-1

for

k-difference

mode

(disallowing long gaps) [-1]

-d

INT

Disallow a long deletion within INT bp towards the 3’

-end [16]

-i

INT

Disallow an indel within INT bp towards the ends [5]

-l

INT

Take

the

first

INT

subsequence

seed.

INT

larger

than

the

query sequence, seeding will be disabled. For long reads, this option

is typically ranged from 25 to 35 for ‘

k 2’. [inf]

-k

INT

Maximum edit distance in the seed [2]

-t

INT

Number of threads (multi-threading mode) [1]

-M

INT

Mismatch penalty. BWA will not search for suboptimal hits with

score lower than (bestScore-misMsc). [3]

-O

INT

Gap open penalty [11]

-E

INT

Gap extension penalty [4]

-R

INT

Proceed with suboptimal alignments if there are no more than INT

equally

best

hits.

This

option

only

affects

paired-end

mapping.

Increasing

this

threshold

helps

improve

the

pairing

accuracy

the cost of speed, especially for short reads (~32bp).

-c

Reverse

query

but

not

complement

it,

which

required

for

alignment in the color space. (Disabled since 0.6.x)

-N

Disable

iterative

search.

All

hits

with

than

maxDiff

differences

will

found.

This

mode

much

slower

than

the

default.

-q

INT

Parameter

for

read

trimming.

BWA

trims

read

down

argmax_x{sum_{i=x+1}^l(INT-q_i)}

q_l

where

the

original read length. [0]

-I

The

input

the

Illumina

1.3+

read

format

(quality

equals

ASCII-64).

-B

INT

Length

of barcode starting

from

the 5’

-end. When

INT is

positive,

the barcode of each read will be trimmed before mapping and will

written

the

SAM

tag.

For

paired-end

reads,

the

barcode

from both ends are concatenated. [0]

-b

Specify

the

input

read

sequence

file

the

BAM

format.

For

paired-end

data,

two

ends

pair

must

grouped

together

and

options -1 or -2 are usually applied to specify which end should be

mapped.

Typical

command

lines

for

mapping

pair-end

data

the

BAM format are:

bwa aln -b1 >

bwa aln -b2 >

bwa sampe >

-0

When -b is specified, only use single-end reads in mapping.

-1

When

-b

specified,

only

use

the

first

read

pair

mapping (skip single-end reads and the second reads).

-2

When

-b

specified,

only

use

the

second

read

pair

mapping.

samse

bwa samse [-n maxOcc] <> <> <> > <>

Generate

alignments

the

SAM

format

given

single-end

reads.

Repetitive

hits will be randomly chosen.

OPTIONS:

-n

INT

Maximum number of alignments to output in the XA tag for reads

paired properly. If a read has more than INT hits, the XA tag will not

be written. [3]

-r

STR

Specify

the

read

group

format

‘@RG< /p>

tID:foo

tSM:bar’.

[null]

sampe

bwa sampe [-a maxInsSize] [-o maxOcc] [-n maxHitPaired] [-N maxHitDis]

[-P] <> <> <> <> <> > <>

Generate alignments in

the SAM format given paired-end reads. Repetitive

read pairs will be placed randomly.

-

本文更新与2021-02-16 07:30，由作者提供，不代表本网站立场，转载请注明出处：https://www.bjmy2z.cn/gaokao/657841.html

返回列表：英语

中西方食品英文表示

(完整版)Abaqus操作说明

当前您在：主页 > 英语 >

BWA操作手册

-

-

-

-

-

-

-

-

-

返回列表：英语

BWA操作手册的相关文章

余华爱情经典语录,余华爱情句子

心情低落的图片压抑,心情低落的图片发朋友圈

经典古训100句图片大全,古训名言警句

关于青春奋斗的名人名言鲁迅,关于青年奋斗的名言鲁迅

三国群英单机版手游礼包码,三国群英手机单机版攻略

不收费的情感挽回专家电话,情感挽回免费咨询

新婚贺语怎么说祝福语,新

适合小学生包容的句子经

开启美好一天的句子,开启

林徽因传,林徽因传主要内

结婚祝福语句句暖心,结婚

正能量的句子经典简短1

沈从文语录经典语录关于

史铁生的简介和作品,史铁

打动人心的爱情句子:我的

平凡的生活.简单的幸福的

母爱的最经典金句,母亲的

相守一生不离不弃的句子

余华的作品值得初中生看

奇妙萌可珍珠公主变好,彩

喝酒后的心情经典句子,适

努力挣钱的霸气图片,努力

有深度有涵养的句子精选

高情商女人分手说的话,高

当前您在： 主页 > 英语 >

-

-

-

-

-

-

-

-

-

BWA操作手册的相关文章

当前您在：主页 > 英语 >