BWA比对软件说明书_高中生题库网|高考真题|高考试题-「密云二中」

BWA比对软件说明书

作者：高考题库网

来源：https://www.bjmy2z.cn/gaokao

2021-02-10 03:44

tags:

-

2021年2月10日发(作者：fishman)

Manual Reference Pages -

bwa (1)

NAME

bwa - Burrows-Wheeler Alignment Tool

CONTENTS

Synopsis

Description

Commands And Options

Sam Alignment Format

Notes On Short-read Alignment

Alignment Accuracy

Estimating Insert Size Distribution

Memory Requirement

Speed

Notes On Long-read Alignment

See Also

Author

License And Citation

History

SYNOPSIS

bwa index -a bwtsw

bwa aln short_ > aln_

bwa samse aln_ short_ >

bwa sampe aln_ aln_ >

bwa bwasw long_ >

DESCRIPTION

BWA is a fast light-weighted tool that aligns relatively short sequences (queries) to a sequence database (targe), such as the

human

reference

genome.

implements

two

different

algorithms,

both

based

Burrows-Wheeler

Transform

(BWT).

The

first

algorithm

is designed for short queries up to ~200bp with low error rate (<3%). It does gapped global alignment w.r.t. queries, supports

paired-end

reads,

and

one

the

fastest

short

read

alignment

algorithms

date

while also

visiting

suboptimal

hits.

The

second

algorithm, BWA-SW, is designed for long reads with more errors. It performs heuristic Smith-Waterman-like alignment to find

high-scoring

local

hits

(and

thus

chimera).

low- error

short

queries,

BWA-SW

slower

and

less

accurate

than

the

first

algorithm,

but on long queries, it is better.

For both algorithms, the database file in the FASTA format must be first indexed with the

‘index’

command, which typically takes

a few hours. The first algorithm is implemented via the

‘aln’

command, which finds the suffix array (SA) coordinates of good hits

of each individual read, and the

‘samse/sampe’

command, which converts SA coordinates to chromosomal coordinate and pairs reads

(for ‘sampe’). The second algorithm is invoked

by the

‘bwasw’

command. It works for single-end reads only.

COMMANDS AND OPTIONS

index

bwa index [-p prefix] [-a algoType] [-c] <>

Index database sequences in the FASTA format.

OPTIONS:

-c

Build color-space index. The input fast should be in nucleotide space.

-p

STR

Prefix of the output database [same as db filename]

-a

STR

Algorithm for constructing BWT index. Available options are:

IS linear-time algorithm for constructing suffix array. It requires 5.37N memory where N is

the size of the database. IS is moderately fast, but does not work with database larger than

2GB. IS is the default algorithm due to its simplicity. The current codes for IS algorithm

are reimplemented by Yuta Mori.

bwtsw

Algorithm implemented in BWT-SW. This method works with the whole human genome, but it does

not work with database smaller than 10MB and it is usually slower than IS.

aln

bwa

aln

[-n

maxDiff]

[-o

maxGapO]

[-e

maxGapE]

[-d

nDelTail]

[-i

nIndelEnd]

[-k

maxSeedDiff]

[-l

seedLen]

[-t

nThrds]

[-cRN] [-M misMsc] [-O gapOsc] [-E gapEsc] [-q trimQual] <> <> > <>

Find

the

coordinates

the

input

reads.

Maximum

maxSeedDiff

differences

are

allowed

the

first

seedLen

subsequence

and maximum

maxDiff

differences are allowed in the whole sequence.

OPTIONS:

-n

NUM

Maximum edit distance if the value is INT, or the fraction of missing alignments given 2% uniform base

error

rate

FLOAT.

the

latter

case,

the

maximum

edit

distance

automatically

chosen

for

different

read lengths. [0.04]

-o

INT

Maximum number of gap opens [1]

-e

INT

Maximum number of gap extensions, -1 for k-difference mode (disallowing long gaps) [-1]

-d

INT

Disallow a long deletion within INT bp towards the 3’

-end [16]

-i

INT

Disallow an indel within INT bp towards the ends [5]

-l

INT

Take

the

first

INT

subsequence

seed.

INT

larger

than

the

query

sequence,

seeding

will

disabled.

For long reads, this option is typically ranged from 25 to 35 for ‘

k 2’.

[inf]

-k

INT

Maximum edit distance in the seed [2]

-t

INT

Number of threads (multi-threading mode) [1]

-M

INT

Mismatch penalty. BWA will not search for suboptimal hits with a score lower than (bestScore-misMsc).

[3]

-O

INT

Gap open penalty [11]

-E

INT

Gap extension penalty [4]

-R

INT

Proceed with suboptimal alignments if there are no more than INT equally best hits. This option only

affects

paired- end

mapping.

Increasing

this

threshold

helps

improve

the

pairing

accuracy

the

cost

of speed, especially for short reads (~32bp).

-c

Reverse query but not complement it, which is required for alignment in the color space.

-N

Disable iterative search. All hits with no more than

maxDiff

differences will be found. This mode is

much slower than the default.

-q

INT

Parameter

for

read

trimming.

BWA

trims

read

down

argmax_x{sum_{i=x+1}^l(INT-q_i)}

q_l

where

l is the original read length. [0]

-I

The input is in the Illumina 1.3+ read format (quality equals ASCII-64).

-B

INT

Length

barcode

starting

from

the

5’

-end.

When

INT

positive,

the

barcode

each

read

will

trimmed

before mapping and will be written at the

SAM tag. For paired-end reads, the barcode from both ends

are concatenated. [0]

-b

Specify the input read sequence file is the BAM format. For paired-end data, two ends in a pair must

be grouped together and options

-1

-2

are usually applied to specify which end should be mapped.

Typical command lines for mapping pair- end data in the BAM format are:

bwa aln -b1 >

bwa aln -b2 >

bwa sampe >

-0

When

-b

is specified, only use single-end reads in mapping.

-1

When

-b

is specified, only use the first read in a read pair in mapping (skip single-end reads and the

second reads).

-2

When

-b

is specified, only use the second read in a read pair in mapping.

samse

bwa samse [-n maxOcc] <> <> <> > <>

Generate alignments in the SAM format given single-end reads. Repetitive hits will be randomly chosen.

OPTIONS:

-n

INT

Maximum

number

of alignments to

output

in the XA tag for

reads

paired

properly. If a read has more than

INT hits, the XA tag will not be written. [3]

-r

STR

Specify the read group in a format like ‘@RG

tID:foo

tSM:bar’. [null]

sampe

bwa

sampe

[-a

maxInsSize]

[-o

maxOcc]

[-n

maxHitPaired]

[-N

maxHitDis]

[-P]

<> > <>

Generate alignments in the SAM format given paired-end reads. Repetitive read pairs will be placed randomly.

OPTIONS:

-a

INT

Maximum insert size for a read pair to be considered being mapped properly. Since 0.4.5, this option is

only used when there are not enough good alignment to infer the distribution of insert sizes. [500]

-o

INT

Maximum occurrences of a read for pairing. A read with more occurrneces will be treated as a single-end

read. Reducing this parameter helps faster pairing. [100000]

-P

Load

the

entire

FM-index

into

memory

reduce

disk

operations

(base-space

reads

only).

With

this

option,

at least 1.25N bytes of memory are required, where N is the length of the genome.

-n

INT

Maximum number of alignments to output in the XA tag for reads paired properly. If a read has more than

INT hits, the XA tag will not be written. [3]

-N

INT

Maximum

number

alignments

output

the

tag

for

disconcordant

read

pairs

(excluding

singletons).

If a read has more than INT hits, the XA tag will not be written. [10]

-r

STR

Specify the read group in a format like ‘@RG

tID:foo

tSM:bar’. [null]

bwasw

bwa

bwasw

[-a

matchScore]

[-b

mmPen]

[-q

gapOpenPen]

[-r

gapExtPen]

[-t

nThreads]

[-w

bandWidth]

[-T

thres]

[-s

hspIntv]

[-z zBest] [-N nHspRev] [-c thresCoef] <> <>

Align query sequences in the <> file.

OPTIONS:

-a

INT

Score of a match [1]

-b

INT

Mismatch penalty [3]

-q

INT

Gap open penalty [5]

-r

INT

Gap extension penalty. The penalty for a contiguous gap of size k is q+k*r. [2]

-t

INT

Number of threads in the multi- threading mode [1]

-w

INT

Band width in the banded alignment [33]

-T

INT

Minimum score threshold divided by a [37]

-c

FLOAT

Coefficient for threshold adjustment according to query length. Given an l-long query, the threshold

for a hit to be retained is a*max{T,c*log(l)}. [5.5]

-z

INT

Z-best heuristics. Higher -z increases accuracy at the cost of speed. [1]

-s

INT

Maximum SA interval size for initiating a seed. Higher -s increases accuracy at the cost of speed. [3]

-N

INT

Minimum number of seeds supporting the resultant alignment to skip reverse alignment. [5]

SAM ALIGNMENT FORMAT

The output of the

‘aln’

command is binary and designed for BWA use only. BWA outputs the final alignment in the SAM (Sequence

Alignment/Map) format. Each line consists of:

Col

Field

Description

-

本文更新与2021-02-10 03:44，由作者提供，不代表本网站立场，转载请注明出处：https://www.bjmy2z.cn/gaokao/626115.html

返回列表：英语

Perlin Noise(译文)

外研社高中英语选修八英译汉单词(默写、背诵、听写)

当前您在：主页 > 英语 >

BWA比对软件说明书

-

-

-

-

-

-

-

-

-

返回列表：英语

BWA比对软件说明书的相关文章

余华爱情经典语录,余华爱情句子

心情低落的图片压抑,心情低落的图片发朋友圈

经典古训100句图片大全,古训名言警句

关于青春奋斗的名人名言鲁迅,关于青年奋斗的名言鲁迅

三国群英单机版手游礼包码,三国群英手机单机版攻略

不收费的情感挽回专家电话,情感挽回免费咨询

新婚贺语怎么说祝福语,新

适合小学生包容的句子经

开启美好一天的句子,开启

林徽因传,林徽因传主要内

结婚祝福语句句暖心,结婚

正能量的句子经典简短1

沈从文语录经典语录关于

史铁生的简介和作品,史铁

打动人心的爱情句子:我的

平凡的生活.简单的幸福的

母爱的最经典金句,母亲的

相守一生不离不弃的句子

余华的作品值得初中生看

奇妙萌可珍珠公主变好,彩

喝酒后的心情经典句子,适

努力挣钱的霸气图片,努力

有深度有涵养的句子精选

高情商女人分手说的话,高

当前您在： 主页 > 英语 >

-

-

-

-

-

-

-

-

-

BWA比对软件说明书的相关文章

当前您在：主页 > 英语 >