关键词不能为空

当前您在: 主页 > 英语 >

BWA比对软件说明书

作者:高考题库网
来源:https://www.bjmy2z.cn/gaokao
2021-02-10 03:44
tags:

-

2021年2月10日发(作者:fishman)


Manual Reference Pages -


bwa (1)




NAME


bwa - Burrows-Wheeler Alignment Tool


CONTENTS


Synopsis



Description



Commands And Options



Sam Alignment Format



Notes On Short-read Alignment



Alignment Accuracy



Estimating Insert Size Distribution



Memory Requirement



Speed



Notes On Long-read Alignment



See Also



Author



License And Citation



History



SYNOPSIS


bwa index -a bwtsw


bwa aln short_ > aln_


bwa samse aln_ short_ >


bwa sampe aln_ aln_ >


bwa bwasw long_ >


DESCRIPTION


BWA is a fast light-weighted tool that aligns relatively short sequences (queries) to a sequence database (targe), such as the


human


reference


genome.


It


implements


two


different


algorithms,


both


based


on


Burrows-Wheeler


Transform


(BWT).


The


first


algorithm


is designed for short queries up to ~200bp with low error rate (<3%). It does gapped global alignment w.r.t. queries, supports


paired-end


reads,


and


is


one


of


the


fastest


short


read


alignment


algorithms


to


date


while also


visiting


suboptimal


hits.


The


second


algorithm, BWA-SW, is designed for long reads with more errors. It performs heuristic Smith-Waterman-like alignment to find


high-scoring


local


hits


(and


thus


chimera).


On


low- error


short


queries,


BWA-SW


is


slower


and


less


accurate


than


the


first


algorithm,


but on long queries, it is better.


For both algorithms, the database file in the FASTA format must be first indexed with the


‘index’


command, which typically takes


a few hours. The first algorithm is implemented via the


‘aln’


command, which finds the suffix array (SA) coordinates of good hits


of each individual read, and the


‘samse/sampe’


command, which converts SA coordinates to chromosomal coordinate and pairs reads


(for ‘sampe’). The second algorithm is invoked


by the


‘bwasw’


command. It works for single-end reads only.


COMMANDS AND OPTIONS


index



bwa index [-p prefix] [-a algoType] [-c] <>


Index database sequences in the FASTA format.


OPTIONS:





-c



Build color-space index. The input fast should be in nucleotide space.


-p


STR



Prefix of the output database [same as db filename]


-a


STR



Algorithm for constructing BWT index. Available options are:




is



IS linear-time algorithm for constructing suffix array. It requires 5.37N memory where N is


the size of the database. IS is moderately fast, but does not work with database larger than


2GB. IS is the default algorithm due to its simplicity. The current codes for IS algorithm


are reimplemented by Yuta Mori.


bwtsw


Algorithm implemented in BWT-SW. This method works with the whole human genome, but it does


not work with database smaller than 10MB and it is usually slower than IS.






aln



bwa


aln


[-n


maxDiff]


[-o


maxGapO]


[-e


maxGapE]


[-d


nDelTail]


[-i


nIndelEnd]


[-k


maxSeedDiff]


[-l


seedLen]


[-t


nThrds]


[-cRN] [-M misMsc] [-O gapOsc] [-E gapEsc] [-q trimQual] <> <> > <>


Find


the


SA


coordinates


of


the


input


reads.


Maximum


maxSeedDiff



differences


are


allowed


in


the


first


seedLen



subsequence


and maximum


maxDiff


differences are allowed in the whole sequence.


OPTIONS:





-n


NUM



Maximum edit distance if the value is INT, or the fraction of missing alignments given 2% uniform base


error


rate


if


FLOAT.


In


the


latter


case,


the


maximum


edit


distance


is


automatically


chosen


for


different


read lengths. [0.04]


-o


INT



Maximum number of gap opens [1]


-e


INT



Maximum number of gap extensions, -1 for k-difference mode (disallowing long gaps) [-1]


-d


INT



Disallow a long deletion within INT bp towards the 3’


-end [16]


-i


INT



Disallow an indel within INT bp towards the ends [5]


-l


INT



Take


the


first


INT


subsequence


as


seed.


If


INT


is


larger


than


the


query


sequence,


seeding


will


be


disabled.


For long reads, this option is typically ranged from 25 to 35 for ‘


-


k 2’.


[inf]


-k


INT



Maximum edit distance in the seed [2]


-t


INT



Number of threads (multi-threading mode) [1]


-M


INT



Mismatch penalty. BWA will not search for suboptimal hits with a score lower than (bestScore-misMsc).


[3]


-O


INT



Gap open penalty [11]


-E


INT



Gap extension penalty [4]


-R


INT



Proceed with suboptimal alignments if there are no more than INT equally best hits. This option only


affects


paired- end


mapping.


Increasing


this


threshold


helps


to


improve


the


pairing


accuracy


at


the


cost


of speed, especially for short reads (~32bp).


-c



Reverse query but not complement it, which is required for alignment in the color space.


-N



Disable iterative search. All hits with no more than


maxDiff


differences will be found. This mode is


much slower than the default.


-q


INT



Parameter


for


read


trimming.


BWA


trims


a


read


down


to


argmax_x{sum_{i=x+1}^l(INT-q_i)}


if


q_l


where


l is the original read length. [0]


-I



The input is in the Illumina 1.3+ read format (quality equals ASCII-64).


-B


INT



Length


of


barcode


starting


from


the


5’


-end.


When


INT



is


positive,


the


barcode


of


each


read


will


be


trimmed


before mapping and will be written at the


BC


SAM tag. For paired-end reads, the barcode from both ends


are concatenated. [0]


-b



Specify the input read sequence file is the BAM format. For paired-end data, two ends in a pair must


be grouped together and options


-1


or


-2


are usually applied to specify which end should be mapped.


Typical command lines for mapping pair- end data in the BAM format are:


bwa aln -b1 >


bwa aln -b2 >


bwa sampe >


-0



When


-b


is specified, only use single-end reads in mapping.


-1



When


-b


is specified, only use the first read in a read pair in mapping (skip single-end reads and the


second reads).


-2



When


-b


is specified, only use the second read in a read pair in mapping.




samse



bwa samse [-n maxOcc] <> <> <> > <>


Generate alignments in the SAM format given single-end reads. Repetitive hits will be randomly chosen.


OPTIONS:





-n


INT



Maximum


number


of alignments to


output


in the XA tag for


reads


paired


properly. If a read has more than


INT hits, the XA tag will not be written. [3]


-r


STR



Specify the read group in a format like ‘@RG


tID:foo


tSM:bar’. [null]





sampe



bwa


sampe


[-a


maxInsSize]


[-o


maxOcc]


[-n


maxHitPaired]


[-N


maxHitDis]


[-P]


<>


<>


<>


<>


<> > <>


Generate alignments in the SAM format given paired-end reads. Repetitive read pairs will be placed randomly.


OPTIONS:





-a


INT



Maximum insert size for a read pair to be considered being mapped properly. Since 0.4.5, this option is


only used when there are not enough good alignment to infer the distribution of insert sizes. [500]


-o


INT



Maximum occurrences of a read for pairing. A read with more occurrneces will be treated as a single-end


read. Reducing this parameter helps faster pairing. [100000]


-P



Load


the


entire


FM-index


into


memory


to


reduce


disk


operations


(base-space


reads


only).


With


this


option,


at least 1.25N bytes of memory are required, where N is the length of the genome.


-n


INT



Maximum number of alignments to output in the XA tag for reads paired properly. If a read has more than


INT hits, the XA tag will not be written. [3]


-N


INT



Maximum


number


of


alignments


to


output


in


the


XA


tag


for


disconcordant


read


pairs


(excluding


singletons).


If a read has more than INT hits, the XA tag will not be written. [10]


-r


STR



Specify the read group in a format like ‘@RG


tID:foo


tSM:bar’. [null]





bwasw



bwa


bwasw


[-a


matchScore]


[-b


mmPen]


[-q


gapOpenPen]


[-r


gapExtPen]


[-t


nThreads]


[-w


bandWidth]


[-T


thres]


[-s


hspIntv]


[-z zBest] [-N nHspRev] [-c thresCoef] <> <>


Align query sequences in the <> file.


OPTIONS:





-a


INT



Score of a match [1]


-b


INT



Mismatch penalty [3]


-q


INT



Gap open penalty [5]


-r


INT



Gap extension penalty. The penalty for a contiguous gap of size k is q+k*r. [2]


-t


INT



Number of threads in the multi- threading mode [1]


-w


INT



Band width in the banded alignment [33]


-T


INT



Minimum score threshold divided by a [37]


-c


FLOAT



Coefficient for threshold adjustment according to query length. Given an l-long query, the threshold


for a hit to be retained is a*max{T,c*log(l)}. [5.5]


-z


INT



Z-best heuristics. Higher -z increases accuracy at the cost of speed. [1]


-s


INT



Maximum SA interval size for initiating a seed. Higher -s increases accuracy at the cost of speed. [3]


-N


INT



Minimum number of seeds supporting the resultant alignment to skip reverse alignment. [5]






SAM ALIGNMENT FORMAT


The output of the


‘aln’


command is binary and designed for BWA use only. BWA outputs the final alignment in the SAM (Sequence


Alignment/Map) format. Each line consists of:


Col



Field



Description


-


-


-


-


-


-


-


-



本文更新与2021-02-10 03:44,由作者提供,不代表本网站立场,转载请注明出处:https://www.bjmy2z.cn/gaokao/626115.html

BWA比对软件说明书的相关文章