关键词不能为空

当前您在: 主页 > 英语 >

BWA操作手册

作者:高考题库网
来源:https://www.bjmy2z.cn/gaokao
2021-02-16 07:30
tags:

-

2021年2月16日发(作者:下雨天)


Manual Reference Pages - bwa



1.



NAME


bwa - Burrows-Wheeler Alignment Tool



2.



CONTENTS


(1) Synopsis


(2) Description


(3) Commands And Options


(4) Sam Alignment Format


(5) Notes On Short-read Alignment












Alignment Accuracy












Estimating Insert Size Distribution












Memory Requirement












Speed


(6) Changes In Bwa-0.6


(7) License And Citation



(1)



SYNOPSIS


bwa index


bwa mem >


bwa mem >


bwa aln short_ > aln_


bwa samse aln_ short_ >


bwa sampe aln_ aln_ >


bwa bwasw long_ >



(2)



DESCRIPTION


BWA


is


a


software


package


for


mapping


low- divergent


sequences


against


a


large


reference


genome,


such


as


the


human


genome.


It


consists


of


three


algorithms:


BWA-backtrack,


BWA-SW


and


BWA-MEM.


The


first


algorithm


is


designed


for


Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged


from


70bp


to


1Mbp.


BWA-MEM


and


BWA-SW


share


similar


features


such


as


long-read


support


and


split


alignment,


but


BWA-MEM,


which


is


the


latest,


is


generally


recommended


for


high-quality


queries


as


it


is


faster


and


more


accurate.


BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina


reads.



For all the algorithms, BWA first needs to construct the FM-index for the reference


genome


(the


index



command).


Alignment


algorithms


are


invoked


with


different


sub-commands:


aln/samse/sampe



for


BWA- backtrack,



bwasw



for


BWA-SW


and


mem


for the BWA-MEM algorithm.



(3)



COMMANDS AND OPTIONS


index



bwa index [-p prefix] [-a algoType] <>


Index database sequences in the FASTA format.



OPTIONS:


-p


STR


Prefix of the output database [same as db filename]


-a


STR


Algorithm for constructing BWT index. Available options are:


is



IS linear-time algorithm for constructing suffix array. It requires


5.37N memory where N is the size of the database. IS is moderately


fast,


but


does


not


work


with


database


larger


than


2GB.


IS


is


the


default


algorithm


due


to


its


simplicity.


The


current


codes


for


IS


algorithm are reimplemented by Yuta Mori.


bwtsw



Algorithm


implemented


in


BWT-SW.


This


method


works


with the whole human genome.



mem



bwa mem


[


-aCHMpP


] [-


t


nThreads] [-


k


minSeedLen] [-


w


bandWidth] [-


d



zDropoff] [-


r


seedSplitRatio] [-


c


maxOcc] [-


A


matchScore] [-


B


mmPenalty]


[-


O


gapOpenPen] [-


E


gapExtPen] [-


L


clipPen] [-


U


unpairPen] [-


R


RGline]


[-


v


verboseLevel] []


Align 70bp-1Mbp query sequences with the BWA-MEM algorithm. Briefly,


the


algorithm


works


by


seeding


alignments


with


maximal


exact


matches


(MEMs)


and


then


extending


seeds


with


the


affine- gap


Smith-Waterman


algorithm (SW).



If file is absent and option -


p


is not set, this command regards input


reads


are single-end.


If


is


present,


this


command assumes the i-th


read in and the i-th read in constitute a read pair. If


-


p


is


used,


the


command


assumes


the


2i-th


and


the


(2i+1)-th


read


in



constitute a read pair (such input file is said to be interleaved). In this case,



is


ignored.


In


the


paired-end


mode,


the


mem



command


will


infer


the read orientation and the insert size distribution from a batch of reads.



The


BWA-MEM


algorithm


performs


local


alignment.


It


may


produce


multiple primary alignments for different part of a query sequence. This is a


crucial


feature


for


long


sequences.


However,


some


tools


such


as


Picard’s


markDuplicates


does


not


work


with


split


alignments.


One


may


consider


to


use option -


M


to flag shorter split hits as secondary.



OPTIONS:


-t


INT



Number of threads [1]


-k


INT



Minimum seed length. Matches shorter than INT will be missed. The


alignment


speed


is


usually


insensitive


to


this


value


unless


it


significantly deviates 20. [19]


-w



INT



Band


width.


Essentially,


gaps


longer


than


INT


will


not


be


found.


Note


that


the


maximum


gap


length


is


also


affected


by


the


scoring


matrix and the hit length, not solely determined by this option. [100]


-d



INT



Off-diagonal


X-dropoff


(Z-dropoff).


Stop


extension


when


the


difference between the best and the current extension score is above


|i-j|*A+INT, where i and j are the current positions of the query and


reference,


respectively,


and


A


is


the


matching


score.


Z-dropoff


is


similar to BLAST’s X


-


dropoff except that it doesn’t penalize gaps in


one


of


the


sequences


in


the


alignment.


Z-dropoff


not


only


avoids


unnecessary


extension,


but


also


reduces


poor


alignments


inside


a


long good alignment. [100]


-r



FLOAT



Trigger re- seeding for a MEM longer than minSeedLen*FLOAT.


This


is


a


key


heuristic


parameter


for


tuning


the


performance.


Larger value


yields fewer seeds, which leads to faster alignment


speed but lower accuracy. [1.5]


-c



INT



Discard a MEM if it has more than INT occurence in the genome.


This is an insensitive parameter. [10000]


-P



In the paired-end mode, perform SW to rescue missing hits only but


do not try to find hits that fit a proper pair.


-A



INT



Matching score. [1]


-B



INT



Mismatch penalty. The sequence error rate is approximately: {.75 *


exp[-log(4) * B/A]}. [4]


-O



INT



Gap open penalty. [6]


-E


INT



Gap extension penalty. A gap of length k costs O + k*E (i.e. -O is for


opening a zero-length gap). [1]


-L



INT



Clipping


penalty.


When


performing


SW


extension,


BWA-MEM


keeps track of the best score reaching the end of query. If this score


is larger than the best SW score minus the clipping penalty, clipping


will not be applied. Note that in this case, the SAM AS tag reports


the best SW score; clipping penalty is not deducted. [5]


-U



INT



Penalty


for


an


unpaired


read


pair.


BWA-MEM


scores


an


unpaired


read


pair


as


scoreRead1+scoreRead2-INT


and


scores


a


paired


as


scoreRead1+scoreRead2-insertPenalty. It compares these two scores


to determine whether we should force pairing. [9]


-p



Assume the first input query file is interleaved paired-end FASTA/Q.


See the command description for details.


-R



STR



Complete read group header line. ’



t’ can be used in STR and will be


converted to a TAB in the output SAM. The read group ID will be


attached


to


every


read


in


the


output.


An


example


is ’@RG


tID:foo


tSM:bar’. [null]



-T



INT



Don’t output alignment with score lower than INT. This option only


affects output. [30]


-a



Output


all


found


alignments


for


single-end


or


unpaired


paired-end


reads. These alignments will be flagged as secondary alignments.


-C



Append append FASTA/Q comment to SAM output. This option can


be used to transfer read meta information (e.g. barcode) to the SAM


output. Note that the FASTA/Q comment (the string after a space in


the header line) must conform the SAM spec (e.g. BC:Z:CGTAC).


Malformated comments lead to incorrect SAM output.


-H



Use


hard


clipping


’H’


in


the


SAM


output.


This


option


may


dramatically


reduce


the


redundancy


of


output


when


mapping


long


contig or BAC sequences.


-M







Mark shorter split hits as secondary (for Picard compatibility).


-v



INT


Control


the


verbose


level


of


the


output.


This


option


has


not


been


fully supported throughout BWA. Ideally, a value 0 for disabling all


the output to stderr; 1 for outputting errors only; 2 for warnings and


errors; 3 for all normal messages; 4 or higher for debugging. When


this option takes value 4, the output is not SAM. [3]



aln



bwa


aln


[-n


maxDiff]


[-o


maxGapO]


[-e


maxGapE]


[-d


nDelTail]


[-i


nIndelEnd] [-k maxSeedDiff] [-l seedLen] [-t nThrds] [-cRN] [-M misMsc]


[-O


gapOsc]


[-E


gapEsc]


[-q


trimQual]


<>


<>


>


<>


Find


the


SA


coordinates


of


the


input


reads.


Maximum


maxSeedDiff


differences


are


allowed


in


the


first


seedLen


subsequence


and


maximum


maxDiff differences are allowed in the whole sequence.


OPTIONS:


-n



NUM



M


aximum edit distance if the value is INT, or the fraction of missing


alignments given 2% uniform base error rate if FLOAT. In the latter


case,


the


maximum


edit


distance


is


automatically


chosen


for


different read lengths. [0.04]


-o


INT



Maximum number of gap opens [1]


-e



INT



Maximum


number


of


gap


extensions,


-1


for


k-difference


mode


(disallowing long gaps) [-1]


-d



INT



Disallow a long deletion within INT bp towards the 3’


-end [16]


-i



INT



Disallow an indel within INT bp towards the ends [5]


-l



INT



Take


the


first


INT


subsequence


as


seed.


If


INT


is


larger


than


the


query sequence, seeding will be disabled. For long reads, this option


is typically ranged from 25 to 35 for ‘


-


k 2’. [inf]



-k



INT



Maximum edit distance in the seed [2]


-t



INT



Number of threads (multi-threading mode) [1]


-M



INT


Mismatch penalty. BWA will not search for suboptimal hits with


a


score lower than (bestScore-misMsc). [3]


-O



INT



Gap open penalty [11]


-E


INT



Gap extension penalty [4]


-R


INT



Proceed with suboptimal alignments if there are no more than INT


equally


best


hits.


This


option


only


affects


paired-end


mapping.


Increasing


this


threshold


helps


to


improve


the


pairing


accuracy


at



the cost of speed, especially for short reads (~32bp).


-c



Reverse


query


but


not


complement


it,


which


is


required


for


alignment in the color space. (Disabled since 0.6.x)


-N



Disable


iterative


search.


All


hits


with


no


more


than


maxDiff


differences


will


be


found.


This


mode


is


much


slower


than


the


default.


-q



INT



Parameter


for


read


trimming.


BWA


trims


a


read


down


to


argmax_x{sum_{i=x+1}^l(INT-q_i)}


if


q_l


where


l


is


the


original read length. [0]


-I



The


input


is


in


the


Illumina


1.3+


read


format


(quality


equals


ASCII-64).


-B



INT



Length


of barcode starting


from


the 5’


-end. When


INT is


positive,


the barcode of each read will be trimmed before mapping and will


be


written


at


the


BC


SAM


tag.


For


paired-end


reads,


the


barcode


from both ends are concatenated. [0]


-b



Specify


the


input


read


sequence


file


is


the


BAM


format.


For


paired-end


data,


two


ends


in


a


pair


must


be


grouped


together


and


options -1 or -2 are usually applied to specify which end should be


mapped.


Typical


command


lines


for


mapping


pair-end


data


in


the


BAM format are:


bwa aln -b1 >



bwa aln -b2 >



bwa sampe >


-0






When -b is specified, only use single-end reads in mapping.


-1



When


-b


is


specified,


only


use


the


first


read


in


a


read


pair


in


mapping (skip single-end reads and the second reads).


-2



When


-b


is


specified,


only


use


the


second


read


in


a


read


pair


in


mapping.



samse



bwa samse [-n maxOcc] <> <> <> > <>


Generate


alignments


in


the


SAM


format


given


single-end


reads.


Repetitive


hits will be randomly chosen.



OPTIONS:


-n


INT



Maximum number of alignments to output in the XA tag for reads


paired properly. If a read has more than INT hits, the XA tag will not


be written. [3]


-r


STR



Specify


the


read


group


in


a


format


like


‘@RG< /p>


tID:foo


tSM:bar’.


[null]



sampe



bwa sampe [-a maxInsSize] [-o maxOcc] [-n maxHitPaired] [-N maxHitDis]


[-P] <> <> <> <> <> > <>


Generate alignments in


the SAM format given paired-end reads. Repetitive


read pairs will be placed randomly.

-


-


-


-


-


-


-


-



本文更新与2021-02-16 07:30,由作者提供,不代表本网站立场,转载请注明出处:https://www.bjmy2z.cn/gaokao/657841.html

BWA操作手册的相关文章