关键词不能为空

当前您在: 主页 > 英语 >

bwa使用说明

作者:高考题库网
来源:https://www.bjmy2z.cn/gaokao
2021-02-16 07:29
tags:

-

2021年2月16日发(作者:舒舒)


Manual Reference Pages-


bwa (1)


NAME


bwa - Burrows-Wheeler Alignment Tool



CONTENTS


Synopsis


Description


Commands And Options


Sam Alignment Format


Notes On Short-read Alignment






Alignment Accuracy






Estimating Insert Size Distribution






Memory Requirement






Speed


Changes In Bwa-0.6


See Also


Author


License And Citation


History


SYNOPSIS


bwa index


构建索引



bwa mem >


单端测序



bwa mem >


双端测序



bwa aln short_ > aln_



bwa samse aln_ short_ >



bwa sampe aln_ aln_ >



bwa bwasw long_ >



DESCRIPTION


BWA


is


a


software


package


for


mapping


low- divergent


sequences


against


a


large


reference


genome,


such


as


the


human


genome.


It


consists


of


three


algorithms:


BWA-backtrack, BWA-SW and BWA-MEM.


The first algorithm is designed for Illumina


sequence reads up to 100bp, while the rest two for longer sequences ranged from


70bp to 1Mbp


. BWA-MEM and BWA-SW share similar features such as long- read support


and


split


alignment,


but


BWA-MEM,


which


is


the


latest,


is


generally


recommended


for


high-quality


queries


as


it


is


faster


and


more


accurate.


BWA-MEM


also


has


better


performance than BWA- backtrack for 70-100bp Illumina reads.



For


all


the


algorithms,


BWA


first


needs


to


construct


the


FM- index


for


the


reference


genome


(the


index



command).


Alignment


algorithms


are


invoked


with


different


sub-commands:


aln


/


samse


/


sampe


for BWA-backtrack,


bwasw


for BWA-SW and


mem


for


the BWA- MEM algorithm.



COMMANDS AND OPTIONS


index




bwa index [-p prefix] [-a algoType] <>



Index database sequences in the FASTA format.



OPTIONS:




-p


STR



Prefix of the output database [same as db filename]



-a


STR



Algorithm for constructing BWT index. Available options are:



Is


(


默认


)


IS linear-time algorithm for constructing suffix array. It


requires 5.37N memory where N is the size of the database.


IS is moderately fast, but does not work with database


larger than 2GB


. IS is the default algorithm due to its


simplicity. The current codes for IS algorithm are


reimplemented by Yuta Mori.



bwtsw



Algorithm implemented in BWT-SW. This method works


with the whole human genome.



mem




bwa mem


[


-aCHMpP


] [


-t



nThreads


] [


-k



minSeedLen


] [


-w



bandWidth


] [


-d



zDropoff


] [


-r



seedSplitRatio


] [


-c



maxOcc


] [


-A



matchScore


] [


-B



mmPenalty


] [


-O



gapOpenPen


] [


-E



gapExtPen


] [


-L



clipPen


] [


-U



unpairPen


] [


-R



RGline


] [


-v



verboseLevel


]





[



]



Align


70bp-1Mbp


query


sequences


with


the


BWA-MEM


algorithm.



Briefly,


the algorithm works by seeding alignments with maximal exact matches (MEMs)


and then extending seeds with the affine-gap Smith-Waterman algorithm (SW).



If




file


is


absent


and


option


-p



is


not


set,


this


command


regards


input


reads are single-end. If



is present, this command assumes the


i


-th read


in



and the


i


-th read in



constitute a read pair. If


-p


is used, the


command assumes the 2


i


-th and the (2


i


+1)-th read in



constitute a read


pair (such input file is said to be interleaved). In this case,



is ignored. In


the paired-end mode, the


mem


command will infer the read orientation and the


insert size distribution from a batch of reads.



The


BWA-MEM


algorithm


performs


local


alignment.


It


may


produce


multiple


primary


alignments


for


different


part


of


a


query


sequence.


This


is


a


crucial


feature


for


long


sequences.


However,


some


tools


such


as


Picard’s


markDuplicates


does


not


work


with


split


alignments.


One


may


consider


to


use


option


-M


to flag shorter split hits as secondary.



OPTIONS:




-t


INT




-k


INT




Number of threads [1]



Minimum seed length. Matches shorter than


INT


will be missed. The


alignment speed is usually insensitive to this value unless it


significantly deviates 20. [19]



-w


INT




Band width. Essentially, gaps longer than


INT


will not be found. Note


that the maximum gap length is also affected by the scoring matrix


and the hit length, not solely determined by this option. [100]



-d


INT




Off-diagonal X-dropoff (Z-dropoff). Stop extension when the


difference between the best and the current extension score is


above |

< p>
i


-


j


|*


A


+


INT


, where


i


and


j


are the current positions of the


query and reference, respectively, and


A


is the matching score.


Z-


dropoff is similar to BLAST’s X


-dropoff except


that it doesn’t


penalize gaps in one of the sequences in the alignment. Z-dropoff


not only avoids unnecessary extension, but also reduces poor


alignments inside a long good alignment. [100]



-r


FLOAT



Trigger re-seeding for a MEM longer than


minSeedLen


*


F LOAT


. This


is a key heuristic parameter for tuning the performance. Larger value


yields fewer seeds, which leads to faster alignment speed but lower


accuracy. [1.5]



-c


INT




-P




Discard a MEM if it has more than


INT


occurence in the genome.


This is an insensitive parameter. [10000]



In the paired-end mode, perform SW to rescue missing hits only but


do not try to find hits that fit a proper pair.



-A


INT




Matching score. [1]



-B


INT




Mismatch penalty. The sequence error rate is approximately: {.75 *


exp[-log(4) * B/A]}. [4]



-O


INT




Gap open penalty. [6]



-E


INT




-L


INT




Gap extension penalty. A gap of length k costs O + k*E (i.e.


-O


is for


opening a zero-length gap). [1]



Clipping penalty. When performing SW extension, BWA-MEM keeps


track of the best score reaching the end of query. If this score is


larger than the best SW score minus the clipping penalty, clipping


will not be applied. Note that in this case, the SAM AS tag reports


the best SW score; clipping penalty is not deducted. [5]



-U


INT




Penalty for an unpaired read pair. BWA-MEM scores an unpaired


read pair as scoreRead1+scoreRead2-


INT


and scores a paired as


scoreRead1+scoreRead2-insertPenalty. It compares these two


scores to determine whether we should force pairing. [9]



-p




Assume the first input query file is interleaved paired-end FASTA/Q.


See the command description for details.



-R


STR




Complete read


group header line. ’



t’ can be used in


STR


and will be


converted to a TAB in the output SAM. The read group ID will be


attached to every read in the output. An example


is ’@RG


tID:foo


tSM:bar’. [null]



-T


INT




-a




-C




Don’t output alignment with score lower than


INT


. This option only


affects output. [30]



Output all found alignments for single- end or unpaired paired-end


reads. These alignments will be flagged as secondary alignments.



Append append FASTA/Q comment to SAM output. This option can


be used to transfer read meta information (e.g. barcode) to the SAM


output. Note that the FASTA/Q comment (the string after a space in


the header line) must conform the SAM spec (e.g. BC:Z:CGTAC).


Malformated comments lead to incorrect SAM output.



-H




Use hard clipping ’H’ in the SAM output. This option may


dramatically reduce the redundancy of output when mapping long


contig or BAC sequences.



-M




-v


INT




Mark shorter split hits as secondary (for Picard compatibility).



Control the verbose level of the output. This option has not been


fully supported throughout BWA. Ideally, a value 0 for disabling all


the output to stderr; 1 for outputting errors only; 2 for warnings and


errors; 3 for all normal messages; 4 or higher for debugging. When


this option takes value 4, the output is not SAM. [3]



aln




bwa aln [-n maxDiff] [-o maxGapO] [-e maxGapE] [-d nDelTail] [-i nIndelEnd] [-k


maxSeedDiff] [-l seedLen] [-t nThrds] [-cRN] [-M misMsc] [-O gapOsc] [-E


gapEsc] [-q trimQual] <> <> >< >



Find


the


SA


coordinates


of


the


input


reads.


Maximum


maxSeedDiff



differences


are allowed in the first


seedLen


subsequence and maximum


maxDiff


differences


are allowed in the whole sequence.



OPTIONS:




-n


NUM




M


aximum edit distance if the value is INT, or the fraction of missing


alignments given 2% uniform base error rate if FLOAT. In the latter


case, the maximum edit distance is automatically chosen for different


read lengths. [0.04]



-o


INT




Maximum number of gap opens [1]



-e


INT




Maximum number of gap extensions, -1 for k-difference mode


(disallowing long gaps) [-1]



-d


INT




Disallow a long deletion within INT bp towards the 3’


-end [16]



-i


INT




Disallow an indel within INT bp towards the ends [5]



-l


INT




Take the first INT subsequence as seed. If INT is larger than the query


sequence, seeding will be disabled. For long reads, this option is


typically ranged from 25 to 35 for ‘


-


k 2’. [inf]



-k


INT




Maximum edit distance in the seed [2]



-t


INT




Number of threads (multi-threading mode) [1]



-M


INT




Mismatch penalty. BWA will not search for suboptimal hits with a


score lower than (bestScore-misMsc). [3]



-O


INT




Gap open penalty [11]



-E


INT




Gap extension penalty [4]



-R


INT




Proceed with suboptimal alignments if there are no more than INT


equally best hits. This option only affects paired-end mapping.


Increasing this threshold helps to improve the pairing accuracy at the


cost of speed, especially for short reads (~32bp).



-c




-N




Reverse query but not complement it, which is required for alignment


in the color space. (Disabled since 0.6.x)



Disable iterative search. All hits with no more than


maxDiff



differences will be found. This mode is much slower than the default.



-q


INT




Parameter for read trimming. BWA trims a read down to


argmax_x{sum_{i=x+1}^l(INT-q_i)} if q_l


read length. [0]



-I




The input is in the Illumina 1.3+ read format (quality equals ASCII-64).



-B


INT




Length of barcode start


ing from the 5’


-end. When


INT


is positive, the


barcode of each read will be trimmed before mapping and will be


written at the


BC


SAM tag. For paired-end reads, the barcode from


both ends are concatenated. [0]



-b




Specify the input read sequence file is the BAM format. For


paired-end data, two ends in a pair must be grouped together and


options


-1


or


-2


are usually applied to specify which end should be


mapped. Typical command lines for mapping pair-end data in the


BAM format are:



bwa aln -b1 >



bwa aln -b2 >



bwa sampe >



-0




-1




-2




When


-b


is specified, only use single-end reads in mapping.



When


-b


is specified, only use the first read in a read pair in mapping


(skip single-end reads and the second reads).



When


-b


is specified, only use the second read in a read pair in


mapping.



samse




bwa samse [-n maxOcc] <> <> <> > <>



Generate alignments in the


SAM format given single-end reads. Repetitive hits


will be randomly chosen.



OPTIONS:




-n


INT



Maximum number of alignments to output in the XA tag for reads


paired properly. If a read has more than INT hits, the XA tag will not be


written. [3]



-r


STR




Specify the read group in a format like ‘@RG


tID:foo


tSM:bar’. [null]



sampe




bwa sampe [-a maxInsSize] [-o maxOcc] [-n maxHitPaired] [-N maxHitDis] [-P]


<> <> <> <> <> > <>



Generate alignments in the SAM format given paired-end reads. Repetitive read


pairs will be placed randomly.



OPTIONS:




-a


INT



Maximum insert size for a read pair to be considered being mapped


properly. Since 0.4.5, this option is only used when there are not


enough good alignment to infer the distribution of insert sizes. [500]



-o


INT



Maximum occurrences of a read for pairing. A read with more


occurrneces will be treated as a single-end read. Reducing this


parameter helps faster pairing. [100000]



-P




Load the entire FM-index into memory to reduce disk operations


(base-space reads only). With this option, at least 1.25N bytes of


memory are required, where N is the length of the genome.



-n


INT



Maximum number of alignments to output in the XA tag for reads


paired properly. If a read has more than INT hits, the XA tag will not be


written. [3]



-N


INT



Maximum number of alignments to output in the XA tag for


disconcordant read pairs (excluding singletons). If a read has more


than INT hits, the XA tag will not be written. [10]



-r


STR




Specify the read group in a format like ‘@RG


tID:foo


tSM:bar’. [null]



bwasw




bwa bwasw [-a matchScore] [-b mmPen] [-q gapOpenPen] [-r gapExtPen] [-t


nThreads] [-w bandWidth] [-T thres] [-s hspIntv] [-z zBest] [-N nHspRev] [-c


thresCoef] <> <> []



Align


query


sequences


in


the




file.


When




is


present,


perform


paired-end


alignment.


The


paired-end


mode


only


works


for


reads


Illumina


short-insert


libraries.


In


the


paired-end


mode,


BWA-SW


may


still


output


split


alignments


but


they


are


all


marked


as


not


properly


paired;


the


mate


positions


will not be written if the mate has multiple local hits.



OPTIONS:




-a


INT




-b


INT




-q


INT




-r


INT




-t


INT




Score of a match [1]



Mismatch penalty [3]



Gap open penalty [5]



Gap extension penalty. The penalty for a contiguous gap of size k is


q+k*r. [2]



Number of threads in the multi- threading mode [1]



-w


INT




Band width in the banded alignment [33]



-T


INT




Minimum score threshold divided by a [37]



Given an l-long query, the threshold for a hit to be retained is


a*max{T,c*log(l)}. [5.5]



-z


INT




Z-best heuristics. Higher -z increases accuracy at the cost of speed.


-c


FLOAT



Coefficient for threshold adjustment according to query length.


[1]



-s


INT




Maximum SA interval size for initiating a seed. Higher -s increases


accuracy at the cost of speed. [3]



-N


INT




Minimum number of seeds supporting the resultant alignment to


skip reverse alignment. [5]



SAM ALIGNMENT FORMAT


The output of the


‘aln’


command is binary and designed for BWA use only. BWA outputs


the final alignment in the SAM (Sequence Alignment/Map) format. Each line consists of:



Col



Field



Description



1


QNAME


Query (pair) NAME


2


FLAG


bitwise FLAG


3


RNAME


Reference sequence NAME


4


POS


5


MAPQ


6


CIAGR


1-based leftmost POSition/coordinate of clipped sequence


MAPping Quality (Phred-scaled)


extended CIGAR string


7


MRNM


Mate Reference sequence NaMe (‘=’ if same as RNAME)



8


MPOS


9


ISIZE


10


SEQ


11


QUAL


12


OPT


1-based Mate POSistion


Inferred insert SIZE


query SEQuence on the same strand as the reference


query QUALity (ASCII-33 gives the Phred base quality)


variable OPTional fields in the format TAG:VTYPE:VALUE


Each bit in the FLAG field is defined as:



Chr



Flag



Description



p


P


u


0x0001


the read is paired in sequencing


0x0002


the read is mapped in a proper pair


0x0004


the query sequence itself is unmapped


U


r


R


1


2


s


f


d


0x0008


the mate is unmapped


0x0010


strand of the query (1 for reverse)


0x0020


strand of the mate


0x0040


the read is the first read in a pair



0x0080


the read is the second read in a pair



0x0100


the alignment is not primary



0x0200


QC failure


0x0400


optical or PCR duplicate


The Please


check <



> for the format specification and the


tools for post-processing the alignment.



BWA generates the following optional fields. Tags starting with ‘X’ are specific to BWA.



Tag



Meaning



NM



Edit distance


MD



Mismatching positions/bases


AS



Alignment score


BC



Barcode sequence


X0



Number of best hits


X1



Number of suboptimal hits found by BWA


XN



Number of ambiguous bases in the referenece


XM



Number of mismatches in the alignment


XO



Number of gap opens


XG



Number of gap extentions


XT



Type: Unique/Repeat/N/Mate-sw


XA



Alternative hits; format: (chr,pos,CIGAR,NM;)*


XS



Suboptimal alignment score


XF



Support from forward/reverse alignment


XE



Number of supporting seeds


Note


that


XO


and


XG


are


generated


by


BWT


search


while


the


CIGAR


string


by


Smith-Waterman


alignment.


These


two


tags


may


be


inconsistent


with the


CIGAR


string.


This is not a bug.



NOTES ON SHORT-READ ALIGNMENT




Alignment Accuracy


When


seeding


is


disabled,


BWA


guarantees


to


find


an


alignment


containing


maximum


maxDiff


differences


including


maxGapO


gap opens which do not occur within


nIndelEnd



bp towards either end of the query. Longer gaps may be found if


maxGapE


is positive, but


it is not guaranteed to


find


all hits. When seeding


is enabled, BWA


further requires that


the first


seedLen


subsequence contains no more than


maxSeedDiff


differences.



When gapped alignment is disabled, BWA is expected to generate the same alignment as


Eland


version


1,


the


Illumina


alignment


program.


However,


as


BWA


change


‘N’


in


the


database sequence to random nucleotides, hits to these random sequences will also


be


counted.


As


a


consequence,


BWA


may


mark


a


unique


hit


as


a


repeat,


if


the


random


sequences


happen


to


be


identical


to


the


sequences


which


should


be


unqiue


in


the


database.



By default, if the best hit is not highly repetitive (controlled by -R), BWA also finds all hits


contains one more mismatch; otherwise, BWA finds all equally best hits only. Base quality


is NOT considered in evaluating hits. In the paired-end mode, BWA pairs all hits it found. It


further performs Smith- Waterman alignment for unmapped reads to rescue reads with a


high erro rate, and for high-quality anomalous pairs to fix potential alignment errors.





Estimating Insert Size Distribution


BWA estimates the insert size distribution per 256*1024 read pairs. It first collects pairs of


reads with both ends mapped with a single-end quality 20 or higher and then calculates


median


(Q2),


lower


and


higher


quartile


(Q1


and


Q3).


It


estimates


the


mean


and


the


variance


of


the


insert


size


distribution


from


pairs


whose


insert


sizes


are


within


interval


[Q1-2(Q3-Q1),


Q3+2(Q3-Q1)].


The


maximum


distance


x


for


a


pair


considered


to


be


properly paired (SAM flag 0x2) is calculated by solving equation Phi((x-mu)/sigma)=x/L*p0,


where mu is the mean, sigma is the standard error of the insert size distribution, L is the


length of the genome, p0 is prior of anomalous pair and Phi() is the standard cumulative


distribution function. For mapping Illumina short-insert reads to the human genome, x is


about 6-7 sigma away from the mean. Quartiles, mean, variance and x will be printed to


the standard error output.





Memory Requirement


With bwtsw algorithm, 5GB memory is required for indexing the complete human genome


sequences.


For


short


reads,


the


aln



command


uses


~3.2GB


memory


and


the


sampe



command uses ~5.4GB.





Speed


Indexing


the


human


genome


sequences


takes


3


hours


with


bwtsw


algorithm.


Indexing


smaller genomes with IS algorithms is faster, but requires more memory.



The speed of alignment is largely determined by the error rate of the query sequences (r).


Firstly, BWA runs much faster for near perfect hits than for hits with many differences, and


it stops searching for a hit with l+2 differences if a


l-difference hit


is found. This means


BWA


will


be


very


slow


if


r


is


high


because


in


this


case


BWA


has


to


visit


hits


with


many


differences


and


looking


for


these


hits


is


expensive.


Secondly,


the


alignment


algorithm


behind


makes


the


speed


sensitive


to


[k


log(N)/m],


where


k


is


the


maximum


allowed


differences, N the size of database and m the length of a query. In practice, we choose k


w.r.t. r and therefore r is the leading factor. I would not recommend to use BWA on data


with r>0.02.



Pairing


is


slower


for


shorter


reads.


This


is


mainly


because


shorter


reads


have


more


spurious hits and converting SA coordinates to chromosomal coordinates are very costly.



CHANGES IN BWA-0.6


Since version 0.6, BWA has been able to work with a reference genome longer than 4GB.


This


feature


makes


it


possible


to


integrate


the


forward


and


reverse


complemented


genome in one FM-index, which speeds up both BWA-short and BWA-SW. As a tradeoff,


BWA uses more memory because it has to keep all positions and ranks in 64-bit integers,


twice larger than 32-bit integers used in the previous versions.



The latest BWA-SW also works for paired-end reads longer than 100bp. In comparison to


BWA-short, BWA-SW tends to be more accurate for highly unique reads and more robust


to relative long INDELs and structural variants. Nonetheless, BWA-short usually has higher


power


to


distinguish


the


optimal


hit


from


many


suboptimal


hits.


The


choice


of


the


mapping algorithm may depend on the application.



SEE ALSO


BWA website <



>, Samtools website<



>


AUTHOR


Heng Li at the Sanger Institute wrote the key source codes and integrated the following


codes for BWT construction: bwtsw<


/~ckwong3/bwtsw


/>, implemented


by Chi-Kwong Wong at the University of Hong Kong and IS<


/sais


> originally proposed by Nong Ge<


/nong


/> at the Sun Yat-Sen University and implemented by


Yuta Mori.



LICENSE AND CITATION


The full BWA package


is distributed under GPLv3 as it uses source codes from BWT-SW


which is covered by GPL. Sorting, hash table, BWT and IS libraries are distributed under the


MIT license.



If you use the BWA-backtrack algorithm, please cite the following paper:



Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler


transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]



If you use the BWA-SW algorithm, please cite:



Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler


transform. Bioinformatics, 26, 589-595. [PMID: 20080505]



If you use the fastmap component of BWA, please cite:



Li H. (2012) Exploring single-sample SNP and INDEL calling with whole-genome de novo


assembly. Bioinformatics, 28, 1838-1844. [PMID: 22569178]



The BWA-MEM algorithm has not been published yet.



HISTORY


BWA is largely influenced by BWT-SW. It uses source codes from BWT-SW and mimics its


binary file formats; BWA-SW resembles BWT-SW in several ways. The initial idea about


BWT-based alignment also came from the group who developed BWT-SW. At the same


time, BWA is different enough from BWT- SW. The short-read alignment algorithm bears


no similarity to Smith-Waterman algorithm any more. While BWA-SW learns from


BWT-SW, it introduces heuristics that can hardly be applied to the original algorithm. In all,


BWA does not guarantee to find all local hits as what BWT-SW is designed to do, but it is


much faster than BWT-SW on both short and long query sequences.



I started to write the first piece of codes on 24 May 2008 and got the initial stable version


on 02 June 2008. During this period, I was acquainted that Professor Tak-Wah Lam, the


first


author


of


BWT-SW


paper,


was


collaborating


with


Beijing


Genomics


Institute


on


SOAP2, the successor to SOAP (Short Oligonucleotide Analysis Package). SOAP2 has come


out


in


November


2008.


According


to


the


SourceForge


download


page,


the


third


BWT-based short read aligner,


bowtie, was first released


in August 2008. At the time of


writing


this


manual,


at


least


three


more


BWT- based


short-read


aligners


are


being


implemented.



The BWA-SW algorithm is a new component of BWA. It was conceived in November 2008


and implemented ten months later.



The BWA-MEM algorithm is based on an algorithm finding super-maximal exact matches


(SMEMs),


which


was


first


published


with


the


fermi


assembler


paper


in


2012.


I


first


implemented the basic SMEM algorithm in the


fastmap


command for an experiment and


then


extended


the


basic


algorithm


and


added


the


extension


part


in


Feburary


2013


to


make BWA-MEM a fully featured mapper.





samtools



Utilities for the Sequence Alignment/Map (SAM) format



SYNOPSIS


samtools view -bt ref_ -o



samtools sort -T /tmp/ -o



samtools index



samtools idxstats



samtools view chr2:20,100,000-20,200,000



samtools merge



samtools faidx



samtools fixmate



samtools mpileup -C50 -gf -r chr3:1,000-2,000



samtools tview



samtools flags PAIRED,UNMAP,MUNMAP



samtools bam2fq >



DESCRIPTION


Samtools is a set of utilities that manipulate alignments in the BAM format. It imports from


and


exports


to


the


SAM


(Sequence


Alignment/Map)


format,


does


sorting,


merging


and


indexing, and allows to retrieve reads in any regions swiftly.



Samtools is designed to work on a stream. It regards an input file `-' as the standard input


(stdin) and an output file `-' as the standard output (stdout). Several commands can thus


be combined with Unix pipes. Samtools always output warning and error messages to the


standard error output (stderr).



Samtools is also able to open a BAM (not SAM) file on a remote FTP or HTTP server if the


BAM


file


name


starts


with


`ftp://'


or


`http://'.


Samtools


checks


the


current


working


directory for the index file and will download the index upon absence. Samtools does not


retrieve the entire alignment file unless it is asked to do so.



COMMANDS AND OPTIONS


view


samtools view [


options


]



|



|



[


region


...]



With no options


or regions specified, prints all alignments


in the specified input


alignment file (in SAM, BAM, or CRAM format) to standard output in SAM format


(with no header).



You may specify one or more space- separated region specifications after the input


filename to restrict output to


only those alignments which overlap the specified


region(s).


Use


of


region


specifications


requires


a


coordinate-sorted


and


indexed


input file (in BAM or CRAM format).



The


-b


,


-C


,


-1


,


-u


,


-h


,


-H


,


and


-c



options


change


the


output


format


from


the


default of headerless SAM, and the


-o


and


-U


options set the output file name(s).



The


-t


and


-T


options provide additional reference data. One of these two options


is required when SAM input does not contain @SQ


headers, and the


-T


option is


required whenever writing CRAM output.



The


-L


,


-r


,


-R


,


-q


,


-l


,


-m


,


-f


,


and


-F



options


filter


the


alignments


that


will


be


included in the output to only those alignments that match certain criteria.



The


-x


,


-B


, and


-s


options modify the data which is contained in each alignment.



Finally, the


-@


option can


be used to allocate additional threads to be used for


compression, and the


-?


option requests a long help message.



REGIONS:


Regions


can


be


specified


as:


RNAME[:STARTPOS[-ENDPOS]]


and


all


position


coordinates are 1-based.



Important note: when multiple regions are given, some alignments may be output


multiple times if they overlap more than one of the specified regions.



Examples of region specifications:



`chr1'


Output all alignments mapped to the reference sequence named `chr1' (i.e. @SQ


SN:chr1) .



`chr2:1000000'


The region on chr2 beginning at base position 1,000,000 and ending at the end of


the chromosome.



`chr3:1000-2000'


The 1001bp region on chr3 beginning at base position 1,000 and ending at base


position 2,000 (including both end positions).



OPTIONS:




-b


Output in the BAM format.



-C


Output in the CRAM format (requires -T).



-1


Enable fast BAM compression (implies -b).



-u


Output


uncompressed


BAM.


This


option


saves


time


spent


on


compression/decompression


and


is


thus


preferred


when


the


output


is


piped


to


another samtools command.



-h


Include the header in the output.



-H


Output the header only.



-c


Instead of printing the alignments, only count them and print the total number. All


filter options, such as


-f


,


-F


, and


-q


, are taken into account.



-?


Output long help and exit immediately.



-o


FILE



Output to


FILE [stdout].




-U


FILE



Write alignments that are


not


selected by the various filter options to


FILE


. When


this


option


is


used,


all


alignments


(or


all


alignments


intersecting


the


regions



specified) are written to either the output file or this file, but never both.



-t


FILE



A tab- delimited


FILE


. Each line must contain the reference name in the first column


and


the


length


of


the


reference


in


the


second


column,


with


one


line


for


each


distinct


reference.


Any


additional


fields


beyond


the


second


column


are


ignored.


This file


also


defines the


order of the reference sequences in sorting. If you run:


`samtools


faidx


<>',


the


resulting


index


file


<>.fai



can


be


used


as


this


FILE


.



-T


FILE



A


FASTA


format


reference


FILE


,


optionally


compressed


by


bgzip



and


ideally


indexed by


samtools



faidx


. If an index


is not present, one


will be


generated for


you.



-L


FILE


-


-


-


-


-


-


-


-



本文更新与2021-02-16 07:29,由作者提供,不代表本网站立场,转载请注明出处:https://www.bjmy2z.cn/gaokao/657839.html

bwa使用说明的相关文章