Parsing BLAST XML output using Bio.SearchIO

The package SearchIO from Biopython parses outputs from BLAST and other sequence search programs; SearchIO will eventually replace Bio.Blast.NCBIXML. General information about SearchIO is available in its official documentation and on Biopython’s wiki.

In this example, I use SearchIO to parse BLAST output in XML format and extract specific contents. The correspondence between elements in the BLAST XML output and attributes for each SearchIO object is available here.

#!/usr/bin/env python

import sys

from Bio import SearchIO

"""
Create a tab-delimited text file, parse a BLAST XML file, and 
print information from BLAST search output to text file

Usage: python parse_blast_xml.py outfile.txt blastoutput.xml
"""

out = open(sys.argv[1], 'w')
out.write("Query Name\tQuery Definition\tQuery Length\tHit ID\tHit Defintion\tHit Length\teValue\n")
for xml_file in sys.argv[2:]:
    result_handle = open(xml_file)
    qresults = SearchIO.parse(result_handle, 'blast-xml')
    for qres in qresults:
        for hit in qres.hits:
            for hsp in hit.hsps:
                fields = [qres.id, qres.description, str(qres.seq_len), hit.id, hit.description, 
                str(hit.seq_len), str(hsp.evalue)]
                out.write("\t".join(fields) + "\n")
out.close()