Bioinformatics Codelets: 4) LAPIS: Extracting all hit sequences from BLAST results (for hits with only one line of sbjct sequence)

This lapis code converts the blast results into FASTA format (and getting rid of double hits). This code only works provided that each hit only has one line of sbjct sequence.

Steps


1) extract {from line starting with > to line starting with sbjct} (this is to get rid of double hits)
2) extract {line either starting > or starting sbjct} ("or" must be after "either")
3) omit {number in line starting with sbjct}
4) omit {spaces in line starting with sbjct}
5) omit sbjct:

Output example corresponding the above codes:

a) Output of code: extract {from line starting with > to line starting with sbjct}

>gi|190193559|dbj|BAG48486.1| gag protein [Human immunodeficiency virus 1]
Length = 154

Score = 50.7 bits (112), Expect = 5e-09
Identities = 15/15 (100%), Positives = 15/15 (100%)

Query: 1 CRQILGQLQPSLQTG 15
CRQILGQLQPSLQTG
Sbjct:57 CRQILGQLQPSLQTG 71

Lines starting with ">" to lines starting with "sbjct" are extracted.

b) Output of code: extract {line either starting > or starting sbjct}

>gi|190193559|dbj|BAG48486.1| gag protein [Human immunodeficiency virus 1]
Sbjct: 57 CRQILGQLQPSLQTG 71

Lines starting with ">" and "sbjct" are extracted.

c) Output of code: omit {number in line starting with sbjct}

>gi|190193559|dbj|BAG48486.1| gag protein [Human immunodeficiency virus 1]
Sbjct: CRQILGQLQPSLQTG

Numbers in lines starting with "sbjct" are removed.

d) Output of code: omit {spaces in line starting with sbjct}

>gi|190193559|dbj|BAG48486.1| gag protein [Human immunodeficiency virus 1]
Sbjct:CRQILGQLQPSLQTG

Spaces in lines starting with "sbjct" are removed.

e) Output of code: omit {spaces in line starting with sbjct}

>gi|190193559|dbj|BAG48486.1| gag protein [Human immunodeficiency virus 1]
CRQILGQLQPSLQTG

The word "sbjct:" is removed, thus obtaining the FASTA sequences in the end.

Code by: Benben & Asif M. Khan
Blog Post by: BenBen
Post Edited by: BenBen

Bioinformatics Codelets

Wednesday, March 4, 2009

4) LAPIS: Extracting all hit sequences from BLAST results (for hits with only one line of sbjct sequence)

1 comment:

Related Sites

Contributors