Steps
1. Select the line containing the fasta description together with the line containing the subject sequences: “line containing > or line containing sbjct” --> Tools --> Extract
2. To get rid of the numbers in the line containing sbjct: “digits in line containing sbjct” --> Tools --> Omit
3. To get rid of sbjct: “sbjct:” --> Extract --> Omit
4. To get rid of dashes: type “-“ --> Extract --> Omit
5. To get rid of the extra spaces in the lines containing sequences: “spaces not in line containing >” --> Tools --> Omit
6. In case you want to clean up the description line to only have the GI
a. From second | in line containing > to start of linebreak
Output example corresponding the above codes:
Before
>gi|126385999|gb|CP000521.1| Acinetobacter baumannii ATCC 17978, complete genome
Length = 3976747
Score = 570 bits (1470), Expect = e-163
Identities = 284/284 (100%), Positives = 284/284 (100%)
Frame = -2
Query: 1 LNFKFNFISLMNIKALLLITSAIFISACSPYIVTANPNHSASKSDEKAEKIKNLFNEAHT 60
LNFKFNFISLMNIKALLLITSAIFISACSPYIVTANPNHSASKSDEKAEKIKNLFNEAHT
Sbjct: 1766322 LNFKFNFISLMNIKALLLITSAIFISACSPYIVTANPNHSASKSDEKAEKIKNLFNEAHT 1766143
After
>gi|126385999
LNFKFNFISLMNIKALLLITSAIFISACSPYIVTANPNHSASKSDEKAEKIKNLFNEAHT
No comments:
Post a Comment