Tuesday, April 7, 2009

14) Perl script for separating a fasta file into individual sequence fasta files

This Perl script separates a fasta file into individual sequence fasta files. Upon execution, users have to specify the input file (without .fasta extension) and the total number of sequences in their input file. Useful for leave-one-out testing.

Output: Individual sequences will be cut out from the original fasta file and saved under file names query1.fasta, query2.fasta etc. In addition, the remaining sequences will be saved in a separate fasta file (file names out1.fasta, out2.fasta etc.)

Note: This script is cut out from a sub-routine and there may be bugs e.g. undefined variables. Please check and debug before using.

#! /usr/bin/perl

print "Please enter filename (without extension): ";
$input = <>;
chomp ($input);

print "Please enter total no. of sequence in fasta file: ";
$training_seq = <>;
chomp ($training_seq);

my $no = 0;
my $query = 0;
my $filenumber = 1;

while ($filenumber<=$training_seq) { 

print $training_seq;

open (INFILE, "$input.fasta") or die "Error opening input file!"; 
open (QUERY, ">query$filenumber.fasta") or die "Can't open query($filenumber)!";
open (OUT, ">out$filenumber.fasta") or die "Can't open out($filenumber)!";

while (@line=<INFILE>) {

# Print query sequence to query file
print QUERY $line[$query++];
print QUERY $line[$query];

# Delete query sequence from database file
delete $line[$no++];
delete $line[$no];

# Print sequences to database file
print OUT @line;

close INFILE;
close QUERY;
close OUT;
}
$filenumber++;
$no++;
$query++;
}

No comments:

Post a Comment