Tuesday, April 7, 2009

15) Perl script to randomly shuffle sequences in a fasta file

This perl script randomly shuffles the order of sequences in a fasta file. Upon execution, specify your input file (without .fasta extension) and total no. of sequences.

Output:  Output file xxx_shuffled.fasta will be produced

Note: Again, this script is cut out from a sub-routine and there may be bugs in terms of variable names and definitions.

#! /usr/bin/perl

print "Please enter filename (without extension): ";
$input = <>;
chomp ($input);

print "Please enter total no. of sequence in fasta file: ";
$orig_size= <>*2-1;
chomp ($orig_size);

open (INFILE, "$input.fasta") or die "Error opening input file for shuffling!";
open (SHUFFLED, ">"."$input"."_shuffled.fasta") or die "Error creating shuffled output file!";

@array = (0); # Need to initialise 1st element in array1&2 for the shift function
@array2 = (0);
$i = 1;
$index = 0;
$index2 = 0;

while (my @line=<INFILE>){
while ($i<=$orig_size) { 

$array[$i] = $line[$index];
$array[$i]=~ s/(.)\s/$1/seg;

$index++;
$array2[$i] = $line[$index];
$array2[$i]=~ s/(.)\s/$1/seg;

$i++;
$index++;
}
}

    my $array = shift (@array); 
    my $array2 = shift (@array2);
    for ($i = $header_size; --$i; ) { 
        my $j = int rand ($i+1);
        next if $i == $j;
        @array[$i,$j] = @array[$j,$i];
@array2[$i,$j] = @array2[$j,$i];
    }

while ($index2<=$header_size) { 
print SHUFFLED "$array[$index2]\n";
print SHUFFLED "$array2[$index2]\n";
$index2++;
}
close(INFILE);
close(SHUFFLED);
}

1 comment:

  1. Hello, I know this is an old post, but I received the following error when I try to execute the script and I would really appreciate any troubleshooting advice (Thanks!):

    Modification of non-creatable array value attempted, subscript -578914 at fasta_corrector.pl line 40, line 578914.

    578914 is 2x the number of sequences in the file, FYI.

    ReplyDelete