Bioinformatics Codelets: 8) Perl Script for converting matched amino acid residues to dots

The following is a code to process an alignment input to an output whereby the amino acid residues or the nucleotides of all protein or DNA sequences variant to an index protein or DNA sequence are represented as dots, leaving the variable residues or nucleotides intact. This is similar to the previous method, however, this is a perl script which is suitable for matching longer index protein sequences.

Input

ASIFTANTINWEE
ASAFTANTINWEE
ASEETENTANWAA

Process Code


#!/usr/bin/perl 
# Usage benbo.pl  
$L=$ARGV[0]-1;
 
# Open file for reading
open (FILE, "$ARGV[1]") or die "cannot open file: $!";

# Store everything in the file into array @raw_data
@raw_data=;

# print the Target Line
print $raw_data[$L];
 
# trim off newline from the Target Line
chomp($raw_data[$L]);
@linearray = split("", $raw_data[$L] );
 
# test each line of the input file, as read into the rawdata array
foreach $line (@raw_data)
{
chomp($line);

# take line in question and split into an array of single chr
@linetest = split("", $line );

# initialise the character count
$nn=0;

# analyse each character in the current Test line
foreach $linechr (@linetest) {
 
# check if  Test character matches Target character
if ($linechr =~ m/$linearray[$nn]/) {

# if yes, print dot
print "."; 
} else  {

# if no, print the non-matching chr in the Test sequence
print "$linechr"; 
} # if

# increment the character count of the Target line
$nn=$nn+1;
} #foreach linechr

# once each line is completed, print new line, and standby to reset $nn
print "\n";
# run through every line of the sequence input file
} # foreach
 
close(FILE);

Usage Notes:

1) To run the script assuming the code is saved in a file called benbo.pl: ./benbo.pl 1 xx1.txt, where "1" is the line number of the index sequence and "xx1.txt" is the input file. All the characters of the index sequence are shown in the output.

Output

ASIFTANTINWEE
..A..........
..EE.E..A..AA

Code by: Dr. Tan Tin Wee
Blog Post by: BenBen

Bioinformatics Codelets

Wednesday, March 11, 2009

8) Perl Script for converting matched amino acid residues to dots

No comments:

Post a Comment

Related Sites

Contributors