Wednesday, March 11, 2009

7) UNIX Shell Script for converting matched amino acid residues to dots

 
The following is a code to process an alignment input to an output whereby the amino acid residues or the nucleotides of all protein or DNA sequences variant to an index protein or DNA sequence are represented as dots, leaving the variable residues or nucleotides intact.

Input

ASIFTANTINWEE
ASAFTANTINWEE
ASEETENTANWAA


Process Code

#!/bin/bash
# grab the sequence in linenumber $1 in sequence file $2
seq=`head -n $1 $2 |tail -1`

# take sequence, break it down to each residue and iterate
(
for i in `echo $seq | fold -1`
do

# Generate the sed script
echo "s_^\($dot\)${i}_\1._"

# Set up the dot for the last sed command
dot=".$dot"
done

# after finishing generating the sed script for each column
# construct the substitution for the sequence at linenumber $1
echo "$1s+$dot+$seq+" | sed -e 's/\./\\./g'
) > /tmp/y
# temp file created

# use the dynamically created temp file as program for sed to
# apply on the sequence file $2
sed -f /tmp/y $2
# remove temp file created
rm /tmp/y

exit
# Example of the temporary file /tmp/y template created dynamically
# slashdot=`echo $dot | sed -e 's/\./\\\\./g'`
n=11
n=`expr $n + 1`
s/^\(\)V/./
s/^\(.\)D/\1./
s_^\(..\)R_\1._
s_^\(...\)F_\1._
s_^\(....\)Y_\1._
s_^\(.....\)K_\1._
s_^\(......\)T_\1._
s_^\(.......\)L_\1._
s_^\(........\)R_\1._
s_^\(.........\)A_\1._
s_^\(..........\)E_\1._
s_^\(...........\)Q_\1._
s_^\(............\)A_\1._
s_^\(.............\)S_\1._
s_^\(..............\)Q_\1._
1s+\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.+VDRFYKTLRAEQASQ+


Usage Notes:

1) To run the script assuming the code is saved in a file called benbo.sh: ./benbo.sh 1 xx1.txt, where "1" is the line number of the index sequence and "xx1.txt" is the input file. All the characters of the index sequence are shown in the output.

Output

ASIFTANTINWEE
..A..........
..EE.E..A..AA


Code by: Dr. Tan Tin Wee
Blog Post by: BenBen
Post Edited by: Asif M. Khan

1 comment:

  1. Thanks Asif for editing this post at 2.30am yesterday.

    ReplyDelete