<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5583061135147499656</id><updated>2011-12-03T09:40:36.932-08:00</updated><title type='text'>Bioinformatics Codelets</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Asif M. Khan</name><uri>http://www.blogger.com/profile/08184259065143974222</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>26</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-8488585682059255408</id><published>2099-03-20T15:27:00.000-07:00</published><updated>2011-03-20T15:27:53.919-07:00</updated><title type='text'>All Posts</title><content type='html'>&lt;div&gt;24) &lt;a href="http://bioinfocodelets.blogspot.com/2009/10/unix-one-liner-to-split-one-line.html"&gt;Unix one-liner to split one-line sequences into FASTA formatted files&lt;/a&gt;&lt;/div&gt;&lt;div&gt;23) &lt;a href="http://bioinfocodelets.blogspot.com/2009/10/simple-perl-script-to-generate-header.html"&gt;Simple perl script to generate a header line for each sequence in a fasta file&lt;/a&gt;&lt;/div&gt;&lt;div&gt;22) &lt;a href="http://bioinfocodelets.blogspot.com/2009/09/22-perl-version-of-avana-benana.html"&gt;Perl version of AVANA: BENANA!!!!!&lt;/a&gt;&lt;/div&gt;&lt;div&gt;21) &lt;a href="http://bioinfocodelets.blogspot.com/2009/05/21-lapis-extracting-all-hit-sequences.html"&gt;LAPIS: Extracting all hit sequences from BLAST results (for hits with one or more lines of sbjct sequence)&lt;/a&gt;&lt;/div&gt;&lt;div&gt;20) &lt;a href="http://bioinfocodelets.blogspot.com/2009/04/20-plotting-entropy-plot-of-msa.html"&gt;Plotting Entropy Plot of MSA&lt;/a&gt;&lt;/div&gt;&lt;div&gt;19) &lt;a href="http://bioinfocodelets.blogspot.com/2009/04/calculating-entropy-of-msa-by-variable.html"&gt;Calculating Entropy of MSA by variable windowsize&lt;/a&gt;&lt;/div&gt;&lt;div&gt;18) &lt;a href="http://bioinfocodelets.blogspot.com/2009/04/18-doing-maths-in-unix.html"&gt;Doing maths in Unix&lt;/a&gt;&lt;/div&gt;&lt;div&gt;17) &lt;a href="http://bioinfocodelets.blogspot.com/2009/04/17-basic-shell-command-to-delimit.html"&gt;Basic shell command to delimit protein/DNA sequence&lt;/a&gt;&lt;/div&gt;&lt;div&gt;16) &lt;a href="http://bioinfocodelets.blogspot.com/2009/04/16-basic-shell-command-to-delimit.html"&gt;Basic shell command to count number of fields in a file&lt;/a&gt;&lt;/div&gt;&lt;div&gt;15) &lt;a href="http://bioinfocodelets.blogspot.com/2009/04/perl-script-to-randomly-shuffle.html"&gt;Perl script to randomly shuffle sequences in a fasta file&lt;/a&gt;&lt;/div&gt;&lt;div&gt;14) &lt;a href="http://bioinfocodelets.blogspot.com/2009/04/perl-script-for-separating-fasta-file.html"&gt;Perl script for separating a fasta file into individual sequence fasta files&lt;/a&gt;&lt;/div&gt;&lt;div&gt;13) &lt;a href="http://bioinfocodelets.blogspot.com/2009/04/perl-script-for-splitting-fasta-file.html"&gt;Perl script for splitting a fasta file into several small files&lt;/a&gt;&lt;/div&gt;&lt;div&gt;12) &lt;a href="http://bioinfocodelets.blogspot.com/2009/04/unix-shell-script-converting-blast2seq.html"&gt;UNIX Shell Script - Converting blast2seq results into FASTA&lt;/a&gt;&lt;/div&gt;&lt;div&gt;11) &lt;a href="http://bioinfocodelets.blogspot.com/2009/03/11-script-converting-blast-output-to.html"&gt;Script converting BLAST output to FASTA format&lt;/a&gt;&lt;/div&gt;&lt;div&gt;10) &lt;a href="http://bioinfocodelets.blogspot.com/2009/03/10-script-converting-acii-files-from.html"&gt;Script converting ACII files from DOS to UNIX format&lt;/a&gt;&lt;/div&gt;&lt;div&gt;9) &lt;a href="http://bioinfocodelets.blogspot.com/2009/03/9-unix-shell-script-automating-benbo.html"&gt;UNIX Shell Script - automating the benbo script for more than one input file&lt;/a&gt;&lt;/div&gt;&lt;div&gt;8) &lt;a href="http://bioinfocodelets.blogspot.com/2009/03/8-perl-script-for-converting-matched.html"&gt;Perl Script for converting matched amino acid residues to dots&lt;/a&gt;&lt;/div&gt;&lt;div&gt;7) &lt;a href="http://bioinfocodelets.blogspot.com/2009/03/unix-shell-script-for-converting.html"&gt;UNIX Shell Script for converting matched amino acid residues to dots&lt;/a&gt;&lt;/div&gt;&lt;div&gt;6) &lt;a href="http://bioinfocodelets.blogspot.com/2009/03/blast-result-to-fasta-file-script.html"&gt;Process hit sequences in BLAST output files into FASTA formatted sequences&lt;/a&gt;&lt;/div&gt;&lt;div&gt;5) &lt;a href="http://bioinfocodelets.blogspot.com/2009/03/lapis-extracting-all-sbjct-sequences-15.html"&gt;LAPIS: Extracting all sbjct sequences of a particular length from BLAST results&lt;/a&gt;&lt;/div&gt;&lt;div&gt;4) &lt;a href="http://bioinfocodelets.blogspot.com/2009/03/lapis-converting-blast-results-into.html"&gt;LAPIS: Extracting all hit sequences from BLAST results (for hits with only one line of sbjct sequence)&lt;/a&gt;&lt;/div&gt;&lt;div&gt;3) &lt;a href="http://bioinfocodelets.blogspot.com/2009/03/unix-shell-script-for-hiv-standalone.html"&gt;UNIX Shell Script for HIV Standalone Blast&lt;/a&gt;&lt;/div&gt;&lt;div&gt;2) &lt;a href="http://bioinfocodelets.blogspot.com/2009/03/unix-shell-script-for-wnv-netblast-root.html"&gt;UNIX Shell Script for WNV Netblast&lt;/a&gt;&lt;/div&gt;&lt;div&gt;1) &lt;a href="http://bioinfocodelets.blogspot.com/2009/03/unix-shell-script-for-mass-alignments.html"&gt;UNIX Shell Script for Mass Alignments using Muscle&lt;/a&gt;&lt;/div&gt;&lt;div&gt;0) &lt;a href="http://bioinfocodelets.blogspot.com/2009/03/0-understanding-unix-environment.html"&gt;Understanding Unix Environment&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-8488585682059255408?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/8488585682059255408'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/8488585682059255408'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2011/03/all-posts.html' title='All Posts'/><author><name>Asif M. Khan</name><uri>http://www.blogger.com/profile/08184259065143974222</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-3569958202940341348</id><published>2009-10-26T08:18:00.000-07:00</published><updated>2009-11-01T00:40:17.778-07:00</updated><title type='text'>24) Unix one-liner  to split one-line sequences into FASTA formatted files</title><content type='html'>In Unix, one line of code is enough to carry out the process of transforming a text file containing x sequences, one sequence per line, into  a fasta formatted file of x sequences with &gt;n  where n is the number of the sequence line.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;$ pr -n:3 -t -T inputfile.txt | sed 's/^/&gt;/' | tr ":" "\n"&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;pr is print command.&lt;br /&gt;-n:3 is to number the line with 3 digits or more depending on how many lines you have&lt;br /&gt;and : is to separate the line number from the sequence&lt;br /&gt;-t is to switch off header&lt;br /&gt;-T is to switch off pagination&lt;br /&gt;&lt;br /&gt;sed 's/^/&gt;/' is to insert a &gt; at the beginning of  each line,&lt;br /&gt;and&lt;br /&gt;tr is to change the separator ":" with a new line character "\n"&lt;br /&gt;&lt;br /&gt;And if you want to wrap the sequence into a width of 60 characters,&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;$ pr -n:3 -t -T input.txt | sed 's/^/&gt;/' | tr  ":" "\n" | fold -w 60&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;where the command&lt;br /&gt;fold -w 60  means to wrap with a width of 60 characters.&lt;br /&gt;&lt;br /&gt;By adjusting the -n:3 to 4 you can change the number of spaces for the digits&lt;br /&gt;By adjusting -w 60, you can change the width of your sequences&lt;br /&gt;By substituting s/^/&gt;/ with s/^[ ]*/&gt;/ you can remove spaces in front,&lt;br /&gt;or you can add prefixes in front of the number e.g&lt;br /&gt;&lt;br /&gt;s/^[ ]*/&gt;NP/&lt;br /&gt;&lt;br /&gt;And if you want to split each line into a separate FASTA formatted file with filename corresponding to the number&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;$ pr -n:3 -t -T input.txt | sed 's/^[ ]*/&gt;NP/' | tr  ":" "\n" | \&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;   fold -w 60 | csplit -f NP - "/&gt;/" "{*}"&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;where the command csplit is told to name the split files starting with a&lt;br /&gt;prefix "-f NP"&lt;br /&gt;and to take the input file as from the standard  input "-"&lt;br /&gt;and to split the files according to a matching  regular pattern starting with "/&gt;/"&lt;br /&gt;and to do this repetitively until the end of the  file  "{*}"&lt;br /&gt;&lt;br /&gt;Say for 107 lines of sequences, you will get 108 files listed  NP00 to NP107 where the contents of NP01 are&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt; &gt;NP01&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  tataatatcagtatatctat&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;or whatever the sequence is folded to 60 characters per line.&lt;br /&gt;NP00 is blank file. Ignore.&lt;br /&gt;&lt;br /&gt;So, just one line of code. Five unix commands: pr, sed, tr, fold, csplit will do the job really fast&lt;br /&gt;because it is pipelined. Therefore for large  inputfiles, the first files will start to pop out as soon as they are finished, so you can actually read them almost immediately after you start the program.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-3569958202940341348?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/3569958202940341348/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/10/unix-one-liner-to-split-one-line.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/3569958202940341348'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/3569958202940341348'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/10/unix-one-liner-to-split-one-line.html' title='24) Unix one-liner  to split one-line sequences into FASTA formatted files'/><author><name>Tan Tin Wee</name><uri>http://www.blogger.com/profile/09860421598703757905</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-7133646832941900300</id><published>2009-10-20T20:05:00.000-07:00</published><updated>2009-11-01T00:40:06.168-07:00</updated><title type='text'>23) Simple perl script to generate a header line for each sequence in a fasta file</title><content type='html'>A simple script to generate a header line (starting with &gt;) for sequences without headers in a faster file. I have included plenty of comments in the script, for people who are interested to learn Perl basics.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Output:&lt;/span&gt; An output file, xxx_processed.fasta will be produced&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Note: &lt;/span&gt; There are 2 conditions for the script to work&lt;br /&gt;&lt;br /&gt;             1. Sequences must be formatted in a way such that each line contains 1 sequence only&lt;br /&gt;             2. Sequences must be saved in *.fasta extension (of course the script can be modified to&lt;br /&gt;                 take any other extensions)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt;Code (without comments)&lt;/span&gt;&lt;/span&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;#!/usr/bin/perl&lt;br /&gt;&lt;br /&gt;print "\n NOTE: Output will be filename_processed.fasta";&lt;br /&gt;print "\nPlease enter filename (without .fasta extension): ";&lt;br /&gt;&lt;br /&gt;$input = &lt;&gt;;&lt;br /&gt;chomp ($input);&lt;br /&gt;&lt;br /&gt;$number=1;&lt;br /&gt;&lt;br /&gt;open (INFILE, "$input.fasta") or die "Cannot open infile!";&lt;br /&gt;open (OUT, "&gt;"."$input"."_processed.fasta") or die "Cannot open outfile!";&lt;br /&gt;&lt;br /&gt;while ($line=&lt;infile&gt;)&lt;br /&gt;{&lt;br /&gt;&lt;br /&gt;print OUT "&gt;$number\n";&lt;br /&gt;print OUT $line;&lt;br /&gt;&lt;br /&gt;$number++;&lt;br /&gt;&lt;br /&gt;}&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt;Code (with comments)&lt;/span&gt;&lt;/span&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;#!/usr/bin/perl&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(255, 0, 0);"&gt;# Text to print at command line (\n means print a new line)&lt;/span&gt;&lt;br /&gt;print "\n NOTE: Output will be filename_processed.fasta";&lt;br /&gt;print "\nPlease enter filename (without .fasta extension): ";&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(255, 0, 0);"&gt;# Processing file name at command line, file name will be pasesd to the variable $input -&gt; $input = file name&lt;/span&gt;&lt;br /&gt;$input = &lt;&gt;;&lt;br /&gt;chomp ($input);&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(255, 0, 0);"&gt;# Setting the value of variable, $number, to 1&lt;/span&gt;&lt;br /&gt;$number=1;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(255, 0, 0);"&gt;# Opening the input file, generate error message "Cannot open infile!" if there's an error&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(255, 0, 0);"&gt;# INFILE is the reference (filehandle) for your input file&lt;/span&gt;&lt;br /&gt;open (INFILE, "$input.fasta") or die "Cannot open infile!";&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(255, 0, 0);"&gt;# Creating the output file, the symbol &gt; is to create and overwrite a new file&lt;/span&gt;&lt;br /&gt;open (OUT, "&gt;"."$input"."_processed.fasta") or die "Cannot open outfile!";&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(255, 0, 0);"&gt;# Passing each line in the input file (INFILE) to the variable $line&lt;/span&gt;&lt;br /&gt;while ($line=&lt;infile&gt;)&lt;br /&gt;{&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(255, 0, 0);"&gt;# Print, in the output file (OUT), fasta header starting with the symbol &gt;, followed by the variable $number, and a new line, can customize the header to print if you want to&lt;/span&gt;&lt;br /&gt;print OUT "&gt;$number\n";&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(255, 0, 0);"&gt;# Print out current line from input file&lt;/span&gt;&lt;br /&gt;print OUT $line;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(255, 0, 0);"&gt;# Change the value of $number so that $number=$number+1&lt;/span&gt;&lt;br /&gt;$number++;&lt;br /&gt;}&lt;/infile&gt;&lt;/infile&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-7133646832941900300?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/7133646832941900300/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/10/simple-perl-script-to-generate-header.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/7133646832941900300'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/7133646832941900300'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/10/simple-perl-script-to-generate-header.html' title='23) Simple perl script to generate a header line for each sequence in a fasta file'/><author><name>Jean</name><uri>http://www.blogger.com/profile/03118751984437716151</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-7389132211565853341</id><published>2009-09-22T23:22:00.000-07:00</published><updated>2009-09-23T03:50:20.676-07:00</updated><title type='text'>22) Perl version of AVANA: BENANA!!!!!</title><content type='html'>The life changing program is here.....Ben's version of AVANA!!!&lt;br /&gt;&lt;br /&gt;Past AVANA users will know that in the presence of any real gap/s in e.g. a nonamer window, AVANA will shift its sequence to cover the real gap/s. Furthermore, AVANA excludes any nonamers with padded gaps in its computations. However, BENANA has solved both issues...in that there is no shifting of amino acids to cover any gaps and it also includes padded gaps in its computations.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Things to take note:&lt;br /&gt;&lt;br /&gt;1) After running BENANA, it will prompt you to enter your input file. Please note that BENANA only accepts .taln format as input files.&lt;br /&gt;&lt;br /&gt;2) Next, define the window size that you are interested in. If you try to enter any thing besides numbers, the program will be terminated. If you did not enter any thing, the program will treat it as "9" by default.&lt;br /&gt;&lt;br /&gt;3) Please ignore _output1.csv and _output2.csv as they are used for processing.&lt;br /&gt;&lt;br /&gt;4) It is important to note that full gaps are not considered as a variant and is excluded from the variant list.&lt;br /&gt;&lt;br /&gt;5) Output files: _intraserotype_position.csv, _intraserotype_variant_list_sorted.csv, BENANA_intraserotype_diversity_results.csv  &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;What BENANA does:&lt;br /&gt;&lt;br /&gt;For example, if your taln input file contains:&lt;br /&gt;&lt;br /&gt;AAAACCCCDDDD&lt;br /&gt;AAAANNNNMMMM&lt;br /&gt;CCCCNNNNMMMM&lt;br /&gt;&lt;br /&gt;It first generates two files output1.csv and output2.csv, which is used for processing (you may ignore them). &lt;br /&gt;&lt;br /&gt;It then generates a peptide list at each position in the window that the user has defined (default: nonamer window). In the above example, 4 files are automatically generated:&lt;br /&gt;&lt;br /&gt;1-9_intraserotype_position.csv:&lt;br /&gt;&lt;br /&gt;1,AAAACCCCD,9&lt;br /&gt;1,AAAANNNNM,9&lt;br /&gt;1,CCCCNNNNM,9&lt;br /&gt;&lt;br /&gt;2-10_intraserotype_position.csv:&lt;br /&gt;&lt;br /&gt;2,AAACCCCDD,10&lt;br /&gt;2,AAANNNNMM,10&lt;br /&gt;2,CCCNNNNMM,10&lt;br /&gt;&lt;br /&gt;3-11_intraserotype_position.csv:&lt;br /&gt;&lt;br /&gt;3,AACCCCDDD,11&lt;br /&gt;3,AANNNNMMM,11&lt;br /&gt;3,CCNNNNMMM,11&lt;br /&gt;&lt;br /&gt;4-12_intraserotype_position.csv:&lt;br /&gt;&lt;br /&gt;4,ACCCCDDDD,12&lt;br /&gt;4,ANNNNMMMM,12&lt;br /&gt;4,CNNNNMMMM,12&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;From each peptide list at each position, BENANA also generates a variant list just like AVANA, with the predominant peptide displayed at the top and the count of each peptide in the alignment. In the case where there is no predominant peptide, the first peptide in the alignment will be displayed at the top:&lt;br /&gt;&lt;br /&gt;1-9_intraserotype_variant_list_sorted.csv:&lt;br /&gt;&lt;br /&gt;AAAACCCCD,1&lt;br /&gt;AAAANNNNM,1&lt;br /&gt;CCCCNNNNM,1&lt;br /&gt;&lt;br /&gt;2-10_intraserotype_variant_list_sorted.csv:&lt;br /&gt;&lt;br /&gt;AAACCCCDD,1&lt;br /&gt;AAANNNNMM,1&lt;br /&gt;CCCNNNNMM,1&lt;br /&gt;&lt;br /&gt;3-11_intraserotype_variant_list_sorted.csv: &lt;br /&gt;&lt;br /&gt;AACCCCDDD,1&lt;br /&gt;AANNNNMMM,1&lt;br /&gt;CCNNNNMMM,1&lt;br /&gt;&lt;br /&gt;4-12_intraserotype_variant_list_sorted.csv:&lt;br /&gt;&lt;br /&gt;ACCCCDDDD,1&lt;br /&gt;ANNNNMMMM,1&lt;br /&gt;CNNNNMMMM,1&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Lastly, BENANA will generate the diversity file just like AVANA did. However, it does not compute entropy values at each position.&lt;br /&gt;&lt;br /&gt;Diversity File:&lt;br /&gt;&lt;br /&gt;StartPos,Predominant Peptide,EndPos,% Representation,No. of Variants,Total no. of Sequences with Valid Variants,Support %&lt;br /&gt;1,CCCCNNNNM,9,33.33%,2,3,100.00%&lt;br /&gt;2,AAACCCCDD,10,33.33%,2,3,100.00%&lt;br /&gt;3,CCNNNNMMM,11,33.33%,2,3,100.00%&lt;br /&gt;4,ANNNNMMMM,12,33.33%,2,3,100.00%&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;BENANA_open_source_v1.0.pl is downloadable from this &lt;a href="http://dl.getdropbox.com/u/900575/BENANA_open_source_v1.0.pl"&gt;link&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;© 2009 ^BeNBeN^. All rights reserved.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-7389132211565853341?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/7389132211565853341/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/09/22-perl-version-of-avana-benana.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/7389132211565853341'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/7389132211565853341'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/09/22-perl-version-of-avana-benana.html' title='22) Perl version of AVANA: BENANA!!!!!'/><author><name>^BeNBeN^</name><uri>http://www.blogger.com/profile/11629878814774576609</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-7033197633802701749</id><published>2009-05-31T09:58:00.000-07:00</published><updated>2009-05-31T10:39:48.532-07:00</updated><title type='text'>21) LAPIS: Extracting all hit sequences from BLAST results (for hits with one or more lines of sbjct sequence)</title><content type='html'>The LAPIS code below allows extraction of all hit sequences from BLAST results, even when there are more than one line of sbjct sequences. This is in contrast to post (4) which only works if there is only one line of sbjct sequence for each BLAST hit. The code in this post also teaches how to clean up the description line to only contain the GI number.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Steps&lt;/b&gt;&lt;/i&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;1. Select the line containing the fasta description together with the line containing the subject sequences: “line containing &gt; or line containing sbjct” --&gt; Tools --&gt; Extract&lt;br /&gt;&lt;br /&gt;2. To get rid of the numbers in the line containing sbjct: “digits in line containing sbjct” --&gt; Tools --&gt; Omit&lt;br /&gt;&lt;br /&gt;3. To get rid of sbjct: “sbjct:” --&gt; Extract --&gt; Omit&lt;br /&gt;&lt;br /&gt;4. To get rid of dashes: type “-“ --&gt; Extract --&gt; Omit&lt;br /&gt;&lt;br /&gt;5. To get rid of the extra spaces in the lines containing sequences: “spaces not in line containing &gt;” --&gt; Tools --&gt; Omit&lt;br /&gt;&lt;br /&gt;6. In case you want to clean up the description line to only have the GI&lt;br /&gt;a. From second | in line containing &gt; to start of linebreak&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Output example corresponding the above codes:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Before&lt;/b&gt;&lt;br /&gt;&lt;span style="font-family:Courier new; font-size:9pt;"&gt;&lt;br /&gt;&gt;gi|126385999|gb|CP000521.1| Acinetobacter baumannii ATCC 17978, complete genome&lt;br /&gt;Length = 3976747&lt;br /&gt;&lt;br /&gt;Score = 570 bits (1470), Expect = e-163&lt;br /&gt;Identities = 284/284 (100%), Positives = 284/284 (100%)&lt;br /&gt;Frame = -2&lt;br /&gt;&lt;br /&gt;Query: 1 &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;LNFKFNFISLMNIKALLLITSAIFISACSPYIVTANPNHSASKSDEKAEKIKNLFNEAHT 60&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;LNFKFNFISLMNIKALLLITSAIFISACSPYIVTANPNHSASKSDEKAEKIKNLFNEAHT&lt;br /&gt;Sbjct: 1766322 LNFKFNFISLMNIKALLLITSAIFISACSPYIVTANPNHSASKSDEKAEKIKNLFNEAHT 1766143&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;After&lt;/b&gt;&lt;br /&gt;&lt;span style="font-family:Courier new; font-size:9pt;"&gt;&lt;br /&gt;&gt;gi|126385999&lt;br /&gt;LNFKFNFISLMNIKALLLITSAIFISACSPYIVTANPNHSASKSDEKAEKIKNLFNEAHT&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-7033197633802701749?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/7033197633802701749/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/05/21-lapis-extracting-all-hit-sequences.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/7033197633802701749'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/7033197633802701749'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/05/21-lapis-extracting-all-hit-sequences.html' title='21) LAPIS: Extracting all hit sequences from BLAST results (for hits with one or more lines of sbjct sequence)'/><author><name>Asif M. Khan</name><uri>http://www.blogger.com/profile/08184259065143974222</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-2208723466814599358</id><published>2009-04-21T01:29:00.001-07:00</published><updated>2009-04-21T02:06:49.027-07:00</updated><title type='text'>20) Plotting Entropy Plot of MSA</title><content type='html'>Say Windowsize 3&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;$ ent2  3  inputfile | grep Position | sed 's/.*=\(.*\) .*=\(.*\)/\1  \2 /'&lt;/div&gt;&lt;div&gt;&lt;div&gt;or &lt;/div&gt;&lt;div&gt;$ ent2  3  inputfile | grep Position | sed -e 's/Position=//' -e 's/Entropy=//'&lt;/div&gt;&lt;div&gt;or&lt;/div&gt;&lt;div&gt;$ ent 2 3 inputfile | grep Position | awk -F"=| " '{print $2, $4}'&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;(this awk script demonstrates how we use the field separator command -F &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;to define = or " " blank space as a delimiter, and as such&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;$1 is Position&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;delimiter =   &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;$2 is the position number &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;$3 is Entropy&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;$4 is the entropy value)&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;1  0.954885&lt;br /&gt;&lt;/div&gt;&lt;div&gt;2  0.460826&lt;/div&gt;&lt;div&gt;3  0.228055&lt;/div&gt;&lt;div&gt;4  0&lt;/div&gt;&lt;div&gt;5  0&lt;/div&gt;&lt;div&gt;6  0&lt;/div&gt;&lt;div&gt;7  0&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;$ ent2  3  inputfile | grep Position | sed -e 's/Position=//' -e 's/Entropy=//' &gt; ent.dat&lt;/div&gt;&lt;div&gt;$ echo "&lt;/div&gt;&lt;div&gt;set term jpeg&lt;/div&gt;&lt;div&gt;set output 'x.jpg'&lt;/div&gt;&lt;div&gt;plot 'ent.dat' with line " | gnuplot&lt;/div&gt;&lt;div&gt;$ konqueror x.jpg&lt;/div&gt;&lt;div&gt; will display the x.jpg file.&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-2208723466814599358?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/2208723466814599358/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/20-plotting-entropy-plot-of-msa.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/2208723466814599358'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/2208723466814599358'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/20-plotting-entropy-plot-of-msa.html' title='20) Plotting Entropy Plot of MSA'/><author><name>Tan Tin Wee</name><uri>http://www.blogger.com/profile/09860421598703757905</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-8293813960790545919</id><published>2009-04-21T01:18:00.000-07:00</published><updated>2009-04-21T01:28:44.903-07:00</updated><title type='text'>19) Calculating Entropy of MSA by variable windowsize</title><content type='html'>&lt;div&gt;The short version&lt;/div&gt;&lt;div&gt;&lt;div&gt;Save file as "entropy2"&lt;/div&gt;&lt;div&gt;&lt;code&gt;&lt;/code&gt;&lt;/div&gt;&lt;code&gt;&lt;/code&gt;&lt;div&gt;&lt;code&gt;&lt;div&gt;#!/bin/bash&lt;/div&gt;&lt;div&gt;L=`head -1 $2 |  wc -c | awk '{print $1}' | sed 's/.*/&amp;amp;-1/' | bc`&lt;/div&gt;&lt;div&gt;x=`expr $L - $1 + 1`&lt;/div&gt;&lt;div&gt;n=`cat $2 |wc -l `&lt;/div&gt;&lt;div&gt;&lt;div&gt;w=$1&lt;/div&gt;&lt;div&gt;for i in `seq $x`&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;do&lt;/div&gt;&lt;div&gt;j=`expr $i + $w - 1`&lt;br /&gt;&lt;/div&gt;&lt;div&gt;cat $2 | cut -c$i-$j| sort | uniq -c&lt;/div&gt;&lt;div&gt;echo -n "Position=$i "&lt;/div&gt;&lt;div&gt;cat $2 | cut -c$i-$j| sort | uniq -c | awk  -v n="$n" '{ h=( -1 /  n * log(2))  * ( log($1) - log(n) ); H=h+H } END {print "Entropy=" H }'&lt;/div&gt;&lt;div&gt;echo -------&lt;/div&gt;&lt;div&gt;done&lt;/div&gt;&lt;/code&gt;&lt;div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Example input file: input.seq&lt;/div&gt;&lt;div&gt;ASIFKHANT&lt;/div&gt;&lt;div&gt;BSIFKHANT&lt;/div&gt;&lt;div&gt;CTIFKHANT&lt;/div&gt;&lt;div&gt;DSIFKHANT&lt;/div&gt;&lt;div&gt;BSIFKHANT&lt;/div&gt;&lt;div&gt;DSAFKHANT&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;$ chmod +x entropy2&lt;/div&gt;&lt;div&gt;$  entropy2  2     input.seq&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt; &lt;/div&gt;&lt;div&gt;     1 AS&lt;br /&gt;&lt;/div&gt;&lt;div&gt;      2 BS&lt;/div&gt;&lt;div&gt;      1 CT&lt;/div&gt;&lt;div&gt;      2 DS&lt;/div&gt;&lt;div&gt;Position=1 Entropy=0.667818&lt;/div&gt;&lt;div&gt;-------&lt;/div&gt;&lt;div&gt;      1 SA&lt;/div&gt;&lt;div&gt;      4 SI&lt;/div&gt;&lt;div&gt;      1 TI&lt;/div&gt;&lt;div&gt;Position=2 Entropy=0.460826&lt;/div&gt;&lt;div&gt;-------&lt;/div&gt;&lt;div&gt;      1 AF&lt;/div&gt;&lt;div&gt;      5 IF&lt;/div&gt;&lt;div&gt;Position=3 Entropy=0.228055&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The long version&lt;/div&gt;&lt;div&gt;&lt;code&gt;&lt;br /&gt;&lt;/code&gt;&lt;/div&gt;&lt;code&gt;&lt;div&gt;#!/bin/bash&lt;/div&gt;&lt;div&gt;# Usage:  prog &lt;w&gt;&lt;windowsize&gt;&lt;inputfile&gt;&lt;/inputfile&gt;&lt;/windowsize&gt;&lt;/w&gt;&lt;/div&gt;&lt;div&gt;# this program computes the Entropy of window size w for a&lt;/div&gt;&lt;div&gt;# multiple sequence alignment with gaps marked with "-" only.&lt;/div&gt;&lt;div&gt;# Date: 22 April 2009&lt;/div&gt;&lt;div&gt;# ToDo:&lt;/div&gt;&lt;div&gt;# does not trap missing $1&lt;/div&gt;&lt;div&gt;# does not handle blank or missing data or missing line&lt;/div&gt;&lt;div&gt;# works for windows = 1 onwards. w cannot be = or less than 0.&lt;/div&gt;&lt;div&gt;# need to check the equation for any errors&lt;/div&gt;&lt;div&gt;# assumes that the data input is well formated according to msa output&lt;/div&gt;&lt;div&gt;# does not trap data input errors and may still generate errorneous output&lt;/div&gt;&lt;div&gt;# does not cater for description tag of the clustal MSA output&lt;/div&gt;&lt;div&gt;# only handles one catenated sequence per line&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;input=$2&lt;/div&gt;&lt;div&gt;# Assumes the first line is the longest&lt;/div&gt;&lt;div&gt;# L = number of column positions to be computed&lt;/div&gt;&lt;div&gt;L=`head -1 $2 |  wc -c | awk '{print $1}' | sed 's/.*/&amp;amp;-1/' | bc`&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;# x - number of windows&lt;/div&gt;&lt;div&gt;x=`expr $L - $1 + 1`&lt;/div&gt;&lt;div&gt;echo Window Size: $1 size&lt;/div&gt;&lt;div&gt;echo No of Windows: $x windows&lt;/div&gt;&lt;div&gt;# number of sequences (=rows, since only one cateneted sequence per line allowed)&lt;/div&gt;&lt;div&gt;n=`cat $input |wc -l `&lt;/div&gt;&lt;div&gt;echo No of Lines: $n lines&lt;/div&gt;&lt;div&gt;w=$1&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;# i = current residue position begin&lt;/div&gt;&lt;div&gt;for i in `seq $x`&lt;/div&gt;&lt;div&gt;do&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;# i = current residue position end&lt;/div&gt;&lt;div&gt;j=`expr $i + $w - 1`&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;# print aligned sequences within window and count them&lt;/div&gt;&lt;div&gt;cat $input | cut -c$i-$j| sort | uniq -c&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;# Report position number&lt;/div&gt;&lt;div&gt;echo -n "Position=$i "&lt;/div&gt;&lt;div&gt;# Compute Entropy in bits of these sequences&lt;/div&gt;&lt;div&gt;cat $input | cut -c$i-$j| sort | uniq -c | awk  -v n="$n" '{ h=( -1 /  n * log(2))  * ( log($1) - log(n) ); H=h+H } END {print "Entropy=" H }'&lt;/div&gt;&lt;div&gt;echo -------&lt;/div&gt;&lt;div&gt;done&lt;/div&gt;&lt;/code&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-8293813960790545919?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/8293813960790545919/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/calculating-entropy-of-msa-by-variable.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/8293813960790545919'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/8293813960790545919'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/calculating-entropy-of-msa-by-variable.html' title='19) Calculating Entropy of MSA by variable windowsize'/><author><name>Tan Tin Wee</name><uri>http://www.blogger.com/profile/09860421598703757905</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-4357147347968848690</id><published>2009-04-19T21:22:00.000-07:00</published><updated>2009-04-19T21:25:05.046-07:00</updated><title type='text'>18) Doing maths in Unix</title><content type='html'>To perform calculations in Unix, one can key in "&lt;span class="Apple-style-span" style="font-style: italic;"&gt;bc&lt;/span&gt;" to access the program. Once in the program, just key in the formula (e.g. 3+2).&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Alternatively, to perform calculations straight from the command line, key in:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;echo "3+2" | bc&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-4357147347968848690?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/4357147347968848690/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/18-doing-maths-in-unix.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/4357147347968848690'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/4357147347968848690'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/18-doing-maths-in-unix.html' title='18) Doing maths in Unix'/><author><name>Jean</name><uri>http://www.blogger.com/profile/03118751984437716151</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-6463437931028762622</id><published>2009-04-19T20:51:00.000-07:00</published><updated>2009-04-19T21:12:11.897-07:00</updated><title type='text'>17) Basic shell command to delimit protein/DNA sequence</title><content type='html'>The '&lt;span class="Apple-style-span" style="font-style: italic;"&gt;sed'&lt;/span&gt; shell command delimits protein/DNA sequences such that each base/residue is seperated by a comma or a specified symbol:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;sed 's/./&amp;amp;,/g' input.txt&gt;output.txt&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;Note:&lt;/span&gt;&lt;/span&gt; '&lt;span class="Apple-style-span" style="font-style: italic;"&gt;.&lt;/span&gt;' matches any character in the file and '&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&amp;amp;&lt;/span&gt;' replaces every match with itself. Take note that this is a basic command and can be improved in many ways to perform more sophisticated tasks.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-6463437931028762622?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/6463437931028762622/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/17-basic-shell-command-to-delimit.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/6463437931028762622'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/6463437931028762622'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/17-basic-shell-command-to-delimit.html' title='17) Basic shell command to delimit protein/DNA sequence'/><author><name>Jean</name><uri>http://www.blogger.com/profile/03118751984437716151</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-5839500756075785173</id><published>2009-04-19T20:40:00.000-07:00</published><updated>2009-04-19T21:12:57.858-07:00</updated><title type='text'>16) Basic shell command to count number of fields in a file</title><content type='html'>Use the NF variable in awk&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;awk '{print NF}'&lt;/span&gt;&lt;/span&gt; &lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-5839500756075785173?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/5839500756075785173/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/16-basic-shell-command-to-delimit.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/5839500756075785173'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/5839500756075785173'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/16-basic-shell-command-to-delimit.html' title='16) Basic shell command to count number of fields in a file'/><author><name>Jean</name><uri>http://www.blogger.com/profile/03118751984437716151</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-4199449496656815908</id><published>2009-04-07T23:25:00.000-07:00</published><updated>2009-04-07T23:42:47.577-07:00</updated><title type='text'>15) Perl script to randomly shuffle sequences in a fasta file</title><content type='html'>This perl script randomly shuffles the order of sequences in a fasta file. Upon execution, specify your input file (without .fasta extension) and total no. of sequences.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;Output:&lt;/span&gt;&lt;/span&gt;  Output file &lt;span class="Apple-style-span" style="font-style: italic;"&gt;xxx_shuffled.fasta&lt;/span&gt; will be produced&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;Note:&lt;/span&gt;&lt;/span&gt; Again, this script is cut out from a sub-routine and there may be bugs in terms of variable names and definitions.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="color: rgb(51, 51, 51);  line-height: 20px; font-size:13px;"&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic; "&gt;#! /usr/bin/perl&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic; "&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic; "&gt;print "Please enter filename (without extension): ";&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic; "&gt;$input = &lt;&gt;;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic; "&gt;chomp ($input);&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic; "&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic; "&gt;print "Please enter total no. of sequence in fasta file: ";&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic; "&gt;$orig_size= &lt;&gt;*2-1;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic; "&gt;chomp ($orig_size);&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/span&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;open (INFILE, "$input.fasta") or die "Error opening input file for shuffling!";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;open (SHUFFLED, "&gt;"."$input"."_shuffled.fasta") or die "Error creating shuffled output file!";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;@array = (0); # Need to initialise 1st element in array1&amp;amp;2 for the shift function&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;@array2 = (0);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;$i = 1;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;$index = 0;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;$index2 = 0;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;while (my @line=&lt;infile\&gt;&lt;infile&gt;&lt;infile&gt;&lt;infile&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse; font-family: Verdana; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; "&gt;&amp;lt;INFILE&amp;gt;&lt;/span&gt;){&lt;/infile&gt;&lt;/infile&gt;&lt;/infile&gt;&lt;/infile\&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;while ($i&lt;=$orig_size) { &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;$array[$i] = $line[$index];&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;$array[$i]=~ s/(.)\s/$1/seg;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;$index++;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;$array2[$i] = $line[$index];&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;$array2[$i]=~ s/(.)\s/$1/seg;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;$i++;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;$index++;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;    my $array = shift (@array); &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;    my $array2 = shift (@array2);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;    for ($i = $header_size; --$i; ) { &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;        my $j = int rand ($i+1);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;        next if $i == $j;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;        @array[$i,$j] = @array[$j,$i];&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt; &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;@array2[$i,$j] = @array2[$j,$i];&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;    }&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;while ($index2&lt;=$header_size) { &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;print SHUFFLED "$array[$index2]\n";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;print SHUFFLED "$array2[$index2]\n";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;$index2++;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;close(INFILE);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;close(SHUFFLED);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-4199449496656815908?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/4199449496656815908/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/perl-script-to-randomly-shuffle.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/4199449496656815908'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/4199449496656815908'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/perl-script-to-randomly-shuffle.html' title='15) Perl script to randomly shuffle sequences in a fasta file'/><author><name>Jean</name><uri>http://www.blogger.com/profile/03118751984437716151</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-1586042936622237037</id><published>2009-04-07T23:07:00.000-07:00</published><updated>2009-04-07T23:44:05.388-07:00</updated><title type='text'>14) Perl script for separating a fasta file into individual sequence fasta files</title><content type='html'>This Perl script separates a fasta file into individual sequence fasta files. Upon execution, users have to specify the input file (without .fasta extension) and the total number of sequences in their input file. Useful for leave-one-out testing.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;Output:&lt;/span&gt;&lt;/span&gt; Individual sequences will be cut out from the original fasta file and saved under file names &lt;span class="Apple-style-span" style="font-style: italic;"&gt;query&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;1.fasta&lt;/span&gt;, &lt;span class="Apple-style-span" style="font-style: italic;"&gt;query&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;2.fasta &lt;/span&gt;etc. In addition, the remaining sequences will be saved in a separate fasta file (file names &lt;span class="Apple-style-span" style="font-style: italic;"&gt;out1.fasta&lt;/span&gt;, &lt;span class="Apple-style-span" style="font-style: italic;"&gt;out2.fasta&lt;/span&gt; etc.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;Note:&lt;/span&gt;&lt;/span&gt; This script is cut out from a sub-routine and there may be bugs e.g. undefined variables. Please check and debug before using.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="line-height: 20px; "&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;#! /usr/bin/perl&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="line-height: 20px; "&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="line-height: 20px; "&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;print "Please enter filename (without extension): ";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;$input = &lt;&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;chomp ($input);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;print "Please enter total no. of sequence in fasta file: ";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;$training_seq = &lt;&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;chomp ($training_seq);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;my $no = 0;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;my $query = 0;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;my $filenumber = 1;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;while ($filenumber&lt;=$training_seq) { &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;print $training_seq;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;open (INFILE, "$input.fasta") or die "Error opening input file!"; &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;open (QUERY, "&gt;query$filenumber.fasta") or die "Can't open query($filenumber)!";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;open (OUT, "&gt;out$filenumber.fasta") or die "Can't open out($filenumber)!";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;while (@line=&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse; color: rgb(51, 51, 51);  -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; "&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;lt;INFILE&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0);  -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; "&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;) {&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;# Print query sequence to query file&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;print QUERY $line[$query++];&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;print QUERY $line[$query];&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;# Delete query sequence from database file&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;delete $line[$no++];&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;delete $line[$no];&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;# Print sequences to database file&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;print OUT @line;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;close INFILE;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;close QUERY;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;close OUT;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;$filenumber++;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;$no++;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;$query++;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-1586042936622237037?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/1586042936622237037/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/perl-script-for-separating-fasta-file.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/1586042936622237037'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/1586042936622237037'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/perl-script-for-separating-fasta-file.html' title='14) Perl script for separating a fasta file into individual sequence fasta files'/><author><name>Jean</name><uri>http://www.blogger.com/profile/03118751984437716151</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-120333446941880526</id><published>2009-04-07T22:46:00.000-07:00</published><updated>2009-04-07T23:45:29.977-07:00</updated><title type='text'>13) Perl script for splitting a fasta file into several small files</title><content type='html'>This script splits a fasta file into several small fasta files, according to the specified number of sequences in each file. Upon execution, users have to input the original fasta file name (without .fasta extension) and to specify the intended number of sequences in each file.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;Output:&lt;/span&gt;&lt;/span&gt; A chomped fasta file (xxx_chomped.fasta) will be produced and the small fasta files will be named as (xxx_1.fasta, xxx_2.fasta etc)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic; font-weight: bold; "&gt;Note: &lt;/span&gt;Each subroutine can be run independently as a perl script by cutting and paste into another .pl file and removing the lines &lt;span class="Apple-style-span" style="font-style: italic;"&gt;sub xxx {&lt;/span&gt; and&lt;span class="Apple-style-span" style="font-style: italic;"&gt; }&lt;/span&gt;. If users do not want to run all of the sub-routines, just comment out those that you don't want to run, e.g. &lt;span class="Apple-style-span" style="font-style: italic;"&gt;#split_fasta();&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;#! /usr/bin/perl&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;print "Please enter filename (without extension): ";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;$input = &lt;&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;chomp ($input);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;print "Please enter no. of sequences you want in each file ";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;$upper_limit = &lt;&gt;+1;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;chomp ($upper_limit);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;chomp_fasta();&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;split_fasta();&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;#--------------------------------------------------------------------------------&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;# chomp_fasta: Merges all the sequence lines in a fasta file into one line, so&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;#              each sequence in the fasta file will have one header line and &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;#              one sequence line only&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;#--------------------------------------------------------------------------------&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;sub chomp_fasta {&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;open (INFILE, "$input.fasta") or die "Cannot open infile!";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;open (OUT, "&gt;"."$input"."_chomped.fasta") or die "Cannot open outfile!";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;while ($line=&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse; color: rgb(51, 51, 51);  -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; "&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;lt;INFILE&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;infile&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;) { # Please remove the spaces&lt;/span&gt;&lt;/span&gt;&lt;/infile&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;if ($line=~/&gt;/) {&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;print OUT "\n$line";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;else {&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;chomp ($line);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;print OUT "$line";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;close OUT;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;#--------------------------------------------------------------------------------&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;# split_fasta: Splits a fasta file into several small files according to the &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;#              specified no. of sequences in each file&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;#--------------------------------------------------------------------------------&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;sub split_fasta {&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;$count = 0;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;$number = 1;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;open (INFILE, "$input"."_chomped.fasta") or die "Cannot open infile!";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;open (OUT, "&gt;"."$input"."_"."$number".".fasta") or die "Cannot open outfile!";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;while ($line=&lt;/span&gt;&lt;/span&gt;&lt;infile&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse; color: rgb(51, 51, 51);  -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; "&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;lt;INFILE&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0);  -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; "&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;) {&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/infile&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;if ($line=~/&gt;/) {&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;$count++;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;if ($count==1) {&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;print OUT "$line";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;elsif ($count&lt;$upper_limit) { #change this value to change upper limit&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;print OUT "\n$line";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;else {&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;chomp ($line);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;print OUT "$line";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;#--------------------------------------------------------------------------------&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;# Note: The header &gt;xxx when $count=3 has already been evaluated in the previous&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;#       regex loop but is not printed out. So this if loop must print out the&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;#       header &gt;xxx in a new file manually and reset the count back to 1 instead&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;#       of 0 &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;#--------------------------------------------------------------------------------&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;if ($count==$upper_limit) { #change this value to change upper limit&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;close OUT;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;$number++;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;open (OUT, "&gt;"."$input"."_"."$number".".fasta") or die "Cannot open outfile!";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;print OUT "$line";&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;$count = 1;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-120333446941880526?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/120333446941880526/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/perl-script-for-splitting-fasta-file.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/120333446941880526'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/120333446941880526'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/perl-script-for-splitting-fasta-file.html' title='13) Perl script for splitting a fasta file into several small files'/><author><name>Jean</name><uri>http://www.blogger.com/profile/03118751984437716151</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-6564756324383811364</id><published>2009-04-03T02:07:00.000-07:00</published><updated>2009-04-03T02:21:31.874-07:00</updated><title type='text'>12) UNIX Shell Script - Converting blast2seq results into FASTA</title><content type='html'>This is a modification of the shell script written by Murali (see Entry 11) to convert a blast2seq result (blastp) to FASTA format by extracting out all sbjct sequences. This is very useful especially for creating my dengue database (see http://benlogbook.blogspot.com for more details), as I used this script as a polyprotein separator to separate the sequences for each protein subtype. However, double hits will still have to be removed from your blast2seq file. &lt;br /&gt;&lt;br /&gt;In the case for my dengue database, I have all my sample sequences for each protein subtype (C,prM,E,NS1,NS2a,NS2b,NS3,NS4a,NS4b,NS5) in FASTA, and paste all in the query textbox. Next, I paste my polyprotein which contain the full protein sequence in the sbjct textbox. The e-value threshold is set at 0.1 to prevent double hits from appearing. After obtaining the blast2seq results in text file, this script is executed to obtain each protein sequence in each subtype. Note that the results file would have to be placed in a folder called "input" before running the script, which have to be placed anywhere except inside the "input" folder itself.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Process Code&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;#!/usr/bin/bash&lt;br /&gt;mkdir output&lt;br /&gt;ls input | sed 's/.txt//g' &gt; index.txt&lt;br /&gt;# lists all the files in the folder called input, removes the extension and places them in a file called index.txt&lt;br /&gt;#extension of BLAST output is assumed to be .txt. If your BLAST output is not a text file, change the code in order to comply with your file extension.&lt;br /&gt;printf "The following lines in the following BLAST output files had gaps" &gt; logfile.txt&lt;br /&gt;# creates the logfile&lt;br /&gt;for num in `cat index.txt`&lt;br /&gt;# generates a for-loop where your variable $num takes the values of an input file name in each iteration&lt;br /&gt;do cat input/$num.txt | awk '/&gt;/ {printf "\n\n"$0"\n"} /Sbjct/ {printf $3}' | sed 's/-//g' &gt; output/$num.fasta&lt;br /&gt;# takes the sequence identifiers and the subject sequences, removes the gaps from subject sequences and funnels it to an output file which is of the same name, but is in an output folder&lt;br /&gt;printf "\n" &gt;&gt; logfile.txt&lt;br /&gt;# generates a new line and updates the logfile&lt;br /&gt;printf $num &gt;&gt; logfile.txt&lt;br /&gt;# generates a new entry for that input file&lt;br /&gt;printf "\n" &gt;&gt; logfile.txt&lt;br /&gt;# generates a new line&lt;br /&gt;cat input/$num.txt | grep -n Sbjct | sed 's/Sbjct//g' | awk '/-/ {print $1," " ,$3}' &gt;&gt; logfile.txt&lt;br /&gt;# takes an input file, numbers the lines, prints the line number of the subject sequence with gaps and the subject sequence and updates the logfile&lt;br /&gt;done&lt;br /&gt;# end of loop&lt;br /&gt;printf "\n"&lt;br /&gt;printf "Gaps have been removed in output" &gt;&gt; logfile.txt&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Code by: Murali and BenBen&lt;br /&gt;Blog Post by: BenBen&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-6564756324383811364?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/6564756324383811364/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/unix-shell-script-converting-blast2seq.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/6564756324383811364'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/6564756324383811364'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/04/unix-shell-script-converting-blast2seq.html' title='12) UNIX Shell Script - Converting blast2seq results into FASTA'/><author><name>^BeNBeN^</name><uri>http://www.blogger.com/profile/11629878814774576609</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-5269730824512936733</id><published>2009-03-21T22:24:00.000-07:00</published><updated>2009-03-22T00:10:37.229-07:00</updated><title type='text'>11) Script converting BLAST output to FASTA format</title><content type='html'>&lt;p&gt;The following script takes all your BLAST output from an input folder, processes the files, removes gaps and generates the ouput files in an output folder. It will also generate a logfile which indicates which lines in which input file contain gaps.&lt;/p&gt; &lt;p&gt;&lt;code&gt;&lt;br /&gt;#!/bin/bash&lt;br /&gt;mkdir output&lt;br /&gt;ls input | sed 's/.txt//g' &gt; index.txt&lt;br /&gt;# lists all the files in the folder called input, removes the extension and places them in a file called index.txt&lt;br /&gt;#extension of BLAST output is assumed to be .txt. If your BLAST output is not a text file, change the code in order to comply with your file extension.&lt;br /&gt;printf "The following lines in the following BLAST output files had gaps" &gt; logfile.txt&lt;br /&gt;# creates the logfile&lt;br /&gt;for num in `cat index.txt`&lt;br /&gt;# generates a for-loop where your variable $num takes the values of an input file name in each iteration&lt;br /&gt;do cat input/$num.txt | awk '/&gt;/ {print $0} /Sbjct:/ {print $3}' | sed 's/-//g' &gt; output/$num.fasta&lt;br /&gt;# takes the sequence identifiers and the subject sequences, removes the gaps from subject sequences and funnels it to an output file which is of the same name, but is in an output folder&lt;br /&gt;printf "\n" &gt;&gt; logfile.txt&lt;br /&gt;# generates a new line and updates the logfile&lt;br /&gt;printf $num &gt;&gt; logfile.txt&lt;br /&gt;# generates a new entry for that input file&lt;br /&gt;printf "\n" &gt;&gt; logfile.txt&lt;br /&gt;# generates a new line&lt;br /&gt;cat input/$num.txt | grep -n Sbjct | sed 's/Sbjct://g' | awk '/-/ {print $1,"   " ,$3}' &gt;&gt; logfile.txt&lt;br /&gt;# takes an input file, numbers the lines, prints the line number of the subject sequence with gaps and the subject sequence and updates the logfile&lt;br /&gt;done&lt;br /&gt;# end of loop&lt;br /&gt;printf "\n"&lt;br /&gt;printf "Gaps have been removed in output" &gt;&gt; logfile.txt&lt;br /&gt;&lt;/code&gt;&lt;/p&gt;&lt;p&gt;To execute, copy the code to a text editor and save it as a .sh file. Create a directory called input and place all your BLAST output into the folder, nothing else. Run the script in the parent folder. &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-5269730824512936733?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/5269730824512936733/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/11-script-converting-blast-output-to.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/5269730824512936733'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/5269730824512936733'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/11-script-converting-blast-output-to.html' title='11) Script converting BLAST output to FASTA format'/><author><name>Murali</name><uri>http://www.blogger.com/profile/08036249483538443818</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_6O9moFERz_Q/SiTLZFerPKI/AAAAAAAAAA0/O5HDum5JFWs/S220/murali+anna+2.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-7341249692494957589</id><published>2009-03-21T21:07:00.000-07:00</published><updated>2009-03-24T04:25:33.764-07:00</updated><title type='text'>10) Script converting ACII files from DOS to UNIX format</title><content type='html'>&lt;p&gt;Shell scripts written in a notepad, notepad2, notepad++ etc encounter errors when executed in shell. For users who use unix environments on their Windows system, this would be a problem as scripting is often done using text editors from windows. This occurs in scripts with multiple lines of code. In DOS formatting, a new line includes two characters, a line feed and a carriage return. Unix formats use only the line feed character for a new line. In order to make your file executable, run the following command in shell:&lt;/p&gt;&lt;p&gt; &lt;code&gt;&gt;tr -d '\15' &amp;lt; infile &amp;gt; OUTFILE&lt;/code&gt;&lt;p&gt;The above code removes all carriage returns (ASCII code \15) from each line. Make sure that your outfile and infile have different names, and remember to inlude your extensions. Your outfile can  now be executed in unix shell once you have made it executable.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-7341249692494957589?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/7341249692494957589/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/10-script-converting-acii-files-from.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/7341249692494957589'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/7341249692494957589'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/10-script-converting-acii-files-from.html' title='10) Script converting ACII files from DOS to UNIX format'/><author><name>Murali</name><uri>http://www.blogger.com/profile/08036249483538443818</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_6O9moFERz_Q/SiTLZFerPKI/AAAAAAAAAA0/O5HDum5JFWs/S220/murali+anna+2.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-3989224406954942270</id><published>2009-03-11T18:57:00.000-07:00</published><updated>2009-03-11T19:17:27.911-07:00</updated><title type='text'>9) UNIX Shell Script - automating the benbo script for more than one input file</title><content type='html'>This is a simple shell script used for running benbo for more than one input file.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Process Code&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;for i in xx*&lt;br /&gt;do&lt;br /&gt;./benbo.sh 1 $i &gt; $i\_out.txt&lt;br /&gt;done &lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Usage Notes:&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;1) To run the script assuming the code is saved in a file called benbo2.sh: ./benbo2.sh. &lt;br /&gt;&lt;br /&gt;2) This script will run benbo.sh for more than one input file, however all the input files would have to include "xx" in their filenames. &lt;br /&gt;&lt;br /&gt;3)The user may switch to the perl version of benbo by substituting ./benbo.sh to ./benbo.pl in the above code.&lt;br /&gt;&lt;br /&gt;4) This script assumes that the index protein/nucleotide sequence is on the first line, the user may change "1" to another number in the above code if their index sequence is not located on the first line of their input file.&lt;br /&gt;&lt;br /&gt;5) The output files will be generated on the same folder and named as "input file name"_out.txt&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Code by: BenBen&lt;br /&gt;Blog Post by: BenBen&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-3989224406954942270?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/3989224406954942270/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/9-unix-shell-script-automating-benbo.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/3989224406954942270'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/3989224406954942270'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/9-unix-shell-script-automating-benbo.html' title='9) UNIX Shell Script - automating the benbo script for more than one input file'/><author><name>^BeNBeN^</name><uri>http://www.blogger.com/profile/11629878814774576609</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-8998119270741873615</id><published>2009-03-11T18:44:00.000-07:00</published><updated>2009-03-11T18:56:46.404-07:00</updated><title type='text'>8) Perl Script for converting matched amino acid residues to dots</title><content type='html'>&amp;nbsp;&lt;br /&gt;The following is a code to process an alignment input to an output whereby the amino acid residues or the nucleotides of all protein or DNA sequences variant to an index protein or DNA sequence are represented as dots, leaving the variable residues or nucleotides intact. This is similar to the previous method, however, this is a perl script which is suitable for matching longer index protein sequences.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Input&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;font face="courier new"&gt;&lt;br /&gt;ASIFTANTINWEE&lt;br /&gt;ASAFTANTINWEE&lt;br /&gt;ASEETENTANWAA&lt;br /&gt;&lt;/font&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Process Code&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;#!/usr/bin/perl &lt;br /&gt;# Usage benbo.pl &lt;linenumber&gt; &lt;sequencefilename&gt;&lt;br /&gt;$L=$ARGV[0]-1;&lt;br /&gt; &lt;br /&gt;# Open file for reading&lt;br /&gt;open (FILE, "$ARGV[1]") or die "cannot open file: $!";&lt;br /&gt;&lt;br /&gt;# Store everything in the file into array @raw_data&lt;br /&gt;@raw_data=&lt;FILE&gt;;&lt;br /&gt;&lt;br /&gt;# print the Target Line&lt;br /&gt;print $raw_data[$L];&lt;br /&gt; &lt;br /&gt;# trim off newline from the Target Line&lt;br /&gt;chomp($raw_data[$L]);&lt;br /&gt;@linearray = split("", $raw_data[$L] );&lt;br /&gt; &lt;br /&gt;# test each line of the input file, as read into the rawdata array&lt;br /&gt;foreach $line (@raw_data)&lt;br /&gt;{&lt;br /&gt;chomp($line);&lt;br /&gt;&lt;br /&gt;# take line in question and split into an array of single chr&lt;br /&gt;@linetest = split("", $line );&lt;br /&gt;&lt;br /&gt;# initialise the character count&lt;br /&gt;$nn=0;&lt;br /&gt;&lt;br /&gt;# analyse each character in the current Test line&lt;br /&gt;foreach $linechr (@linetest) {&lt;br /&gt; &lt;br /&gt;# check if  Test character matches Target character&lt;br /&gt;if ($linechr =~ m/$linearray[$nn]/) {&lt;br /&gt;&lt;br /&gt;# if yes, print dot&lt;br /&gt;print "."; &lt;br /&gt;} else  {&lt;br /&gt;&lt;br /&gt;# if no, print the non-matching chr in the Test sequence&lt;br /&gt;print "$linechr"; &lt;br /&gt;} # if&lt;br /&gt;&lt;br /&gt;# increment the character count of the Target line&lt;br /&gt;$nn=$nn+1;&lt;br /&gt;} #foreach linechr&lt;br /&gt;&lt;br /&gt;# once each line is completed, print new line, and standby to reset $nn&lt;br /&gt;print "\n";&lt;br /&gt;# run through every line of the sequence input file&lt;br /&gt;} # foreach&lt;br /&gt; &lt;br /&gt;close(FILE);&lt;br /&gt;&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Usage Notes:&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;1) To run the script assuming the code is saved in a file called benbo.pl: ./benbo.pl 1 xx1.txt, where "1" is the line number of the index sequence and "xx1.txt" is the input file. All the characters of the index sequence are shown in the output.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Output&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;font face="courier new"&gt;&lt;br /&gt;ASIFTANTINWEE&lt;br /&gt;..A..........&lt;br /&gt;..EE.E..A..AA&lt;br /&gt;&lt;/font&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt; Code by: Dr. Tan Tin Wee&lt;/i&gt;&lt;br /&gt;&lt;i&gt; Blog Post by: BenBen&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-8998119270741873615?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/8998119270741873615/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/8-perl-script-for-converting-matched.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/8998119270741873615'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/8998119270741873615'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/8-perl-script-for-converting-matched.html' title='8) Perl Script for converting matched amino acid residues to dots'/><author><name>^BeNBeN^</name><uri>http://www.blogger.com/profile/11629878814774576609</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-5593262464022435492</id><published>2009-03-11T02:30:00.000-07:00</published><updated>2009-03-11T08:50:57.791-07:00</updated><title type='text'>7) UNIX Shell Script for converting matched amino acid residues to dots</title><content type='html'>&amp;nbsp;&lt;br /&gt;The following is a code to process an alignment input to an output whereby the amino acid residues or the nucleotides of all protein or DNA sequences variant to an index protein or DNA sequence are represented as dots, leaving the variable residues or nucleotides intact.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Input&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;font face="courier new"&gt;&lt;br /&gt;ASIFTANTINWEE&lt;br /&gt;ASAFTANTINWEE&lt;br /&gt;ASEETENTANWAA&lt;br /&gt;&lt;/font&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Process Code&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;#!/bin/bash&lt;br /&gt;# grab the sequence in linenumber $1 in sequence file $2&lt;br /&gt;seq=`head -n $1 $2 |tail -1`&lt;br /&gt;&lt;br /&gt;# take sequence, break it down to each residue and iterate&lt;br /&gt;(&lt;br /&gt;for i in `echo $seq | fold -1`&lt;br /&gt;do&lt;br /&gt;&lt;br /&gt;#  Generate the sed script&lt;br /&gt;echo "s_^\($dot\)${i}_\1._"&lt;br /&gt;&lt;br /&gt;# Set up the dot for the last sed command&lt;br /&gt;dot=".$dot"&lt;br /&gt;done&lt;br /&gt;&lt;br /&gt;# after finishing generating the sed script for each column&lt;br /&gt;# construct the substitution for the sequence at linenumber $1&lt;br /&gt;echo "$1s+$dot+$seq+" | sed -e 's/\./\\./g'&lt;br /&gt;) &gt; /tmp/y&lt;br /&gt;# temp file created&lt;br /&gt;&lt;br /&gt;# use the dynamically created temp file as program for sed to&lt;br /&gt;# apply on the sequence file $2&lt;br /&gt;sed -f /tmp/y $2&lt;br /&gt;# remove temp file created&lt;br /&gt;rm /tmp/y&lt;br /&gt;&lt;br /&gt;exit&lt;br /&gt;# Example of the temporary file /tmp/y template created dynamically&lt;br /&gt;# slashdot=`echo $dot | sed -e 's/\./\\\\./g'`&lt;br /&gt;n=11&lt;br /&gt;n=`expr $n + 1`&lt;br /&gt;s/^\(\)V/./&lt;br /&gt;s/^\(.\)D/\1./&lt;br /&gt;s_^\(..\)R_\1._&lt;br /&gt;s_^\(...\)F_\1._&lt;br /&gt;s_^\(....\)Y_\1._&lt;br /&gt;s_^\(.....\)K_\1._&lt;br /&gt;s_^\(......\)T_\1._&lt;br /&gt;s_^\(.......\)L_\1._&lt;br /&gt;s_^\(........\)R_\1._&lt;br /&gt;s_^\(.........\)A_\1._&lt;br /&gt;s_^\(..........\)E_\1._&lt;br /&gt;s_^\(...........\)Q_\1._&lt;br /&gt;s_^\(............\)A_\1._&lt;br /&gt;s_^\(.............\)S_\1._&lt;br /&gt;s_^\(..............\)Q_\1._&lt;br /&gt;1s+\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.+VDRFYKTLRAEQASQ+&lt;br /&gt;&lt;/sequencefilename&gt;&lt;/linenumber&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Usage Notes:&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;1) To run the script assuming the code is saved in a file called benbo.sh: ./benbo.sh 1 xx1.txt, where "1" is the line number of the index sequence and "xx1.txt" is the input file. All the characters of the index sequence are shown in the output.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Output&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;font face="courier new"&gt;&lt;br /&gt;ASIFTANTINWEE&lt;br /&gt;..A..........&lt;br /&gt;..EE.E..A..AA&lt;br /&gt;&lt;/font&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt; Code by: Dr. Tan Tin Wee&lt;/i&gt;&lt;br /&gt;&lt;i&gt; Blog Post by: BenBen&lt;/i&gt;&lt;br /&gt;&lt;i&gt; Post Edited by: Asif M. Khan&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-5593262464022435492?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/5593262464022435492/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/unix-shell-script-for-converting.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/5593262464022435492'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/5593262464022435492'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/unix-shell-script-for-converting.html' title='7) UNIX Shell Script for converting matched amino acid residues to dots'/><author><name>^BeNBeN^</name><uri>http://www.blogger.com/profile/11629878814774576609</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-2419585548460283690</id><published>2009-03-10T00:33:00.000-07:00</published><updated>2009-03-11T08:52:15.008-07:00</updated><title type='text'>6) Process hit sequences in BLAST output files into FASTA formatted sequences</title><content type='html'>&amp;nbsp;&lt;br /&gt;LAPIS is problematic to use with large input files because of its limited memory.&lt;br /&gt;We end up having to break up the file, process it (often having to restart LAPIS every time the memory runs out. The following code is able to process or convert all the hit sequences in Blast output files into Fasta formatted sequences.&lt;br /&gt;&lt;br /&gt;Follow the following steps:&lt;br /&gt;&lt;br /&gt;1. copy the following code into a text editor&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;#!/bin/bash&lt;br /&gt;mkdir output;&lt;br /&gt;for num in `seq 4`;&lt;br /&gt;do for pii in `seq -w 2 11`;&lt;br /&gt;do `cat DENV$num"o"$pii.txt | sed 's/&gt;/33ab&gt;/g' | sed 's/Sbjct:/33abSbjct:/g' | grep '33ab' | sed 's/33ab//g' | sed 's/Sbjct://g' | sed 's/[ ][0-9]/33ab/g' | sed 's/33ab/ /g' | sed 's/[ ][0-9]/33ab/g' | sed 's/33ab/ /g' | sed 's/[ ][0-9]/33ab/g' | sed 's/33ab/ /g' | sed 's/[ ][0-9]/33ab/g' | sed 's/33ab/ /g' | sed 's/ //g' &gt; output/DENV$num$pii.fasta`;&lt;br /&gt;done;&lt;br /&gt;done&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;2. Delete the line breaks. i.e. make it such that everything is in one line with the different commands separated by semicolons ";". Remember to put a space after the semicolon.&lt;br /&gt;&lt;br /&gt;3. Save the above code in the text editor as extract.sh (or whatever you want)&lt;br /&gt;&lt;br /&gt;4. Remember to run it in bash enviroment as "bash extract.sh" using Linux Konsole&lt;br /&gt;&lt;br /&gt;Note:&lt;br /&gt;&lt;br /&gt;i. My BLAST results were in the form DENV$num"o"$pii.txt e.g DENV1o02.txt, so, I've got 2 nested "for loops" to change 2 variables $num and $pii; DENV1o02, DENV2o02, etc and to change DENV1o02, DENV1o03. For convenience use files named abcd1, abcd2 etc so that the for loops can be used to run the program for a number of files&lt;br /&gt;&lt;br /&gt;ii. I use &lt;b&gt;sed&lt;/b&gt; to add in a dummy identifier "33ab" to "&gt;" and "Sbjct". Then I can use grep to extract these lines. I use sed to remove 33ab, then I use sed to remove "Sbjct:", then subject sequence numbers and the spaces. Unfortunately all 1 digit to 4 digit numbers are removed (including from the sequence title line) except the gi and the accession numbers&lt;br /&gt;&lt;br /&gt;iii. The processed files are created in a a folder called output. Make sure that there are no other folders of that name in your folder. Alternatively the output folder can be renamed from the code&lt;br /&gt;&lt;br /&gt;&lt;i&gt; Code by: Murali&lt;/i&gt;&lt;br /&gt;&lt;i&gt; Blog Post by: Murali&lt;/i&gt;&lt;br /&gt;&lt;i&gt; Post Edited by: Asif M. Khan&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-2419585548460283690?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/2419585548460283690/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/blast-result-to-fasta-file-script.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/2419585548460283690'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/2419585548460283690'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/blast-result-to-fasta-file-script.html' title='6) Process hit sequences in BLAST output files into FASTA formatted sequences'/><author><name>Murali</name><uri>http://www.blogger.com/profile/08036249483538443818</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_6O9moFERz_Q/SiTLZFerPKI/AAAAAAAAAA0/O5HDum5JFWs/S220/murali+anna+2.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-7158080843141822102</id><published>2009-03-04T23:14:00.000-08:00</published><updated>2009-05-31T10:26:16.285-07:00</updated><title type='text'>5) LAPIS: Extracting all sbjct sequences of a particular length from BLAST results</title><content type='html'>This lapis code is to extract all sbjct sequences with the length of 15 amino acids and at the same time getting rid of double blast hits. This code only works provided that each hit only has one line of sbjct sequence.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Steps&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;1) extract {from line starting with &gt; to line starting with sbjct} # (this is to get rid of double hits)&lt;br /&gt;2) extract {from line containing "/15" to line containing sbjct}&lt;br /&gt;3) extract {line starting with sbjct}&lt;br /&gt;4) omit sbjct:&lt;br /&gt;5) omit number&lt;br /&gt;6) omit spaces&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Output example corresponding the above codes:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;a) Output of code: &lt;code&gt; extract {from line starting with &gt; to line starting with sbjct}&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:Courier new;"&gt;&lt;br /&gt;&gt;gi|190193559|dbj|BAG48486.1| gag protein [Human immunodeficiency virus 1]&lt;br /&gt;        Length = 154&lt;br /&gt;&lt;br /&gt;Score = 50.7 bits (112), Expect = 5e-09&lt;br /&gt;Identities = 15/15 (100%), Positives = 15/15 (100%)&lt;br /&gt;&lt;br /&gt;Query: 1  CRQILGQLQPSLQTG 15&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;CRQILGQLQPSLQTG&lt;br /&gt;Sbjct:57 CRQILGQLQPSLQTG 71&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;b) Output of code: &lt;code&gt; extract {from line containing "/15" to line containing sbjct}&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:Courier new;"&gt;&lt;br /&gt;Identities = 15/15 (100%), Positives = 15/15 (100%)&lt;br /&gt;&lt;br /&gt;Query: 1  CRQILGQLQPSLQTG 15&lt;br /&gt;        CRQILGQLQPSLQTG&lt;br /&gt;Sbjct:57 CRQILGQLQPSLQTG 71&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;c) Output of code: &lt;code&gt; extract {line starting with sbjct}&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:Courier new;"&gt;&lt;br /&gt;Sbjct: 57 CRQILGQLQPSLQTG 71&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;d) Output of codes: &lt;code&gt; i) omit sbjct: ii) omit number iii) omit spaces&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:Courier new;"&gt;&lt;br /&gt;CRQILGQLQPSLQTG&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt; Code by: Benben &amp;amp; Asif M. Khan&lt;/i&gt;&lt;br /&gt;&lt;i&gt; Blog Post by: BenBen&lt;/i&gt;&lt;br /&gt;&lt;i&gt; Post Edited by: Asif M. Khan&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-7158080843141822102?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/7158080843141822102/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/lapis-extracting-all-sbjct-sequences-15.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/7158080843141822102'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/7158080843141822102'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/lapis-extracting-all-sbjct-sequences-15.html' title='5) LAPIS: Extracting all sbjct sequences of a particular length from BLAST results'/><author><name>^BeNBeN^</name><uri>http://www.blogger.com/profile/11629878814774576609</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-4825397635731120243</id><published>2009-03-04T23:04:00.000-08:00</published><updated>2009-05-31T10:23:38.533-07:00</updated><title type='text'>4) LAPIS: Extracting all hit sequences from BLAST results (for hits with only one line of sbjct sequence)</title><content type='html'>This lapis code converts the blast results into FASTA format (and getting rid of double hits). This code only works provided that each hit only has one line of sbjct sequence.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Steps&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;1) extract {from line starting with &gt; to line starting with sbjct} (this is to get rid of double hits)&lt;br /&gt;2) extract {line either starting &gt; or starting sbjct} ("or" must be after "either")&lt;br /&gt;3) omit {number in line starting with sbjct}&lt;br /&gt;4) omit {spaces in line starting with sbjct}&lt;br /&gt;5) omit sbjct:&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Output example corresponding the above codes:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;a) Output of code: &lt;code&gt;extract {from line starting with &gt; to line starting with sbjct}&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:Courier new;"&gt;&lt;br /&gt;&gt;gi|190193559|dbj|BAG48486.1| gag protein [Human immunodeficiency virus 1]&lt;br /&gt;     Length = 154&lt;br /&gt;&lt;br /&gt;Score = 50.7 bits (112), Expect = 5e-09&lt;br /&gt;Identities = 15/15 (100%), Positives = 15/15 (100%)&lt;br /&gt;&lt;br /&gt;Query: 1  CRQILGQLQPSLQTG 15&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;CRQILGQLQPSLQTG&lt;br /&gt;Sbjct:57 CRQILGQLQPSLQTG 71&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Lines starting with "&gt;" to lines starting with "sbjct" are extracted.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;b) Output of code: &lt;code&gt;extract {line either starting &gt; or starting sbjct}&lt;/code&gt;&lt;br /&gt;&lt;span style="font-family:Courier new;"&gt;&lt;br /&gt;&gt;gi|190193559|dbj|BAG48486.1| gag protein [Human immunodeficiency virus 1]&lt;br /&gt;Sbjct: 57 CRQILGQLQPSLQTG 71&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Lines starting with "&gt;" and "sbjct" are extracted.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;c) Output of code: &lt;code&gt;omit {number in line starting with sbjct}&lt;/code&gt;&lt;br /&gt;&lt;span style="font-family:Courier new;"&gt;&lt;br /&gt;&gt;gi|190193559|dbj|BAG48486.1| gag protein [Human immunodeficiency virus 1]&lt;br /&gt;Sbjct: CRQILGQLQPSLQTG&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Numbers in lines starting with "sbjct" are removed.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;d) Output of code: &lt;code&gt;omit {spaces in line starting with sbjct}&lt;/code&gt;&lt;br /&gt;&lt;span style="font-family:Courier new;"&gt;&lt;br /&gt;&gt;gi|190193559|dbj|BAG48486.1| gag protein [Human immunodeficiency virus 1]&lt;br /&gt;Sbjct:CRQILGQLQPSLQTG&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Spaces in lines starting with "sbjct" are removed.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;e) Output of code: &lt;code&gt;omit {spaces in line starting with sbjct}&lt;/code&gt;&lt;br /&gt;&lt;span style="font-family:Courier new;"&gt;&lt;br /&gt;&gt;gi|190193559|dbj|BAG48486.1| gag protein [Human immunodeficiency virus 1]&lt;br /&gt;CRQILGQLQPSLQTG&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;The word "sbjct:" is removed, thus obtaining the FASTA sequences in the end.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt; Code by: Benben &amp;amp; Asif M. Khan&lt;/i&gt;&lt;br /&gt;&lt;i&gt; Blog Post by: BenBen&lt;/i&gt;&lt;br /&gt;&lt;i&gt; Post Edited by: BenBen&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-4825397635731120243?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/4825397635731120243/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/lapis-converting-blast-results-into.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/4825397635731120243'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/4825397635731120243'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/lapis-converting-blast-results-into.html' title='4) LAPIS: Extracting all hit sequences from BLAST results (for hits with only one line of sbjct sequence)'/><author><name>^BeNBeN^</name><uri>http://www.blogger.com/profile/11629878814774576609</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-1433396265361305004</id><published>2009-03-04T22:58:00.000-08:00</published><updated>2009-03-25T00:01:41.437-07:00</updated><title type='text'>3) UNIX Shell Script for HIV Standalone Blast</title><content type='html'>This is a simple script for using standalone blast against the HIV database to generate blast results for each HIV T-cell epitope.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;HIVstandaloneblast.sh&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Process Code&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;for i in xx*&lt;br /&gt;do&lt;br /&gt;blastall -p blastp -d HIV_db1.fasta -i $i -F F -e 200000 -I T -v 0 -b 20000 -f 11 -W 2 -T F -C F -M PAM30 -o $i.out&lt;br /&gt;done&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt; Code by: BenBen &amp; Asif M. Khan&lt;/i&gt;&lt;br /&gt;&lt;i&gt; Blog Post by: BenBen&lt;/i&gt;&lt;br /&gt;&lt;i&gt; Post Edited by: BenBen&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-1433396265361305004?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/1433396265361305004/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/unix-shell-script-for-hiv-standalone.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/1433396265361305004'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/1433396265361305004'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/unix-shell-script-for-hiv-standalone.html' title='3) UNIX Shell Script for HIV Standalone Blast'/><author><name>^BeNBeN^</name><uri>http://www.blogger.com/profile/11629878814774576609</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-9093866756935775192</id><published>2009-03-04T22:52:00.000-08:00</published><updated>2009-03-25T00:01:19.122-07:00</updated><title type='text'>2) UNIX Shell Script for WNV Netblast</title><content type='html'>This is a simple shell script for using netblast on all organisms except WNV and artificial sequences to generate blast results for each WNV T-cell epitope.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;WNVNetblast.sh&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Process Code&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;for i in xx*&lt;br /&gt;do&lt;br /&gt;\./blastcl3 -p blastp -d nr -i $i -F F -e 200000 -I T -v 0 -b 20000 -f 11 -W 2 -T F -u "Root[ORGN] NOT txid11082[Organism:exp] NOT txid81077[ORGN]" -C F -M PAM30 -o $i.out&lt;br /&gt;done  &lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt; Code by: BenBen &amp; Asif M. Khan&lt;/i&gt;&lt;br /&gt;&lt;i&gt; Blog Post by: BenBen&lt;/i&gt;&lt;br /&gt;&lt;i&gt; Post Edited by: BenBen&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-9093866756935775192?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/9093866756935775192/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/unix-shell-script-for-wnv-netblast-root.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/9093866756935775192'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/9093866756935775192'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/unix-shell-script-for-wnv-netblast-root.html' title='2) UNIX Shell Script for WNV Netblast'/><author><name>^BeNBeN^</name><uri>http://www.blogger.com/profile/11629878814774576609</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-5260163103542122426</id><published>2009-03-04T22:09:00.000-08:00</published><updated>2009-03-25T00:00:57.015-07:00</updated><title type='text'>1) UNIX Shell Script for Mass Alignments using Muscle</title><content type='html'>This is a simple shell script for running muscle with one or more alignment automatedly.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Muscle.sh&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Process Code&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;for i in xx*&lt;br /&gt;do&lt;br /&gt;muscle -in $i -out $i"_align".fasta&lt;br /&gt;done&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Usage Notes:&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;1) All your fasta files that are required for alignments must have xx1,xx2,etc included in the file name.&lt;br /&gt;&lt;br /&gt;2) Your output files would have "_align" included in the file name.&lt;br /&gt;&lt;br /&gt;&lt;i&gt; Code by: BenBen&lt;/i&gt;&lt;br /&gt;&lt;i&gt; Blog Post by: BenBen&lt;/i&gt;&lt;br /&gt;&lt;i&gt; Post Edited by: BenBen&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-5260163103542122426?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/5260163103542122426/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/unix-shell-script-for-mass-alignments.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/5260163103542122426'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/5260163103542122426'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/unix-shell-script-for-mass-alignments.html' title='1) UNIX Shell Script for Mass Alignments using Muscle'/><author><name>^BeNBeN^</name><uri>http://www.blogger.com/profile/11629878814774576609</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5583061135147499656.post-820021472049692020</id><published>2009-03-04T22:08:00.000-08:00</published><updated>2009-12-07T20:18:23.258-08:00</updated><title type='text'>0) Understanding Unix Environment</title><content type='html'>&lt;span style="font-style: italic; font-weight: bold;"&gt;{This is still work in progress..}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Unix File System&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The Unix File System is a hierarchical file system.&lt;br /&gt;&lt;br /&gt;&lt;code&gt; / &lt;/code&gt; is root of the system&lt;br /&gt;&lt;code&gt; /var &lt;/code&gt; is a directory in root&lt;br /&gt;&lt;code&gt; /var/www &lt;/code&gt; is the full path of the sub-directory in "var"&lt;br /&gt;&lt;code&gt; /var/www/htdocs &lt;/code&gt; is the full path of the sub-sub-directory in "/var/www" and so on...&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Some examples&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;1. To know in which directory you are currently in: &lt;code&gt; pwd &lt;/code&gt;&lt;br /&gt;&lt;br /&gt;# It shows the path of the present working directory.&lt;br /&gt;&lt;br /&gt;2. Navigate: &lt;code&gt; cd &lt;/code&gt;&lt;br /&gt;# If you want to get into a certain directory, for example:&lt;br /&gt;&lt;code&gt;cd /var/www/htdocs/db&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;3. To list the content of a directory: &lt;code&gt; ls &lt;/code&gt;&lt;br /&gt;# To get long directory listing: &lt;code&gt; ls -l &lt;/code&gt;&lt;br /&gt;&lt;br /&gt;4. Getting the source code: &lt;code&gt;lynx -source &lt;httpwebaddress&gt;&lt;/httpwebaddress&gt;&lt;/code&gt;&lt;br /&gt;# Example&lt;br /&gt;lynx -source http://aps.unmc.edu/AP/database/query_output.php?ID=00286&lt;br /&gt;# lynx is a text-based browser. -source tells lynx NOT to display the HTML code, but to return the raw HTML code.&lt;br /&gt;&lt;br /&gt;5. Unix command history&lt;br /&gt;# Just press the ''Arrow up'' key to see your previous command&lt;br /&gt;# Whatever you do, double check&lt;br /&gt;&lt;br /&gt;6. Get the internet page to your folder&lt;br /&gt;:Example: to put the internet page to the dir where you are at now, key in&lt;br /&gt;lynx -source internet_address.html &gt; your_pagename.html&lt;br /&gt;&lt;br /&gt;7. Change mode&lt;br /&gt;: ''chmod'' takes several arguments&lt;br /&gt;: -x is to make executable&lt;br /&gt;: ''chmod -x programfile'' will make program executable rather than just a plain text file.&lt;br /&gt;&lt;br /&gt;Questions&lt;br /&gt;&lt;br /&gt;1. Why put /var/www/htdocs/db/myprogram in front of myprogram?&lt;br /&gt;: This is because whenever you execute a program, Unix shell needs to find the path of that program.&lt;br /&gt;: In this case, we spell out the full directory path so that Unix shell knows where that program is.&lt;br /&gt;&lt;br /&gt;2. Why then can we use the ''lynx'' command and don't have to put the full path?&lt;br /&gt;: This is because lynx command is already in the Shell's list of known paths.&lt;br /&gt;: Later on, we will show you what is the current known paths, e.g. echo $PATH&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Command Line Arguments&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;1. Command Line command without arguments&lt;br /&gt;: For example ''pwd'' doesn't need to take arguments&lt;br /&gt;&lt;br /&gt;2. Command line command with one argument&lt;br /&gt;: For example ''ls -l'' takes one argument, the flag ''-l'' to give a ''l''ong listing.&lt;br /&gt;&lt;br /&gt;3. Command line command with two arguments&lt;br /&gt;: For example ''lynx -source http://www.google.com''&lt;br /&gt;: lynx the command&lt;br /&gt;: -source the first argument&lt;br /&gt;: http://www.google.com as the second argument&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Loop Control&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;How do we make the numbers autoincrement without us to put in the numbers ourselves?&lt;br /&gt;&lt;br /&gt;1. Make 6 into a program to create multiple files&lt;br /&gt;:(1)pseudo code for ''my_program''&lt;br /&gt;for (i=0;i&lt;100;i++)&gt; your_pagename[i].html}&lt;br /&gt;&lt;br /&gt;:(2)Change the file to executable file&lt;br /&gt;:(3)Key in [path]/my_program&lt;br /&gt;:(4)chmod +x my_program (This is to make the file executable.)&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Learning Unix Commands&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;'''Environment Control'''&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;# refers to comments&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;1) cd d # Change to directory d&lt;br /&gt;&lt;/pre&gt;&lt;pre&gt;2) mkdir d #Create new directory d&lt;br /&gt;&lt;/pre&gt;&lt;pre&gt;3) rmdir d #Remove directory d&lt;br /&gt;&lt;br /&gt;4) rm -r existingdirectories # Will delete the existing directory named 'existingdirectories' and all directories and files below it.&lt;br /&gt;&lt;br /&gt;5) rm -rf existingdirectories # Will delete the existing directory named 'existingdirectories' and all directories and files below it. Even if it runs into an exception that it would usually prompt for user interaction.&lt;br /&gt;the r is recursive and the f is force.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Be careful with the "rm -r" command. Don't run it as root. And stay away from the root directory if you plan on running it.&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;pre&gt;6) mv f1 [f2...] d #Move file f to directory d&lt;br /&gt;&lt;/pre&gt;&lt;pre&gt;7) mv d1 d2 #Rename directory d1 as d2&lt;br /&gt;&lt;/pre&gt;&lt;pre&gt;8) passwd #Change password&lt;br /&gt;&lt;/pre&gt;&lt;pre&gt;9) alias name1 name2 #Create command alias (csh/tcsh)&lt;br /&gt;&lt;/pre&gt;&lt;pre&gt;10) alias name1="name2" #Create command alias (ksh/bash)&lt;br /&gt;&lt;/pre&gt;&lt;pre&gt;11) unalias name1[na2...] #Remove command alias na&lt;br /&gt;&lt;/pre&gt;&lt;pre&gt;12) ssh nd #Login securely to remote node&lt;br /&gt;&lt;/pre&gt;&lt;pre&gt;13) exit #End terminal session&lt;br /&gt;&lt;/pre&gt;&lt;pre&gt;14) setenv name v #Set env var to value v (csh/tcsh)&lt;br /&gt;&lt;/pre&gt;&lt;pre&gt;export name="v" #set environment variable to value v (ksh/bash)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;b&gt;Miscellaneous commands&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;1)&lt;pre&gt; ls &lt;/pre&gt;# is to list the directory&lt;br /&gt;&lt;br /&gt;2)&lt;pre&gt;cp &lt;/pre&gt;# copy&lt;br /&gt;&lt;br /&gt;3)&lt;pre&gt;wc &lt;/pre&gt;# word count&lt;br /&gt;&lt;br /&gt;4)&lt;pre&gt;ls -al | grep root |awk '{print "file: $9, S8" }' &lt;/pre&gt;# to extract the 8th and the 9th column; details of all the files/directories called root is ls&lt;br /&gt;&lt;br /&gt;5)&lt;pre&gt;wc -l &lt;/pre&gt; # count number of lines&lt;br /&gt;&lt;br /&gt;6)&lt;pre&gt; &gt; &lt;/pre&gt; # is to funnel to a new file&lt;br /&gt;&lt;br /&gt;7)&lt;pre&gt; ls -al | grep root |awk '{print "file: $9, S8" }' &gt; x.file &lt;/pre&gt; # transfer the output to a file name, in this case called x.file&lt;br /&gt;&lt;br /&gt;8)&lt;pre&gt; pico x.file &lt;/pre&gt;# a text editor that helps you read the x file; to get out from pico, type ctrl x; pico is very sophisticated&lt;br /&gt;&lt;br /&gt;9)&lt;pre&gt; vi x.file &lt;/pre&gt; # a text editor that helps you read the x file&lt;br /&gt;&lt;br /&gt;10)&lt;pre&gt; :q &lt;/pre&gt; # helps exit vi, which is very obscure.&lt;br /&gt;&lt;br /&gt;11)&lt;pre&gt; :q! &lt;/pre&gt; # helps exit vi, which is very obscure.&lt;br /&gt;&lt;br /&gt;12)&lt;pre&gt; pwd &lt;/pre&gt; # print the current working directory&lt;br /&gt;&lt;br /&gt;13)&lt;pre&gt; cd &lt;/pre&gt; # is to change the directory&lt;br /&gt;&lt;br /&gt;14)&lt;pre&gt; tab &lt;/pre&gt; #use it to automatically fill in the blanks; auto completion&lt;br /&gt;&lt;br /&gt;15)&lt;pre&gt; key up and down &lt;/pre&gt; # help you look at commands typed earlier&lt;br /&gt;&lt;br /&gt;16)&lt;pre&gt; echo "ls -al /root/Desktop/ |awk '{print $5 "b", S9 }' " &gt;tinwee &lt;/pre&gt; # creates an executable file&lt;br /&gt;&lt;br /&gt;17)&lt;pre&gt; ./tinwee &lt;/pre&gt;# executing tinwee, but will not work because you have no permission&lt;br /&gt;&lt;br /&gt;18)&lt;pre&gt; chmod 755 tinwee &lt;/pre&gt; # give full access to run tinwee&lt;br /&gt;&lt;br /&gt;19)&lt;pre&gt; pico tinwee &lt;/pre&gt; # to make the program more generic by modifying the code&lt;br /&gt;&lt;br /&gt;20)&lt;pre&gt; ls -al $1 |awk '{print $5 "bytes", $9 } &lt;/pre&gt; #change the code to more generic instead of hardwiring it&lt;br /&gt;&lt;br /&gt;21)&lt;pre&gt; tinwee / &lt;/pre&gt; # takes / as $1, which in this case is root; note that this $1 is different from the #awk $1&lt;br /&gt;&lt;br /&gt;22)&lt;pre&gt; ln tinwee t &lt;/pre&gt; # creating a shortname or short cut (soft link) for tinwee (alias); make sure it is unique alias&lt;br /&gt;&lt;br /&gt;23)&lt;pre&gt; echo $PATH | sed 's/:/\n/g' &lt;/pre&gt; # this is to show the path of all the directories and files line by #line&lt;br /&gt;&lt;br /&gt;24)&lt;pre&gt; echo $PATH | sed 's/:/\n/g' |more &lt;/pre&gt;# show little by little&lt;br /&gt;&lt;br /&gt;25)&lt;pre&gt; echo $PATH | sed 's/:/\n/g' |less &lt;/pre&gt; # show less&lt;br /&gt;&lt;br /&gt;26)&lt;pre&gt; echo $PATH | sed 's/:/\n/g' |cat &lt;/pre&gt;# to show all&lt;br /&gt;&lt;br /&gt;27)&lt;pre&gt; echo $PATH | sed 's/:/\n/g' |tac &lt;/pre&gt;# to show in reverse order&lt;br /&gt;&lt;br /&gt;28)&lt;pre&gt; echo $PATH | sed 's/:/\n/g' |less |rev &lt;/pre&gt;# show less but in reverse order&lt;br /&gt;&lt;br /&gt;29)&lt;pre&gt; q &lt;/pre&gt; # to quit while looking at the results of these&lt;br /&gt;&lt;br /&gt;30)&lt;pre&gt; ctrl + C &lt;/pre&gt; #to quit while looking at these&lt;br /&gt;&lt;br /&gt;31) Create a translation tool, simple one&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;echo "ATGCTTA"&lt;br /&gt;" |rev |tac |tr "atcg" "tagc" &lt;/pre&gt;&lt;br /&gt;# rev reverses the sequence, tac show it in reverse order and tr does the translation&lt;br /&gt;&lt;br /&gt;32)&lt;pre&gt;alias &lt;/pre&gt;#does not link a file but substitutes for the name&lt;br /&gt;&lt;br /&gt;33)&lt;pre&gt; &gt;&gt; &lt;/pre&gt;#will append to the bottom of a file that already exists&lt;br /&gt;&lt;br /&gt;34)&lt;pre&gt; &gt; &lt;/pre&gt;#append to a new file&lt;br /&gt;&lt;br /&gt;35) Creata database of your own&lt;br /&gt;&lt;pre&gt;lynx http://aps.unmc.edu/AP/database/query_output.php?ID=00286 &gt; AP00286.html &lt;/pre&gt;#saves the web page to AP00286.html but it is not saving the source code&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;lynx -source http://aps.unmc.edu/AP/database/query_output.php?ID=00286 &gt; AP00286.html&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;#saves the web page to AP00286.html and this time it is saving the source code&lt;br /&gt;echo "lynx -source http://aps.unmc.edu/AP/database/query_output.php?ID=00286 &gt; AP00286.html" &gt;getapd&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;pico getapd &lt;/pre&gt;# read the getapd file&lt;br /&gt;change &lt;pre&gt;lynx -source http://aps.unmc.edu/AP/database/query_output.php?ID=00286 &gt; AP00286.html&lt;/pre&gt; to&lt;br /&gt;&lt;pre&gt;lynx -source http://aps.unmc.edu/AP/database/query_output.php?ID=0$1 &gt; AP0$1.html &lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;# this is making the code generic&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;getapd AP00287 &lt;/pre&gt;# type on the command prompt to download the record 00287 without typing the whole command&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;pico getapd &lt;/pre&gt;# open the getapd file to make the command more generic&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;for i in 'Seq -w 1137`&lt;br /&gt;do&lt;br /&gt;lynx -source http://aps.unmc.edu/AP/database/query_output.php?ID=0$i &gt; AP0$i.html&lt;br /&gt;sleep 3&lt;br /&gt;done&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;./getapd &lt;/pre&gt;#run getapd for automatic download&lt;br /&gt;&lt;br /&gt;36. &lt;pre&gt;" " &lt;/pre&gt;# double quotes are used whe you want to evaluate the variable. For example "$PATH" will evaluate the dollar sign and give you relevant matches&lt;br /&gt;37. &lt;pre&gt;' ' &lt;/pre&gt;# are used when one wants to print the special characters without evaluating them. For example, '$PATH' will return $PATH&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5583061135147499656-820021472049692020?l=bioinfocodelets.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bioinfocodelets.blogspot.com/feeds/820021472049692020/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/0-understanding-unix-environment.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/820021472049692020'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5583061135147499656/posts/default/820021472049692020'/><link rel='alternate' type='text/html' href='http://bioinfocodelets.blogspot.com/2009/03/0-understanding-unix-environment.html' title='0) Understanding Unix Environment'/><author><name>Asif M. Khan</name><uri>http://www.blogger.com/profile/08184259065143974222</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
