Tuesday, October 20, 2009

23) Simple perl script to generate a header line for each sequence in a fasta file

A simple script to generate a header line (starting with >) for sequences without headers in a faster file. I have included plenty of comments in the script, for people who are interested to learn Perl basics.

Output: An output file, xxx_processed.fasta will be produced

Note: There are 2 conditions for the script to work

1. Sequences must be formatted in a way such that each line contains 1 sequence only
2. Sequences must be saved in *.fasta extension (of course the script can be modified to
take any other extensions)

Code (without comments):

#!/usr/bin/perl

print "\n NOTE: Output will be filename_processed.fasta";
print "\nPlease enter filename (without .fasta extension): ";

$input = <>;
chomp ($input);

$number=1;

open (INFILE, "$input.fasta") or die "Cannot open infile!";
open (OUT, ">"."$input"."_processed.fasta") or die "Cannot open outfile!";

while ($line=)
{

print OUT ">$number\n";
print OUT $line;

$number++;

}


Code (with comments):

#!/usr/bin/perl

# Text to print at command line (\n means print a new line)
print "\n NOTE: Output will be filename_processed.fasta";
print "\nPlease enter filename (without .fasta extension): ";

# Processing file name at command line, file name will be pasesd to the variable $input -> $input = file name
$input = <>;
chomp ($input);

# Setting the value of variable, $number, to 1
$number=1;

# Opening the input file, generate error message "Cannot open infile!" if there's an error
# INFILE is the reference (filehandle) for your input file
open (INFILE, "$input.fasta") or die "Cannot open infile!";

# Creating the output file, the symbol > is to create and overwrite a new file
open (OUT, ">"."$input"."_processed.fasta") or die "Cannot open outfile!";

# Passing each line in the input file (INFILE) to the variable $line
while ($line=)
{

# Print, in the output file (OUT), fasta header starting with the symbol >, followed by the variable $number, and a new line, can customize the header to print if you want to
print OUT ">$number\n";

# Print out current line from input file
print OUT $line;

# Change the value of $number so that $number=$number+1
$number++;
}

No comments:

Post a Comment