Sunday, April 19, 2009

17) Basic shell command to delimit protein/DNA sequence

The 'sed' shell command delimits protein/DNA sequences such that each base/residue is seperated by a comma or a specified symbol:

sed 's/./&,/g' input.txt>output.txt

Note: '.' matches any character in the file and '&' replaces every match with itself. Take note that this is a basic command and can be improved in many ways to perform more sophisticated tasks.

No comments:

Post a Comment