linux - Bash: Read in file, edit line, output to new file -

August 15, 2014

i new linux , new scripting. working in linux environment using bash. need following things: 1. read txt file line line 2. delete first line 3. remove middle part of each line after first 4. copy changes new txt file

each line after first has 3 sections, first ends in .pdf , third begins r0 middle section has no consistency.

example of 2 lines in file:

 r01234567_high transcript_01234567.pdf  high school transcript  r01234567 r01891023_application_01891023127.pdf   application r01891023

here have far. i'm reading file, printing screen , copying file.

#! /bin/bash cd /usr/local/bin; #echo "list of files:"; #ls; index in *.txt; echo "file: ${index}"; echo "reading..." exec<${index} value=0 while read line    #value='expr ${value} +1';    echo ${line}; done echo "read done ${index}"; cp ${index} /usr/local/bin/test2; echo "file ${index} moved test2";  done

so question is, how can delete middle bit of each line, after .pdf before r0...?

updated answer assuming tab delim

since there tab delimiter, cinch awk. borrowing deleted answer , @geek1011 deleted answer:

awk -f"\t" '{print $1, $nf}' infile.txt

here awk splits each record in file tab, prints first field $1 , last field $nf nf built in awk variable record's number of fields; prepending dollar sign, says "the value of last field in record".

original answer assuming space delimiter

leaving here in case has space delimited nonsense assumed.

you can use awk instead of using bash read through file:

awk 'nr>1{for(i=1; $i!~/pdf/; ++i) firstrec=firstrec" "$i} nr>1{print firstrec,$i,$nf}' yourfile.txt

awk reads files line line , processes each record comes across. fields delimited automatically white space. first field $1, second $2 , on. awk has built in variables; here use nf number of fields contained in record, , nr record number being processed.

this script following:

if record number greater 1 (not header) then
loop through each field (separated white space here) until find field has "pdf" in ($i!~/pdf/). store find until field in variable called firstrec separated space (firstrec=firstrec" "$i).
print out firstrec, print out whatever field stopped iterating on (the 1 contains "pdf") $i, , print out last field in record, $nf (print firstrec,$i,$nf)

you can direct file:

awk 'nr>1{for(i=1; $i!~/pdf/; ++i) firstrec=firstrec" "$i} nr>1{print firstrec,$i,$nf}' yourfile.txt > outfile.txt

sed may cleaner way of going here since, if pdf file has more 1 space separating characters, lose multiple spaces.

Search This Blog

Perl