linux - Bash: Read in file, edit line, output to new file -
i new linux , new scripting. working in linux environment using bash. need following things: 1. read txt file line line 2. delete first line 3. remove middle part of each line after first 4. copy changes new txt file
each line after first has 3 sections, first ends in .pdf , third begins r0 middle section has no consistency.
example of 2 lines in file:
r01234567_high transcript_01234567.pdf high school transcript r01234567 r01891023_application_01891023127.pdf application r01891023
here have far. i'm reading file, printing screen , copying file.
#! /bin/bash cd /usr/local/bin; #echo "list of files:"; #ls; index in *.txt; echo "file: ${index}"; echo "reading..." exec<${index} value=0 while read line #value='expr ${value} +1'; echo ${line}; done echo "read done ${index}"; cp ${index} /usr/local/bin/test2; echo "file ${index} moved test2"; done
so question is, how can delete middle bit of each line, after .pdf before r0...?
updated answer assuming tab delim
since there tab delimiter, cinch awk. borrowing deleted answer , @geek1011 deleted answer:
awk -f"\t" '{print $1, $nf}' infile.txt
here awk
splits each record in file tab, prints first field $1
, last field $nf
nf
built in awk
variable record's number of fields; prepending dollar sign, says "the value of last field in record".
original answer assuming space delimiter
leaving here in case has space delimited nonsense assumed.
you can use awk
instead of using bash read through file:
awk 'nr>1{for(i=1; $i!~/pdf/; ++i) firstrec=firstrec" "$i} nr>1{print firstrec,$i,$nf}' yourfile.txt
awk
reads files line line , processes each record comes across. fields delimited automatically white space. first field $1
, second $2
, on. awk
has built in variables; here use nf
number of fields contained in record, , nr
record number being processed.
this script following:
- if record number greater 1 (not header) then
- loop through each field (separated white space here) until find field has "pdf" in (
$i!~/pdf/
). store find until field in variable calledfirstrec
separated space (firstrec=firstrec" "$i
). - print out
firstrec
, print out whatever field stopped iterating on (the 1 contains "pdf")$i
, , print out last field in record,$nf
(print firstrec,$i,$nf
)
you can direct file:
awk 'nr>1{for(i=1; $i!~/pdf/; ++i) firstrec=firstrec" "$i} nr>1{print firstrec,$i,$nf}' yourfile.txt > outfile.txt
sed
may cleaner way of going here since, if pdf
file has more 1 space separating characters, lose multiple spaces.
Comments
Post a Comment