python - How would I collapse rows based on a value in a column? -
i’ll describe mean in more detail here. suppose have data sheet looks this:
+-----------+---------+---------+---------+---------+---------+---------+--------------+ | | person1 | person2 | person4 | person4 | person5 | person6 | city | +-----------+---------+---------+---------+---------+---------+---------+--------------+ | january | - | - | yes | - | yes | - | sanfrancisco | | febuary | yes | - | - | - | - | - | sanfrancisco | | march | - | - | - | - | - | - | sanfrancisco | | april | - | - | - | - | - | - | newyork | | may | yes | - | - | - | - | - | newyork | | june | - | - | - | - | - | - | newyork | | july | - | - | - | - | yes | - | newyork | | august | - | - | - | - | - | - | newyork | | september | - | - | - | - | - | - | miami | | november | - | - | - | - | - | yes | miami | | december | - | - | - | - | - | - | miami | +-----------+---------+---------+---------+---------+---------+---------+--------------+ ignoring ascii stackoverflow formatting, it’s simple spreadsheet tracks 6 people based on city they’ve been in months.
what want know is, people have visited cities. condensing list this:
+---------+---------+---------+---------+---------+---------+--------------+ | person1 | person2 | person4 | person4 | person5 | person6 | city | +---------+---------+---------+---------+---------+---------+--------------+ | yes | - | yes | - | yes | - | sanfrancisco | | yes | - | - | - | yes | - | newyork | | - | - | - | - | - | yes | miami | +---------+---------+---------+---------+---------+---------+--------------+ each row 1 city, , contains people have visited it. there optimum way this, or rather, there sort of tr(squeeze)/sed tool this? if had code this, optimum logic be?
$ cat tst.awk function prt() { if ( prev != "" ) { (i=2;i<=nf;i++) { printf "%s%s", vals[i], (i<nf ? ofs : ors) } } delete vals } begin { fs=ofs="," } $nf != prev { prt() } { (i=1;i<=nf;i++) { vals[i] = (vals[i] ~ /[[:alpha:]]/ ? vals[i] : $i) } prev = $nf } end { prt() } $ awk -f tst.awk file person1,person2,person4,person4,person5,person6,city yes,-,yes,-,yes,-,sanfrancisco yes,-,-,-,yes,-,newyork -,-,-,-,-,yes,miami the above assumes input format csv this:
$ cat file month,person1,person2,person4,person4,person5,person6,city january,-,-,yes,-,yes,-,sanfrancisco febuary,yes,-,-,-,-,-,sanfrancisco march,-,-,-,-,-,-,sanfrancisco april,-,-,-,-,-,-,newyork may,yes,-,-,-,-,-,newyork june,-,-,-,-,-,-,newyork july,-,-,-,-,yes,-,newyork august,-,-,-,-,-,-,newyork september,-,-,-,-,-,-,miami november,-,-,-,-,-,yes,miami december,-,-,-,-,-,-,miami and want csv output.
Comments
Post a Comment