Re: uniq without sort <-------------- GURU NEEDED

Re: uniq without sort <-------------- GURU NEEDED

am 29.01.2008 18:06:35 von Abigail

_
gnuist006@gmail.com (gnuist006@gmail.com) wrote on VCCLX September
MCMXCIII in :
++ This is a tough problem, and needs a guru.
++
++ I know it is very easy to find uniq or non-uniq lines if you scramble
++ all of them and sort them. Its trivially
++
++ echo -e "a\nc\nd\nb\nc\nd" | sort | uniq
++
++ $ echo -e "a\nc\nd\nb\nc\nd"
++ a
++ c
++ d
++ b
++ c
++ d
++
++ $ echo -e "a\nc\nd\nb\nc\nd"|sort|uniq
++ a
++ b
++ c
++ d
++
++
++ So it is TRIVIAL with sort.
++
++ I want uniq without sorting the initial order.
++
++ The algorithm is this. For every line, look above if there is another
++ line like it. If so, then ignore it. If not, then output it. I am
++ sure, I can spend some time to write this in C. But what is the
++ solution using shell ? This way I can get an output that preserves the
++ order of first occurrence. It is needed in many problems.

This solution uses sort, but doesn't sort the output.

nl -s ':' | sort -k 2 -u | sort -n | cut -d ':' -f 2- < your_file


Abigail