Re: uniq without sort <-------------- GURU NEEDED
am 25.01.2008 06:04:18 von Dann CorbitOn Jan 24, 6:45=A0pm, gnuist...@gmail.com wrote:
> This is a tough problem, and needs a guru.
>
> I know it is very easy to find uniq or non-uniq lines if you scramble
> all of them and sort them. Its trivially
>
> echo -e "a\nc\nd\nb\nc\nd" | sort | uniq
>
> $ echo -e "a\nc\nd\nb\nc\nd"
> a
> c
> d
> b
> c
> d
>
> $ echo -e "a\nc\nd\nb\nc\nd"|sort|uniq
> a
> b
> c
> d
>
> So it is TRIVIAL with sort.
>
> I want uniq without sorting the initial order.
>
> The algorithm is this. For every line, look above if there is another
> line like it. If so, then ignore it. If not, then output it. I am
> sure, I can spend some time to write this in C. But what is the
> solution using shell ? This way I can get an output that preserves the
> order of first occurrence. It is needed in many problems.
You have no C question here that I can discern.
Read the file once, forming a hash table. The hash table has 2
entries:
A. The hash code
B. The string
If the string is already in the table, ignore it.
Now, iterate over the hash table and dump out the strings.
No sorting is required.