Re: uniq without sort <-------------- GURU NEEDED

Re: uniq without sort <-------------- GURU NEEDED

am 25.01.2008 17:41:28 von John

On Jan 24, 6:45 pm, gnuist...@gmail.com wrote:
> This is a tough problem, and needs a guru.
>
> I know it is very easy to find uniq or non-uniq lines if you scramble
> all of them and sort them. Its trivially
>
> echo -e "a\nc\nd\nb\nc\nd" | sort | uniq
>
> $ echo -e "a\nc\nd\nb\nc\nd"
> a
> c
> d
> b
> c
> d
>
> $ echo -e "a\nc\nd\nb\nc\nd"|sort|uniq
> a
> b
> c
> d
>
> So it is TRIVIAL with sort.
>
> I want uniq without sorting the initial order.
>
> The algorithm is this. For every line, look above if there is another
> line like it. If so, then ignore it. If not, then output it. I am
> sure, I can spend some time to write this in C. But what is the
> solution using shell ? This way I can get an output that preserves the
> order of first occurrence. It is needed in many problems.
>
> Thanks to the star who can help
> gnuist

Just use AWK. Here's how I would do it:

#!/bin/awk

BEGIN {
fi = 0
}

{ found = 0
for (i = 0; i < fi; i++) {
if (flist[i] == $0) {
found = 1
}
}
if (found == 0) {
flist[fi] = $0
print $0
fi++
}
}

It's actually pretty fast and AWK's line-oriented design makes it
perfect for the task. Hopefully this little program meets your
requirements!

John

Re: uniq without sort <-------------- GURU NEEDED

am 25.01.2008 18:10:01 von Ed Morton

On 1/25/2008 10:41 AM, John wrote:
> On Jan 24, 6:45 pm, gnuist...@gmail.com wrote:
>
>>This is a tough problem, and needs a guru.
>>
>>I know it is very easy to find uniq or non-uniq lines if you scramble
>>all of them and sort them. Its trivially
>>
>>echo -e "a\nc\nd\nb\nc\nd" | sort | uniq
>>
>>$ echo -e "a\nc\nd\nb\nc\nd"
>>a
>>c
>>d
>>b
>>c
>>d
>>
>>$ echo -e "a\nc\nd\nb\nc\nd"|sort|uniq
>>a
>>b
>>c
>>d
>>
>>So it is TRIVIAL with sort.
>>
>>I want uniq without sorting the initial order.
>>
>>The algorithm is this. For every line, look above if there is another
>>line like it. If so, then ignore it. If not, then output it. I am
>>sure, I can spend some time to write this in C. But what is the
>>solution using shell ? This way I can get an output that preserves the
>>order of first occurrence. It is needed in many problems.
>>
>>Thanks to the star who can help
>>gnuist
>
>
> Just use AWK. Here's how I would do it:
>
> #!/bin/awk
>
> BEGIN {
> fi = 0
> }
>
> { found = 0
> for (i = 0; i < fi; i++) {
> if (flist[i] == $0) {
> found = 1
> }
> }
> if (found == 0) {
> flist[fi] = $0
> print $0
> fi++
> }
> }
>
> It's actually pretty fast and AWK's line-oriented design makes it
> perfect for the task. Hopefully this little program meets your
> requirements!

"little" :-). Try this instead:

!a[$0]++

Regards,

Ed.