combining files

combining files

am 07.09.2007 14:58:51 von ginger.m.griffin

Hello

I have several text files which in which many of the files ending
records overlap with the beginning records of another file. I'd like
to combine two of the files so that the records are continuous, and
this means that the overlap in one of the files needs to be removed.
What's a good way to do this?

For instance, if I have the two text files
file1.txt
09/06/07 01:23:49 PM,1189113829,0,0,000170
09/06/07 01:25:29 PM,1189113929,100,1.66,000138
09/06/07 01:25:44 PM,1189113944,115,1.91,000135
09/06/07 01:26:04 PM,1189113964,135,2.25,000148
09/06/07 01:27:19 PM,1189114039,210,3.50,000116

file2.txt
09/06/07 01:25:44 PM,1189113944,115,1.91,000135
09/06/07 01:26:04 PM,1189113964,135,2.25,000148
09/06/07 01:27:19 PM,1189114039,210,3.50,000116
09/06/07 01:27:42 PM,1189114062,233,3.88,000114
09/06/07 01:27:52 PM,1189114072,243,4.05,000119
09/06/07 01:29:26 PM,1189114166,337,5.61,000105

They overlap in the last three lines of file1.txt and the first three
lines of file2.txt . I'd like to bandage these two together to get:

09/06/07 01:23:49 PM,1189113829,0,0,000170
09/06/07 01:25:29 PM,1189113929,100,1.66,000138
09/06/07 01:25:44 PM,1189113944,115,1.91,000135
09/06/07 01:26:04 PM,1189113964,135,2.25,000148
09/06/07 01:27:19 PM,1189114039,210,3.50,000116
09/06/07 01:27:42 PM,1189114062,233,3.88,000114
09/06/07 01:27:52 PM,1189114072,243,4.05,000119
09/06/07 01:29:26 PM,1189114166,337,5.61,000105

thanks!

Re: combining files

am 07.09.2007 15:02:00 von Miles

On Sep 7, 7:58 am, gin_g wrote:
> Hello
>
> I have several text files which in which many of the files ending
> records overlap with the beginning records of another file. I'd like
> to combine two of the files so that the records are continuous, and
> this means that the overlap in one of the files needs to be removed.
> What's a good way to do this?
>
> For instance, if I have the two text files
> file1.txt
> 09/06/07 01:23:49 PM,1189113829,0,0,000170
> 09/06/07 01:25:29 PM,1189113929,100,1.66,000138
> 09/06/07 01:25:44 PM,1189113944,115,1.91,000135
> 09/06/07 01:26:04 PM,1189113964,135,2.25,000148
> 09/06/07 01:27:19 PM,1189114039,210,3.50,000116
>
> file2.txt
> 09/06/07 01:25:44 PM,1189113944,115,1.91,000135
> 09/06/07 01:26:04 PM,1189113964,135,2.25,000148
> 09/06/07 01:27:19 PM,1189114039,210,3.50,000116
> 09/06/07 01:27:42 PM,1189114062,233,3.88,000114
> 09/06/07 01:27:52 PM,1189114072,243,4.05,000119
> 09/06/07 01:29:26 PM,1189114166,337,5.61,000105
>
> They overlap in the last three lines of file1.txt and the first three
> lines of file2.txt . I'd like to bandage these two together to get:
>
> 09/06/07 01:23:49 PM,1189113829,0,0,000170
> 09/06/07 01:25:29 PM,1189113929,100,1.66,000138
> 09/06/07 01:25:44 PM,1189113944,115,1.91,000135
> 09/06/07 01:26:04 PM,1189113964,135,2.25,000148
> 09/06/07 01:27:19 PM,1189114039,210,3.50,000116
> 09/06/07 01:27:42 PM,1189114062,233,3.88,000114
> 09/06/07 01:27:52 PM,1189114072,243,4.05,000119
> 09/06/07 01:29:26 PM,1189114166,337,5.61,000105
>
> thanks!

cat file1 file2 | sort -u
09/06/07 01:23:49 PM,1189113829,0,0,000170
09/06/07 01:25:29 PM,1189113929,100,1.66,000138
09/06/07 01:25:44 PM,1189113944,115,1.91,000135
09/06/07 01:26:04 PM,1189113964,135,2.25,000148
09/06/07 01:27:19 PM,1189114039,210,3.50,000116
09/06/07 01:27:42 PM,1189114062,233,3.88,000114
09/06/07 01:27:52 PM,1189114072,243,4.05,000119
09/06/07 01:29:26 PM,1189114166,337,5.61,000105

Re: combining files

am 07.09.2007 15:06:09 von Jeroen van Nieuwenhuizen

On Fri, 07 Sep 2007 05:58:51 -0700
somebody claiming to be gin_g wrote:
>
> Hello
>
> I have several text files which in which many of the files ending
> records overlap with the beginning records of another file. I'd like
> to combine two of the files so that the records are continuous, and
> this means that the overlap in one of the files needs to be removed.
> What's a good way to do this?
>
> For instance, if I have the two text files
> file1.txt
> 09/06/07 01:23:49 PM,1189113829,0,0,000170
> 09/06/07 01:25:29 PM,1189113929,100,1.66,000138
> 09/06/07 01:25:44 PM,1189113944,115,1.91,000135
> 09/06/07 01:26:04 PM,1189113964,135,2.25,000148
> 09/06/07 01:27:19 PM,1189114039,210,3.50,000116
>
> file2.txt
> 09/06/07 01:25:44 PM,1189113944,115,1.91,000135
> 09/06/07 01:26:04 PM,1189113964,135,2.25,000148
> 09/06/07 01:27:19 PM,1189114039,210,3.50,000116
> 09/06/07 01:27:42 PM,1189114062,233,3.88,000114
> 09/06/07 01:27:52 PM,1189114072,243,4.05,000119
> 09/06/07 01:29:26 PM,1189114166,337,5.61,000105
>
> They overlap in the last three lines of file1.txt and the first three
> lines of file2.txt . I'd like to bandage these two together to get:
>
> 09/06/07 01:23:49 PM,1189113829,0,0,000170
> 09/06/07 01:25:29 PM,1189113929,100,1.66,000138
> 09/06/07 01:25:44 PM,1189113944,115,1.91,000135
> 09/06/07 01:26:04 PM,1189113964,135,2.25,000148
> 09/06/07 01:27:19 PM,1189114039,210,3.50,000116
> 09/06/07 01:27:42 PM,1189114062,233,3.88,000114
> 09/06/07 01:27:52 PM,1189114072,243,4.05,000119
> 09/06/07 01:29:26 PM,1189114166,337,5.61,000105

sort file1.txt file2.txt | uniq

Kind regards,

Jeroen.




--
ir. Jeroen van Nieuwenhuizen
Email: jnieuwen [at] jeroen [dot] se
I know I'm not perfect but I can smile

Re: combining files

am 07.09.2007 15:53:02 von Glenn Jackman

At 2007-09-07 08:58AM, "gin_g" wrote:
> I have several text files which in which many of the files ending
> records overlap with the beginning records of another file. I'd like
> to combine two of the files so that the records are continuous, and
> this means that the overlap in one of the files needs to be removed.
> What's a good way to do this?

At 2007-09-07 09:02AM, "Miles" wrote:
> cat file1 file2 | sort -u

At 2007-09-07 09:06AM, "Jeroen van Nieuwenhuizen" wrote:
> sort file1.txt file2.txt | uniq

How about:
sort -u file1 file2

--
Glenn Jackman
"You can only be young once. But you can always be immature." -- Dave Barry

Re: combining files

am 07.09.2007 16:57:35 von Jeroen van Nieuwenhuizen

On 7 Sep 2007 13:53:02 GMT
somebody claiming to be Glenn Jackman wrote:
> At 2007-09-07 08:58AM, "gin_g" wrote:
>> I have several text files which in which many of the files ending
>> records overlap with the beginning records of another file. I'd like
>> to combine two of the files so that the records are continuous, and
>> this means that the overlap in one of the files needs to be removed.
>> What's a good way to do this?
>
> At 2007-09-07 09:02AM, "Miles" wrote:
>> cat file1 file2 | sort -u
>
> At 2007-09-07 09:06AM, "Jeroen van Nieuwenhuizen" wrote:
>> sort file1.txt file2.txt | uniq
>
> How about:
> sort -u file1 file2

Does not work on solaris 9 for example. thats why I always use
the | uniq construct.

kinds regards,

Jeroen.

--
ir. Jeroen van Nieuwenhuizen
Email: jnieuwen [at] jeroen [dot] se
I know I'm not perfect but I can smile

Re: combining files

am 07.09.2007 17:34:35 von Stephane CHAZELAS

2007-09-07, 14:57(+00), Jeroen van Nieuwenhuizen:
[...]
>> How about:
>> sort -u file1 file2
>
> Does not work on solaris 9 for example. thats why I always use
> the | uniq construct.
[...]

Should work on Solaris, it's just that for many other things,
you need to make sure you're in a POSIX environment. The
Unix/POSIX conformant sort is in /usr/xpg4/bin on Solaris.

--
Stéphane

Re: combining files

am 07.09.2007 18:27:16 von Dummy

gin_g wrote:
>
> I have several text files which in which many of the files ending
> records overlap with the beginning records of another file. I'd like
> to combine two of the files so that the records are continuous, and
> this means that the overlap in one of the files needs to be removed.
> What's a good way to do this?
>
> For instance, if I have the two text files
> file1.txt
> 09/06/07 01:23:49 PM,1189113829,0,0,000170
> 09/06/07 01:25:29 PM,1189113929,100,1.66,000138
> 09/06/07 01:25:44 PM,1189113944,115,1.91,000135
> 09/06/07 01:26:04 PM,1189113964,135,2.25,000148
> 09/06/07 01:27:19 PM,1189114039,210,3.50,000116
>
> file2.txt
> 09/06/07 01:25:44 PM,1189113944,115,1.91,000135
> 09/06/07 01:26:04 PM,1189113964,135,2.25,000148
> 09/06/07 01:27:19 PM,1189114039,210,3.50,000116
> 09/06/07 01:27:42 PM,1189114062,233,3.88,000114
> 09/06/07 01:27:52 PM,1189114072,243,4.05,000119
> 09/06/07 01:29:26 PM,1189114166,337,5.61,000105
>
> They overlap in the last three lines of file1.txt and the first three
> lines of file2.txt . I'd like to bandage these two together to get:
>
> 09/06/07 01:23:49 PM,1189113829,0,0,000170
> 09/06/07 01:25:29 PM,1189113929,100,1.66,000138
> 09/06/07 01:25:44 PM,1189113944,115,1.91,000135
> 09/06/07 01:26:04 PM,1189113964,135,2.25,000148
> 09/06/07 01:27:19 PM,1189114039,210,3.50,000116
> 09/06/07 01:27:42 PM,1189114062,233,3.88,000114
> 09/06/07 01:27:52 PM,1189114072,243,4.05,000119
> 09/06/07 01:29:26 PM,1189114166,337,5.61,000105

perl -ne'$x{$_}++||print' file1.txt file2.txt


John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Re: combining files

am 07.09.2007 18:47:48 von William James

On Sep 7, 7:58 am, gin_g wrote:
> Hello
>
> I have several text files which in which many of the files ending
> records overlap with the beginning records of another file. I'd like
> to combine two of the files so that the records are continuous, and
> this means that the overlap in one of the files needs to be removed.
> What's a good way to do this?
>
> For instance, if I have the two text files
> file1.txt
> 09/06/07 01:23:49 PM,1189113829,0,0,000170
> 09/06/07 01:25:29 PM,1189113929,100,1.66,000138
> 09/06/07 01:25:44 PM,1189113944,115,1.91,000135
> 09/06/07 01:26:04 PM,1189113964,135,2.25,000148
> 09/06/07 01:27:19 PM,1189114039,210,3.50,000116
>
> file2.txt
> 09/06/07 01:25:44 PM,1189113944,115,1.91,000135
> 09/06/07 01:26:04 PM,1189113964,135,2.25,000148
> 09/06/07 01:27:19 PM,1189114039,210,3.50,000116
> 09/06/07 01:27:42 PM,1189114062,233,3.88,000114
> 09/06/07 01:27:52 PM,1189114072,243,4.05,000119
> 09/06/07 01:29:26 PM,1189114166,337,5.61,000105
>
> They overlap in the last three lines of file1.txt and the first three
> lines of file2.txt . I'd like to bandage these two together to get:
>
> 09/06/07 01:23:49 PM,1189113829,0,0,000170
> 09/06/07 01:25:29 PM,1189113929,100,1.66,000138
> 09/06/07 01:25:44 PM,1189113944,115,1.91,000135
> 09/06/07 01:26:04 PM,1189113964,135,2.25,000148
> 09/06/07 01:27:19 PM,1189114039,210,3.50,000116
> 09/06/07 01:27:42 PM,1189114062,233,3.88,000114
> 09/06/07 01:27:52 PM,1189114072,243,4.05,000119
> 09/06/07 01:29:26 PM,1189114166,337,5.61,000105
>
> thanks!

awk '!a[$0]++' file1.txt file2.txt

or

ruby -e 'puts ARGF.to_a.uniq' file1.txt file2.txt

Re: combining files

am 07.09.2007 21:15:21 von Jeroen van Nieuwenhuizen

On Fri, 07 Sep 2007 15:34:35 GMT
somebody claiming to be Stephane CHAZELAS wrote:
> 2007-09-07, 14:57(+00), Jeroen van Nieuwenhuizen:
> [...]
>>> How about:
>>> sort -u file1 file2
>>
>> Does not work on solaris 9 for example. thats why I always use
>> the | uniq construct.
> [...]
>
> Should work on Solaris, it's just that for many other things,
> you need to make sure you're in a POSIX environment. The
> Unix/POSIX conformant sort is in /usr/xpg4/bin on Solaris.

Your absolutely right when you say that it can be done under a Solaris
installation. But not without making assumptions about the environment.
Which I of course should have stated, instead of saying solaris 9 does
not support it.

Kind regards,

Jeroen.

--
ir. Jeroen van Nieuwenhuizen
Email: jnieuwen [at] jeroen [dot] se
I know I'm not perfect but I can smile