Confused query question
am 18.08.2005 20:44:04 von kjanuski
------_=_NextPart_001_01C5A424.CEEA59D0
Content-Type: text/plain;
charset="iso-8859-1"
I think I may have gotten over my head in terms of my database knowledge in
trying to gather some data from apache access_logs by using myqsl.
The problem is that the log includes a username to go with an IP when the
user logs in to the web application. Later the log includes errors for
certain IP addresses. I'd like to run a query that shows the errors and the
user.
To do this I've set up one table, errors, with this structure:
IP Date Error ID
with the ID being unique and created through autoincrement.
The other table, users, is problematic and has had varying structures. This
is the current one:
IP Date User ID
with the ID being unique and created through autoincrement.
Since the user may logon many times there could be multiple entries for a
certain user at a certain IP. There also could be a different user at that
IP. So my thought was to include the Date in the table but to truncate the
minute and second. So the date in the second table is something like
"2005-08-18:07".
I then thought I could run a query that searched for errors.IP, errors.Date,
errors.Error, users.User where errors.IP = user.IP and errors.Date like
users.Date. If I skip trying to match dates then I get all of the users who
have used that IP at any time. So I'd thought I'd limit it by saying only
those users who have the same IP but who logged it at a time that was LIKE
the time of the error. This would not be perfect since it would miss
somebody who logged in at "2005-08-18:11:01:00" but who got an error at
"2005-08-18:12:00:00". But it should get anyone who had an error between 11
and 12 a.m. on 2005-08-18. The problem is that I don't know how to put the
value of user.Date in a like statement, e.g.
"where errors.Date like '%users.Date%'.
I realize that for anybody who is familiar with databases that what I've
done probably breaks a lot of smart database rules and best practices. This
is the first time I've found a need to make this sort of query. But I'd
appreciate any thoughts on it. I'm hoping for some way to use the like
syntax. But it may be that I just need to redesign the tables. I'd prefer
not to do this since this is all being done to solve a temporary problem,
find out who is getting errors at a certain web page based on time, ip and
query text. So it's a bit of a throwaway database and query and I don't want
to spend too much time on it.
Thanks for any thoughts,
Ken
I'm still using Mysql 4.0.1 by the way
------_=_NextPart_001_01C5A424.CEEA59D0--
RE: Confused query question
am 18.08.2005 21:02:09 von paulo.urcid
Ken, if you're using apache you might want to try Webalizer. I'm not sure if
it covers all your needs but basically this tool gives you reports from
analyzing web server logs http://www.mrunix.net/webalizer/
Just a thought...
-----Mensaje original-----
De: Januski, Ken [mailto:kjanuski@phillynews.com]
Enviado el: Jueves, 18 de Agosto de 2005 01:44 p.m.
Para: win32@lists.mysql.com
Asunto: Confused query question
I think I may have gotten over my head in terms of my database knowledge in
trying to gather some data from apache access_logs by using myqsl.
The problem is that the log includes a username to go with an IP when the
user logs in to the web application. Later the log includes errors for
certain IP addresses. I'd like to run a query that shows the errors and the
user.
To do this I've set up one table, errors, with this structure:
IP Date Error ID
with the ID being unique and created through autoincrement.
The other table, users, is problematic and has had varying structures. This
is the current one:
IP Date User ID
with the ID being unique and created through autoincrement.
Since the user may logon many times there could be multiple entries for a
certain user at a certain IP. There also could be a different user at that
IP. So my thought was to include the Date in the table but to truncate the
minute and second. So the date in the second table is something like
"2005-08-18:07".
I then thought I could run a query that searched for errors.IP, errors.Date,
errors.Error, users.User where errors.IP = user.IP and errors.Date like
users.Date. If I skip trying to match dates then I get all of the users who
have used that IP at any time. So I'd thought I'd limit it by saying only
those users who have the same IP but who logged it at a time that was LIKE
the time of the error. This would not be perfect since it would miss
somebody who logged in at "2005-08-18:11:01:00" but who got an error at
"2005-08-18:12:00:00". But it should get anyone who had an error between 11
and 12 a.m. on 2005-08-18. The problem is that I don't know how to put the
value of user.Date in a like statement, e.g.
"where errors.Date like '%users.Date%'.
I realize that for anybody who is familiar with databases that what I've
done probably breaks a lot of smart database rules and best practices. This
is the first time I've found a need to make this sort of query. But I'd
appreciate any thoughts on it. I'm hoping for some way to use the like
syntax. But it may be that I just need to redesign the tables. I'd prefer
not to do this since this is all being done to solve a temporary problem,
find out who is getting errors at a certain web page based on time, ip and
query text. So it's a bit of a throwaway database and query and I don't want
to spend too much time on it.
Thanks for any thoughts,
Ken
I'm still using Mysql 4.0.1 by the way
--
MySQL Windows Mailing List
For list archives: http://lists.mysql.com/win32
To unsubscribe: http://lists.mysql.com/win32?unsub=gcdmw-win32@m.gmane.org
RE: Confused query question
am 18.08.2005 21:09:57 von kjanuski
------_=_NextPart_001_01C5A428.6C5230A0
Content-Type: text/plain;
charset="iso-8859-1"
Thanks Urcid,
Once I get this problem solved I probably won't be reading the logs anymore
so I don't want to bother with any new software. And I think even if I did I
might need to use the complex parsing that I've already come up with. I just
need to figure out how to get something usable into mysql. Of course I would
learn something new about databases and mysql if I could figure out the
correct way to do it:-)
Ken
-----Original Message-----
From: Urcid Pliego, Paulo [mailto:paulo.urcid@vw.com.mx]
Sent: Thursday, August 18, 2005 3:02 PM
To: 'Januski, Ken'; win32@lists.mysql.com
Subject: RE: Confused query question
Ken, if you're using apache you might want to try Webalizer. I'm not sure if
it covers all your needs but basically this tool gives you reports from
analyzing web server logs http://www.mrunix.net/webalizer/
Just a thought...
-----Mensaje original-----
De: Januski, Ken [mailto:kjanuski@phillynews.com]
Enviado el: Jueves, 18 de Agosto de 2005 01:44 p.m.
Para: win32@lists.mysql.com
Asunto: Confused query question
I think I may have gotten over my head in terms of my database knowledge in
trying to gather some data from apache access_logs by using myqsl.
The problem is that the log includes a username to go with an IP when the
user logs in to the web application. Later the log includes errors for
certain IP addresses. I'd like to run a query that shows the errors and the
user.
To do this I've set up one table, errors, with this structure:
IP Date Error ID
with the ID being unique and created through autoincrement.
The other table, users, is problematic and has had varying structures. This
is the current one:
IP Date User ID
with the ID being unique and created through autoincrement.
Since the user may logon many times there could be multiple entries for a
certain user at a certain IP. There also could be a different user at that
IP. So my thought was to include the Date in the table but to truncate the
minute and second. So the date in the second table is something like
"2005-08-18:07".
I then thought I could run a query that searched for errors.IP, errors.Date,
errors.Error, users.User where errors.IP = user.IP and errors.Date like
users.Date. If I skip trying to match dates then I get all of the users who
have used that IP at any time. So I'd thought I'd limit it by saying only
those users who have the same IP but who logged it at a time that was LIKE
the time of the error. This would not be perfect since it would miss
somebody who logged in at "2005-08-18:11:01:00" but who got an error at
"2005-08-18:12:00:00". But it should get anyone who had an error between 11
and 12 a.m. on 2005-08-18. The problem is that I don't know how to put the
value of user.Date in a like statement, e.g.
"where errors.Date like '%users.Date%'.
I realize that for anybody who is familiar with databases that what I've
done probably breaks a lot of smart database rules and best practices. This
is the first time I've found a need to make this sort of query. But I'd
appreciate any thoughts on it. I'm hoping for some way to use the like
syntax. But it may be that I just need to redesign the tables. I'd prefer
not to do this since this is all being done to solve a temporary problem,
find out who is getting errors at a certain web page based on time, ip and
query text. So it's a bit of a throwaway database and query and I don't want
to spend too much time on it.
Thanks for any thoughts,
Ken
I'm still using Mysql 4.0.1 by the way
------_=_NextPart_001_01C5A428.6C5230A0--
Re: Confused query question
am 18.08.2005 21:15:21 von SGreen
--=_alternative 006A218B85257061_=
Content-Type: text/plain; charset="US-ASCII"
"Januski, Ken" wrote on 08/18/2005 02:44:04 PM:
> I think I may have gotten over my head in terms of my database knowledge
in
> trying to gather some data from apache access_logs by using myqsl.
>
> The problem is that the log includes a username to go with an IP when
the
> user logs in to the web application. Later the log includes errors for
> certain IP addresses. I'd like to run a query that shows the errors and
the
> user.
>
> To do this I've set up one table, errors, with this structure:
> IP Date Error ID
> with the ID being unique and created through autoincrement.
>
> The other table, users, is problematic and has had varying structures.
This
> is the current one:
> IP Date User ID
> with the ID being unique and created through autoincrement.
>
> Since the user may logon many times there could be multiple entries for
a
> certain user at a certain IP. There also could be a different user at
that
> IP. So my thought was to include the Date in the table but to truncate
the
> minute and second. So the date in the second table is something like
> "2005-08-18:07".
>
> I then thought I could run a query that searched for errors.IP,
errors.Date,
> errors.Error, users.User where errors.IP = user.IP and errors.Date like
> users.Date. If I skip trying to match dates then I get all of the users
who
> have used that IP at any time. So I'd thought I'd limit it by saying
only
> those users who have the same IP but who logged it at a time that was
LIKE
> the time of the error. This would not be perfect since it would miss
> somebody who logged in at "2005-08-18:11:01:00" but who got an error at
> "2005-08-18:12:00:00". But it should get anyone who had an error between
11
> and 12 a.m. on 2005-08-18. The problem is that I don't know how to put
the
> value of user.Date in a like statement, e.g.
> "where errors.Date like '%users.Date%'.
>
> I realize that for anybody who is familiar with databases that what I've
> done probably breaks a lot of smart database rules and best practices.
This
> is the first time I've found a need to make this sort of query. But I'd
> appreciate any thoughts on it. I'm hoping for some way to use the like
> syntax. But it may be that I just need to redesign the tables. I'd
prefer
> not to do this since this is all being done to solve a temporary
problem,
> find out who is getting errors at a certain web page based on time, ip
and
> query text. So it's a bit of a throwaway database and query and I don't
want
> to spend too much time on it.
>
> Thanks for any thoughts,
>
> Ken
>
> I'm still using Mysql 4.0.1 by the way
OK Ken,
You have a version of the "maximum of a group" problem. You need to
identify which user from a specific IP address has the last login
(MAX(DATE) value).
Now, to make this work properly, you need to STOP trimming your login
dates and allow it to be a full DATETIME value. Try this definition as a
Logins table:
CREATE TABLE logins (
IP varchar(15) not null
, logindate datetime not null
, user varchar(75) not null
, primary key (ip, logindate, user)
);
/* Please notice that logindate is NOT a varchar. The primary key (PK) on
(ip, logindate, user) is intentionally in that order. */
/* to find the most recent logins for all of your errors (this could take
a while): */
CREATE TEMPORARY TABLE tmpLastLogin (key (error_id))
SELECT e.id as error_id, max(l.logindate) as logindate
FROM errors e
INNER JOIN logins l
on e.IP = l.IP
and e.date > l.logindate;
/* Now we have enough information to build your desired report (which
error belongs to whom): */
SELECT e.id, e.ip, e.error, l.user
FROM errors e
INNER JOIN tmpLastLogin tll
on tll.error_id = e.id
INNER JOIN logins l
on l.ip = e.ip
and l.logindate = tll.logindate;
/* and because people login all the time, you need to drop your temp table
to get ready for the next time you need to run this query */
DROP TEMPORARY TABLE tmpLastLogin;
This can be very fast if you already know the error you want to research
as the way I wrote the query it will process ALL of your errors at the
same time.
Does what I wrote make sense? I would be very happy to go into greater
detail if you have any questions.
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
--=_alternative 006A218B85257061_=--
RE: Confused query question
am 18.08.2005 21:29:47 von kjanuski
------_=_NextPart_001_01C5A42B.32056310
Content-Type: text/plain;
charset="iso-8859-1"
Thanks Shawn,
This sounds very much like what I need, and shows just how limited my
database knowledge is (!), not to mention how good yours is. Thanks very
much for the speedy reply. I'll give it a try and let you know the results.
Ken
-----Original Message-----
From: SGreen@unimin.com [mailto:SGreen@unimin.com]
Sent: Thursday, August 18, 2005 3:15 PM
To: Januski, Ken
Cc: win32@lists.mysql.com
Subject: Re: Confused query question
"Januski, Ken" wrote on 08/18/2005 02:44:04 PM:
> I think I may have gotten over my head in terms of my database knowledge
in
> trying to gather some data from apache access_logs by using myqsl.
>
> The problem is that the log includes a username to go with an IP when the
> user logs in to the web application. Later the log includes errors for
> certain IP addresses. I'd like to run a query that shows the errors and
the
> user.
>
> To do this I've set up one table, errors, with this structure:
> IP Date Error ID
> with the ID being unique and created through autoincrement.
>
> The other table, users, is problematic and has had varying structures.
This
> is the current one:
> IP Date User ID
> with the ID being unique and created through autoincrement.
>
> Since the user may logon many times there could be multiple entries for a
> certain user at a certain IP. There also could be a different user at that
> IP. So my thought was to include the Date in the table but to truncate the
> minute and second. So the date in the second table is something like
> "2005-08-18:07".
>
> I then thought I could run a query that searched for errors.IP,
errors.Date,
> errors.Error, users.User where errors.IP = user.IP and errors.Date like
> users.Date. If I skip trying to match dates then I get all of the users
who
> have used that IP at any time. So I'd thought I'd limit it by saying only
> those users who have the same IP but who logged it at a time that was LIKE
> the time of the error. This would not be perfect since it would miss
> somebody who logged in at "2005-08-18:11:01:00" but who got an error at
> "2005-08-18:12:00:00". But it should get anyone who had an error between
11
> and 12 a.m. on 2005-08-18. The problem is that I don't know how to put the
> value of user.Date in a like statement, e.g.
> "where errors.Date like '%users.Date%'.
>
> I realize that for anybody who is familiar with databases that what I've
> done probably breaks a lot of smart database rules and best practices.
This
> is the first time I've found a need to make this sort of query. But I'd
> appreciate any thoughts on it. I'm hoping for some way to use the like
> syntax. But it may be that I just need to redesign the tables. I'd prefer
> not to do this since this is all being done to solve a temporary problem,
> find out who is getting errors at a certain web page based on time, ip and
> query text. So it's a bit of a throwaway database and query and I don't
want
> to spend too much time on it.
>
> Thanks for any thoughts,
>
> Ken
>
> I'm still using Mysql 4.0.1 by the way
OK Ken,
You have a version of the "maximum of a group" problem. You need to identify
which user from a specific IP address has the last login (MAX(DATE) value).
Now, to make this work properly, you need to STOP trimming your login dates
and allow it to be a full DATETIME value. Try this definition as a Logins
table:
CREATE TABLE logins (
IP varchar(15) not null
, logindate datetime not null
, user varchar(75) not null
, primary key (ip, logindate, user)
);
/* Please notice that logindate is NOT a varchar. The primary key (PK) on
(ip, logindate, user) is intentionally in that order. */
/* to find the most recent logins for all of your errors (this could take a
while): */
CREATE TEMPORARY TABLE tmpLastLogin (key (error_id))
SELECT e.id as error_id, max(l.logindate) as logindate
FROM errors e
INNER JOIN logins l
on e.IP = l.IP
and e.date > l.logindate;
/* Now we have enough information to build your desired report (which error
belongs to whom): */
SELECT e.id, e.ip, e.error, l.user
FROM errors e
INNER JOIN tmpLastLogin tll
on tll.error_id = e.id
INNER JOIN logins l
on l.ip = e.ip
and l.logindate = tll.logindate;
/* and because people login all the time, you need to drop your temp table
to get ready for the next time you need to run this query */
DROP TEMPORARY TABLE tmpLastLogin;
This can be very fast if you already know the error you want to research as
the way I wrote the query it will process ALL of your errors at the same
time.
Does what I wrote make sense? I would be very happy to go into greater
detail if you have any questions.
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
------_=_NextPart_001_01C5A42B.32056310--
RE: Confused query question
am 19.08.2005 00:08:30 von kjanuski
------_=_NextPart_001_01C5A441.5E1633B0
Content-Type: text/plain;
charset="iso-8859-1"
Shawn,
This worked, sort of. There were 117 entries in the errors table but only 50
rows returned at the end of the query that you suggested. But this has been
a great help in getting me started. I'll see if I can figure out the
discrepancy and if not I'll get back to you.
Thanks again,
Ken
-----Original Message-----
From: SGreen@unimin.com [mailto:SGreen@unimin.com]
Sent: Thursday, August 18, 2005 3:15 PM
To: Januski, Ken
Cc: win32@lists.mysql.com
Subject: Re: Confused query question
"Januski, Ken" wrote on 08/18/2005 02:44:04 PM:
> I think I may have gotten over my head in terms of my database knowledge
in
> trying to gather some data from apache access_logs by using myqsl.
>
> The problem is that the log includes a username to go with an IP when the
> user logs in to the web application. Later the log includes errors for
> certain IP addresses. I'd like to run a query that shows the errors and
the
> user.
>
> To do this I've set up one table, errors, with this structure:
> IP Date Error ID
> with the ID being unique and created through autoincrement.
>
> The other table, users, is problematic and has had varying structures.
This
> is the current one:
> IP Date User ID
> with the ID being unique and created through autoincrement.
>
> Since the user may logon many times there could be multiple entries for a
> certain user at a certain IP. There also could be a different user at that
> IP. So my thought was to include the Date in the table but to truncate the
> minute and second. So the date in the second table is something like
> "2005-08-18:07".
>
> I then thought I could run a query that searched for errors.IP,
errors.Date,
> errors.Error, users.User where errors.IP = user.IP and errors.Date like
> users.Date. If I skip trying to match dates then I get all of the users
who
> have used that IP at any time. So I'd thought I'd limit it by saying only
> those users who have the same IP but who logged it at a time that was LIKE
> the time of the error. This would not be perfect since it would miss
> somebody who logged in at "2005-08-18:11:01:00" but who got an error at
> "2005-08-18:12:00:00". But it should get anyone who had an error between
11
> and 12 a.m. on 2005-08-18. The problem is that I don't know how to put the
> value of user.Date in a like statement, e.g.
> "where errors.Date like '%users.Date%'.
>
> I realize that for anybody who is familiar with databases that what I've
> done probably breaks a lot of smart database rules and best practices.
This
> is the first time I've found a need to make this sort of query. But I'd
> appreciate any thoughts on it. I'm hoping for some way to use the like
> syntax. But it may be that I just need to redesign the tables. I'd prefer
> not to do this since this is all being done to solve a temporary problem,
> find out who is getting errors at a certain web page based on time, ip and
> query text. So it's a bit of a throwaway database and query and I don't
want
> to spend too much time on it.
>
> Thanks for any thoughts,
>
> Ken
>
> I'm still using Mysql 4.0.1 by the way
OK Ken,
You have a version of the "maximum of a group" problem. You need to identify
which user from a specific IP address has the last login (MAX(DATE) value).
Now, to make this work properly, you need to STOP trimming your login dates
and allow it to be a full DATETIME value. Try this definition as a Logins
table:
CREATE TABLE logins (
IP varchar(15) not null
, logindate datetime not null
, user varchar(75) not null
, primary key (ip, logindate, user)
);
/* Please notice that logindate is NOT a varchar. The primary key (PK) on
(ip, logindate, user) is intentionally in that order. */
/* to find the most recent logins for all of your errors (this could take a
while): */
CREATE TEMPORARY TABLE tmpLastLogin (key (error_id))
SELECT e.id as error_id, max(l.logindate) as logindate
FROM errors e
INNER JOIN logins l
on e.IP = l.IP
and e.date > l.logindate;
/* Now we have enough information to build your desired report (which error
belongs to whom): */
SELECT e.id, e.ip, e.error, l.user
FROM errors e
INNER JOIN tmpLastLogin tll
on tll.error_id = e.id
INNER JOIN logins l
on l.ip = e.ip
and l.logindate = tll.logindate;
/* and because people login all the time, you need to drop your temp table
to get ready for the next time you need to run this query */
DROP TEMPORARY TABLE tmpLastLogin;
This can be very fast if you already know the error you want to research as
the way I wrote the query it will process ALL of your errors at the same
time.
Does what I wrote make sense? I would be very happy to go into greater
detail if you have any questions.
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
------_=_NextPart_001_01C5A441.5E1633B0--
RE: Confused query question
am 19.08.2005 00:20:47 von kjanuski
------_=_NextPart_001_01C5A443.153A17E0
Content-Type: text/plain;
charset="iso-8859-1"
Aha! I wasn't quite sure why the sample code that you sent used error_id
when it was the ips I was concerned with. Once I changed error_id to
error_ip it worked just fine.
I think this has solved the problem and been a great learning experience to
boot.
Thanks again,
Ken
-----Original Message-----
From: Januski, Ken [mailto:kjanuski@phillynews.com]
Sent: Thursday, August 18, 2005 6:09 PM
To: SGreen@unimin.com
Cc: win32@lists.mysql.com
Subject: RE: Confused query question
Shawn,
This worked, sort of. There were 117 entries in the errors table but only 50
rows returned at the end of the query that you suggested. But this has been
a great help in getting me started. I'll see if I can figure out the
discrepancy and if not I'll get back to you.
Thanks again,
Ken
-----Original Message-----
From: SGreen@unimin.com [mailto:SGreen@unimin.com]
Sent: Thursday, August 18, 2005 3:15 PM
To: Januski, Ken
Cc: win32@lists.mysql.com
Subject: Re: Confused query question
"Januski, Ken" wrote on 08/18/2005 02:44:04 PM:
> I think I may have gotten over my head in terms of my database knowledge
in
> trying to gather some data from apache access_logs by using myqsl.
>
> The problem is that the log includes a username to go with an IP when the
> user logs in to the web application. Later the log includes errors for
> certain IP addresses. I'd like to run a query that shows the errors and
the
> user.
>
> To do this I've set up one table, errors, with this structure:
> IP Date Error ID
> with the ID being unique and created through autoincrement.
>
> The other table, users, is problematic and has had varying structures.
This
> is the current one:
> IP Date User ID
> with the ID being unique and created through autoincrement.
>
> Since the user may logon many times there could be multiple entries for a
> certain user at a certain IP. There also could be a different user at that
> IP. So my thought was to include the Date in the table but to truncate the
> minute and second. So the date in the second table is something like
> "2005-08-18:07".
>
> I then thought I could run a query that searched for errors.IP,
errors.Date,
> errors.Error, users.User where errors.IP = user.IP and errors.Date like
> users.Date. If I skip trying to match dates then I get all of the users
who
> have used that IP at any time. So I'd thought I'd limit it by saying only
> those users who have the same IP but who logged it at a time that was LIKE
> the time of the error. This would not be perfect since it would miss
> somebody who logged in at "2005-08-18:11:01:00" but who got an error at
> "2005-08-18:12:00:00". But it should get anyone who had an error between
11
> and 12 a.m. on 2005-08-18. The problem is that I don't know how to put the
> value of user.Date in a like statement, e.g.
> "where errors.Date like '%users.Date%'.
>
> I realize that for anybody who is familiar with databases that what I've
> done probably breaks a lot of smart database rules and best practices.
This
> is the first time I've found a need to make this sort of query. But I'd
> appreciate any thoughts on it. I'm hoping for some way to use the like
> syntax. But it may be that I just need to redesign the tables. I'd prefer
> not to do this since this is all being done to solve a temporary problem,
> find out who is getting errors at a certain web page based on time, ip and
> query text. So it's a bit of a throwaway database and query and I don't
want
> to spend too much time on it.
>
> Thanks for any thoughts,
>
> Ken
>
> I'm still using Mysql 4.0.1 by the way
OK Ken,
You have a version of the "maximum of a group" problem. You need to identify
which user from a specific IP address has the last login (MAX(DATE) value).
Now, to make this work properly, you need to STOP trimming your login dates
and allow it to be a full DATETIME value. Try this definition as a Logins
table:
CREATE TABLE logins (
IP varchar(15) not null
, logindate datetime not null
, user varchar(75) not null
, primary key (ip, logindate, user)
);
/* Please notice that logindate is NOT a varchar. The primary key (PK) on
(ip, logindate, user) is intentionally in that order. */
/* to find the most recent logins for all of your errors (this could take a
while): */
CREATE TEMPORARY TABLE tmpLastLogin (key (error_id))
SELECT e.id as error_id, max(l.logindate) as logindate
FROM errors e
INNER JOIN logins l
on e.IP = l.IP
and e.date > l.logindate;
/* Now we have enough information to build your desired report (which error
belongs to whom): */
SELECT e.id, e.ip, e.error, l.user
FROM errors e
INNER JOIN tmpLastLogin tll
on tll.error_id = e.id
INNER JOIN logins l
on l.ip = e.ip
and l.logindate = tll.logindate;
/* and because people login all the time, you need to drop your temp table
to get ready for the next time you need to run this query */
DROP TEMPORARY TABLE tmpLastLogin;
This can be very fast if you already know the error you want to research as
the way I wrote the query it will process ALL of your errors at the same
time.
Does what I wrote make sense? I would be very happy to go into greater
detail if you have any questions.
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
------_=_NextPart_001_01C5A443.153A17E0--
RE: Confused query question
am 19.08.2005 14:58:41 von SGreen
--=_alternative 0047A4F585257062_=
Content-Type: text/plain; charset="US-ASCII"
I am not sure that changing anything else to link by IP is the correct way
to go. You can potentially have several errors from the same IP. Each
error from an IP address can be from different users who logged in at
different times. That is why I tried to isolate the LAST logintime for
each error (matching the IP of the error to the IP of the login). This
should have been the data stored in the temporary table along with the ID
of the error in question. Please check that we are getting the correct
data into tmpLastLogin.
The final query was built to match the errors table with the logins table
(matching login to error by IP) while limiting the logins to only those
logins whose logindate matched that in tmpLastLogin for the error we were
looking up.
Please keep me informed to your results.
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
"Januski, Ken" wrote on 08/18/2005 06:20:47 PM:
> Aha! I wasn't quite sure why the sample code that you sent used error_id
> when it was the ips I was concerned with. Once I changed error_id to
> error_ip it worked just fine.
>
> I think this has solved the problem and been a great learning experience
to
> boot.
>
> Thanks again,
>
> Ken
>
>
> -----Original Message-----
> From: Januski, Ken [mailto:kjanuski@phillynews.com]
> Sent: Thursday, August 18, 2005 6:09 PM
> To: SGreen@unimin.com
> Cc: win32@lists.mysql.com
> Subject: RE: Confused query question
>
>
> Shawn,
>
> This worked, sort of. There were 117 entries in the errors table but
only 50
> rows returned at the end of the query that you suggested. But this has
been
> a great help in getting me started. I'll see if I can figure out the
> discrepancy and if not I'll get back to you.
>
> Thanks again,
>
> Ken
>
> -----Original Message-----
> From: SGreen@unimin.com [mailto:SGreen@unimin.com]
> Sent: Thursday, August 18, 2005 3:15 PM
> To: Januski, Ken
> Cc: win32@lists.mysql.com
> Subject: Re: Confused query question
>
>
>
>
> "Januski, Ken" wrote on 08/18/2005 02:44:04
PM:
>
> > I think I may have gotten over my head in terms of my database
knowledge
> in
> > trying to gather some data from apache access_logs by using myqsl.
> >
> > The problem is that the log includes a username to go with an IP when
the
> > user logs in to the web application. Later the log includes errors for
> > certain IP addresses. I'd like to run a query that shows the errors
and
> the
> > user.
> >
> > To do this I've set up one table, errors, with this structure:
> > IP Date Error ID
> > with the ID being unique and created through autoincrement.
> >
> > The other table, users, is problematic and has had varying structures.
> This
> > is the current one:
> > IP Date User ID
> > with the ID being unique and created through autoincrement.
> >
> > Since the user may logon many times there could be multiple entries
for a
> > certain user at a certain IP. There also could be a different user at
that
> > IP. So my thought was to include the Date in the table but to truncate
the
> > minute and second. So the date in the second table is something like
> > "2005-08-18:07".
> >
> > I then thought I could run a query that searched for errors.IP,
> errors.Date,
> > errors.Error, users.User where errors.IP = user.IP and errors.Date
like
> > users.Date. If I skip trying to match dates then I get all of the
users
> who
> > have used that IP at any time. So I'd thought I'd limit it by saying
only
> > those users who have the same IP but who logged it at a time that was
LIKE
> > the time of the error. This would not be perfect since it would miss
> > somebody who logged in at "2005-08-18:11:01:00" but who got an error
at
> > "2005-08-18:12:00:00". But it should get anyone who had an error
between
> 11
> > and 12 a.m. on 2005-08-18. The problem is that I don't know how to put
the
> > value of user.Date in a like statement, e.g.
> > "where errors.Date like '%users.Date%'.
> >
> > I realize that for anybody who is familiar with databases that what
I've
> > done probably breaks a lot of smart database rules and best practices.
> This
> > is the first time I've found a need to make this sort of query. But
I'd
> > appreciate any thoughts on it. I'm hoping for some way to use the like
> > syntax. But it may be that I just need to redesign the tables. I'd
prefer
> > not to do this since this is all being done to solve a temporary
problem,
> > find out who is getting errors at a certain web page based on time, ip
and
> > query text. So it's a bit of a throwaway database and query and I
don't
> want
> > to spend too much time on it.
> >
> > Thanks for any thoughts,
> >
> > Ken
> >
> > I'm still using Mysql 4.0.1 by the way
>
>
> OK Ken,
>
> You have a version of the "maximum of a group" problem. You need to
identify
> which user from a specific IP address has the last login (MAX(DATE)
value).
>
> Now, to make this work properly, you need to STOP trimming your login
dates
> and allow it to be a full DATETIME value. Try this definition as a
Logins
> table:
>
> CREATE TABLE logins (
> IP varchar(15) not null
> , logindate datetime not null
> , user varchar(75) not null
> , primary key (ip, logindate, user)
> );
>
> /* Please notice that logindate is NOT a varchar. The primary key (PK)
on
> (ip, logindate, user) is intentionally in that order. */
>
> /* to find the most recent logins for all of your errors (this could
take a
> while): */
>
> CREATE TEMPORARY TABLE tmpLastLogin (key (error_id))
> SELECT e.id as error_id, max(l.logindate) as logindate
> FROM errors e
> INNER JOIN logins l
> on e.IP = l.IP
> and e.date > l.logindate;
>
> /* Now we have enough information to build your desired report (which
error
> belongs to whom): */
>
> SELECT e.id, e.ip, e.error, l.user
> FROM errors e
> INNER JOIN tmpLastLogin tll
> on tll.error_id = e.id
> INNER JOIN logins l
> on l.ip = e.ip
> and l.logindate = tll.logindate;
>
> /* and because people login all the time, you need to drop your temp
table
> to get ready for the next time you need to run this query */
>
> DROP TEMPORARY TABLE tmpLastLogin;
>
>
> This can be very fast if you already know the error you want to research
as
> the way I wrote the query it will process ALL of your errors at the same
> time.
>
> Does what I wrote make sense? I would be very happy to go into greater
> detail if you have any questions.
>
> Shawn Green
> Database Administrator
> Unimin Corporation - Spruce Pine
>
>
>
--=_alternative 0047A4F585257062_=--
RE: Confused query question
am 19.08.2005 19:18:15 von kjanuski
------_=_NextPart_001_01C5A4E1.FBFA2A00
Content-Type: text/plain;
charset="iso-8859-1"
Hi Shawn,
I've gone over this again and I can't figure out why the temporary table
cares about the errors id. It seems to me that it ought to care about the
errors ip. Because of this and the fact that I was also getting unexpected
results I thought that you must really have meant errors ip in the temp
table.
Could you clarify for me why you use errors id?
Thanks again,
Ken
-----Original Message-----
From: SGreen@unimin.com [mailto:SGreen@unimin.com]
Sent: Friday, August 19, 2005 8:59 AM
To: Januski, Ken
Cc: win32@lists.mysql.com
Subject: RE: Confused query question
I am not sure that changing anything else to link by IP is the correct way
to go. You can potentially have several errors from the same IP. Each error
from an IP address can be from different users who logged in at different
times. That is why I tried to isolate the LAST logintime for each error
(matching the IP of the error to the IP of the login). This should have been
the data stored in the temporary table along with the ID of the error in
question. Please check that we are getting the correct data into
tmpLastLogin.
The final query was built to match the errors table with the logins table
(matching login to error by IP) while limiting the logins to only those
logins whose logindate matched that in tmpLastLogin for the error we were
looking up.
Please keep me informed to your results.
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
"Januski, Ken" wrote on 08/18/2005 06:20:47 PM:
> Aha! I wasn't quite sure why the sample code that you sent used error_id
> when it was the ips I was concerned with. Once I changed error_id to
> error_ip it worked just fine.
>
> I think this has solved the problem and been a great learning experience
to
> boot.
>
> Thanks again,
>
> Ken
>
>
> -----Original Message-----
> From: Januski, Ken [mailto:kjanuski@phillynews.com]
> Sent: Thursday, August 18, 2005 6:09 PM
> To: SGreen@unimin.com
> Cc: win32@lists.mysql.com
> Subject: RE: Confused query question
>
>
> Shawn,
>
> This worked, sort of. There were 117 entries in the errors table but only
50
> rows returned at the end of the query that you suggested. But this has
been
> a great help in getting me started. I'll see if I can figure out the
> discrepancy and if not I'll get back to you.
>
> Thanks again,
>
> Ken
>
> -----Original Message-----
> From: SGreen@unimin.com [mailto:SGreen@unimin.com]
> Sent: Thursday, August 18, 2005 3:15 PM
> To: Januski, Ken
> Cc: win32@lists.mysql.com
> Subject: Re: Confused query question
>
>
>
>
> "Januski, Ken" wrote on 08/18/2005 02:44:04 PM:
>
> > I think I may have gotten over my head in terms of my database knowledge
> in
> > trying to gather some data from apache access_logs by using myqsl.
> >
> > The problem is that the log includes a username to go with an IP when
the
> > user logs in to the web application. Later the log includes errors for
> > certain IP addresses. I'd like to run a query that shows the errors and
> the
> > user.
> >
> > To do this I've set up one table, errors, with this structure:
> > IP Date Error ID
> > with the ID being unique and created through autoincrement.
> >
> > The other table, users, is problematic and has had varying structures.
> This
> > is the current one:
> > IP Date User ID
> > with the ID being unique and created through autoincrement.
> >
> > Since the user may logon many times there could be multiple entries for
a
> > certain user at a certain IP. There also could be a different user at
that
> > IP. So my thought was to include the Date in the table but to truncate
the
> > minute and second. So the date in the second table is something like
> > "2005-08-18:07".
> >
> > I then thought I could run a query that searched for errors.IP,
> errors.Date,
> > errors.Error, users.User where errors.IP = user.IP and errors.Date like
> > users.Date. If I skip trying to match dates then I get all of the users
> who
> > have used that IP at any time. So I'd thought I'd limit it by saying
only
> > those users who have the same IP but who logged it at a time that was
LIKE
> > the time of the error. This would not be perfect since it would miss
> > somebody who logged in at "2005-08-18:11:01:00" but who got an error at
> > "2005-08-18:12:00:00". But it should get anyone who had an error between
> 11
> > and 12 a.m. on 2005-08-18. The problem is that I don't know how to put
the
> > value of user.Date in a like statement, e.g.
> > "where errors.Date like '%users.Date%'.
> >
> > I realize that for anybody who is familiar with databases that what I've
> > done probably breaks a lot of smart database rules and best practices.
> This
> > is the first time I've found a need to make this sort of query. But I'd
> > appreciate any thoughts on it. I'm hoping for some way to use the like
> > syntax. But it may be that I just need to redesign the tables. I'd
prefer
> > not to do this since this is all being done to solve a temporary
problem,
> > find out who is getting errors at a certain web page based on time, ip
and
> > query text. So it's a bit of a throwaway database and query and I don't
> want
> > to spend too much time on it.
> >
> > Thanks for any thoughts,
> >
> > Ken
> >
> > I'm still using Mysql 4.0.1 by the way
>
>
> OK Ken,
>
> You have a version of the "maximum of a group" problem. You need to
identify
> which user from a specific IP address has the last login (MAX(DATE)
value).
>
> Now, to make this work properly, you need to STOP trimming your login
dates
> and allow it to be a full DATETIME value. Try this definition as a Logins
> table:
>
> CREATE TABLE logins (
> IP varchar(15) not null
> , logindate datetime not null
> , user varchar(75) not null
> , primary key (ip, logindate, user)
> );
>
> /* Please notice that logindate is NOT a varchar. The primary key (PK) on
> (ip, logindate, user) is intentionally in that order. */
>
> /* to find the most recent logins for all of your errors (this could take
a
> while): */
>
> CREATE TEMPORARY TABLE tmpLastLogin (key (error_id))
> SELECT e.id as error_id, max(l.logindate) as logindate
> FROM errors e
> INNER JOIN logins l
> on e.IP = l.IP
> and e.date > l.logindate;
>
> /* Now we have enough information to build your desired report (which
error
> belongs to whom): */
>
> SELECT e.id, e.ip, e.error, l.user
> FROM errors e
> INNER JOIN tmpLastLogin tll
> on tll.error_id = e.id
> INNER JOIN logins l
> on l.ip = e.ip
> and l.logindate = tll.logindate;
>
> /* and because people login all the time, you need to drop your temp table
> to get ready for the next time you need to run this query */
>
> DROP TEMPORARY TABLE tmpLastLogin;
>
>
> This can be very fast if you already know the error you want to research
as
> the way I wrote the query it will process ALL of your errors at the same
> time.
>
> Does what I wrote make sense? I would be very happy to go into greater
> detail if you have any questions.
>
> Shawn Green
> Database Administrator
> Unimin Corporation - Spruce Pine
>
>
>
------_=_NextPart_001_01C5A4E1.FBFA2A00--
RE: Confused query question
am 19.08.2005 19:41:47 von SGreen
--=_alternative 0061901185257062_=
Content-Type: text/plain; charset="US-ASCII"
No problem!
I deduced from your first post that you have parsed the Apache logs and
have generated some tables of information. One of them is an errors table
which uniquely identifies (with its ID field) an error. It also has on it
fields to say what the error is, when it happened, and what IP address had
the problem. Based on that description, I believe that you should
probably wind up with more than one error from the same IP address. It can
be the same error or a different error, that doesn't matter. What matters
is WHEN it occurred. Based on who logged in from that IP *last* before the
error happed is the user who received the error. I imagine some error
table entries that could look like this (concentrating on just one IP
address for now):
-errors table-
(id, name, date, ip)
1 error 1 2005-08-05 16:15:00 10.1.1.1
2 error 2 2005-08-05 16:15:05 10.1.1.1
15 error 1 2005-08-05 16:15:10 10.1.1.1
45 error 2 2005-08-05 16:15:20 10.1.1.1
101 error 3 2005-08-05 16:16:08 10.1.1.1
145 error 2 2005-08-05 16:16:00 10.1.1.1
178 error 2 2005-08-05 16:31:20 10.1.1.1
182 error 2 2005-08-05 16:56:00 10.1.1.1
183 error 2 2005-08-05 17:05:05 10.1.1.1
204 error 2 2005-08-05 17:15:48 10.1.1.1
Now, in order to know who was getting each error, we need to correlate the
datetime the error occurs with the most recent datetime from the logins
table (still focusing on a single IP)
-logins-
(ip, logindate, user)
10.1.1.1 2005-08-05 16:00:00 user_blue
10.1.1.1 2005-08-05 16:15:30 user_red
10.1.1.1 2005-08-05 16:31:00 user_blue
10.1.1.1 2005-08-05 17:04:50 user_red
If we run my original query on this data, we should end up with records in
tmpLastLogin that look like this
-tmpLastLogin-
(error_id, logindate)
1 2005-08-05 16:00:00
2 2005-08-05 16:00:00
15 2005-08-05 16:00:00
45 2005-08-05 16:00:00
101 2005-08-05 16:00:30
145 2005-08-05 16:00:30
178 2005-08-05 16:31:00
182 2005-08-05 16:31:00
183 2005-08-05 17:04:50
204 2005-08-05 17:04:50
That tells us that when (for each error) the user last logged in. Since we
know the IP address from the error itself and we now know the logintime we
need to match against, we can use those two pieces of information to look
into the logins table to uniquely identify a user. (gurus --- please hold
off, I see what original design flaw I created. I will correct it later
after we get past this issue).
Does that help you to see why I stuck with the error.id and not the IP
address? You wanted to match errors to users and different users can share
the same IP address so sticking with IP address would have just muddled
the entire picture. We use the IP address in the FINAL query along with
the most recent login time to pick out a particular user from the logins
table. It doesn't belong in tmpLastLogin.
Have I made sense or just confused you more?
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
"Januski, Ken" wrote on 08/19/2005 01:18:15 PM:
> Hi Shawn,
>
> I've gone over this again and I can't figure out why the temporary table
> cares about the errors id. It seems to me that it ought to care about
the
> errors ip. Because of this and the fact that I was also getting
unexpected
> results I thought that you must really have meant errors ip in the temp
> table.
>
> Could you clarify for me why you use errors id?
>
> Thanks again,
>
> Ken
>
> -----Original Message-----
> From: SGreen@unimin.com [mailto:SGreen@unimin.com]
> Sent: Friday, August 19, 2005 8:59 AM
> To: Januski, Ken
> Cc: win32@lists.mysql.com
> Subject: RE: Confused query question
>
>
>
> I am not sure that changing anything else to link by IP is the correct
way
> to go. You can potentially have several errors from the same IP. Each
error
> from an IP address can be from different users who logged in at
different
> times. That is why I tried to isolate the LAST logintime for each error
> (matching the IP of the error to the IP of the login). This should have
been
> the data stored in the temporary table along with the ID of the error in
> question. Please check that we are getting the correct data into
> tmpLastLogin.
>
> The final query was built to match the errors table with the logins
table
> (matching login to error by IP) while limiting the logins to only those
> logins whose logindate matched that in tmpLastLogin for the error we
were
> looking up.
>
> Please keep me informed to your results.
>
> Shawn Green
> Database Administrator
> Unimin Corporation - Spruce Pine
>
>
> "Januski, Ken" wrote on 08/18/2005 06:20:47
PM:
>
> > Aha! I wasn't quite sure why the sample code that you sent used
error_id
> > when it was the ips I was concerned with. Once I changed error_id to
> > error_ip it worked just fine.
> >
> > I think this has solved the problem and been a great learning
experience
> to
> > boot.
> >
> > Thanks again,
> >
> > Ken
> >
> >
> > -----Original Message-----
> > From: Januski, Ken [mailto:kjanuski@phillynews.com]
> > Sent: Thursday, August 18, 2005 6:09 PM
> > To: SGreen@unimin.com
> > Cc: win32@lists.mysql.com
> > Subject: RE: Confused query question
> >
> >
> > Shawn,
> >
> > This worked, sort of. There were 117 entries in the errors table but
only
> 50
> > rows returned at the end of the query that you suggested. But this has
> been
> > a great help in getting me started. I'll see if I can figure out the
> > discrepancy and if not I'll get back to you.
> >
> > Thanks again,
> >
> > Ken
> >
> > -----Original Message-----
> > From: SGreen@unimin.com [mailto:SGreen@unimin.com]
> > Sent: Thursday, August 18, 2005 3:15 PM
> > To: Januski, Ken
> > Cc: win32@lists.mysql.com
> > Subject: Re: Confused query question
> >
> >
> >
> >
> > "Januski, Ken" wrote on 08/18/2005 02:44:04
PM:
> >
> > > I think I may have gotten over my head in terms of my database
knowledge
> > in
> > > trying to gather some data from apache access_logs by using myqsl.
> > >
> > > The problem is that the log includes a username to go with an IP
when
> the
> > > user logs in to the web application. Later the log includes errors
for
> > > certain IP addresses. I'd like to run a query that shows the errors
and
> > the
> > > user.
> > >
> > > To do this I've set up one table, errors, with this structure:
> > > IP Date Error ID
> > > with the ID being unique and created through autoincrement.
> > >
> > > The other table, users, is problematic and has had varying
structures.
> > This
> > > is the current one:
> > > IP Date User ID
> > > with the ID being unique and created through autoincrement.
> > >
> > > Since the user may logon many times there could be multiple entries
for
> a
> > > certain user at a certain IP. There also could be a different user
at
> that
> > > IP. So my thought was to include the Date in the table but to
truncate
> the
> > > minute and second. So the date in the second table is something like
> > > "2005-08-18:07".
> > >
> > > I then thought I could run a query that searched for errors.IP,
> > errors.Date,
> > > errors.Error, users.User where errors.IP = user.IP and errors.Date
like
> > > users.Date. If I skip trying to match dates then I get all of the
users
> > who
> > > have used that IP at any time. So I'd thought I'd limit it by saying
> only
> > > those users who have the same IP but who logged it at a time that
was
> LIKE
> > > the time of the error. This would not be perfect since it would miss
> > > somebody who logged in at "2005-08-18:11:01:00" but who got an error
at
> > > "2005-08-18:12:00:00". But it should get anyone who had an error
between
> > 11
> > > and 12 a.m. on 2005-08-18. The problem is that I don't know how to
put
> the
> > > value of user.Date in a like statement, e.g.
> > > "where errors.Date like '%users.Date%'.
> > >
> > > I realize that for anybody who is familiar with databases that what
I've
> > > done probably breaks a lot of smart database rules and best
practices.
> > This
> > > is the first time I've found a need to make this sort of query. But
I'd
> > > appreciate any thoughts on it. I'm hoping for some way to use the
like
> > > syntax. But it may be that I just need to redesign the tables. I'd
> prefer
> > > not to do this since this is all being done to solve a temporary
> problem,
> > > find out who is getting errors at a certain web page based on time,
ip
> and
> > > query text. So it's a bit of a throwaway database and query and I
don't
> > want
> > > to spend too much time on it.
> > >
> > > Thanks for any thoughts,
> > >
> > > Ken
> > >
> > > I'm still using Mysql 4.0.1 by the way
> >
> >
> > OK Ken,
> >
> > You have a version of the "maximum of a group" problem. You need to
> identify
> > which user from a specific IP address has the last login (MAX(DATE)
> value).
> >
> > Now, to make this work properly, you need to STOP trimming your login
> dates
> > and allow it to be a full DATETIME value. Try this definition as a
Logins
> > table:
> >
> > CREATE TABLE logins (
> > IP varchar(15) not null
> > , logindate datetime not null
> > , user varchar(75) not null
> > , primary key (ip, logindate, user)
> > );
> >
> > /* Please notice that logindate is NOT a varchar. The primary key (PK)
on
> > (ip, logindate, user) is intentionally in that order. */
> >
> > /* to find the most recent logins for all of your errors (this could
take
> a
> > while): */
> >
> > CREATE TEMPORARY TABLE tmpLastLogin (key (error_id))
> > SELECT e.id as error_id, max(l.logindate) as logindate
> > FROM errors e
> > INNER JOIN logins l
> > on e.IP = l.IP
> > and e.date > l.logindate;
> >
> > /* Now we have enough information to build your desired report (which
> error
> > belongs to whom): */
> >
> > SELECT e.id, e.ip, e.error, l.user
> > FROM errors e
> > INNER JOIN tmpLastLogin tll
> > on tll.error_id = e.id
> > INNER JOIN logins l
> > on l.ip = e.ip
> > and l.logindate = tll.logindate;
> >
> > /* and because people login all the time, you need to drop your temp
table
> > to get ready for the next time you need to run this query */
> >
> > DROP TEMPORARY TABLE tmpLastLogin;
> >
> >
> > This can be very fast if you already know the error you want to
research
> as
> > the way I wrote the query it will process ALL of your errors at the
same
> > time.
> >
> > Does what I wrote make sense? I would be very happy to go into greater
> > detail if you have any questions.
> >
> > Shawn Green
> > Database Administrator
> > Unimin Corporation - Spruce Pine
> >
> >
> >
>
>
--=_alternative 0061901185257062_=--