tracking a possible bug

tracking a possible bug

am 13.09.2007 00:37:39 von Martin Richard

--===============2022612515==
Content-Type: multipart/alternative;
boundary="----=_Part_17325_16310002.1189636659223"

------=_Part_17325_16310002.1189636659223
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Hello,

I have clients running an installation script. The script is ran as root
(to be able to create the target directory and install the init script) and
reads /etc/inittab to get the default runlevel. I've had one customer claim
that the script _deleted his inittab file_. Which we would have put under
user error.. until we had 2 other, unrelated machines, with wiped inittab
files and where the script had been used in the past. We have absolutely no
proof our script is related to the inittab file problem.

So we have:

- 3 emptied /etc/inittab files (file still exists, but has 0 length). Same
as if we'd open with mode '>' and closed it without writing.
- 3 different machines: 1 HP-UX 11iv2 (PA-RISC), 1 Linux (Fedora Core 2 /
x86), 1 Linux (custom from scratch kernel/distro under VMWare)
- The HPUX is at Company A, the 2 Linux boxes are at an unrelated Company
B. No shared admins to account for the same typo / custom admin script
- Perl 5.8 on each (5.8.3 activestate build 809 on the HPUX)

We can't reproduce the "bug" anywhere after 25,000 iterations. 3 different
setups with the same symptoms makes us think it can't just be coincidence,
but if we can't reproduce it it doesn't help... So I thought of posting the
related code to see if anything looks suspicious.. It's snipped from the
total source, but everything related to the inittab file is there.. The only
thing we could think of is a weird race condition from the usage of open
(?), but since we can't reproduce the darn effect we have no idea what to
look for..

So any pointers appreciated.. If we can't find anything to 'fix' we'll
have a hard time getting that customer's trust back...

Martin

#--------------8<---Snip!-------------------------------
require 5.6.0;

use strict;

use Config;
use Getopt::Std;
use File::Basename;
use File::Spec::Functions ':ALL';
use File::Copy;
use File::Path;
use File::Find;

use constant INIT_TAB_FILE => '/etc/inittab';
my $levels = undef;

sub error($) {
my $msg = shift;
ilog($msg);
warn "Error: $msg\n";
exit 1;
}

sub ilog($) {
my $msg = shift;
open(my $fh, '>>', catfile('/tmp', 'install.log'));
print $fh "$msg\n";
close($fh);
}


### much snipped

my $init_tab_file = INIT_TAB_FILE;

open(my $fh, '<', $init_tab_file) or error("Failed to open file
'$init_tab_file' --> $!");;

while (<$fh>) {
$levels = $1 if /^[^#][^:]*:(\d):initdefault:/
}

close($fh);

error("Cannot find 'initdefault' definition in file '$init_tab_file'")
unless defined($levels);

print "Default runlevel: $levels\n";

------=_Part_17325_16310002.1189636659223
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Hello,

  I have clients running an installation script. The script is ran as root (to be able to create the target directory and install the init script) and reads /etc/inittab to get the default runlevel.  I've had one customer claim that the script _deleted his inittab file_. Which we would have put under user error.. until we had 2 other, unrelated machines, with wiped inittab files and where the script had been used in the past. We have absolutely no proof our script is related to the inittab file problem.


  So we have:

  - 3 emptied /etc/inittab files (file still exists, but has 0 length). Same as if we'd open with mode '>' and closed it without writing.
  - 3 different machines: 1 HP-UX 11iv2 (PA-RISC), 1 Linux (Fedora Core 2 / x86), 1 Linux (custom from scratch kernel/distro under VMWare)

  - The HPUX is at Company A, the 2 Linux boxes are at an unrelated Company B. No shared admins to account for the same typo / custom admin script
  - Perl 5.8 on each (5.8.3 activestate build 809 on the HPUX)


  We can't reproduce the "bug" anywhere after 25,000 iterations. 3 different setups with the same symptoms makes us think it can't just be coincidence, but if we can't reproduce it it doesn't help... So I thought of posting the related code to see if anything looks suspicious.. It's snipped from the total source, but everything related to the inittab file is there.. The only thing we could think of is a weird race condition from the usage of open (?), but since we can't reproduce the darn effect we have no idea what to look for..


  So any pointers appreciated.. If we can't find anything to 'fix' we'll have a hard time getting that customer's trust back...

  Martin

#--------------8<---Snip!-------------------------------

require 5.6.0;

use strict;

use Config;
use Getopt::Std;
use File::Basename;
use File::Spec::Functions ':ALL';
use File::Copy;
use File::Path;
use File::Find;

use constant INIT_TAB_FILE => '/etc/inittab';

my $levels = undef;

sub error($) {
    my $msg = shift;
    ilog($msg);
    warn "Error: $msg\n";
    exit 1;
}

sub ilog($) {
    my $msg = shift;
    open(my $fh, '>>', catfile('/tmp', '
install.log'));
    print $fh "$msg\n";
    close($fh);
}


### much snipped

my $init_tab_file = INIT_TAB_FILE;

open(my $fh, '<', $init_tab_file) or error("Failed to open file '$init_tab_file' --> $!");;


while (<$fh>) {
     $levels = $1 if /^[^#][^:]*:(\d):initdefault:/
}

close($fh);

error("Cannot find 'initdefault' definition in file '$init_tab_file'") unless defined($levels);


print "Default runlevel: $levels\n";


------=_Part_17325_16310002.1189636659223--

--===============2022612515==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--===============2022612515==--

RE: tracking a possible bug

am 13.09.2007 02:43:00 von Justin Allegakoen

---------------8<----------------
=A0I've had one customer claim that the script _deleted his inittab file_. =
Which we would have put under user error.. until we had 2 other, unrelated =
machines, with wiped inittab files and where the script had been used in th=
e past. We have absolutely no proof our script is related to the inittab fi=
le problem. =

---------------8<----------------

---------------8<----------------
The only thing we could think of is a weird race condition from the usage o=
f open (?), but since we can't reproduce the darn effect we have no idea wh=
at to look for.. =


So any pointers appreciated.. If we can't find anything to 'fix' we'll have=
a hard time getting that customer's trust back...
---------------8<----------------

You have at least one open statement in there without an 'or die', I'm not =
sure how much code you have since you snipped a lot, but an easy way to ter=
minate on open errors would be for you to use Fatal:-

use Fatal qw(open close);

Opening files that are locked by other processes results in the behaviour t=
hat you describe.

Cheers,
Just in

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: tracking a possible bug

am 13.09.2007 21:33:20 von Martin Richard

On 9/12/07, Justin Allegakoen wrote:
> You have at least one open statement in there without an 'or die', I'm not sure
> how much code you have since you snipped a lot, but an easy way to
>terminate on open errors would be for you to use Fatal:-
>
> use Fatal qw(open close);
>
> Opening files that are locked by other processes results in the behaviour
> that you describe.

On 9/12/07, Thurn, Martin wrote:
> Did you try 25,000 "clean" iterations, or some "dirty" (i.e. run many
> concurrent copies, kill in the middle, etc.) ??
> Slurping the whole file rather than testing line by line should
> shorten the time holding the file open, which is always good.


I'm now trying multiple copies running concurrently, and added Fatal
to the mix.. I also started another loop that opens inittab in append
mode, ask for an exclusive lock ( flock($fh,2); ) sleeps a while
before unlocking and closing it..

So far no errors, as it really should be considering the simplicity of
that code anyway.. Just wondering what freak accident could have
happened on those servers..

If anyone has any other weird scenarios I could test send them my way
:-) Maybe playing with readdir and such ? Since it's a file with 0
bytes I'm trying to get it could either be a file open in write '>'
mode, or a delete file created fresh..

Martin
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: tracking a possible bug

am 14.09.2007 11:20:20 von Brian Raven

From: activeperl-bounces@listserv.ActiveState.com
[mailto:activeperl-bounces@listserv.ActiveState.com] On Behalf Of Martin
Richard
Sent: 12 September 2007 23:38
To: activeperl@listserv.activestate.com
Subject: tracking a possible bug

> Hello,
> =

> I have clients running an installation script. The script is ran as
root (to be able to create the target =

> directory and install the init script) and reads /etc/inittab to get
the default runlevel. I've had one =

> customer claim that the script _deleted his inittab file_. Which we
would have put under user error.. until we > had 2 other, unrelated
machines, with wiped inittab files and where the script had been used in
the past. We =

> have absolutely no proof our script is related to the inittab file
problem. =


I would suggest splitting out the functionality from your script that
really has to be run as root, and run the rest with no more privileges
that it actually needs. You should not need root privilege just to read
inittab, for example. Write access to /etc and /etc/inittab is normally
reserved for root (you should probably check that on the systems that
you run your script), so if your only access to that file (and
directory) is as an unprivileged user, then it is unlikely to be your
script that is deleting or truncating the file.

I wish you luck. It can pretty frustrating trying to debug a problem
that is difficult/impossible to reproduce. More so if it is not your
code that is causing the problem.

HTH

-- =

Brian Raven =


==================== =====3D=
================
Atos Euronext Market Solutions Disclaimer
==================== =====3D=
================

The information contained in this e-mail is confidential and solely for the=
intended addressee(s). Unauthorised reproduction, disclosure, modification=
, and/or distribution of this email may be unlawful.
If you have received this email in error, please notify the sender immediat=
ely and delete it from your system. The views expressed in this message do =
not necessarily reflect those of Atos Euronext Market Solutions.

Atos Euronext Market Solutions Limited - Registered in England & Wales with=
registration no. 3962327. Registered office address at 25 Bank Street Lon=
don E14 5NQ United Kingdom. =

Atos Euronext Market Solutions SAS - Registered in France with registration=
no. 425 100 294. Registered office address at 6/8 Boulevard Haussmann 750=
09 Paris France.

L'information contenue dans cet e-mail est confidentielle et uniquement des=
tinee a la (aux) personnes a laquelle (auxquelle(s)) elle est adressee. Tou=
te copie, publication ou diffusion de cet email est interdite. Si cet e-mai=
l vous parvient par erreur, nous vous prions de bien vouloir prevenir l'exp=
editeur immediatement et d'effacer le e-mail et annexes jointes de votre sy=
steme. Le contenu de ce message electronique ne represente pas necessaireme=
nt la position ou le point de vue d'Atos Euronext Market Solutions.
Atos Euronext Market Solutions Limited Soci=E9t=E9 de droit anglais, enregi=
str=E9e au Royaume Uni sous le num=E9ro 3962327, dont le si=E8ge social se =
situe 25 Bank Street E14 5NQ Londres Royaume Uni.

Atos Euronext Market Solutions SAS, soci=E9t=E9 par actions simplifi=E9e, e=
nregistr=E9 au registre dui commerce et des soci=E9t=E9s sous le num=E9ro 4=
25 100 294 RCS Paris et dont le si=E8ge social se situe 6/8 Boulevard Hauss=
mann 75009 Paris France.
==================== =====3D=
================

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: tracking a possible bug

am 05.10.2007 20:13:22 von Martin Richard

It happened again.. apparently someone ran our script and had its
inittab file wiped.. It's really killing me..

Here's the complete script time time.. if anyone has any idea WHAT
could be happening in the way of a freak accident, any comments
welcome...!


---------------


require 5.6.0;

use strict;

use Config;
use Getopt::Std;
use File::Basename;
use File::Spec::Functions ':ALL';
use File::Copy;
use File::Path;
use File::Find;

use constant INIT_FILE_NAME => 'irpeinit';

use constant INIT_TAB_FILE => '/etc/inittab';

use constant RC_PATH_LIST => ('/etc/rc.d', '/etc', '/sbin');

my $path;
my $addresses;
my $user;
my $group;

$Getopt::Std::STANDARD_HELP_VERSION = 1;

my $base_dir = rel2abs(dirname($0));
my $src_dir = catdir($base_dir, 'src');

my $levels;
my $priority;
my $path;
my $addresses;
my $user;
my $group;

my $uid;
my $gid;

my $init_file;
my @rc_files;

my %options;

getopts('l:n:p:a:u:g:i', \%options);

$levels = $options{l};
$priority = $options{n};
$path = $options{p};
$addresses = $options{a};
$user = $options{u};
$group = $options{g};

HELP_MESSAGE() unless defined($path);
HELP_MESSAGE() unless defined($addresses);
HELP_MESSAGE() unless defined($user);
HELP_MESSAGE() unless defined($group);

$path =~ s/\/+$//;

error("Invalid run level") if defined($levels) && $levels !~
/^[[:alnum:]]+(,[[:alnum:]]+)*$/i;

error("Invalid priority") if defined($priority) && $priority !~ /^\d+$/;

error("Install directory already exists") if -e $path;

error("Invalid IP address") unless $addresses =~
/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(,\d{1,3}\.\d{1,3}\.\d{1 ,3}\.\d{1,3})*$/;

if (defined($user)) {
$uid = $user;

if ($user !~ /^\d+$/) {
$uid = getpwnam($user) or error("User name '$user' does not exist");
}
}

if (defined($group)) {
$gid = $group;

if ($group !~ /^\d+$/) {
$gid = getgrnam($group) or error("Group name '$group' does not exist");
}
}

set_init() unless $options{i};

info("Creating install directory");

eval {
mkpath($path);
};

error("Failed to create install directory --> $@") if $@;

perm($path, 1);

find(\&insfile, $src_dir);

config();

init();

info("Installation successful");

info("Starting agent...");

system(catfile($path, 'bin', 'irpectl') . ' start') and error("Failed
to start application --> $?");

exit;

sub set_init() {
unless (defined($levels)) {
my $init_tab_file = INIT_TAB_FILE;

open(my $fh, '<', $init_tab_file) or error("Failed to open file
'$init_tab_file' --> $!");;

while (<$fh>) {
$levels = $1 if /^[^#][^:]*:(\d):initdefault:/
}

close($fh);

error("Cannot find 'initdefault' definition in file
'$init_tab_file'") unless defined($levels);
}

my $rc_path;

foreach my $dir (RC_PATH_LIST) {
next unless -d $dir;

opendir(my $dh, $dir) or fatal("Failed to open directory '$dir' --> $!");

my @dirs = readdir($dh);

closedir($dh);

if (grep(/^rc\d\.d$/ && -d catdir($dir, $_), @dirs)) {
$rc_path = $dir;
last;
}
}

error("Cannot find run level init directories location (" . join(',',
RC_PATH_LIST) . ")") unless defined($rc_path);

my $init_dir = catdir($rc_path, 'init.d');

$init_dir = $rc_path unless -d $init_dir;

$init_file = catfile($init_dir, INIT_FILE_NAME);

error("Startup script '$init_file' already exists") if -e $init_file;

$priority = '80' unless defined($priority);

foreach my $level (split(',', $levels)) {
my $rc_dir = catdir($rc_path, "rc$level.d");
my $rc_file = catfile($rc_dir, "S$priority" . INIT_FILE_NAME);

error("Run level directory '$rc_dir' does not exist") unless -d $rc_dir;
error("Startup script link '$rc_file' already exists") if -e
$rc_file || -l $rc_file;

push @rc_files, $rc_file;
}
}

sub perm($;$) {
my $file = shift;
my $exec = shift;

my $mode = ($exec ? 0755 : 0644);

chmod($mode, $file) or error("Failed to change attributes of file '$file'");
chown($uid, $gid, $file) or error("Failed to change owner of file '$file'");
}

sub insfile {
my $file = $File::Find::name;

my $rel_file = substr($file, length($src_dir));

return unless length($rel_file);

my (undef, $head_dir) = File::Spec->splitdir($rel_file);

my $dest = $path . $rel_file;

if (-d $file) {
info("Creating directory '$dest'");

mkdir($dest) or error("Failed to create directory '$dest' --> $!");

perm($dest, 1);
}
else {
info("Copying file '$dest'");

copy($file, $dest) or error("Failed to copy file '$dest' --> $!");

perm($dest, $head_dir eq 'bin' || $head_dir eq 'plugin');
}
}

sub config() {
info("Generating config file");

my $config_file = catfile($path, 'irpe.cfg');

open(CFGIN, '<', catfile($base_dir, 'config', 'irpe.cfg.in')) or
error("Failed to open input config file --> $!");
open(CFGOUT, '>', $config_file) or error("Failed to open output
config file --> $!");

my $user_cfg = (defined($user) ? "user=$user" : '');
my $group_cfg = (defined($group) ? "group=$group" : '');

while () {
s/\@ALLOWED_HOSTS@/$addresses/;
s/\@USER_CFG@/$user_cfg/;
s/\@GROUP_CFG@/$group_cfg/;

print CFGOUT $_;
}

close CFGIN;
close CFGOUT;

perm($config_file);
}

sub init() {
info("Generating init script");

my $src_init_file = catfile($base_dir, 'init', INIT_FILE_NAME);

open(my $ifh, '<', catfile($base_dir, 'init', INIT_FILE_NAME .
'.in')) or error("Failed to open input init script file --> $!");
open(my $ofh, '>', $src_init_file) or error("Failed to open output
init script file --> $!");

while (<$ifh>) {
s/\@PATH@/$path/;

print $ofh $_;
}

close($ifh);
close($ofh);

if ($options{i}) {
info("Note: init script '$src_init_file' not installed, please do it
manually");

return;
}

copy($src_init_file, $init_file) or error("Failed to copy init script
file '$init_file' --> $!");

chmod(0750, $init_file) or error("Failed to change attributes of init
script file '$init_file'");

foreach my $rc_file (@rc_files) {
symlink($init_file, $rc_file) or error("Failed to create symbolic
link '$rc_file' to init script '$init_file' --> $!");
}
}

sub ilog($) {
my $msg = shift;

open(my $fh, '>>', catfile($base_dir, 'install.log'));

print $fh "$msg\n";

close($fh);
}

sub info($) {
my $msg = shift;

ilog($msg);

print "$msg\n";
}

sub error($) {
my $msg = shift;

ilog($msg);

warn "Error: $msg\n";

exit 1;
}

sub VERSION_MESSAGE() {}

sub HELP_MESSAGE() {
print "Usage: install [-i] [-l [,...]] [-n ]
-p -a

[,
...] -u -g \n";
print "Options:\n";
print " level : run level to install the startup script (init
default is used if '-l' option is omitted)\n";
print " priority : init startup priority number (default is 80)\n";
print " path : install directory\n";
print " address : IP address allowed to connect\n";
print " user : system user running the application\n";
print " group : system group running the application\n";
print "Use option -i to disable installation of init startup script.\n";
print "\n";

exit 2;
}
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: tracking a possible bug

am 05.10.2007 23:00:00 von Todd Beverly

Martin Richard wrote:
> It happened again.. apparently someone ran our script and had its
> inittab file wiped.. It's really killing me..
>
> Here's the complete script time time.. if anyone has any idea WHAT
> could be happening in the way of a freak accident, any comments
> welcome...!


I couldn't find anything wrong with the script. Why don't you do a
copy of /etc/inittab to some temporary location first and then open up
the copy? If nothing else, it would double your chances of having a
non-zero backup of inittab available.
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs