how to parse from the end?

am 20.04.2008 02:42:03 von ela

The following rule sets are generated by a program, which is not readily
human-readible. In order to formulate the rules into a table-like format
(supposed I know all attributes in advance), e.g.

code of (CA)n allele1 code of (CA)n allele2 LTC4Si_24
CTLA41i_22 APOB1c_29 IL61i_16 PAI12c_54 ... outcome support
Rule1 <=2 <=1
1 or 3 1 1 or 3
0 29; 0.621
Rule2 <=2 <=1
1 or 3 1 2
1 5; 1.0
.....
Rulen <=2 >1
1/3 1/3 0 52; 0.692
....

I plan to use [] as delimiter, as every line contains that, then back parse
the values (e.g. 2,1, 3, MISSING) ,operators (<=, =, >, IS) and rule enders
(=>). Finally count the number of '\t' from the beginning to know the level
of rules. Why I think parsing from the end is because 'space' is more
difficult to be used as a delimiter here.

code of (CA)n allele1 <= 2 [ Mode: 0 ] (241)
code of (CA)n allele2 <= 1 [ Mode: 0 ] (61)
LTC4Si_24 = 1 or LTC4Si_24 = 3 [ Mode: 0 ] (44)
CTLA41i_22 = 1 [ Mode: 0 ] (34)
APOB1c_29 = 1 or APOB1c_29 = 3 [ Mode: 0 ] => 0 (29; 0.621)
APOB1c_29 = 2 [ Mode: 1 ] => 1 (5; 1.0)
CTLA41i_22 = 2 [ Mode: 0 ] => 0 (10; 1.0)
LTC4Si_24 = 2 [ Mode: 1 ] => 1 (17; 0.765)
code of (CA)n allele2 > 1 [ Mode: 0 ] (180)
IL61i_16 = 1 or IL61i_16 = 3 [ Mode: 0 ] (107)
PAI12c_54 = 1 or PAI12c_54 = 3 [ Mode: 0 ] => 0 (52; 0.692)
PAI12c_54 = 2 [ Mode: 0 ] (55)
LDLRc_46 = 1 or LDLRc_46 = 2 or LDLRc_46 IS MISSING [ Mode: 0 ] => 0
(48; 0.958)
LDLRc_46 = 3 [ Mode: 0 ] => 0 (7; 0.571)
IL61i_16 = 2 or IL61i_16 IS MISSING [ Mode: 0 ] (73)
F72c_52 = 1 [ Mode: 0 ] => 0 (59; 0.525)
F72c_52 = 2 or F72c_52 IS MISSING [ Mode: 0 ] => 0 (14; 0.929)
code of (CA)n allele1 IS MISSING [ Mode: 0 ] (128)