Bookmarks

Yahoo Gmail Google Facebook Delicious Twitter Reddit Stumpleupon Myspace Digg

Search queries

sqlexpress database file auto-creation error, dbf2mysql parameter, WWWXXXAPC, wwwxxxAPC, How to unsubscrube from dategen spam, docmd.close 2585, WWWXXXDOCO, nu vot, dhcpd lease file "binding state", WWWXXXDOCO

Links

XODOX
Impressum

#1: text parsing

Posted on 2008-01-21 21:34:50 by Carolyn Marenger

Can someone point me in the direction of some good documentation on text
parsing?

I want to take a bunch of text files (rtf), read them in and dump the
contents in a database. The files are effectively a flat file database,
with I suspect some fairly intricate programming needed to process the
files. Unfortunately, they are laid out for human readability, not data
conversion.

Thanks, Carolyn

Report this message

#2: Re: text parsing

Posted on 2008-01-21 22:17:15 by McKirahan

"Carolyn Marenger" <cajunk@marenger.com> wrote in message
news:74fb1$479501d1$cf70133e$7458@PRIMUS.CA...
> Can someone point me in the direction of some good documentation on text
> parsing?
>
> I want to take a bunch of text files (rtf), read them in and dump the
> contents in a database. The files are effectively a flat file database,
> with I suspect some fairly intricate programming needed to process the
> files. Unfortunately, they are laid out for human readability, not data
> conversion.

A few questions.

How many is a "bunch"?
What would the target database be -- MySQL?
What table and column structures do you envision?
Perhaps simply a single table with two columns:
filename (key) and a memo field containing the data?
What is the purpose behind doing this?

Report this message

#3: Re: text parsing

Posted on 2008-01-21 22:17:34 by Manuel Lemos

Hello,

on 01/21/2008 06:34 PM Carolyn Marenger said the following:
> Can someone point me in the direction of some good documentation on text
> parsing?
>
> I want to take a bunch of text files (rtf), read them in and dump the
> contents in a database. The files are effectively a flat file database,
> with I suspect some fairly intricate programming needed to process the
> files. Unfortunately, they are laid out for human readability, not data
> conversion.

You may want ti try this RTF parser class:

http://www.phpclasses.org/rtfparseclass

--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

Report this message

#4: Re: text parsing

Posted on 2008-01-22 14:13:34 by Carolyn Marenger

McKirahan wrote:
> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
> news:74fb1$479501d1$cf70133e$7458@PRIMUS.CA...
>> Can someone point me in the direction of some good documentation on text
>> parsing?
>>
>> I want to take a bunch of text files (rtf), read them in and dump the
>> contents in a database. The files are effectively a flat file database,
>> with I suspect some fairly intricate programming needed to process the
>> files. Unfortunately, they are laid out for human readability, not data
>> conversion.
>
> A few questions.
>
> How many is a "bunch"?
> What would the target database be -- MySQL?
> What table and column structures do you envision?
> Perhaps simply a single table with two columns:
> filename (key) and a memo field containing the data?
> What is the purpose behind doing this?
>
A few answers

A bunch is about a dozen. Basically one large file that was broken into
sixteen subsets, following the initial letter for each record.

The target database would be MySQL

I haven't looked too closely at the data, but I think one main table
with a few linked tables for those cases where there may be more than
one piece of data for a category. There are about 25 categories to each
record. Eventually there would be additional structure added around the
imported data, but that isn't relevant to importing the data itself. (I
will confirm this before beginning to code.

The purpose: I am a D&D fan and I run games. I would like to be able to
reference the material and automate much of the process so I don't have
to lug and reference 20lbs of books.

Thanks, Carolyn

Report this message

#5: Re: text parsing

Posted on 2008-01-22 14:14:20 by Carolyn Marenger

Manuel Lemos wrote:
> Hello,
>
> on 01/21/2008 06:34 PM Carolyn Marenger said the following:
>> Can someone point me in the direction of some good documentation on text
>> parsing?
>>
>> I want to take a bunch of text files (rtf), read them in and dump the
>> contents in a database. The files are effectively a flat file database,
>> with I suspect some fairly intricate programming needed to process the
>> files. Unfortunately, they are laid out for human readability, not data
>> conversion.
>
> You may want ti try this RTF parser class:
>
> http://www.phpclasses.org/rtfparseclass
>

I am signing up, so I can check it out.

Thanks, Carolyn

Report this message

#6: Re: text parsing

Posted on 2008-01-22 16:16:04 by McKirahan

"Carolyn Marenger" <cajunk@marenger.com> wrote in message
news:7c0f$4795ea54$cf70133e$1079@PRIMUS.CA...
> McKirahan wrote:
> > "Carolyn Marenger" <cajunk@marenger.com> wrote in message
> > news:74fb1$479501d1$cf70133e$7458@PRIMUS.CA...
> >> Can someone point me in the direction of some good documentation on
text
> >> parsing?
> >>
> >> I want to take a bunch of text files (rtf), read them in and dump the
> >> contents in a database. The files are effectively a flat file
database,
> >> with I suspect some fairly intricate programming needed to process the
> >> files. Unfortunately, they are laid out for human readability, not
data
> >> conversion.
> >
> > A few questions.
> >
> > How many is a "bunch"?
> > What would the target database be -- MySQL?
> > What table and column structures do you envision?
> > Perhaps simply a single table with two columns:
> > filename (key) and a memo field containing the data?
> > What is the purpose behind doing this?
> >
> A few answers
>
> A bunch is about a dozen. Basically one large file that was broken into
> sixteen subsets, following the initial letter for each record.
>
> The target database would be MySQL
>
> I haven't looked too closely at the data, but I think one main table
> with a few linked tables for those cases where there may be more than
> one piece of data for a category. There are about 25 categories to each
> record. Eventually there would be additional structure added around the
> imported data, but that isn't relevant to importing the data itself. (I
> will confirm this before beginning to code.
>
> The purpose: I am a D&D fan and I run games. I would like to be able to
> reference the material and automate much of the process so I don't have
> to lug and reference 20lbs of books.

Any chance the RTF files are online so I could look at them?

Perhaps http://www.wizards.com/default.asp?x=d20/article/srd35?
http://www.wizards.com/d20/files/v35/SRD.zip contains 88 RTF files.


Also, I gather, this might be a one-time effort; correct?

Not what you requested but ...

I've developed a VBScript solution that takes the following approach:
for a given folder, each RTF file is opened in MS-Word and saved
as a text file which is opened and read then saved in an MS-Access
database table containing 3 columns: id (AutoNumber), file, data.

Using those 86 RTF files it created a 10MB MS-Access database.

Report this message

#7: Re: text parsing

Posted on 2008-01-23 14:03:28 by Carolyn Marenger

McKirahan wrote:
> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
> news:7c0f$4795ea54$cf70133e$1079@PRIMUS.CA...
>> McKirahan wrote:
>>> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
>>> news:74fb1$479501d1$cf70133e$7458@PRIMUS.CA...
>>>> Can someone point me in the direction of some good documentation on
> text
>>>> parsing?
>>>>
>>>> I want to take a bunch of text files (rtf), read them in and dump the
>>>> contents in a database. The files are effectively a flat file
> database,
>>>> with I suspect some fairly intricate programming needed to process the
>>>> files. Unfortunately, they are laid out for human readability, not
> data
>>>> conversion.
>>> A few questions.
>>>
>>> How many is a "bunch"?
>>> What would the target database be -- MySQL?
>>> What table and column structures do you envision?
>>> Perhaps simply a single table with two columns:
>>> filename (key) and a memo field containing the data?
>>> What is the purpose behind doing this?
>>>
>> A few answers
>>
>> A bunch is about a dozen. Basically one large file that was broken into
>> sixteen subsets, following the initial letter for each record.
>>
>> The target database would be MySQL
>>
>> I haven't looked too closely at the data, but I think one main table
>> with a few linked tables for those cases where there may be more than
>> one piece of data for a category. There are about 25 categories to each
>> record. Eventually there would be additional structure added around the
>> imported data, but that isn't relevant to importing the data itself. (I
>> will confirm this before beginning to code.
>>
>> The purpose: I am a D&D fan and I run games. I would like to be able to
>> reference the material and automate much of the process so I don't have
>> to lug and reference 20lbs of books.
>
> Any chance the RTF files are online so I could look at them?
>
> Perhaps http://www.wizards.com/default.asp?x=d20/article/srd35?
> http://www.wizards.com/d20/files/v35/SRD.zip contains 88 RTF files.
>
>
> Also, I gather, this might be a one-time effort; correct?
>
> Not what you requested but ...
>
> I've developed a VBScript solution that takes the following approach:
> for a given folder, each RTF file is opened in MS-Word and saved
> as a text file which is opened and read then saved in an MS-Access
> database table containing 3 columns: id (AutoNumber), file, data.
>
> Using those 86 RTF files it created a 10MB MS-Access database.
>

Yes, they are online. Yes, you can look at them. Yes, those are the
files except I only care about the 16 monster files. Yes, this is a one
time effort.

My goal is to create a encounter generation program - where I key in
climate, geography, season, encounter level, time of day, proximity to
civilization, and the application gives me a suggested random encounter
suited to the scenario. For example, if the party was wandering around
the city sewers on a hot summer night, they might encounter a pack of
giant rats being led by a were rat. I would then want the program to
determine how many rats, how many hit points each, and any other
pertinent variable data, including what weapons and treasure the wererat
was carrying and using.

Having the rtfs loaded into a database like your script does, would
enable faster searches, it would not go the next step and perform the
various calculations based on the results of the searches. It is a good
start, but if it has stripped any of the rtf encoding, it may make it
harder to have a script find the various 'fields'.

Thanks, Carolyn

Report this message

#8: Re: text parsing

Posted on 2008-01-23 14:53:46 by McKirahan

"Carolyn Marenger" <cajunk@marenger.com> wrote in message
news:81d7b$47973b05$cf70133e$360@PRIMUS.CA...
> McKirahan wrote:
> > "Carolyn Marenger" <cajunk@marenger.com> wrote in message
> > news:7c0f$4795ea54$cf70133e$1079@PRIMUS.CA...
> >> McKirahan wrote:
> >>> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
> >>> news:74fb1$479501d1$cf70133e$7458@PRIMUS.CA...
> >>>> Can someone point me in the direction of some good documentation on
> > text
> >>>> parsing?
> >>>>
> >>>> I want to take a bunch of text files (rtf), read them in and dump the
> >>>> contents in a database. The files are effectively a flat file
> > database,
> >>>> with I suspect some fairly intricate programming needed to process
the
> >>>> files. Unfortunately, they are laid out for human readability, not
> > data
> >>>> conversion.
> >>> A few questions.
> >>>
> >>> How many is a "bunch"?
> >>> What would the target database be -- MySQL?
> >>> What table and column structures do you envision?
> >>> Perhaps simply a single table with two columns:
> >>> filename (key) and a memo field containing the data?
> >>> What is the purpose behind doing this?
> >>>
> >> A few answers
> >>
> >> A bunch is about a dozen. Basically one large file that was broken
into
> >> sixteen subsets, following the initial letter for each record.
> >>
> >> The target database would be MySQL
> >>
> >> I haven't looked too closely at the data, but I think one main table
> >> with a few linked tables for those cases where there may be more than
> >> one piece of data for a category. There are about 25 categories to
each
> >> record. Eventually there would be additional structure added around
the
> >> imported data, but that isn't relevant to importing the data itself.
(I
> >> will confirm this before beginning to code.
> >>
> >> The purpose: I am a D&D fan and I run games. I would like to be able
to
> >> reference the material and automate much of the process so I don't have
> >> to lug and reference 20lbs of books.
> >
> > Any chance the RTF files are online so I could look at them?
> >
> > Perhaps http://www.wizards.com/default.asp?x=d20/article/srd35?
> > http://www.wizards.com/d20/files/v35/SRD.zip contains 88 RTF files.
> >
> >
> > Also, I gather, this might be a one-time effort; correct?
> >
> > Not what you requested but ...
> >
> > I've developed a VBScript solution that takes the following approach:
> > for a given folder, each RTF file is opened in MS-Word and saved
> > as a text file which is opened and read then saved in an MS-Access
> > database table containing 3 columns: id (AutoNumber), file, data.
> >
> > Using those 86 RTF files it created a 10MB MS-Access database.
> >
>
> Yes, they are online. Yes, you can look at them. Yes, those are the
> files except I only care about the 16 monster files. Yes, this is a one
> time effort.
>
> My goal is to create a encounter generation program - where I key in
> climate, geography, season, encounter level, time of day, proximity to
> civilization, and the application gives me a suggested random encounter
> suited to the scenario. For example, if the party was wandering around
> the city sewers on a hot summer night, they might encounter a pack of
> giant rats being led by a were rat. I would then want the program to
> determine how many rats, how many hit points each, and any other
> pertinent variable data, including what weapons and treasure the wererat
> was carrying and using.
>
> Having the rtfs loaded into a database like your script does, would
> enable faster searches, it would not go the next step and perform the
> various calculations based on the results of the searches. It is a good
> start, but if it has stripped any of the rtf encoding, it may make it
> harder to have a script find the various 'fields'.
>
> Thanks, Carolyn

I counted 17 "Monster" prefixed files.

My version creates ".txt" files which do strip "the rtf encoding".

An alternative version creates ".htm" files which retains the
formatting you want; I don't think you really want all of the
"rtf encoding" unless you fully understand the specification:
(search on "rtf specification".)

Perhaps, as an intermediate step, you would like all of the
"Monster" rtfs converted to HTML and made available via
an interface to open one or more for viewing.

As HTML files they consume 7.5MB.

Report this message

#9: Re: text parsing

Posted on 2008-01-23 15:38:34 by Carolyn Marenger

McKirahan wrote:
> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
> news:81d7b$47973b05$cf70133e$360@PRIMUS.CA...
>> McKirahan wrote:
>>> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
>>> news:7c0f$4795ea54$cf70133e$1079@PRIMUS.CA...
>>>> McKirahan wrote:
>>>>> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
>>>>> news:74fb1$479501d1$cf70133e$7458@PRIMUS.CA...
>>>>>> Can someone point me in the direction of some good documentation on
>>> text
>>>>>> parsing?
>>>>>>
>>>>>> I want to take a bunch of text files (rtf), read them in and dump the
>>>>>> contents in a database. The files are effectively a flat file
>>> database,
>>>>>> with I suspect some fairly intricate programming needed to process
> the
>>>>>> files. Unfortunately, they are laid out for human readability, not
>>> data
>>>>>> conversion.
>>>>> A few questions.
>>>>>
>>>>> How many is a "bunch"?
>>>>> What would the target database be -- MySQL?
>>>>> What table and column structures do you envision?
>>>>> Perhaps simply a single table with two columns:
>>>>> filename (key) and a memo field containing the data?
>>>>> What is the purpose behind doing this?
>>>>>
>>>> A few answers
>>>>
>>>> A bunch is about a dozen. Basically one large file that was broken
> into
>>>> sixteen subsets, following the initial letter for each record.
>>>>
>>>> The target database would be MySQL
>>>>
>>>> I haven't looked too closely at the data, but I think one main table
>>>> with a few linked tables for those cases where there may be more than
>>>> one piece of data for a category. There are about 25 categories to
> each
>>>> record. Eventually there would be additional structure added around
> the
>>>> imported data, but that isn't relevant to importing the data itself.
> (I
>>>> will confirm this before beginning to code.
>>>>
>>>> The purpose: I am a D&D fan and I run games. I would like to be able
> to
>>>> reference the material and automate much of the process so I don't have
>>>> to lug and reference 20lbs of books.
>>> Any chance the RTF files are online so I could look at them?
>>>
>>> Perhaps http://www.wizards.com/default.asp?x=d20/article/srd35?
>>> http://www.wizards.com/d20/files/v35/SRD.zip contains 88 RTF files.
>>>
>>>
>>> Also, I gather, this might be a one-time effort; correct?
>>>
>>> Not what you requested but ...
>>>
>>> I've developed a VBScript solution that takes the following approach:
>>> for a given folder, each RTF file is opened in MS-Word and saved
>>> as a text file which is opened and read then saved in an MS-Access
>>> database table containing 3 columns: id (AutoNumber), file, data.
>>>
>>> Using those 86 RTF files it created a 10MB MS-Access database.
>>>
>> Yes, they are online. Yes, you can look at them. Yes, those are the
>> files except I only care about the 16 monster files. Yes, this is a one
>> time effort.
>>
>> My goal is to create a encounter generation program - where I key in
>> climate, geography, season, encounter level, time of day, proximity to
>> civilization, and the application gives me a suggested random encounter
>> suited to the scenario. For example, if the party was wandering around
>> the city sewers on a hot summer night, they might encounter a pack of
>> giant rats being led by a were rat. I would then want the program to
>> determine how many rats, how many hit points each, and any other
>> pertinent variable data, including what weapons and treasure the wererat
>> was carrying and using.
>>
>> Having the rtfs loaded into a database like your script does, would
>> enable faster searches, it would not go the next step and perform the
>> various calculations based on the results of the searches. It is a good
>> start, but if it has stripped any of the rtf encoding, it may make it
>> harder to have a script find the various 'fields'.
>>
>> Thanks, Carolyn
>
> I counted 17 "Monster" prefixed files.
>
> My version creates ".txt" files which do strip "the rtf encoding".
>
> An alternative version creates ".htm" files which retains the
> formatting you want; I don't think you really want all of the
> "rtf encoding" unless you fully understand the specification:
> (search on "rtf specification".)
>
> Perhaps, as an intermediate step, you would like all of the
> "Monster" rtfs converted to HTML and made available via
> an interface to open one or more for viewing.
>
> As HTML files they consume 7.5MB.
>

There are a couple of the monster prefixed files that are not listings
of monsters but other information, such as monsters as characters.
Anyway, exact number of files is not overly important.

I just did a little test, and looking at the files, I think the easiest
to work with may indeed be the text file.

Here is an example to illustrate: I am pulling the monster name, type
and hit dice from each file format.

in rtf...
{
\par }{\fs36
\par DELVER
\par }\trowd \trgaph108\trleft-108\trbrdrh\brdrs\brdrw10
\trftsWidth1\trautofit1\trpaddl108\trpaddr108\trpaddfl3\trpa ddfr3
\clvertalt\clbrdrb\brdrs\brdrw10 \cltxlrtb\clftsWidth1
\cellx1969\clvertalt\clbrdrb\brdrs\brdrw10
\cltxlrtb\clftsWidth3\clwWidth4871
\cellx6840\pard \ql \li0\ri0\nowidctlpar\intbl\faauto\rin0\lin0 {\b\fs20
}{\b\fs19 \cell }{\fs20 Huge Aberration}{\fs19 \cell }\pard \ql
\li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\ rin0\lin0
{\fs19 \trowd \trgaph108\trleft-108\trbrdrh
\brdrs\brdrw10
\trftsWidth1\trautofit1\trpaddl108\trpaddr108\trpaddfl3\trpa ddfr3
\clvertalt\clbrdrb\brdrs\brdrw10 \cltxlrtb\clftsWidth1
\cellx1969\clvertalt\clbrdrb\brdrs\brdrw10
\cltxlrtb\clftsWidth3\clwWidth4871 \cellx6840\row }\trowd
\trgaph108\trleft-108\trbrdrh\brdrs\brdrw10
\trftsWidth1\trautofit1\trpaddl108\trpaddr108\trpaddfl3\trpa ddfr3
\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10
\cltxlrtb\clftsWidth1 \cellx1969\clvertalt\clbrdrt\brdrs\brdrw10
\clbrdrb\brdrs\brdrw10
\cltxlrtb\clftsWidth3\clwWidth4871 \cellx6840\pard \ql
\li0\ri0\nowidctlpar\intbl\faauto\rin0\lin0 {\b\fs20 Hit Dice:}{\b\fs19
\cell }{\fs20 15d8+78 (145 hp)}{\fs19 \cell }\pard \ql

----------
in .html...
<P STYLE="page-break-after: avoid"><FONT SIZE=5>DARKMANTLE</FONT></P>
<TABLE WIDTH=410 BORDER=1 BORDERCOLOR="#000000" CELLPADDING=7
CELLSPACING=0 FRAME=VOID RULES=ROWS>
<COL WIDTH=124>
<COL WIDTH=258>
<TR VALIGN=TOP>

<TD WIDTH=124>
<P CLASS="western">
</P>
</TD>
<TD WIDTH=258>
<P CLASS="western"><FONT SIZE=2>Small Magical Beast</FONT></P>
</TD>
</TR>
<TR VALIGN=TOP>

<TD WIDTH=124>
<P CLASS="western"><FONT SIZE=2><B>Hit Dice:</B></FONT></P>
</TD>
<TD WIDTH=258>
<P CLASS="western"><FONT SIZE=2>1d10+1 (6 hp)</FONT></P>
</TD>
</TR>

---------
in .txt...

DARKMANTLE

Small Magical Beast
Hit Dice:
1d10+1 (6 hp)


--------

So, looking at that and assuming the rest will be similar, the text
version looks the easiest to deal with. If document styling such as
'title', 'heading' and 'subheading' had been used, maybe not, but in
this case, a new line seems to denote either a field heading or field
data. There are exceptions of course - particularly when denoting a
category of monster.

That doies bring me a little closer to achievign my goal. Thanks for
the assistance. :)

Carolyn

Report this message

#10: Re: text parsing

Posted on 2008-01-23 16:08:02 by McKirahan

"Carolyn Marenger" <cajunk@marenger.com> wrote in message
news:5a282$4797516b$cf70133e$25433@PRIMUS.CA...
> McKirahan wrote:
> > "Carolyn Marenger" <cajunk@marenger.com> wrote in message
> > news:81d7b$47973b05$cf70133e$360@PRIMUS.CA...
> >> McKirahan wrote:
> >>> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
> >>> news:7c0f$4795ea54$cf70133e$1079@PRIMUS.CA...
> >>>> McKirahan wrote:
> >>>>> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
> >>>>> news:74fb1$479501d1$cf70133e$7458@PRIMUS.CA...
> >>>>>> Can someone point me in the direction of some good documentation on
> >>> text
> >>>>>> parsing?
> >>>>>>
> >>>>>> I want to take a bunch of text files (rtf), read them in and dump
the
> >>>>>> contents in a database. The files are effectively a flat file
> >>> database,
> >>>>>> with I suspect some fairly intricate programming needed to process
> > the
> >>>>>> files. Unfortunately, they are laid out for human readability, not
> >>> data
> >>>>>> conversion.
> >>>>> A few questions.
> >>>>>
> >>>>> How many is a "bunch"?
> >>>>> What would the target database be -- MySQL?
> >>>>> What table and column structures do you envision?
> >>>>> Perhaps simply a single table with two columns:
> >>>>> filename (key) and a memo field containing the data?
> >>>>> What is the purpose behind doing this?
> >>>>>
> >>>> A few answers
> >>>>
> >>>> A bunch is about a dozen. Basically one large file that was broken
> > into
> >>>> sixteen subsets, following the initial letter for each record.
> >>>>
> >>>> The target database would be MySQL
> >>>>
> >>>> I haven't looked too closely at the data, but I think one main table
> >>>> with a few linked tables for those cases where there may be more than
> >>>> one piece of data for a category. There are about 25 categories to
> > each
> >>>> record. Eventually there would be additional structure added around
> > the
> >>>> imported data, but that isn't relevant to importing the data itself.
> > (I
> >>>> will confirm this before beginning to code.
> >>>>
> >>>> The purpose: I am a D&D fan and I run games. I would like to be able
> > to
> >>>> reference the material and automate much of the process so I don't
have
> >>>> to lug and reference 20lbs of books.
> >>> Any chance the RTF files are online so I could look at them?
> >>>
> >>> Perhaps http://www.wizards.com/default.asp?x=d20/article/srd35?
> >>> http://www.wizards.com/d20/files/v35/SRD.zip contains 88 RTF files.
> >>>
> >>>
> >>> Also, I gather, this might be a one-time effort; correct?
> >>>
> >>> Not what you requested but ...
> >>>
> >>> I've developed a VBScript solution that takes the following approach:
> >>> for a given folder, each RTF file is opened in MS-Word and saved
> >>> as a text file which is opened and read then saved in an MS-Access
> >>> database table containing 3 columns: id (AutoNumber), file, data.
> >>>
> >>> Using those 86 RTF files it created a 10MB MS-Access database.
> >>>
> >> Yes, they are online. Yes, you can look at them. Yes, those are the
> >> files except I only care about the 16 monster files. Yes, this is a
one
> >> time effort.
> >>
> >> My goal is to create a encounter generation program - where I key in
> >> climate, geography, season, encounter level, time of day, proximity to
> >> civilization, and the application gives me a suggested random encounter
> >> suited to the scenario. For example, if the party was wandering around
> >> the city sewers on a hot summer night, they might encounter a pack of
> >> giant rats being led by a were rat. I would then want the program to
> >> determine how many rats, how many hit points each, and any other
> >> pertinent variable data, including what weapons and treasure the
wererat
> >> was carrying and using.
> >>
> >> Having the rtfs loaded into a database like your script does, would
> >> enable faster searches, it would not go the next step and perform the
> >> various calculations based on the results of the searches. It is a
good
> >> start, but if it has stripped any of the rtf encoding, it may make it
> >> harder to have a script find the various 'fields'.
> >>
> >> Thanks, Carolyn
> >
> > I counted 17 "Monster" prefixed files.
> >
> > My version creates ".txt" files which do strip "the rtf encoding".
> >
> > An alternative version creates ".htm" files which retains the
> > formatting you want; I don't think you really want all of the
> > "rtf encoding" unless you fully understand the specification:
> > (search on "rtf specification".)
> >
> > Perhaps, as an intermediate step, you would like all of the
> > "Monster" rtfs converted to HTML and made available via
> > an interface to open one or more for viewing.
> >
> > As HTML files they consume 7.5MB.
> >
>
> There are a couple of the monster prefixed files that are not listings
> of monsters but other information, such as monsters as characters.
> Anyway, exact number of files is not overly important.
>
> I just did a little test, and looking at the files, I think the easiest
> to work with may indeed be the text file.
>
> Here is an example to illustrate: I am pulling the monster name, type
> and hit dice from each file format.
>
> in rtf...
> {
> \par }{\fs36
> \par DELVER
> \par }\trowd \trgaph108\trleft-108\trbrdrh\brdrs\brdrw10
> \trftsWidth1\trautofit1\trpaddl108\trpaddr108\trpaddfl3\trpa ddfr3
> \clvertalt\clbrdrb\brdrs\brdrw10 \cltxlrtb\clftsWidth1
> \cellx1969\clvertalt\clbrdrb\brdrs\brdrw10
> \cltxlrtb\clftsWidth3\clwWidth4871
> \cellx6840\pard \ql \li0\ri0\nowidctlpar\intbl\faauto\rin0\lin0 {\b\fs20
> }{\b\fs19 \cell }{\fs20 Huge Aberration}{\fs19 \cell }\pard \ql
> \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\ rin0\lin0
> {\fs19 \trowd \trgaph108\trleft-108\trbrdrh
> \brdrs\brdrw10
> \trftsWidth1\trautofit1\trpaddl108\trpaddr108\trpaddfl3\trpa ddfr3
> \clvertalt\clbrdrb\brdrs\brdrw10 \cltxlrtb\clftsWidth1
> \cellx1969\clvertalt\clbrdrb\brdrs\brdrw10
> \cltxlrtb\clftsWidth3\clwWidth4871 \cellx6840\row }\trowd
> \trgaph108\trleft-108\trbrdrh\brdrs\brdrw10
> \trftsWidth1\trautofit1\trpaddl108\trpaddr108\trpaddfl3\trpa ddfr3
> \clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10
> \cltxlrtb\clftsWidth1 \cellx1969\clvertalt\clbrdrt\brdrs\brdrw10
> \clbrdrb\brdrs\brdrw10
> \cltxlrtb\clftsWidth3\clwWidth4871 \cellx6840\pard \ql
> \li0\ri0\nowidctlpar\intbl\faauto\rin0\lin0 {\b\fs20 Hit Dice:}{\b\fs19
> \cell }{\fs20 15d8+78 (145 hp)}{\fs19 \cell }\pard \ql
>
> ----------
> in .html...
> <P STYLE="page-break-after: avoid"><FONT SIZE=5>DARKMANTLE</FONT></P>
> <TABLE WIDTH=410 BORDER=1 BORDERCOLOR="#000000" CELLPADDING=7
> CELLSPACING=0 FRAME=VOID RULES=ROWS>
> <COL WIDTH=124>
> <COL WIDTH=258>
> <TR VALIGN=TOP>
>
> <TD WIDTH=124>
> <P CLASS="western">
> </P>
> </TD>
> <TD WIDTH=258>
> <P CLASS="western"><FONT SIZE=2>Small Magical Beast</FONT></P>
> </TD>
> </TR>
> <TR VALIGN=TOP>
>
> <TD WIDTH=124>
> <P CLASS="western"><FONT SIZE=2><B>Hit Dice:</B></FONT></P>
> </TD>
> <TD WIDTH=258>
> <P CLASS="western"><FONT SIZE=2>1d10+1 (6 hp)</FONT></P>
> </TD>
> </TR>
>
> ---------
> in .txt...
>
> DARKMANTLE
>
> Small Magical Beast
> Hit Dice:
> 1d10+1 (6 hp)
>
>
> --------
>
> So, looking at that and assuming the rest will be similar, the text
> version looks the easiest to deal with. If document styling such as
> 'title', 'heading' and 'subheading' had been used, maybe not, but in
> this case, a new line seems to denote either a field heading or field
> data. There are exceptions of course - particularly when denoting a
> category of monster.
>
> That doies bring me a little closer to achievign my goal. Thanks for
> the assistance. :)
>
> Carolyn

So I gather you have what you need.

I'd suggest just manually converting the files (via MS-Word Save-As)
rather than automating that part of the process since it's a one-time
effort and there aren't that many files.

Below is a page that will list and allow selection of the "Monster"
files via a dropdown with the page displayed in an <iframe>. The
<select> is on the right to allow quicker access to the scroll bar.
Save it as "Monster.htm" and put it in the same folder as the
"Monster" files as Web pages; (i.e. with a ".htm" extension).
Doubleclick on the filename in Windows Explorer or create a
desktop shortcut to it for quicker access.

Watch for word-wrap.

<html>
<head>
<title>Monster.htm</title>
<script type="text/javascript">
function monster(that) {
var what = document.getElementById("id_select").value;
document.getElementById("id_picked").innerHTML = what;
document.getElementById("id_iframe").src = what;
}
</script>
<style type="text/css">
..font { font-family:Arial; font-size:8pt }
..zero { margin:0px; padding:0px }
</style>
</head>
<body class="zero">
<form action="" method="get" class="zero">
<table align="center" border="0" cellpadding="0" cellspacing="0"
width="100">
<tr valign="top">
<th>
<span id="id_picked" class="font"></span><br>
<iframe id="id_iframe" width="860" height="600"></iframe>
</th>
<td>&nbsp;</td>
<td class="font">
&nbsp; &nbsp; &nbsp; <b>Monster Files:</b><br>
<select class="font" size="19" id="id_select" onchange="monster(this)">
<option value="">
<option value="MonstersIntro-A.htm">Monsters Intro-A
<option value="MonstersB-C.htm">Monsters B-C
<option value="MonstersD-De.htm">Monsters D-De
<option value="MonstersDi-Do.htm">Monsters Di-Do
<option value="MonstersDr-Dw.htm">Monsters Dr-Dw
<option value="MonstersE-F.htm">Monsters E-F
<option value="MonstersG.htm">Monsters G
<option value="MonstersH-I.htm">Monsters H-I
<option value="MonstersK-L.htm">Monsters K-L
<option value="MonstersM-N.htm">Monsters M-N
<option value="MonstersO-R.htm">Monsters O-R
<option value="MonstersS.htm">Monsters S
<option value="MonstersT-Z.htm">Monsters T-Z
<option value=""> - - - - - - - - - - - - -
<option value="MonsterFeats.htm">Monster Feats
<option value="MonstersAnimals.htm">Monsters Animals
<option value="MonstersasRaces.htm">Monsters as Races
<option value="MonstersVermin.htm">Monsters Vermon
</select>
</td>
</tr>
</table>
</form>
</body>
</html>

Report this message

#11: Re: text parsing

Posted on 2008-01-23 16:30:54 by Courtney

Carolyn Marenger wrote:
> McKirahan wrote:
>> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
>> news:7c0f$4795ea54$cf70133e$1079@PRIMUS.CA...
>>> McKirahan wrote:
>>>> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
>>>> news:74fb1$479501d1$cf70133e$7458@PRIMUS.CA...
>>>>> Can someone point me in the direction of some good documentation on
>> text
>>>>> parsing?
>>>>>
>>>>> I want to take a bunch of text files (rtf), read them in and dump the
>>>>> contents in a database. The files are effectively a flat file
>> database,
>>>>> with I suspect some fairly intricate programming needed to process the
>>>>> files. Unfortunately, they are laid out for human readability, not
>> data
>>>>> conversion.
>>>> A few questions.
>>>>
>>>> How many is a "bunch"?
>>>> What would the target database be -- MySQL?
>>>> What table and column structures do you envision?
>>>> Perhaps simply a single table with two columns:
>>>> filename (key) and a memo field containing the data?
>>>> What is the purpose behind doing this?
>>>>
>>> A few answers
>>>
>>> A bunch is about a dozen. Basically one large file that was broken into
>>> sixteen subsets, following the initial letter for each record.
>>>
>>> The target database would be MySQL
>>>
>>> I haven't looked too closely at the data, but I think one main table
>>> with a few linked tables for those cases where there may be more than
>>> one piece of data for a category. There are about 25 categories to each
>>> record. Eventually there would be additional structure added around the
>>> imported data, but that isn't relevant to importing the data itself. (I
>>> will confirm this before beginning to code.
>>>
>>> The purpose: I am a D&D fan and I run games. I would like to be able to
>>> reference the material and automate much of the process so I don't have
>>> to lug and reference 20lbs of books.
>>
>> Any chance the RTF files are online so I could look at them?
>>
>> Perhaps http://www.wizards.com/default.asp?x=d20/article/srd35?
>> http://www.wizards.com/d20/files/v35/SRD.zip contains 88 RTF files.
>>
>>
>> Also, I gather, this might be a one-time effort; correct?
>>
>> Not what you requested but ...
>>
>> I've developed a VBScript solution that takes the following approach:
>> for a given folder, each RTF file is opened in MS-Word and saved
>> as a text file which is opened and read then saved in an MS-Access
>> database table containing 3 columns: id (AutoNumber), file, data.
>>
>> Using those 86 RTF files it created a 10MB MS-Access database.
>>
>
> Yes, they are online. Yes, you can look at them. Yes, those are the
> files except I only care about the 16 monster files. Yes, this is a one
> time effort.
>
> My goal is to create a encounter generation program - where I key in
> climate, geography, season, encounter level, time of day, proximity to
> civilization, and the application gives me a suggested random encounter
> suited to the scenario. For example, if the party was wandering around
> the city sewers on a hot summer night, they might encounter a pack of
> giant rats being led by a were rat.

Only if

1/. It was los angeles

2/. They had all taken too many mind enhacing drugs.

Otherwise its likely to be Viles disease, at the most interesting ;-)

> I would then want the program to
> determine how many rats, how many hit points each, and any other
> pertinent variable data, including what weapons and treasure the wererat
> was carrying and using.
>
> Having the rtfs loaded into a database like your script does, would
> enable faster searches, it would not go the next step and perform the
> various calculations based on the results of the searches. It is a good
> start, but if it has stripped any of the rtf encoding, it may make it
> harder to have a script find the various 'fields'.
>

Go full database surely. The art is to define the 'monster' table with
extensibility for all the monster classes one might encounter.
When doing ANYTHING based on a database, the most important thing is to
spend time designing table layouts. And write a data dictionary. And
keep it up to date.





> Thanks, Carolyn

Report this message

#12: Re: text parsing

Posted on 2008-01-24 11:51:21 by Carolyn Marenger

McKirahan wrote:
> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
> news:5a282$4797516b$cf70133e$25433@PRIMUS.CA...
>> McKirahan wrote:
>>> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
>>> news:81d7b$47973b05$cf70133e$360@PRIMUS.CA...
>>>> McKirahan wrote:
>>>>> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
>>>>> news:7c0f$4795ea54$cf70133e$1079@PRIMUS.CA...
>>>>>> McKirahan wrote:
>>>>>>> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
>>>>>>> news:74fb1$479501d1$cf70133e$7458@PRIMUS.CA...
>>>>>>>> Can someone point me in the direction of some good documentation on
>>>>> text
>>>>>>>> parsing?
>>>>>>>>
>>>>>>>> I want to take a bunch of text files (rtf), read them in and dump
> the
>>>>>>>> contents in a database. The files are effectively a flat file
>>>>> database,
>>>>>>>> with I suspect some fairly intricate programming needed to process
>>> the
>>>>>>>> files. Unfortunately, they are laid out for human readability, not
>>>>> data
>>>>>>>> conversion.
>>>>>>> A few questions.
>>>>>>>
>>>>>>> How many is a "bunch"?
>>>>>>> What would the target database be -- MySQL?
>>>>>>> What table and column structures do you envision?
>>>>>>> Perhaps simply a single table with two columns:
>>>>>>> filename (key) and a memo field containing the data?
>>>>>>> What is the purpose behind doing this?
>>>>>>>
>>>>>> A few answers
>>>>>>
>>>>>> A bunch is about a dozen. Basically one large file that was broken
>>> into
>>>>>> sixteen subsets, following the initial letter for each record.
>>>>>>
>>>>>> The target database would be MySQL
>>>>>>
>>>>>> I haven't looked too closely at the data, but I think one main table
>>>>>> with a few linked tables for those cases where there may be more than
>>>>>> one piece of data for a category. There are about 25 categories to
>>> each
>>>>>> record. Eventually there would be additional structure added around
>>> the
>>>>>> imported data, but that isn't relevant to importing the data itself.
>>> (I
>>>>>> will confirm this before beginning to code.
>>>>>>
>>>>>> The purpose: I am a D&D fan and I run games. I would like to be able
>>> to
>>>>>> reference the material and automate much of the process so I don't
> have
>>>>>> to lug and reference 20lbs of books.
>>>>> Any chance the RTF files are online so I could look at them?
>>>>>
>>>>> Perhaps http://www.wizards.com/default.asp?x=d20/article/srd35?
>>>>> http://www.wizards.com/d20/files/v35/SRD.zip contains 88 RTF files.
>>>>>
>>>>>
>>>>> Also, I gather, this might be a one-time effort; correct?
>>>>>
>>>>> Not what you requested but ...
>>>>>
>>>>> I've developed a VBScript solution that takes the following approach:
>>>>> for a given folder, each RTF file is opened in MS-Word and saved
>>>>> as a text file which is opened and read then saved in an MS-Access
>>>>> database table containing 3 columns: id (AutoNumber), file, data.
>>>>>
>>>>> Using those 86 RTF files it created a 10MB MS-Access database.
>>>>>
>>>> Yes, they are online. Yes, you can look at them. Yes, those are the
>>>> files except I only care about the 16 monster files. Yes, this is a
> one
>>>> time effort.
>>>>
>>>> My goal is to create a encounter generation program - where I key in
>>>> climate, geography, season, encounter level, time of day, proximity to
>>>> civilization, and the application gives me a suggested random encounter
>>>> suited to the scenario. For example, if the party was wandering around
>>>> the city sewers on a hot summer night, they might encounter a pack of
>>>> giant rats being led by a were rat. I would then want the program to
>>>> determine how many rats, how many hit points each, and any other
>>>> pertinent variable data, including what weapons and treasure the
> wererat
>>>> was carrying and using.
>>>>
>>>> Having the rtfs loaded into a database like your script does, would
>>>> enable faster searches, it would not go the next step and perform the
>>>> various calculations based on the results of the searches. It is a
> good
>>>> start, but if it has stripped any of the rtf encoding, it may make it
>>>> harder to have a script find the various 'fields'.
>>>>
>>>> Thanks, Carolyn
>>> I counted 17 "Monster" prefixed files.
>>>
>>> My version creates ".txt" files which do strip "the rtf encoding".
>>>
>>> An alternative version creates ".htm" files which retains the
>>> formatting you want; I don't think you really want all of the
>>> "rtf encoding" unless you fully understand the specification:
>>> (search on "rtf specification".)
>>>
>>> Perhaps, as an intermediate step, you would like all of the
>>> "Monster" rtfs converted to HTML and made available via
>>> an interface to open one or more for viewing.
>>>
>>> As HTML files they consume 7.5MB.
>>>
>> There are a couple of the monster prefixed files that are not listings
>> of monsters but other information, such as monsters as characters.
>> Anyway, exact number of files is not overly important.
>>
>> I just did a little test, and looking at the files, I think the easiest
>> to work with may indeed be the text file.
>>
>> Here is an example to illustrate: I am pulling the monster name, type
>> and hit dice from each file format.
>>
>> in rtf...
>> {
>> \par }{\fs36
>> \par DELVER
>> \par }\trowd \trgaph108\trleft-108\trbrdrh\brdrs\brdrw10
>> \trftsWidth1\trautofit1\trpaddl108\trpaddr108\trpaddfl3\trpa ddfr3
>> \clvertalt\clbrdrb\brdrs\brdrw10 \cltxlrtb\clftsWidth1
>> \cellx1969\clvertalt\clbrdrb\brdrs\brdrw10
>> \cltxlrtb\clftsWidth3\clwWidth4871
>> \cellx6840\pard \ql \li0\ri0\nowidctlpar\intbl\faauto\rin0\lin0 {\b\fs20
>> }{\b\fs19 \cell }{\fs20 Huge Aberration}{\fs19 \cell }\pard \ql
>> \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\ rin0\lin0
>> {\fs19 \trowd \trgaph108\trleft-108\trbrdrh
>> \brdrs\brdrw10
>> \trftsWidth1\trautofit1\trpaddl108\trpaddr108\trpaddfl3\trpa ddfr3
>> \clvertalt\clbrdrb\brdrs\brdrw10 \cltxlrtb\clftsWidth1
>> \cellx1969\clvertalt\clbrdrb\brdrs\brdrw10
>> \cltxlrtb\clftsWidth3\clwWidth4871 \cellx6840\row }\trowd
>> \trgaph108\trleft-108\trbrdrh\brdrs\brdrw10
>> \trftsWidth1\trautofit1\trpaddl108\trpaddr108\trpaddfl3\trpa ddfr3
>> \clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10
>> \cltxlrtb\clftsWidth1 \cellx1969\clvertalt\clbrdrt\brdrs\brdrw10
>> \clbrdrb\brdrs\brdrw10
>> \cltxlrtb\clftsWidth3\clwWidth4871 \cellx6840\pard \ql
>> \li0\ri0\nowidctlpar\intbl\faauto\rin0\lin0 {\b\fs20 Hit Dice:}{\b\fs19
>> \cell }{\fs20 15d8+78 (145 hp)}{\fs19 \cell }\pard \ql
>>
>> ----------
>> in .html...
>> <P STYLE="page-break-after: avoid"><FONT SIZE=5>DARKMANTLE</FONT></P>
>> <TABLE WIDTH=410 BORDER=1 BORDERCOLOR="#000000" CELLPADDING=7
>> CELLSPACING=0 FRAME=VOID RULES=ROWS>
>> <COL WIDTH=124>
>> <COL WIDTH=258>
>> <TR VALIGN=TOP>
>>
>> <TD WIDTH=124>
>> <P CLASS="western">
>> </P>
>> </TD>
>> <TD WIDTH=258>
>> <P CLASS="western"><FONT SIZE=2>Small Magical Beast</FONT></P>
>> </TD>
>> </TR>
>> <TR VALIGN=TOP>
>>
>> <TD WIDTH=124>
>> <P CLASS="western"><FONT SIZE=2><B>Hit Dice:</B></FONT></P>
>> </TD>
>> <TD WIDTH=258>
>> <P CLASS="western"><FONT SIZE=2>1d10+1 (6 hp)</FONT></P>
>> </TD>
>> </TR>
>>
>> ---------
>> in .txt...
>>
>> DARKMANTLE
>>
>> Small Magical Beast
>> Hit Dice:
>> 1d10+1 (6 hp)
>>
>>
>> --------
>>
>> So, looking at that and assuming the rest will be similar, the text
>> version looks the easiest to deal with. If document styling such as
>> 'title', 'heading' and 'subheading' had been used, maybe not, but in
>> this case, a new line seems to denote either a field heading or field
>> data. There are exceptions of course - particularly when denoting a
>> category of monster.
>>
>> That doies bring me a little closer to achievign my goal. Thanks for
>> the assistance. :)
>>
>> Carolyn
>
> So I gather you have what you need.
>
> I'd suggest just manually converting the files (via MS-Word Save-As)
> rather than automating that part of the process since it's a one-time
> effort and there aren't that many files.
>
> Below is a page that will list and allow selection of the "Monster"
> files via a dropdown with the page displayed in an <iframe>. The
> <select> is on the right to allow quicker access to the scroll bar.
> Save it as "Monster.htm" and put it in the same folder as the
> "Monster" files as Web pages; (i.e. with a ".htm" extension).
> Doubleclick on the filename in Windows Explorer or create a
> desktop shortcut to it for quicker access.
>
> Watch for word-wrap.
>
> <html>
> <head>
> <title>Monster.htm</title>
> <script type="text/javascript">
> function monster(that) {
> var what = document.getElementById("id_select").value;
> document.getElementById("id_picked").innerHTML = what;
> document.getElementById("id_iframe").src = what;
> }
> </script>
> <style type="text/css">
> ..font { font-family:Arial; font-size:8pt }
> ..zero { margin:0px; padding:0px }
> </style>
> </head>
> <body class="zero">
> <form action="" method="get" class="zero">
> <table align="center" border="0" cellpadding="0" cellspacing="0"
> width="100">
> <tr valign="top">
> <th>
> <span id="id_picked" class="font"></span><br>
> <iframe id="id_iframe" width="860" height="600"></iframe>
> </th>
> <td>&nbsp;</td>
> <td class="font">
> &nbsp; &nbsp; &nbsp; <b>Monster Files:</b><br>
> <select class="font" size="19" id="id_select" onchange="monster(this)">
> <option value="">
> <option value="MonstersIntro-A.htm">Monsters Intro-A
> <option value="MonstersB-C.htm">Monsters B-C
> <option value="MonstersD-De.htm">Monsters D-De
> <option value="MonstersDi-Do.htm">Monsters Di-Do
> <option value="MonstersDr-Dw.htm">Monsters Dr-Dw
> <option value="MonstersE-F.htm">Monsters E-F
> <option value="MonstersG.htm">Monsters G
> <option value="MonstersH-I.htm">Monsters H-I
> <option value="MonstersK-L.htm">Monsters K-L
> <option value="MonstersM-N.htm">Monsters M-N
> <option value="MonstersO-R.htm">Monsters O-R
> <option value="MonstersS.htm">Monsters S
> <option value="MonstersT-Z.htm">Monsters T-Z
> <option value=""> - - - - - - - - - - - - -
> <option value="MonsterFeats.htm">Monster Feats
> <option value="MonstersAnimals.htm">Monsters Animals
> <option value="MonstersasRaces.htm">Monsters as Races
> <option value="MonstersVermin.htm">Monsters Vermon
> </select>
> </td>
> </tr>
> </table>
> </form>
> </body>
> </html>
>

I was going to do the conversion manually, with open office. Using word
would cost too much, as I would have to go and purchase it. I do have a
windows box to install it on - games and website testing, but other than
that - linux and open office. The web page you just gave me works fine
either way. Thanks!

Carolyn

Report this message

#13: Re: text parsing

Posted on 2008-01-24 11:53:47 by Carolyn Marenger

The Natural Philosopher wrote:
> Carolyn Marenger wrote:
>> McKirahan wrote:
>>> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
>>> news:7c0f$4795ea54$cf70133e$1079@PRIMUS.CA...
>>>> McKirahan wrote:
>>>>> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
>>>>> news:74fb1$479501d1$cf70133e$7458@PRIMUS.CA...
>>>>>> Can someone point me in the direction of some good documentation on
>>> text
>>>>>> parsing?
>>>>>>
>>>>>> I want to take a bunch of text files (rtf), read them in and dump the
>>>>>> contents in a database. The files are effectively a flat file
>>> database,
>>>>>> with I suspect some fairly intricate programming needed to process
>>>>>> the
>>>>>> files. Unfortunately, they are laid out for human readability, not
>>> data
>>>>>> conversion.
>>>>> A few questions.
>>>>>
>>>>> How many is a "bunch"?
>>>>> What would the target database be -- MySQL?
>>>>> What table and column structures do you envision?
>>>>> Perhaps simply a single table with two columns:
>>>>> filename (key) and a memo field containing the data?
>>>>> What is the purpose behind doing this?
>>>>>
>>>> A few answers
>>>>
>>>> A bunch is about a dozen. Basically one large file that was broken
>>>> into
>>>> sixteen subsets, following the initial letter for each record.
>>>>
>>>> The target database would be MySQL
>>>>
>>>> I haven't looked too closely at the data, but I think one main table
>>>> with a few linked tables for those cases where there may be more than
>>>> one piece of data for a category. There are about 25 categories to
>>>> each
>>>> record. Eventually there would be additional structure added around
>>>> the
>>>> imported data, but that isn't relevant to importing the data
>>>> itself. (I
>>>> will confirm this before beginning to code.
>>>>
>>>> The purpose: I am a D&D fan and I run games. I would like to be
>>>> able to
>>>> reference the material and automate much of the process so I don't have
>>>> to lug and reference 20lbs of books.
>>>
>>> Any chance the RTF files are online so I could look at them?
>>>
>>> Perhaps http://www.wizards.com/default.asp?x=d20/article/srd35?
>>> http://www.wizards.com/d20/files/v35/SRD.zip contains 88 RTF files.
>>>
>>>
>>> Also, I gather, this might be a one-time effort; correct?
>>>
>>> Not what you requested but ...
>>>
>>> I've developed a VBScript solution that takes the following approach:
>>> for a given folder, each RTF file is opened in MS-Word and saved
>>> as a text file which is opened and read then saved in an MS-Access
>>> database table containing 3 columns: id (AutoNumber), file, data.
>>>
>>> Using those 86 RTF files it created a 10MB MS-Access database.
>>>
>>
>> Yes, they are online. Yes, you can look at them. Yes, those are the
>> files except I only care about the 16 monster files. Yes, this is a
>> one time effort.
>>
>> My goal is to create a encounter generation program - where I key in
>> climate, geography, season, encounter level, time of day, proximity to
>> civilization, and the application gives me a suggested random
>> encounter suited to the scenario. For example, if the party was
>> wandering around the city sewers on a hot summer night, they might
>> encounter a pack of giant rats being led by a were rat.
>
> Only if
>
> 1/. It was los angeles
>
> 2/. They had all taken too many mind enhacing drugs.
>
> Otherwise its likely to be Viles disease, at the most interesting ;-)
>
>> I would then want the program to determine how many rats, how many hit
>> points each, and any other pertinent variable data, including what
>> weapons and treasure the wererat was carrying and using.
>>
>> Having the rtfs loaded into a database like your script does, would
>> enable faster searches, it would not go the next step and perform the
>> various calculations based on the results of the searches. It is a
>> good start, but if it has stripped any of the rtf encoding, it may
>> make it harder to have a script find the various 'fields'.
>>
>
> Go full database surely. The art is to define the 'monster' table with
> extensibility for all the monster classes one might encounter.
> When doing ANYTHING based on a database, the most important thing is to
> spend time designing table layouts. And write a data dictionary. And
> keep it up to date.
>

That I know. Can you recommend any software for documenting the
database design? Should I stick to ye old word processor?

Thanks, Carolyn

Report this message

#14: Re: text parsing

Posted on 2008-01-24 12:49:22 by Jerry Stuckle

Carolyn Marenger wrote:
> The Natural Philosopher wrote:
>> Carolyn Marenger wrote:
>>> McKirahan wrote:
>>>> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
>>>> news:7c0f$4795ea54$cf70133e$1079@PRIMUS.CA...
>>>>> McKirahan wrote:
>>>>>> "Carolyn Marenger" <cajunk@marenger.com> wrote in message
>>>>>> news:74fb1$479501d1$cf70133e$7458@PRIMUS.CA...
>>>>>>> Can someone point me in the direction of some good documentation on
>>>> text
>>>>>>> parsing?
>>>>>>>
>>>>>>> I want to take a bunch of text files (rtf), read them in and dump
>>>>>>> the
>>>>>>> contents in a database. The files are effectively a flat file
>>>> database,
>>>>>>> with I suspect some fairly intricate programming needed to
>>>>>>> process the
>>>>>>> files. Unfortunately, they are laid out for human readability, not
>>>> data
>>>>>>> conversion.
>>>>>> A few questions.
>>>>>>
>>>>>> How many is a "bunch"?
>>>>>> What would the target database be -- MySQL?
>>>>>> What table and column structures do you envision?
>>>>>> Perhaps simply a single table with two columns:
>>>>>> filename (key) and a memo field containing the data?
>>>>>> What is the purpose behind doing this?
>>>>>>
>>>>> A few answers
>>>>>
>>>>> A bunch is about a dozen. Basically one large file that was broken
>>>>> into
>>>>> sixteen subsets, following the initial letter for each record.
>>>>>
>>>>> The target database would be MySQL
>>>>>
>>>>> I haven't looked too closely at the data, but I think one main table
>>>>> with a few linked tables for those cases where there may be more than
>>>>> one piece of data for a category. There are about 25 categories to
>>>>> each
>>>>> record. Eventually there would be additional structure added
>>>>> around the
>>>>> imported data, but that isn't relevant to importing the data
>>>>> itself. (I
>>>>> will confirm this before beginning to code.
>>>>>
>>>>> The purpose: I am a D&D fan and I run games. I would like to be
>>>>> able to
>>>>> reference the material and automate much of the process so I don't
>>>>> have
>>>>> to lug and reference 20lbs of books.
>>>>
>>>> Any chance the RTF files are online so I could look at them?
>>>>
>>>> Perhaps http://www.wizards.com/default.asp?x=d20/article/srd35?
>>>> http://www.wizards.com/d20/files/v35/SRD.zip contains 88 RTF files.
>>>>
>>>>
>>>> Also, I gather, this might be a one-time effort; correct?
>>>>
>>>> Not what you requested but ...
>>>>
>>>> I've developed a VBScript solution that takes the following approach:
>>>> for a given folder, each RTF file is opened in MS-Word and saved
>>>> as a text file which is opened and read then saved in an MS-Access
>>>> database table containing 3 columns: id (AutoNumber), file, data.
>>>>
>>>> Using those 86 RTF files it created a 10MB MS-Access database.
>>>>
>>>
>>> Yes, they are online. Yes, you can look at them. Yes, those are the
>>> files except I only care about the 16 monster files. Yes, this is a
>>> one time effort.
>>>
>>> My goal is to create a encounter generation program - where I key in
>>> climate, geography, season, encounter level, time of day, proximity
>>> to civilization, and the application gives me a suggested random
>>> encounter suited to the scenario. For example, if the party was
>>> wandering around the city sewers on a hot summer night, they might
>>> encounter a pack of giant rats being led by a were rat.
>>
>> Only if
>>
>> 1/. It was los angeles
>>
>> 2/. They had all taken too many mind enhacing drugs.
>>
>> Otherwise its likely to be Viles disease, at the most interesting ;-)
>>
>>> I would then want the program to determine how many rats, how many
>>> hit points each, and any other pertinent variable data, including
>>> what weapons and treasure the wererat was carrying and using.
>>>
>>> Having the rtfs loaded into a database like your script does, would
>>> enable faster searches, it would not go the next step and perform the
>>> various calculations based on the results of the searches. It is a
>>> good start, but if it has stripped any of the rtf encoding, it may
>>> make it harder to have a script find the various 'fields'.
>>>
>>
>> Go full database surely. The art is to define the 'monster' table with
>> extensibility for all the monster classes one might encounter.
>> When doing ANYTHING based on a database, the most important thing is
>> to spend time designing table layouts. And write a data dictionary.
>> And keep it up to date.
>>
>
> That I know. Can you recommend any software for documenting the
> database design? Should I stick to ye old word processor?
>
> Thanks, Carolyn
>

Try comp.lang.mysql. They've got all kinds of suggestions on database
stuff there.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Report this message