New Module: Text::Stripper
am 14.06.2007 23:05:43 von Marcus BeranekHi all,
I have written a small module, which i would like to put on CPAN.
I'm using this module (well, actually it's just a single function) very
often. So, if someone else is interessted in, here're the details:
NAME
Text::Stripper - a module for shortening text
DESCRIPTION
Text::Stripper shortens text and avoids cutting the text in the middle
of a word.
DETAILS
Motivation
There may be situations, when you have a reasonably long text in your
perl-application, which should be displayed to the user. But you may not
want to print out all of the text, because it would consume too much
space of your screen. So, you might want to display a shortened version
of the information, and let the user decide, if he wants to view the
full text or not.
In many cases, a "print substr($text, 0, 50).'...';" will be sufficient.
Unfortunatly nearly all uses of the above example will cut your text in
the middle of a word. So you might get phrases saying "This is an a..."
or similar. For most users, this kind of text-stripping is hard to read
and also offers some space for misinterpreting the cutted word.
A cleaner solution for the user is to print out "This is an..." or "This
is an abstract...". This way, the user doesn't get confused about
wondering what the "a..." stands for. This is where Text::Stripper comes
in.
The stripof-function
The module Text::Stripper consists of a single function named "stripof".
You can give "stripof" a text, a "length" and a "tolerance", and it will
give you a text shortend to at least "length" characters, with at
maximum "tolerance" characters more to complete the next word(s).
Breakpoints
The "stripof"-function tries to find all possible "breakpoints" in the
text and cuts the text at an apropriate position. It consideres the
following characters as "breakpoints":
' ', '\t', '.', ',', ';', ':', '!', '-', '?', '\n', '\r',
'/', '|', '(', ')'
Modes
There are two modes, in which the stripof-function may operate:
*maximum-mode*: try to find the latest possible breakpoint
*minimum-mode*: try to find the first possible breakpoint
See the examples-section for more details.
Optionally you can tell "stripof" to add three dots at the end of the
text, to indicate that the text was shortend.
EXAMPLES
use Text::Stripper qw(stripof);
my $text = "Lorem ipsum dolor sit amet, consectetur, adipisci velit";
print stripof( $text, 30, 10, 1, 1 );
# prints "Lorem ipsum dolor sit amet, consectetur..."
# min. 30 chars., max. 40 chars., use last breakpoint, add dots
print stripof( $text, 25, 14, 1, 1 );
# prints "Lorem ipsum dolor sit amet,..."
# min. 25 chars., max. 39 chars., use last breakpoint, add dots
print stripof( $text, 20, 10, 0, 1 );
# prints "Lorem ipsum dolor sit..."
# min. 20 chars., max. 10 chars., use first breakpoint, add dots
If you want to give it a try, the tarball is here:
http://www.beranek.de/downloads/Text-Stripper-1.16.tar.gz
Any comments are welcome.
Best regards,
Marcus