Strip out CSS

Strip out CSS

am 19.07.2007 18:49:11 von M

When saving web pages, I'd like to strip out all CSS and just leave the raw
HTML intact. Some web developer toolbars will strip out the CSS, but for
some reason they won't let you save the page this way. Any tools that can do
this?

(PS: Yes, I know I could manually delete any style sheets but would like to
automate this process. Bonus points if it can strip out inline styles as
well.)

M

Re: Strip out CSS

am 19.07.2007 23:31:33 von dorayme

In article ,
"M" wrote:

> When saving web pages, I'd like to strip out all CSS and just leave the raw
> HTML intact. Some web developer toolbars will strip out the CSS, but for
> some reason they won't let you save the page this way. Any tools that can do
> this?
>
> (PS: Yes, I know I could manually delete any style sheets but would like to
> automate this process. Bonus points if it can strip out inline styles as
> well.)
>
> M

Give an example of one url you would like to do this to.

--
dorayme

Re: Strip out CSS

am 20.07.2007 01:43:21 von M

"dorayme" wrote in message
news:doraymeRidThis-CCE468.07313320072007@news-vip.optusnet. com.au...
> In article ,
> "M" wrote:

> Give an example of one url you would like to do this to.

Not sure why this is relevant but, hey, if it leads to something. . . As an
example:

http://niftytutorials.com/basics/transform-your-photos-into- a-beautiful-mosaic/1/

Essentially, I just want to save barebones articles with any relevant
images. I don't want Google ads, sidebars, irrelevant banner images, forms,
search boxes, background images, scripts, etc.

Sometimes the website is gracious enough to offer a print version which gets
rid of most of this stuff.

I have a Notetab script which does most of what I want but wanted to see if
something else out there is better at it.

M

Re: Strip out CSS

am 20.07.2007 02:11:54 von jmatt

On Jul 20, 12:49 am, "M" wrote:
> When saving web pages, I'd like to strip out all CSS and just leave the raw
> HTML intact. Some web developer toolbars will strip out the CSS, but for
> some reason they won't let you save the page this way. Any tools that can do
> this?

What browser are you using?

Re: Strip out CSS

am 20.07.2007 02:28:55 von dorayme

In article ,
"M" wrote:

> "dorayme" wrote in message
> news:doraymeRidThis-CCE468.07313320072007@news-vip.optusnet. com.au...
> > In article ,
> > "M" wrote:
>
> > Give an example of one url you would like to do this to.
>
> Not sure why this is relevant but, hey, if it leads to something. . . As an
> example:
>
> http://niftytutorials.com/basics/transform-your-photos-into- a-beautiful-mosaic
> /1/
>
> Essentially, I just want to save barebones articles with any relevant
> images. I don't want Google ads, sidebars, irrelevant banner images, forms,
> search boxes, background images, scripts, etc.
>
> Sometimes the website is gracious enough to offer a print version which gets
> rid of most of this stuff.
>
> I have a Notetab script which does most of what I want but wanted to see if
> something else out there is better at it.
>
> M

It is tricky to fashion a general facility to distinguish between
relevant and irrelevant images as you can imagine. Best I can
suggest is this, open in FF (equipped with free developer tools)
and turn off all css and probably javascript too. Save as
webpage. Open the saved in a browser. If too rich for you still,
just delete the associated folder which contains all the images
and other stuff, or inspect the folder and be rid things
selectively - but this is not what you want to do). I am afraid
there is nothing as intelligent as you for this job.

--
dorayme

Re: Strip out CSS

am 20.07.2007 03:15:09 von M

wrote in message
news:1184890314.971583.226820@m37g2000prh.googlegroups.com.. .
> On Jul 20, 12:49 am, "M" wrote:
>> When saving web pages, I'd like to strip out all CSS and just leave the
>> raw
>> HTML intact. Some web developer toolbars will strip out the CSS, but for
>> some reason they won't let you save the page this way. Any tools that can
>> do
>> this?
>
> What browser are you using?

Normally I use FF, but I'd use IE if there was a tool for it.

M

Re: Strip out CSS

am 20.07.2007 03:48:52 von jmatt

On Jul 20, 9:15 am, "M" wrote:

> Normally I use FF, but I'd use IE if there was a tool for it.

Have you tried Greasemonkey?

https://addons.mozilla.org/en-US/firefox/addon/748

http://www.greasespot.net/

Re: Strip out CSS

am 20.07.2007 04:27:12 von jmatt

On Jul 20, 12:49 am, "M" wrote:

This may give you what you want.

At the top of the Firefox browser select VIEW, PAGE STYLE, then NO
STYLE.

This will strip any web page that you're viewing of all CSS styling.

Re: Strip out CSS

am 20.07.2007 06:16:11 von M

wrote in message
news:1184898432.752744.38660@d30g2000prg.googlegroups.com...
> On Jul 20, 12:49 am, "M" wrote:
>
> This may give you what you want.
>
> At the top of the Firefox browser select VIEW, PAGE STYLE, then NO
> STYLE.
>
> This will strip any web page that you're viewing of all CSS styling.

Yes, I know. However when you save the de-"css"-esified page, all the CSS is
still saved with it. When you open the saved page again, all the CSS shows
up again. It's the same with the web developer toolbars -- they let you
turn off the CSS to view the page but they don't let you save the modified
page. :(

M

Re: Strip out CSS

am 20.07.2007 06:22:16 von jmatt

On Jul 20, 12:49 am, "M" wrote:

Have a look at this one.

Stylish
https://addons.mozilla.org/en-US/firefox/addon/2108
https://addons.mozilla.org/en-US/firefox/search?q=style&stat us=4
http://dev.upian.com/hotlinks/tag/greasemonkey?tag=greasemon key&n=4
Firefox Extension for managing user styles - Stylish allows you to
easily manage user styles for the application UI, all websites, or
only certain websites. Stylish is better than using userChrome.css/
userContent.css because styles are applied immediately instead of
requiring a restart.
Stylish is to CSS what Greasemonkey is to JavaScript. Stylish allows
you to easily manage user styles for the application UI, all websites,
or only certain websites. Stylish is better than using userChrome.css/
userContent.css because styles are applied immediately instead of
requiring a restart.

Re: Strip out CSS

am 20.07.2007 06:50:59 von Susan Bugher

M wrote:

> When saving web pages, I'd like to strip out all CSS and just leave the raw
> HTML intact. Some web developer toolbars will strip out the CSS, but for
> some reason they won't let you save the page this way. Any tools that can do
> this?

I have a hunch those toolbars don't "strip out" anything. ISTM more
likely they just ignore it.

copied from another post:

"Essentially, I just want to save barebones articles with any relevant
images. I don't want Google ads, sidebars, irrelevant banner images,
forms, search boxes, background images, scripts, etc."

Have you looked at Net Picker?

Program: Net Picker
Company: 100share.com
Ware: (Freeware)
http://www.netpicker.net/
http://www.netpicker.net/netpicker.html

"NetPicker allows you to select and save a portion of the web page by
dragging it from your browser. NetPicker can save all the useful format
like image, table or font style, and organize your collection in a vivid
tree structure. You can even write down your comments in the original
page at any time. you can drag each item node on the tree view to a
new position for a better arrangement. Select NEW to insert a new item;
NEW SUBITEM to add a subitem; Press F2 to edit the item title."

Susan
--
Posted to alt.comp.freeware
Search alt.comp.freeware (or read it online):
http://www.google.com/advanced_group_search?q=+group:alt.com p.freeware
Pricelessware & ACF: http://www.pricelesswarehome.org
Pricelessware: http://www.pricelessware.org (not maintained)

Re: Strip out CSS

am 20.07.2007 07:06:02 von dorayme

In article ,
"M" wrote:

> wrote in message
> news:1184898432.752744.38660@d30g2000prg.googlegroups.com...
> > On Jul 20, 12:49 am, "M" wrote:
> >
> > This may give you what you want.
> >
> > At the top of the Firefox browser select VIEW, PAGE STYLE, then NO
> > STYLE.
> >
> > This will strip any web page that you're viewing of all CSS styling.
>
> Yes, I know. However when you save the de-"css"-esified page, all the CSS is
> still saved with it. When you open the saved page again, all the CSS shows
> up again. It's the same with the web developer toolbars -- they let you
> turn off the CSS to view the page but they don't let you save the modified
> page. :(
>
> M

See my post, css did not activate in the saved html using the
technique I outlined.

--
dorayme

Re: Strip out CSS

am 20.07.2007 07:32:45 von jmatt

On Jul 20, 12:49 am, "M" wrote:

Another to look at.

CSS Spy
http://www.softpedia.com/get/Internet/WEB-Design/HTML-Editor s/CSS-Spy.shtml
http://www.aktive.com.ar/aktive/default.aspx?SC=SOFT&ID=CSSS PY&lang=en

Re: Strip out CSS

am 20.07.2007 09:01:35 von jmm-list-gn

M wrote:
>
>> Give an example of one url you would like to do this to.
>
> Not sure why this is relevant but, hey, if it leads to something. . . As an
> example:
>
> http://niftytutorials.com/basics/transform-your-photos-into- a-beautiful-mosaic/1/
>
> Essentially, I just want to save barebones articles with any relevant
> images. I don't want Google ads, sidebars, irrelevant banner images, forms,
> search boxes, background images, scripts, etc.
>
CSS is the least of the problem, then. In most cases you can ignore
anything between , or style="inline_syling". Poof! No CSS!
But the rest of the stuff? I doubt you'll find anything that can
distinguish between a "desirable" image and an "undesirable" one.
You can reduce the amount of crud received by the browser be using a
filtering proxy like Squid.

--
jmm (hyphen) list (at) sohnen-moe (dot) com
(Remove .AXSPAMGN for email)

Re: Strip out CSS

am 20.07.2007 09:08:40 von Ben C

On 2007-07-19, M wrote:
> "dorayme" wrote in message
> news:doraymeRidThis-CCE468.07313320072007@news-vip.optusnet. com.au...
>> In article ,
>> "M" wrote:
>
>> Give an example of one url you would like to do this to.
>
> Not sure why this is relevant but, hey, if it leads to something. . . As an
> example:
>
> http://niftytutorials.com/basics/transform-your-photos-into- a-beautiful-mosaic/1/
>
> Essentially, I just want to save barebones articles with any relevant
> images. I don't want Google ads, sidebars, irrelevant banner images, forms,
> search boxes, background images, scripts, etc.
>
> Sometimes the website is gracious enough to offer a print version which gets
> rid of most of this stuff.
>
> I have a Notetab script which does most of what I want but wanted to see if
> something else out there is better at it.

If you want to get a lot of stuff out of one particular site a script
using curl and BeautifulSoup (which is a Python module) may be the way
to go, especially if the content has class or id attributes in it that
you can use to latch onto the bits you want.

I use this method for TV listings and traffic news.

Re: Strip out CSS

am 20.07.2007 18:19:50 von M

"dorayme" wrote in message
news:doraymeRidThis-1A16F6.10285520072007@news-vip.optusnet. com.au...
> In article ,
> "M" wrote:

> Best I can
> suggest is this, open in FF (equipped with free developer tools)
> and turn off all css and probably javascript too. Save as
> webpage.

I did this (via the View | Page Style | No style) but FF still saves with
the CSS intact. When you open the saved page, there is all the CSS again. Am
I doing this wrong?

Open the saved in a browser. If too rich for you still,
> just delete the associated folder which contains all the images
> and other stuff,

What I have been doing combined with Notetab text editing and Scrapbook's
DOM editor. It would be nice to have one easy to use tool to do all this. (I
sometimes use Amaya for very busy pages. . .)

M

Re: Strip out CSS

am 20.07.2007 18:19:50 von M

"Susan Bugher" wrote in message
news:5gat6bF3fhqmpU1@mid.individual.net...
>M wrote:
>

> Have you looked at Net Picker?

I have used it. IIRC it converts everything to HTML 3.2. Also not sure that
it would be any quicker than what I'm doing now, what with all the selective
dragging and dropping.

M

Re: Strip out CSS

am 20.07.2007 18:19:50 von M

wrote in message
news:1184909565.979162.291680@e16g2000pri.googlegroups.com.. .
> On Jul 20, 12:49 am, "M" wrote:
>
> Another to look at.
>
> CSS Spy

I'm unclear from the description. Does it strip out CSS in bulk? And does it
deal with scripts, iframes, ad tables, etc?

M

Regular expression evaluator [Was Re: Strip out CSS]

am 20.07.2007 18:19:51 von M

I thank all for some of your suggestions but most of them deal with CSS and
not the bigger issue of scripts, ads, irrelevant sidebars (tables or divs),
etc. Maybe I'm coming at this the wrong way.

As I mentioned, Notetab's script language does most stuff for me. In order
to strip out CSS though I need to strip out phrases like:
id="something"
class="something"
style="bunch of css attributes"

I've been playing around with Notetab's (v4.95) regular expression search
and replace but I can't seem to find a combination that finds the above
expressions.

Is there a regular expression program that will break this down for me? For
example, the program RegEx Coach lets you enter your text, then test various
regular expressions. The results are highlighted in real time in the text
you entered.

I need something that works IN REVERSE. i.e. I enter text, highlight the
expression I want removed, then it tells me the regular expression needed to
achieve that.

Anything like that out there?

(PS, yes, I know that removing either the stylesheet or the embedded styles
will render any id and class calls irrelevant. However, there are times when
I need them intact, so it would be nice to have the option. . .)

M

Re: Regular expression evaluator [Was Re: Strip out CSS]

am 20.07.2007 18:35:18 von Ben C

On 2007-07-20, M wrote:
> I thank all for some of your suggestions but most of them deal with CSS and
> not the bigger issue of scripts, ads, irrelevant sidebars (tables or divs),
> etc. Maybe I'm coming at this the wrong way.
>
> As I mentioned, Notetab's script language does most stuff for me. In order
> to strip out CSS though I need to strip out phrases like:
> id="something"
> class="something"
> style="bunch of css attributes"

> I've been playing around with Notetab's (v4.95) regular expression search
> and replace but I can't seem to find a combination that finds the above
> expressions.

(style|id|class)=".*?"

is your basic regexp for that in PCRE, which I think is what Notetab
uses. Not too difficult.

It reads 'style or id or class followed by =" and then everything up to
the next "'

> Is there a regular expression program that will break this down for me? For
> example, the program RegEx Coach lets you enter your text, then test various
> regular expressions. The results are highlighted in real time in the text
> you entered.
>
> I need something that works IN REVERSE. i.e. I enter text, highlight the
> expression I want removed, then it tells me the regular expression needed to
> achieve that.

That's very difficult for the program to know-- there are a vast number
of ways to match a given bit of highlighted text, how is the program
supposed to know which of them you want?

> Anything like that out there?

Honestly it's easier just to read the manual. The Python docs have a
very clear explanation of PCRE syntax.

http://docs.python.org/lib/re-syntax.html

Re: Regular expression evaluator

am 20.07.2007 19:59:08 von unknown

Post removed (X-No-Archive: yes)

Re: Regular expression evaluator [Was Re: Strip out CSS]

am 20.07.2007 22:09:17 von cfajohnson

On 2007-07-20, M wrote:
> I thank all for some of your suggestions but most of them deal with CSS and
> not the bigger issue of scripts, ads, irrelevant sidebars (tables or divs),
> etc. Maybe I'm coming at this the wrong way.
>
> As I mentioned, Notetab's script language does most stuff for me. In order
> to strip out CSS though I need to strip out phrases like:
> id="something"
> class="something"
> style="bunch of css attributes"
>
> I've been playing around with Notetab's (v4.95) regular expression search
> and replace but I can't seem to find a combination that finds the above
> expressions.
>
> Is there a regular expression program that will break this down for me? For
> example, the program RegEx Coach lets you enter your text, then test various
> regular expressions. The results are highlighted in real time in the text
> you entered.

I have no idea how standard notepad's regular expression syntax is,
but this would match embedded style in *nix utilities:

style="[^"]*"

For example, with sed, this will remove all so long as there are no
quotes within the style themselves:

sed 's/style="[^"]*"//' index.html > newindex.html

> I need something that works IN REVERSE. i.e. I enter text, highlight the
> expression I want removed, then it tells me the regular expression needed to
> achieve that.
>
> Anything like that out there?

No. If you wanted to match the 12345 in abc12345def, the regex
could be any of:

abc\(123[0-9]5*\)def
abc\(1234[0-9]*\)def
abc\([0-9]*\)def
[a-z][a-z][a-z]\([0-9]*\)[a-z][a-z][a-z]
[a-z]bc\([0-9]*\)de[a-z]

... and an infinite number of other expressions.


--
Chris F.A. Johnson
============================================================ =======
Author:
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)

Re: Strip out CSS

am 20.07.2007 22:29:34 von dorayme

In article ,
"M" wrote:

> "dorayme" wrote in message
> news:doraymeRidThis-1A16F6.10285520072007@news-vip.optusnet. com.au...
> > In article ,
> > "M" wrote:
>
> > Best I can
> > suggest is this, open in FF (equipped with free developer tools)
> > and turn off all css and probably javascript too. Save as
> > webpage.
>
> I did this (via the View | Page Style | No style) but FF still saves with
> the CSS intact. When you open the saved page, there is all the CSS again. Am
> I doing this wrong?
>

I did it via the developer tools menu, perhaps that was the
difference.

--
dorayme

Re: Regular expression evaluator [Was Re: Strip out CSS]

am 20.07.2007 23:08:15 von Neredbojias

Well bust mah britches and call me cheeky, on Fri, 20 Jul 2007 16:19:51
GMT M scribed:

> I thank all for some of your suggestions but most of them deal with
> CSS and not the bigger issue of scripts, ads, irrelevant sidebars
> (tables or divs), etc. Maybe I'm coming at this the wrong way.
>
> As I mentioned, Notetab's script language does most stuff for me. In
> order to strip out CSS though I need to strip out phrases like:
> id="something"
> class="something"
> style="bunch of css attributes"
>
> I've been playing around with Notetab's (v4.95) regular expression
> search and replace but I can't seem to find a combination that finds
> the above expressions.
>
> Is there a regular expression program that will break this down for
> me? For example, the program RegEx Coach lets you enter your text,
> then test various regular expressions. The results are highlighted in
> real time in the text you entered.
>
> I need something that works IN REVERSE. i.e. I enter text, highlight
> the expression I want removed, then it tells me the regular expression
> needed to achieve that.
>
> Anything like that out there?
>
> (PS, yes, I know that removing either the stylesheet or the embedded
> styles will render any id and class calls irrelevant. However, there
> are times when I need them intact, so it would be nice to have the
> option. . .)

Why not change "" and to opening and closing
comment delimiters, respectively, then just reverse "class" and "id" for
all inline styles? Of course, the html itself (and j/s) would have to be
devoid of "id" calls.

--
Neredbojias
Half lies are worth twice as much as whole lies.

Re: Regular expression evaluator [Was Re: Strip out CSS]

am 20.07.2007 23:10:36 von Neredbojias

Well bust mah britches and call me cheeky, on Fri, 20 Jul 2007 20:09:17
GMT Chris F.A. Johnson scribed:

> On 2007-07-20, M wrote:
>> I thank all for some of your suggestions but most of them deal with
>> CSS and not the bigger issue of scripts, ads, irrelevant sidebars
>> (tables or divs), etc. Maybe I'm coming at this the wrong way.
>>
>> As I mentioned, Notetab's script language does most stuff for me. In
>> order to strip out CSS though I need to strip out phrases like:
>> id="something"
>> class="something"
>> style="bunch of css attributes"
>>
>> I've been playing around with Notetab's (v4.95) regular expression
>> search and replace but I can't seem to find a combination that finds
>> the above expressions.
>>
>> Is there a regular expression program that will break this down for
>> me? For example, the program RegEx Coach lets you enter your text,
>> then test various regular expressions. The results are highlighted in
>> real time in the text you entered.
>
> I have no idea how standard notepad's regular expression syntax is,
> but this would match embedded style in *nix utilities:
>
> style="[^"]*"

That would have been my suggestion but I don't think Notetab's regexes work
the same way.

--
Neredbojias
Half lies are worth twice as much as whole lies.

Re: Regular expression evaluator [Was Re: Strip out CSS]

am 21.07.2007 00:11:59 von M

"Neredbojias" wrote in message
news:Xns99738FD2ABB56nanopandaneredbojias@198.186.190.161...

> Why not change "" and to opening and closing
> comment delimiters, respectively,

That's an idea. . .

> then just reverse "class" and "id" for
> all inline styles? Of course, the html itself (and j/s) would have to be
> devoid of "id" calls.

Sorry, not getting what you mean here.

M

Re: Regular expression evaluator [Was Re: Strip out CSS]

am 21.07.2007 02:39:45 von Neredbojias

Well bust mah britches and call me cheeky, on Fri, 20 Jul 2007 22:11:59
GMT M scribed:

> "Neredbojias" wrote in message
> news:Xns99738FD2ABB56nanopandaneredbojias@198.186.190.161...
>
>> Why not change "" and to opening and
>> closing comment delimiters, respectively,
>
> That's an idea. . .
>
>> then just reverse "class" and "id" for
>> all inline styles? Of course, the html itself (and j/s) would have
>> to be devoid of "id" calls.
>
> Sorry, not getting what you mean here.

Well, it is a, er, "stretching" (half-baked) idea, but if you use regexes
to change "class" to "id" and vice-versa, there'll be no css with those
(renamed) names and...

Yeah. It sounded good before.

--
Neredbojias
Half lies are worth twice as much as whole lies.