[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sheflug] quick Reg exp exercise



And Lo! The Great Prophet "Chris Johnson" uttered these words of wisdom:
> 
> The first one is that all html tags should be lowercase, at least that's
> what I think the W3C error is getting at.  Is there a reg exp that will
> search through all my files and exchange <XX> for <xx> and </XX> for </xx>
> without messing up variable names (is PHP case sensitive) (eg <XX
> color=$MyColour> I know colour should be set in css files but I couldn't
> think of another example)
> 
> I could do <XX> for <xx> but not doing the $XxXxx as well will be testing.
> 

I had to revert to perl, as I couldn't come up with a suitable sed or awk 
alternative -- perl allows functions in the replace string, so it's a 
simple job of grouping the thing to lowercase and call lc(). What I'll try 
and do if I get bored is a vim macro to do summat similar[1] :)

There's three substitutions:
	- The first matches <ELEMENT ...>           i.e., up to the space.
	- The second matches <ELEMENT>              no attributes
	- The third matches </ELEMENT>              closing attributes


#!/usr/bin/perl

while (<>) {
   s/<([A-Za-z0-9]*) /"<" . lc($1) . " "/ge;
   s/<([A-Za-z0-9]*)>/"<" . lc($1) . ">"/ge;
   s/<\/([A-Za-z0-9]*)>/"<\/" . lc($1) . ">"/ge;
   print;
}
## End of script

Save, chmod, and call as:
	/path/to/script.pl < input.html > output.html

Caveats: This is the first perl I've written in about a year, so there may 
be an even more elegant solution. Secondly, I don't know if I've caught all 
the cases when an element needs to be caught (e.g., if the space is a tab, 
then that element won't change - but that's a simple thing to rectify and 
left as an exercise for the reader, ditto another to catch the "/>" 
terminator in XML).

Enjoy,

Chris...
[1] although that said, saving this as a script (e.g., golow.pl) and 
doing :%!./golow.pl within vim will do much the same thing...

-- 
\ Chris Johnson                 \ NP: Icon of Coil - Floorkiller
 \ cej [at] nightwolf.org.uk          \  
  \ http://cej.nightwolf.org.uk/  \ 
   \ http://redclaw.org.uk/        ~---------------------------------------



___________________________________________________________________

Sheffield Linux User's Group -
http://www.sheflug.co.uk/mailfaq.html

  GNU the choice of a complete generation.