AW: How to use replaceregexp in multi-line context?

Discussion:

Oliver Ashoff

2006-05-17 10:41:01 UTC

Hi David!

Inserting a new line




in the regular expression so that
line endings/transitions are matched
may help.
After that my multi-line regexp work.

I wonder if the ant regexp multi-line mode
is platform dependent as far as the line endings
are concerned?
Or if I missed something. :)

Regards,

Oliver

-----Ursprüngliche Nachricht-----
Gesendet: Mittwoch, 17. Mai 2006 11:59
An: Ant Apache User Group
Betreff: How to use replaceregexp in multi-line context?
Dear members,
I would like to use the ant task replaceregexp in order to
replace tablespace information on my sql scripts, my input
CREATE INDEX IX_2_lra_country
PCTFREE 10
INITRANS 2
MAXTRANS 255
STORAGE (
INITIAL 524288
NEXT 524288
PCTINCREASE 0
MINEXTENTS 2
MAXEXTENTS 2147483645
)
;
I have found the following regular expresion will match
replace all of this with the semi-colon symbol(";").
<replaceregexp byline = "false" flags = "g"
file="${sql.dir}/oracle/lra-create-index-oracle.sql"
replace=";">
</replaceregexp>
When I test this regular expression it works (but not under
CREATE INDEX IX_2_lra_country
It doesn't work with Ant because the regular expresion to
find is spread in more than one line. If I put all in one
line it matches the regular expression. I don't understand
why because the \s includes the newline character too. I have
tried to use the flag="sg" without success too and also to
set byline="true".
Do you have any suggestion about that?,
Thanks in advance,
David
---------------------------------
Love cheap thrills? Enjoy PC-to-Phone calls to 30+ countries
for just 2¢/min with Yahoo! Messenger with Voice.

Oliver Ashoff

2006-05-18 16:42:57 UTC

Permalink

Hello David!

Sorry, my explanation was a little bit short, I guess. ;)

I wondered --- as you --- that the replaceregexp works fine in
(1) one-line-mode
but not in
(2) multi-line mode.

So, let's see what is the difference of the inputs!
Outline:

(1) <any character sequence without any line feed><an optional
line feed>

(2) <any character sequence without any line feed><a line feed>
<any character sequence without any line feed><a line feed>
....
<any character sequence without any line feed><an optional
line feed>

So, I thought the 'line feed's could be the problem because not all line
feeds are equal :)
To be more precise, not all 'line end markers' (charcters or character
sequences that mark the end of a line) are equal.
For an explanation for 'line feed' see
http://en.wikipedia.org/wiki/Line_Feed

How to insert a 'line feed' in a regular expression?

I tried --- as you--- '\n'. But that did not help.

The problem could be that the line end marker used in the input are 2
characters!
For example: <carriage return><line feed>=<ASCII 13><ASCII 10>

Hence, inserting an extra ASCII 10 encoded as "
" in the regular
expression solved
the problem! Hurra! :)

For 'Numerical Character References' see
http://www.w3.org/MarkUp/html3/latin1.html (
 Line
feed)

Example: Consider the following lines of an input file:

<entry name="statistics.enabled">
<value>true</value>
</entry>

Now, we want to replace the string "true" by "false". But this should be
done
only for the entry "statistics.enabled". So, I use a regular expression
that
matches the above three line:

<replaceregexp byline="false" flags="m">
<regexp
pattern="(.*<entry.*name="statistics.enabled".*>.*
.*&
lt;value>).*(</value>.*
.*</entry>.*$)" />
<substitution expression="\1false\2" />
<fileset dir="${etc.dir}" includes="config.xml" />
</replaceregexp>

As you can see, I inserted 2 times the sequences


so that the 'line breaks' are recognized. Without that it does not work.
Additionally, I use further 'Numerical Character References' for the
characters '<' and '>'.

I guess, you got now the crucial point. ;)
I did not investigate your regular expression because I mean that you
know you regular expression
and you only missed the trick with the "
" character seqence to
insert.
I dont know if there is an other solution, perhaps a smarter one.
But this is at least an acceptable work-around for me.

If you dont succeed let me know. Then, I try to give you further
assitance... ;)

Cheers, Oliver

________________________________

Von: David [mailto:***@yahoo.es]
Gesendet: Donnerstag, 18. Mai 2006 16:50
An: Oliver Ashoff; Ant Apache User Group
Betreff: Re:How to use replaceregexp in multi-line context?

Dear Oliver,

Thanks for your interest on my problem. Concerning to your
comment, I don't understand wery will, please could you be a little bit
more explicit.

As far as I understand, I think you mean to include the new line
character on the match expression. I was tested this too, without
success, so:

<replaceregexp byline = "false" flags = "g"
file="${sql.dir}/oracle/lra-create-index-oracle.sql"
match="^[\s]*PCTFREE[\sa-zA-Z0-9@\n]*;"
replace=";">

doesn't work, so adding the \n character. I have tested to using
the java property
${line.separator}, so:

<replaceregexp byline = "false" flags = "g"
file="${sql.dir}/oracle/lra-create-index-oracle.sql"

match="^[\s]*PCTFREE[\sa-zA-Z0-9@${line.separator}]*;"
replace=";">

both solutions compiles with Ant, but the input file doesn't
change.

I have a simple example that work on multi-line context, but I
don't have to specify the list of allowed characters on the match
expression:

<replaceregexp byline = "false" flags = "gs">
<regexp pattern = "${CVI.begin}(.*)${CVI.end}"/>
<substitution expression =
"${CVI.begin}${nl}${CVI.body.java}${CVI.end}"/>
<fileset dir = ".">
<exclude name="**/*.properties"/>
<patternset refid = "java.patternset"/>
</fileset>
</replaceregexp>

where:
CVI.begin = @BEGIN_CONTROL_VERSION_INFO@
CVI.end = @END_CONTROL_VERSION_INFO@

and ${nl} = ${line.separator}, with this peace of code the
delete the contains of the CVI block code, for example:

@BEGIN_CONTROL_VERSION_INFO@
Control Version Information

========================================================================
========
$Log: DynamicInstance.java,v $
Revision 1.4 2004/10/04 19:24:03 UF367151
Checkstyle test passed.
Revision 1.3 2004/09/14 17:56:48 UF367151

========================================================================
========
@END_CONTROL_VERSION_INFO@

for this case is easy because the end token is at new line and
it is a string instead of character like my case(";"), so we can specify
the "s" flag and "eat" every thing with .* pattern (including the new
line because the "s" option stays that)

Please let me know any suggestion about that,

Thanks,

David

________________________________

Be a chatter box. Enjoy free PC-to-PC calls
<http://us.rd.yahoo.com/mail_us/taglines/postman12/*http://us.rd.yahoo.c
om/evt=39663/*http://messenger.yahoo.com> with Yahoo! Messenger with
Voice.

Brian Agnew

2006-05-18 17:01:16 UTC

Permalink

If you're trying to replace stuff in XML but only for particular nodes
(e.g. in the below you're scoping on an attribute value) then I'd suggest:

http://www.oopsconsultancy.com/software/xmltask/

and doing:
<replace path="/entry[@name='statistics.enabled']/value/text()"
withText="true"/>

Brian

Post by Oliver Ashoff
Hello David!
<entry name="statistics.enabled">
<value>true</value>
</entry>
Now, we want to replace the string "true" by "false". But this should be
done
only for the entry "statistics.enabled". So, I use a regular expression
that
<replaceregexp byline="false" flags="m">
<regexp
<substitution expression="\1false\2" />
<fileset dir="${etc.dir}" includes="config.xml" />
</replaceregexp>
As you can see, I inserted 2 times the sequences


so that the 'line breaks' are recognized. Without that it does not work.
Additionally, I use further 'Numerical Character References' for the
characters '<' and '>'.

--
Brian Agnew http://www.oopsconsultancy.com
OOPS Consultancy Ltd brian @ oopsconsultancy.com
Tel: +44 (0)7720 397526
Fax: +44 (0)20 8682 0012

David

2006-05-19 13:52:36 UTC

Permalink

Dear Oliver,

Your solution works!!!!!, so my regular expression:

<replaceregexp byline = "false" flags = "mg"
file="${sql.dir}/oracle/lra-create-index-oracle.sql"
match="^[\s]*PCTFREE[\sa-zA-Z0-9@
]*;"
replace=";">
</replaceregexp>

eats the multi-lines.

I don't understand well why you need to put two times the ) 
 as you have commented. In my cas it works as above.

Thank a lot,

David

Oliver Ashoff <***@xtramind.com> wrote:
Hello David!

Sorry, my explanation was a little bit short, I guess. ;)

I wondered --- as you --- that the replaceregexp works fine in
(1) one-line-mode
but not in
(2) multi-line mode.

So, let's see what is the difference of the inputs!
Outline:

(1) <any character sequence without any line feed><an optional line feed>

(2) <any character sequence without any line feed><a line feed>
<any character sequence without any line feed><a line feed>
....
<any character sequence without any line feed><an optional line feed>

So, I thought the 'line feed's could be the problem because not all line feeds are equal :)
To be more precise, not all 'line end markers' (charcters or character sequences that mark the end of a line) are equal.
For an explanation for 'line feed' see http://en.wikipedia.org/wiki/Line_Feed

How to insert a 'line feed' in a regular expression?

I tried --- as you--- '\n'. But that did not help.

The problem could be that the line end marker used in the input are 2 characters!
For example: <carriage return><line feed>=<ASCII 13><ASCII 10>

Hence, inserting an extra ASCII 10 encoded as "
" in the regular expression solved
the problem! Hurra! :)

For 'Numerical Character References' see http://www.w3.org/MarkUp/html3/latin1.html (
 Line feed)

Example: Consider the following lines of an input file:

<entry name="statistics.enabled">
<value>true</value>
</entry>

Now, we want to replace the string "true" by "false". But this should be done
only for the entry "statistics.enabled". So, I use a regular expression that
matches the above three line: <replaceregexp byline="false" flags="m"> <regexp pattern="(.*<entry.*name="statistics.enabled".*>.*
.*<value>).*(</value>.*
.*</entry>.*$)" />
<substitution expression="\1false\2" />
<fileset dir="${etc.dir}" includes="config.xml" />
</replaceregexp>

As you can see, I inserted 2 times the sequences


so that the 'line breaks' are recognized. Without that it does not work.
Additionally, I use further 'Numerical Character References' for the characters '<' and '>'.

I guess, you got now the crucial point. ;)
I did not investigate your regular expression because I mean that you know you regular expression
and you only missed the trick with the "
" character seqence to insert.
I dont know if there is an other solution, perhaps a smarter one.
But this is at least an acceptable work-around for me.

If you dont succeed let me know. Then, I try to give you further assitance... ;)

Cheers, Oliver

---------------------------------
Von: David [mailto:***@yahoo.es]
Gesendet: Donnerstag, 18. Mai 2006 16:50
An: Oliver Ashoff; Ant Apache User Group
Betreff: Re:How to use replaceregexp in multi-line context?

Dear Oliver,

Thanks for your interest on my problem. Concerning to your comment, I don't understand wery will, please could you be a little bit more explicit.

As far as I understand, I think you mean to include the new line character on the match expression. I was tested this too, without success, so:

<replaceregexp byline = "false" flags = "g"
file="${sql.dir}/oracle/lra-create-index-oracle.sql"
match="^[\s]*PCTFREE[\sa-zA-Z0-9@\n]*;"
replace=";">

doesn't work, so adding the \n character. I have tested to using the java property
${line.separator}, so:

<replaceregexp byline = "false" flags = "g"
file="${sql.dir}/oracle/lra-create-index-oracle.sql"
match="^[\s]*PCTFREE[\sa-zA-Z0-9@${line.separator}]*;"
replace=";">

both solutions compiles with Ant, but the input file doesn't change.

I have a simple example that work on multi-line context, but I don't have to specify the list of allowed characters on the match expression:

<replaceregexp byline = "false" flags = "gs">
<regexp pattern = "${CVI.begin}(.*)${CVI.end}"/>
<substitution expression = "${CVI.begin}${nl}${CVI.body.java}${CVI.end}"/>
<fileset dir = ".">
<exclude name="**/*.properties"/>
<patternset refid = "java.patternset"/>
</fileset>
</replaceregexp>

where:
CVI.begin = @BEGIN_CONTROL_VERSION_INFO@
CVI.end = @END_CONTROL_VERSION_INFO@

and ${nl} = ${line.separator}, with this peace of code the delete the contains of the CVI block code, for example:

@BEGIN_CONTROL_VERSION_INFO@
Control Version Information
================================================================================
$Log: DynamicInstance.java,v $
Revision 1.4 2004/10/04 19:24:03 UF367151
Checkstyle test passed.
Revision 1.3 2004/09/14 17:56:48 UF367151
================================================================================
@END_CONTROL_VERSION_INFO@

for this case is easy because the end token is at new line and it is a string instead of character like my case(";"), so we can specify the "s" flag and "eat" every thing with .* pattern (including the new line because the "s" option stays that)

Please let me know any suggestion about that,

Thanks,

David

---------------------------------
Be a chatter box. Enjoy free PC-to-PC calls with Yahoo! Messenger with Voice.

---------------------------------
Love cheap thrills? Enjoy PC-to-Phone calls to 30+ countries for just 2¢/min with Yahoo! Messenger with Voice.

---------------------------------
Sneak preview the all-new Yahoo.com. It's not radically different. Just radically better.