Regex problem. Finding the file name embedded in a windows path

Oxygen general issues.
BenDupre
Posts: 4
Joined: Mon Jun 08, 2020 9:02 pm

Regex problem. Finding the file name embedded in a windows path

Post by BenDupre »

Hello,

I am using this regex
[^\\]+$
with an xpath
//Image/@file
to find file names embedded in full Windows style paths contained in the file attribute of the Image elements in my document.
There are 26 of these in the doc. oXygen only returns one of them when the FIND ALL button is pressed.
Any idea why it's behaving like this?
Ben Dupre
"The greatest problem with communication is the illusion that it has been achieved." -- GB Shaw
Radu
Posts: 9283
Joined: Fri Jul 09, 2004 5:18 pm

Re: Regex problem. Finding the file name embedded in a windows path

Post by Radu »

Hi Ben,

From what I remember using XPath epressions like "//Image/@file" in the Find/Replace dialog creates certain filtered intervals in which the find should be performed. The filtered intervals include the attribute name, quotes and values. So for an XML element like:

Code: Select all

<Image file="a\b\c"/>
the search interval would be

Code: Select all

file="a\b\c"
.
I do not know regexp that well, from what I see $ is defined as:
$ matches the position before the first newline in the string.
But there are no new lines in these filtered intervals.
Maybe instead you could search for quoted values like:

Code: Select all

"[^\\]+"
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
BenDupre
Posts: 4
Joined: Mon Jun 08, 2020 9:02 pm

Re: Regex problem. Finding the file name embedded in a windows path

Post by BenDupre »

Thanks for the suggestions.
When I use the Xpath, it returns everything within the quotes, and the quotes as well.
$ in regex is meant to indicate the end of the string so it can match backwards and is the key to getting the last member of the sequence. If it is expecting a newline in there, it will not find one. I think I have the regex worked out, but I am still looking for the answer as to why the find routine grabs one matching value in the middle of the document and stops. This smells buggy to me.
Ben
Ben Dupre
"The greatest problem with communication is the illusion that it has been achieved." -- GB Shaw
adrian
Posts: 2867
Joined: Tue May 17, 2005 4:01 pm

Re: Regex problem. Finding the file name embedded in a windows path

Post by adrian »

Hello Ben,

If you are familiar with Perl regex, I see why you identify $ as the end of the string. However, Oxygen uses the Java implementation of regex, which even though is based on the Perl one, does not have the same meaning for ^ and $.
See java.util.regex.Pattern
Boundary matchers
^ The beginning of a line
$ The end of a line
Looking at the Oxygen docs (Search and Find/Replace Features > Regular Expressions Syntax), I do realize that somehow this specific difference between Java and Perl 5 regex was not highlighted.
The documentation team has mostly handled the "Comparison to Perl 5" from the Java docs, but that also seems to omit this specific difference. I've added a documentation issue to address this.
I am still looking for the answer as to why the find routine grabs one matching value in the middle of the document and stops. This smells buggy to me.
If you'd like to report a bug, I would like to request more details, perhaps a small sample file where you can reproduce the issue.
Please take into consideration the implementation difference that I mentioned between Java and Perl 5 regex.
If you'd like to keep this private, please send an email with the issue to support@oxygenxml.com or use the Technical Support form.

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Post Reply