Page 1 of 1

Line end characters being removed

Posted: Fri Sep 30, 2011 4:14 pm
by dancj
Hi - I'm just evaluating oXyGen XML Developer for my company and I've found the following issue.

If I format some XML that has line end characters within an element it removes the line ends (unless it seems the tagname is "address").

So:
<?xml version="1.0" encoding="UTF-8"?>
<unit>
<address1>123 Somewhere Street
Sometown Norway
EX2 7HY</address1>
</unit>

Becomes:
<?xml version="1.0" encoding="UTF-8"?>
<unit>
<address1>123 Somewhere Street Sometown Norway EX2 7HY</address1>
</unit>

Is there any fix for this?

Thanks

Dan

Re: Line end characters being removed

Posted: Fri Sep 30, 2011 4:36 pm
by dancj
Since posting that I have found Options/Preferences/Editor/Format/XML where you can specify specific Element names and XPath expressions that don't get formatted, but having to rely on pre-warning the app about any fields that I don't want messed up seems very dangerous - and putting //* in as one of the options just stops the formatting from working at all.

Re: Line end characters being removed

Posted: Fri Sep 30, 2011 4:51 pm
by sorin_ristache
Hello,

The Format and Indent action breaks the line only when it exceeds the maximum length specified in the option Line Width - Format and Indent from Options -> Preferences -> Editor / Format. That means if the line is shorter it is joined with the next one.

You have 2 options for preserving the text nodes from the address1 element:
  • add the attribute xml:space="preserve" to the address1 element
  • add the element name (address1) to the Preserve space list of elements from Options -> Preferences -> Editor / Format / XML


Regards,
Sorin

Re: Line end characters being removed

Posted: Fri Sep 30, 2011 4:57 pm
by dancj
Thanks for the reply.

Unfortunately both of those options mean I can't just stick an unknown piece of XML and format it without danger of changing the XML.

I think that's going to be a deal breaker for us.

Re: Line end characters being removed

Posted: Fri Sep 30, 2011 4:59 pm
by sorin_ristache
dancj wrote:having to rely on pre-warning the app about any fields that I don't want messed up seems very dangerous
Formatting an XML document applies the rules set in the Format and Format / XML preferences panels. You want an exception in the normal process of formatting for some elements, this is why you have to mark the exception elements explicitly.


Regards,
Sorin

Re: Line end characters being removed

Posted: Fri Sep 30, 2011 5:04 pm
by sorin_ristache
dancj wrote:Thanks for the reply.

Unfortunately both of those options mean I can't just stick an unknown piece of XML and format it without danger of changing the XML.

I think that's going to be a deal breaker for us.
Please give us some examples of the expected result for the formatting action. Do you want to preserve all text nodes of the XML document? If yes, how would you want to re-format the nodes by running the Format and Indent action?


Thank you,
Sorin

Re: Line end characters being removed

Posted: Fri Sep 30, 2011 5:11 pm
by dancj
sorin wrote:
dancj wrote:Thanks for the reply.

Unfortunately both of those options mean I can't just stick an unknown piece of XML and format it without danger of changing the XML.

I think that's going to be a deal breaker for us.
Please give us some examples of the expected result for the formatting action. Do you want to preserve all text nodes of the XML document? If yes, how would you want to re-format the nodes by running the Format and Indent action?


Thank you,
Sorin
I would expect:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<unit><address1>123 Somewhere Street
Sometown Norway
EX2 7HY</address1><anotherNode><childNode>aa</childNode></anotherNode></unit>
to become:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<unit>
<address1>123 Somewhere Street
Sometown Norway
EX2 7HY</address1>
<anotherNode>
<childNode>aa</childNode>
</anotherNode>
</unit>
So basically the content text-elements get left alone, but the space between elements gets adjusted to make the XML easier to read. I thought this was pretty standard behaviour for XML editors.

Re: Line end characters being removed

Posted: Fri Sep 30, 2011 5:12 pm
by dancj
That should say "content of text-elements"

Re: Line end characters being removed

Posted: Fri Sep 30, 2011 5:18 pm
by sorin_ristache
Removing whitespaces like end of line, tab, space is called normalization of an XML document and does not change the canonical form of the document. Is it important for you to preserve all text nodes? In such a case you can specify //text() in the Preserve space list from Options -> Preferences -> Editor / Format / XML.


Regards,
Sorin

Re: Line end characters being removed

Posted: Fri Sep 30, 2011 5:34 pm
by dancj
sorin wrote:Removing whitespaces like end of line, tab, space is called normalization of an XML document and does not change the canonical form of the document.
I'm not sure about "canonical form" but if you do it within text nodes you're changing the data contained in the XML
sorin wrote:Is it important for you to preserve all text nodes? In such a case you can specify //text() in the Preserve space list from Options -> Preferences -> Editor / Format / XML.
I just tried that. Unfortunately it didn't work. Does it rely on the XML having an XSD that specifies that the element is a text datatype? Is it not enough just to have text contained in the element?

Re: Line end characters being removed

Posted: Fri Sep 30, 2011 5:40 pm
by sorin_ristache
I am sorry, only a subset of the XPath language is supported for the Preserve space list. You have to specify the element names.


Regards,
Sorin

Re: Line end characters being removed

Posted: Fri Sep 30, 2011 5:47 pm
by dancj
sorin wrote:I am sorry, only a subset of the XPath language is supported for the Preserve space list. You have to specify the element names.


Regards,
Sorin
Okay, thanks.

Re: Line end characters being removed

Posted: Mon Oct 03, 2011 1:05 pm
by george
Hi Dan,

What you want is accomplished with the "Preserve text as it is" option from Options->Preferences -- Editor / Format / XML.

Best Regards,
George

Re: Line end characters being removed

Posted: Mon Oct 03, 2011 1:22 pm
by dancj
Thanks. That is a lot better, but it does still insert indents into the text - so I get:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<unit>
<address1>123 Somewhere Street
Sometown Norway
EX2 7HY</address1>
<anotherNode>
<childNode>aa</childNode>
</anotherNode>
</unit>
instead of:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<unit>
<address1>123 Somewhere Street
Sometown Norway
EX2 7HY</address1>
<anotherNode>
<childNode>aa</childNode>
</anotherNode>
</unit>

Re: Line end characters being removed

Posted: Mon Oct 03, 2011 2:35 pm
by george
Hi,

Can you try to use the "Reset defaults" on that page and then set the "Preserve text as it is" option? My tests show that the following document should remain unchanged after format and indent:

Code: Select all


<?xml version="1.0" encoding="UTF-8"?>
<unit>
<address1>123 Somewhere Street
Sometown Norway
EX2 7HY</address1>
<anotherNode>
<childNode>aa</childNode>
</anotherNode>
</unit>
Best Regards,
George

Re: Line end characters being removed

Posted: Mon Oct 03, 2011 2:44 pm
by sorin_ristache
dancj wrote:Thanks. That is a lot better, but it does still insert indents into the text - so I get:
I cannot reproduce the problem. If I select the option Preserve text as it is in Preferences - Editor - Format - XML only the XML tags are re-indented, not the text that appears between the XML tags. Please send us using this online form a sample XML document for reproducing the problem. Please include also your user preferences which you can export from menu Options -> Export Global Options.


Regards,
Sorin

Re: Line end characters being removed

Posted: Mon Oct 03, 2011 3:27 pm
by dancj
Ah - it wasn't the formatting. It put the indents in because "Indent on paste - sections with number of lines less than 300" on the same page was ticked.

With that unticked it all works well.

Thanks

Dan