Minifying XML documents (not HTML)

Are you missing a feature? Request its implementation here.
forevermaat
Posts: 1
Joined: Tue May 17, 2022 4:14 pm

Minifying XML documents (not HTML)

Post by forevermaat »

I was curious if OxygenXML Editor had some type of minifying tool that would allow an entire XML document or highlighted text to be "minified". With options to remove all line breaks and place all text on a single line.

A web search turned up Minifying HTML documents help page. What I found most ironic is that the minifying function seems to be available, but is exclusive to HTML...but not other document types. Maybe I am missing something??
Radu
Posts: 9414
Joined: Fri Jul 09, 2004 5:18 pm

Re: Minifying XML documents (not HTML)

Post by Radu »

Hi,

If you open an XML document in Oxygen in the Text editing mode you can select its entire contents, then right click and choose "Source->Join and normalize lines".
Can I ask what your use case is?

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
jeankaplansky
Posts: 24
Joined: Tue Jun 08, 2021 8:22 pm

Re: Minifying XML documents (not HTML)

Post by jeankaplansky »

Hi, Radu -
Can we run "Join and Normalize lines" over a folder or subfolder's worth of XML files? Can I set this up in a configuration or script file?

Use case: I have a bunch of HTML5 that requires ascii-level cleanup. We have downstream tools outside the XML ecosystem that don't handle whitespace per the XML specification. Right now, we're going through the files with either RegEx or opening the files in text mode and joining and normalizing the files one by one. This last isn't efficient if we need to do a project with thousands of files. We'd much rather be able to run this on an entire project at one time. Better yet, I'd like to make it part of a linting routine that runs through our files before we send them for ingestion in downstream processes. Ideally, I'd like to keep this part of the technology in the XML ecosystem (e.g., I know I can go to node and likely find something that will do the joining and normalizing, but I'm also trying to keep my XML tools "stake in the sand" before we hand our source content over to the downstream processes that are incapable of processing XML files as XML.

Please advise.

Thanks!
-Jean
Jean Kaplansky
Kaplan North America
jean.kaplansky at kaplan dot com
Radu
Posts: 9414
Joined: Fri Jul 09, 2004 5:18 pm

Re: Minifying XML documents (not HTML)

Post by Radu »

Hello Jean,
We do not have the functionality to run our "Join and Normalize Lines" in batch mode. So indeed the workaround would be to maybe use our "Find/Replace in Files" dialog, regexp enabled search for something like "(\s+)" and replace it with a single space.
Ideally in such cases these problems would be fixed closer to the script which does the actual processing. If that script cannot handle new lines and cannot be fixed, maybe it could have a pre-processing stage where it removes the new lines, so instead of forcing all users to deliver the content in some form, fix the problem closer to its source.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
jeankaplansky
Posts: 24
Joined: Tue Jun 08, 2021 8:22 pm

Re: Minifying XML documents (not HTML)

Post by jeankaplansky »

We currently do use RegEx to cleanup our XML in a couple of different ways using the Find and Replace in Files dialog. Mostly to remove vestiges of old markup or to fix errors. CMD+j is my team's favorite keyboard shortcut. Sometimes we get undefined characters in place of spaces that need to be changed. The normalize part - getting rid of the underlying ASCII file hidden characters is crucial to our downstream processes.

I'm starting to think about how we might do a pre-processing step in XSLT via XProc.

Is there a wishlist for noting that at least one company needs a batch join & normalize feature in Oxygen desktop?

Thanks!
Jean Kaplansky
Kaplan North America
jean.kaplansky at kaplan dot com
Radu
Posts: 9414
Joined: Fri Jul 09, 2004 5:18 pm

Re: Minifying XML documents (not HTML)

Post by Radu »

Hi Jean,
I added an internal issue based on this thread, pasting the issue ID below for future reference:
EXM-54638 Batch "Join and Normalize Lines"
If we manage to implement it in a future version we'll update this thread.
I think if I were to have the same problem I would probably try to create a custom XSLT-based XML refactoring script and apply it on multiple files from Oxygen:
https://www.oxygenxml.com/doc/versions/ ... tions.html
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
jeankaplansky
Posts: 24
Joined: Tue Jun 08, 2021 8:22 pm

Re: Minifying XML documents (not HTML)

Post by jeankaplansky »

Thank you! Normalizing files in batch is a big want on our internal wishlist.
I may play with a AI Positron custom action to see what I can see...
Jean Kaplansky
Kaplan North America
jean.kaplansky at kaplan dot com
Radu
Posts: 9414
Joined: Fri Jul 09, 2004 5:18 pm

Re: Minifying XML documents (not HTML)

Post by Radu »

Hi Jean,
I added for you on this sample GitHub project a custom refactoring action named "Join and normalize consecutive spaces" based on XSLT to join and normalize XML documents:
https://github.com/oxygenxml/dita-refac ... e%20spaces
The action uses a regular expression inside the XSLT stylesheet to match multiple spaces and replace them with a single one.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Post Reply