How to enforce the encoding in the output html

Post here questions and problems related to editing and publishing DITA content.
Tarja Koski
Posts: 7
Joined: Wed Apr 25, 2018 2:24 pm

How to enforce the encoding in the output html

Post by Tarja Koski »

Hello,
I'm using Oxygen XML Editor for creating htmlhelp (CHM) from dita source files. I am using a customized htmlhelp plugin which is based on the DITA-OT 2.5.4. It worked fine with Oxygen XML Editor 23.1. I have now installed Oxygen XML Editor 26.1, and I have modified my htmlhelp plugin so that it now works together with the latest DITA-OT included in Oxygen XML Editor 26.1. I can successfully build the CHM file and everything looks fine.

There is only one problem. My source dita files are in Finnish, so there are a lot of scandinavian characters. They are displayed correctly in the htmlhelp viewer, in the table of contents, and on the Index tab. But when I type a word containing scandinavian characters in the text field on the Search tab and press Enter, the result is "No topics found" even though there are such topics.

The generated html files now seem to have charset=utf-8, when previously with the older DITA-OT it was charset=iso-8859-1. Could this be the reason why the full-text search cannot find the scandinavian characters? Is there a way I could enforce my htmlhelp plugin to set the character encoding to iso-8859-1 and generate the html files according to that?

Any help would be much appreciated, thank you.
Radu
Posts: 9283
Joined: Fri Jul 09, 2004 5:18 pm

Re: How to enforce the encoding in the output html

Post by Radu »

Hello Tarja,
The generated html files now seem to have charset=utf-8, when previously with the older DITA-OT it was charset=iso-8859-1. Could this be the reason why the full-text search cannot find the scandinavian characters? Is there a way I could enforce my htmlhelp plugin to set the character encoding to iso-8859-1 and generate the html files according to that?
I think you are right about this. Oxygen has some patches made to the DITA Open Toolkit engine and one of those patches created a long time ago attempts to use UTF-8 for the generated HTML files in order to fix a problem with generating CHM containing Greek letters if I recall correctly. I added an internal issue to remove this patch as it seemed to also cause problems for some of our Chinese users.
A possible hackish workaround:
- Close Oxygen.
- If you install on your side a tool like 7-Zip, you can open in it the JAR library "OXYGEN_INSTALL_DIR/frameworks/dita/DITA-OT/plugins/com.oxygenxml.dost.patches/lib/oxygen-dost-patches.jar" and inside the JAR there is a file in the folder path "org/dita/dost/util/codepages.xml", remote the "codepages.xml" file and then save the JAR archive.
- Then start Oxygen and try to publish again.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Tarja Koski
Posts: 7
Joined: Wed Apr 25, 2018 2:24 pm

Re: How to enforce the encoding in the output html

Post by Tarja Koski »

Hello Radu,
This worked! The html files now have charset=iso-8859-1, the Scandinavian characters are displayed correctly and the full-text search finds them.
Thank you very much!
Radu
Posts: 9283
Joined: Fri Jul 09, 2004 5:18 pm

Re: How to enforce the encoding in the output html

Post by Radu »

HI Tarja,
Great, thanks for the feedback, the official fix will be included in the DITA OT bundled with Oxygen 27 (November this year).
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
julien_lacour
Posts: 601
Joined: Wed Oct 16, 2019 3:47 pm

Re: How to enforce the encoding in the output html

Post by julien_lacour »

Hello,

Oxygen 27.0 is now available, in this version the encoding for Scandinavian characters has been fixed.

Regards,
Julien
Tarja Koski
Posts: 7
Joined: Wed Apr 25, 2018 2:24 pm

Re: How to enforce the encoding in the output html

Post by Tarja Koski »

Good to hear, thank you!
Post Reply