utf-8 to Windows-1252 encoding with XSL

Here should go questions about transforming XML with XSLT and FOP.
6thChild
Posts: 2
Joined: Tue Feb 26, 2013 6:39 pm

utf-8 to Windows-1252 encoding with XSL

Post by 6thChild »

Hi everyone,

I have a XSL transformation which reads a XML file encoded in UTF-8 and writes a text file which must be encoded in Windows-1252.
So I wrote the following line in my transformation.
<xsl:output method="text" encoding="Windows-1252" />

Everything was working fine until I ran into an UTF-8 character which is absent in Windows-1252.
It creates a fatal error.

I can't find a way to specify that any character unavailable in Windows-1252 can be skipped and I have no idea how to solve this problem.
Any ideas?

Thanks for your help!

Guillaume
adrian
Posts: 2879
Joined: Tue May 17, 2005 4:01 pm

Re: utf-8 to Windows-1252 encoding with XSL

Post by adrian »

Hello,

If it's for specific characters, you can use xsl:character-map:

Code: Select all

<xsl:character-map name="a">
<xsl:output-character character="<" string="&lt;"/>
<xsl:output-character character=">" string="&gt;"/>
</xsl:character-map>
<xsl:output method="text" use-character-maps="a"/>
But in this case you need to cover a large character range (U+0100 -> U+FFFD), so this example is better suited:
http://stackoverflow.com/questions/1079 ... iven-range
Use the range 256-65533 (65534/U+FFFE and 65535/U+FFFF are not allowed in XML).
e.g.

Code: Select all

<xsl:template match="text()">
<xsl:analyze-string select="." regex="[&#256;-&#65533;]">
<xsl:matching-substring>
<!-- Skip -->
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
Regards,
Adrian

Later Edit:
Skipped the matching chars in code above.
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
6thChild
Posts: 2
Joined: Tue Feb 26, 2013 6:39 pm

Re: utf-8 to Windows-1252 encoding with XSL

Post by 6thChild »

Hello,

it did work, thank you very much Adrian!

I wrote the following line to erase characters that weren't allowed in Windows-1252:
<xsl:value-of select="replace(., '[&#256;-&#8363;]|[&#8365;-&#65533;]', '')"/>
I left the UTF-8 € character &#8364;

Regards,
Guillaume
Post Reply