utf-8 to Windows-1252 encoding with XSL
Here should go questions about transforming XML with XSLT and FOP.
-
- Posts: 2
- Joined: Tue Feb 26, 2013 6:39 pm
utf-8 to Windows-1252 encoding with XSL
Hi everyone,
I have a XSL transformation which reads a XML file encoded in UTF-8 and writes a text file which must be encoded in Windows-1252.
So I wrote the following line in my transformation.
<xsl:output method="text" encoding="Windows-1252" />
Everything was working fine until I ran into an UTF-8 character which is absent in Windows-1252.
It creates a fatal error.
I can't find a way to specify that any character unavailable in Windows-1252 can be skipped and I have no idea how to solve this problem.
Any ideas?
Thanks for your help!
Guillaume
I have a XSL transformation which reads a XML file encoded in UTF-8 and writes a text file which must be encoded in Windows-1252.
So I wrote the following line in my transformation.
<xsl:output method="text" encoding="Windows-1252" />
Everything was working fine until I ran into an UTF-8 character which is absent in Windows-1252.
It creates a fatal error.
I can't find a way to specify that any character unavailable in Windows-1252 can be skipped and I have no idea how to solve this problem.
Any ideas?
Thanks for your help!
Guillaume
-
- Posts: 2879
- Joined: Tue May 17, 2005 4:01 pm
Re: utf-8 to Windows-1252 encoding with XSL
Hello,
If it's for specific characters, you can use xsl:character-map:
But in this case you need to cover a large character range (U+0100 -> U+FFFD), so this example is better suited:
http://stackoverflow.com/questions/1079 ... iven-range
Use the range 256-65533 (65534/U+FFFE and 65535/U+FFFF are not allowed in XML).
e.g.
Regards,
Adrian
Later Edit:
Skipped the matching chars in code above.
If it's for specific characters, you can use xsl:character-map:
Code: Select all
<xsl:character-map name="a">
<xsl:output-character character="<" string="<"/>
<xsl:output-character character=">" string=">"/>
</xsl:character-map>
<xsl:output method="text" use-character-maps="a"/>
http://stackoverflow.com/questions/1079 ... iven-range
Use the range 256-65533 (65534/U+FFFE and 65535/U+FFFF are not allowed in XML).
e.g.
Code: Select all
<xsl:template match="text()">
<xsl:analyze-string select="." regex="[Ā-�]">
<xsl:matching-substring>
<!-- Skip -->
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
Adrian
Later Edit:
Skipped the matching chars in code above.
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
-
- Posts: 2
- Joined: Tue Feb 26, 2013 6:39 pm
Re: utf-8 to Windows-1252 encoding with XSL
Hello,
it did work, thank you very much Adrian!
I wrote the following line to erase characters that weren't allowed in Windows-1252:
<xsl:value-of select="replace(., '[Ā-₫]|[₭-�]', '')"/>
I left the UTF-8 € character €
Regards,
Guillaume
it did work, thank you very much Adrian!
I wrote the following line to erase characters that weren't allowed in Windows-1252:
<xsl:value-of select="replace(., '[Ā-₫]|[₭-�]', '')"/>
I left the UTF-8 € character €
Regards,
Guillaume
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service