DOCTYPE not allowed in content,
This should cover W3C XML Schema, Relax NG and DTD related problems.
DOCTYPE not allowed in content,
Hey, thanks for looking!
I have some lovely MIL-STD-40051 xml that is DTD 6.5 compliant.
I usually work in Arbortext but I'm trying Oxygen again.
I have 200+ individual xml documents, and a "wrapper" file that references them all as entities for publication.
XML from the wrapper:
I'll spare you the list, all the separate documents are declared.
In Arbortext, each document is imported by reference:
When I try to validate in Oxygen Developer, it stops at the first entity reference, &goooo1;
reporting a fatal error as identified by Xerxes,
A DOCTYPE is not allowed in content.
The error is coming from the doctype in the first file being referenced as an entity.
And it's legit, every one of those ~150 XML documents has a DOCTYPE declaration.
Arbortext does not report any of this as an error and will happily generate a PDF from the "wrapper" document using the xsl style sheets provided with the DTD. I've been working with AT since version 5. I keep trying to cut over to oxygen, but the learning curve is kind of steep and I keep getting frustrated.
There is probably some simple solution, but I cannot for the life of me find it.
The other issue I am seeing is that when oxygen validates individual documents, it flags references to other documents as a validation error.
A generic link to another document <xref wpid="m00001"> should go to XML document m00001 when published, but does not go anywhere when considering the file by itself, and that is OK.
I did figure out how to ignore that validation error (I think).
Any tips on how to get Oxygen to check that would be swell, too.
Thanks in advance for any help.
Dan
I have some lovely MIL-STD-40051 xml that is DTD 6.5 compliant.
I usually work in Arbortext but I'm trying Oxygen again.
I have 200+ individual xml documents, and a "wrapper" file that references them all as entities for publication.
XML from the wrapper:
Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE production PUBLIC "-//USA-DOD//DTD -1/2C TM Assembly REV C 6.5 20200930//EN" "../40051c_6_5/40051c_6_5.dtd" [
<!ENTITY g00001 SYSTEM "../xmlfiles/g00001.xml">
<!ENTITY g00003 SYSTEM "../xmlfiles/g00003.xml">
<!ENTITY g00006 SYSTEM "../xmlfiles/g00006.xml">
<!ENTITY o00001 SYSTEM "../xmlfiles/o00001.xml">
<!ENTITY o00002 SYSTEM "../xmlfiles/o00002.xml">
<!ENTITY o00003 SYSTEM "../xmlfiles/o00003.xml">
<!ENTITY o00004 SYSTEM "../xmlfiles/o00004.xml">
<!ENTITY o00005 SYSTEM "../xmlfiles/o00005.xml">
<!ENTITY o00006 SYSTEM "../xmlfiles/o00006.xml">
In Arbortext, each document is imported by reference:
Code: Select all
<!-- gim -->
<!-- <!ELEMENT gim (titlepg, ((ginfowp, (bdar-geninfowp | (descwp+, thrywp*) | dmwr_introwp)) | (softginfowp, softsumwp, softeffectwp*, softdiffversionwp*) | (genmaint_ginfowp, descwp) | (pm-ginfowp) | (pms-ginfowp)))> -->
<gim chap-toc="no" chngno="0" revno="0"><titlepg maintlvl="operator">
<name>BIG GREEN TRUCK</name>
</titlepg>
<!-- Intro and theory of operation g00001 -->&g00001;
<!-- Equipment Description and Data g00006 -->&g00006;
<!-- Theory of Operation g00003 -->&g00003;
</gim>
<!-- opim -->
<!-- <!ELEMENT opim (titlepg, ((ctrlindwp+, opusualwp+, opunuwp+, emergencywp*, stowagewp*, eqploadwp*) | dmwr_operationalreqwp*))> -->
<opim chap-toc="no" chngno="0" revno="0"><titlepg maintlvl="operator">
<name>BIG GREEN TRUCK</name>
</titlepg>
<!-- Chapter 2 - Operator Instructions -->
<!-- DESCRIPTION AND USE OF OPERATOR CONTROLS AND INDICATORS per 40051-->
<!-- instrument panel -->&o00001;
<!-- aux panel -->&o00059;
<!-- center console -->&o00091;
<!-- steering col -->&o00092;
<!-- floor -->&o00093;
<!-- seat -->&o00095;
<!-- door -->&o00094;
When I try to validate in Oxygen Developer, it stops at the first entity reference, &goooo1;
reporting a fatal error as identified by Xerxes,
A DOCTYPE is not allowed in content.
The error is coming from the doctype in the first file being referenced as an entity.
And it's legit, every one of those ~150 XML documents has a DOCTYPE declaration.
Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ginfowp PUBLIC "-//USA-DOD//DTD -1/2C TM Assembly REV C 6.5 20200930//EN" "../40051c_6_5/40051c_6_5.dtd">
<ginfowp wpno="g00001">
Arbortext does not report any of this as an error and will happily generate a PDF from the "wrapper" document using the xsl style sheets provided with the DTD. I've been working with AT since version 5. I keep trying to cut over to oxygen, but the learning curve is kind of steep and I keep getting frustrated.
There is probably some simple solution, but I cannot for the life of me find it.
The other issue I am seeing is that when oxygen validates individual documents, it flags references to other documents as a validation error.
A generic link to another document <xref wpid="m00001"> should go to XML document m00001 when published, but does not go anywhere when considering the file by itself, and that is OK.
I did figure out how to ignore that validation error (I think).
Any tips on how to get Oxygen to check that would be swell, too.
Thanks in advance for any help.
Dan
Re: DOCTYPE not allowed in content,
Hello Dan,
According to the XML specification:
https://www.w3.org/TR/xml/#intern-replacement
an external entity reference must be expanded to its exact content in the XML document.
I understand from your description of Arbortext's behavior that it skips the DOCTYPE declaration from the reference's file when expanding the reference. I understand why this is useful but this is not correct according to the XML specification, it's probably something only Arbortext is doing.
This particular problem with entity references to files containing DTDs is also described here and the given workaround here is to use xi:includes instead of entity references:
https://www.oxygenxml.com/doc/versions/ ... ities.html
Other than that, I consider Oxygen's behavior correct according to the XML specification.
Regards,
Radu
Oxygen uses the Apache Xerces parser to parse and validate XML documents. The XSLT processors bundled with Oxygen use the same parser.DOCTYPE not allowed in content
Arbortext does not report any of this as an error and will happily generate a PDF from the "wrapper" document using the xsl style sheets provided with the DTD. I've been working with AT since version 5. I keep trying to cut over to oxygen, but the learning curve is kind of steep and I keep getting frustrated.
There is probably some simple solution, but I cannot for the life of me find it.
According to the XML specification:
https://www.w3.org/TR/xml/#intern-replacement
an external entity reference must be expanded to its exact content in the XML document.
I understand from your description of Arbortext's behavior that it skips the DOCTYPE declaration from the reference's file when expanding the reference. I understand why this is useful but this is not correct according to the XML specification, it's probably something only Arbortext is doing.
This particular problem with entity references to files containing DTDs is also described here and the given workaround here is to use xi:includes instead of entity references:
https://www.oxygenxml.com/doc/versions/ ... ities.html
Other than that, I consider Oxygen's behavior correct according to the XML specification.
So the module file in itself is invalid, it has an idref to a missing ID. But it is valid in the context of a larger XML file which includes multiple other files. Maybe Oxygen's Main Files support may help with this: https://www.oxygenxml.com/doc/ug-editor ... iting.htmlThe other issue I am seeing is that when oxygen validates individual documents, it flags references to other documents as a validation error.
A generic link to another document <xref wpid="m00001"> should go to XML document m00001 when published, but does not go anywhere when considering the file by itself, and that is OK.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
Re: DOCTYPE not allowed in content,
I am entirely willing to believe that Oxygen is parsing correctly and that this is an error.
I am also willing to believe that Arbortext ignoring it is a kludge for the 40051 user base.
The color difference in the link in the first referenced article was subtle and I missed it the first time through. Second time around, I caught it.
https://www.w3.org/TR/xinclude-11/
Arbortext/WC uses something similar when used used with the Windchill CMS, which is probably doing better parsing.
I will check what we are doing in that environment, and give it a try here. I'll follow up with results.
Thanks so much for the pointer!
Dan
I am also willing to believe that Arbortext ignoring it is a kludge for the 40051 user base.
The color difference in the link in the first referenced article was subtle and I missed it the first time through. Second time around, I caught it.
Code: Select all
<xi:include href="a.xml" xpointer="a1"
xmlns:xi="http://www.w3.org/2001/XInclude"/>
Arbortext/WC uses something similar when used used with the Windchill CMS, which is probably doing better parsing.
I will check what we are doing in that environment, and give it a try here. I'll follow up with results.
Thanks so much for the pointer!
Dan
Re: DOCTYPE not allowed in content,
Yay, progress, sort of.
Swapped out the entity references for xi:include.
That solved my initial DTD NOT ALLOWED error. YAY.
New problem:
Attribute "xml:base" is not allowed to appear in element "ctrlindwp".
If I understand that FAQ entry, I could fix this in the xml schema (XSD?) if I had a schema, but I don't, there is just a DTD.
Apparently there is also a way to turn this off in Xerxes? But this is all new territory and I cannot figure it out.
I moved the wrapper file into the same directory as all the other XML files, same thing:
I am also getting this message about every graphic:
ENTITY "G00006_01" is not unparsed.
Graphics are all listed in the DOCTYPE declaration in each file:
and then referenced in the document:
Again, any help would be appreciated. I am reasonably clever but I lack the background knowledge.
Thanks,
Dan
Swapped out the entity references for xi:include.
Code: Select all
<!-- <!ELEMENT gim (titlepg, ((ginfowp, (bdar-geninfowp | (descwp+, thrywp*) | dmwr_introwp)) | (softginfowp, softsumwp, softeffectwp*, softdiffversionwp*) | (genmaint_ginfowp, descwp) | (pm-ginfowp) | (pms-ginfowp)))> -->
<gim chap-toc="no" chngno="0" revno="0">
<titlepg maintlvl="operator"><name>BIG GREEN TRUCK</name></titlepg>
<xi:include href="./xmlfiles/g00001.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
<xi:include href="./xmlfiles/g00006.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
<xi:include href="./xmlfiles/g00003.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
</gim>
New problem:
Attribute "xml:base" is not allowed to appear in element "ctrlindwp".
https://xerces.apache.org/xerces2-j/faq ... html#faq-3According to the specification for XInclude, processors must add an xml:base attribute to elements included from locations with a different base URI.
If I understand that FAQ entry, I could fix this in the xml schema (XSD?) if I had a schema, but I don't, there is just a DTD.
Apparently there is also a way to turn this off in Xerxes? But this is all new territory and I cannot figure it out.
I moved the wrapper file into the same directory as all the other XML files, same thing:
Code: Select all
<gim chap-toc="no" chngno="0" revno="0">
<titlepg maintlvl="operator"><name>BIG GREEN TRUCK</name></titlepg>
<xi:include href="./g00001.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
<xi:include href="./g00006.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
<xi:include href="./g00003.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
</gim>
I am also getting this message about every graphic:
ENTITY "G00006_01" is not unparsed.
Graphics are all listed in the DOCTYPE declaration in each file:
Code: Select all
<!ENTITY G00006_01 SYSTEM "../Graphics/G-Introductory/G00006_01.svg" NDATA SVG>
Code: Select all
<figure><title>Title text, yadda yadda.</title><graphic boardno="G00006_01"></graphic></figure>
Again, any help would be appreciated. I am reasonably clever but I lack the background knowledge.
Thanks,
Dan
Re: DOCTYPE not allowed in content,
Hello Dan,
So:
The alternative would be to go to the Oxygen Preferences->"XML / XML Parser" page and disable the "Base URI fixup" checkbox, this should no longer generate those hidden xml:base attributes when validating or processing XML files containing xi:includes. The downside is that if the module XML file has a relative reference to some other location, that relative reference will appear as it was defined in the processed master XML document, without that xml:base which would have defined relative to what folder that reference should have been resolved.
According to the xi:include specs:
https://www.w3.org/TR/xinclude/#unparsed-entities
But the Apache Xerces XML parser that we are using does not seem to follow the specification in this regard.
I added an internal issue to see if we can possibly better analyze the problem and maybe patch the parser to behave closer to the specs, pasting the issue ID below for future reference:
EXM-54482 Unparsed entities are not added to larger infoset when resolving xi:includes
Trying to set a priority for this internal issue, is your general purpose to try and somehow migrate your editing and publishing needs from Arbortext to Oxygen? About how many people from your side would be using Oxygen if it would become feasible for them to use it?
Regards,
Radu
So:
When the main XML document containing the xi:includes gets validated or processed using XSLT, the xi:includes get expanded in place. When they get expanded, the XML processor adds the "xml:base" attribute to each expanded top level element which was initially defined in the smaller xi:included XML file. It does that in order to make it possible for the XML processor to compute relative references correctly. But this would mean that in order to avoid that validation error in your DTD you would need to declare the "xml:base" attribute as a possible valid attribute on all elements or at least on the elements which are usually top level elements in the xi:included files, like in your case in the ATTLIST definition of the element "ginfowp" you need to add something like:Attribute "xml:base" is not allowed to appear in element "ctrlindwp".
According to the specification for XInclude, processors must add an xml:base attribute to elements included from locations with a different base URI.
If I understand that FAQ entry, I could fix this in the xml schema (XSD?) if I had a schema, but I don't, there is just a DTD.
Apparently there is also a way to turn this off in Xerxes? But this is all new territory and I cannot figure it out.
Code: Select all
xml:base CDATA #IMPLIED
I managed to reproduce this situation on my side.I am also getting this message about every graphic:
ENTITY "G00006_01" is not unparsed.
Graphics are all listed in the DOCTYPE declaration in each file:
<!ENTITY G00006_01 SYSTEM "../Graphics/G-Introductory/G00006_01.svg" NDATA SVG>
and then referenced in the document:
<figure><title>Title text, yadda yadda.</title><graphic boardno="G00006_01"></graphic></figure>
According to the xi:include specs:
https://www.w3.org/TR/xinclude/#unparsed-entities
So I interpret this to mean that if the XML module defines an unparsed entity, when it gets included in the master XML document it should enrich the master XML document's DOCTYPE by declaring also this unparsed entity there.Any unparsed entity information item appearing in the references property of an attribute on the included items or any descendant thereof is added to the unparsed entities property of the result infoset's document information item, if it is not a duplicate of an existing member. Duplicates do not appear in the result infoset.
Unparsed entity items with the same name, system identifier, public identifier, declaration base URI, notation name, and notation are considered to be duplicate. An application may also be able to detect that unparsed entities are duplicate through other means. For instance, the URI resulting from combining the system identifier and the declaration base URI is the same.
But the Apache Xerces XML parser that we are using does not seem to follow the specification in this regard.
I added an internal issue to see if we can possibly better analyze the problem and maybe patch the parser to behave closer to the specs, pasting the issue ID below for future reference:
EXM-54482 Unparsed entities are not added to larger infoset when resolving xi:includes
Trying to set a priority for this internal issue, is your general purpose to try and somehow migrate your editing and publishing needs from Arbortext to Oxygen? About how many people from your side would be using Oxygen if it would become feasible for them to use it?
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service