Page 1 of 1

XSL multiple file input

Posted: Mon Apr 12, 2010 12:45 pm
by ra0543
Is there a way to use XSLT over multiple input files (specified by location or wildcard) to produce a single output file?

For instance, I have a directory with a varying number of XML files, all of which have the same XML structure:

FILE1.xml
<a><header>some metadata A</header><content>file contents</content></a>

FILE2.xml
<a><header>some metadata B</header><content>file contents</content></a>

FILE3.xml
<a><header>some metadata C</header><content>file contents</content></a>

FILE4.xml
<a><header>some metadata D</header><content>file contents</content></a>


I'd like to be able to run a transformation across all the files that happen to be in the directory at any particular time to get a list of the filenames (or full paths, I don't mind which) and some of their XML content, e.g. the <header> elements:

FILENAME HEADER INFO
FILE1.xml some metadata A
FILE2.xml some metadata B
FILE3.xml some metadata C
FILE4.xml some metadata D

The contents of the directory change fairly often, so if, for instance, this file is added:

FILE5.xml
<a><header>some metadata E</header><content>file contents</content></a>

I'd like not to have to change anything in the transformation (or the scenario) in order to get the new result:

FILENAME HEADER INFO
FILE1.xml some metadata A
FILE2.xml some metadata B
FILE3.xml some metadata C
FILE4.xml some metadata D
FILE5.xml some metadata E

Any suggestions for how to do it - or indeed advice that it's not possible - gratefully received.

Re: XSL multiple file input

Posted: Mon Apr 12, 2010 1:12 pm
by george
You can do that with XSLT 2.0. For example, if you select Saxon 9 as the XSLT engine then the following should give you a list with all the XML files from the same folder with the stylesheet:

Code: Select all


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:text>FILENAME HEADER INFO</xsl:text>
<xsl:for-each select="collection('.?select=*.xml')">
<xsl:text>&#10;</xsl:text>
<xsl:value-of select="document-uri(.)"/>
<xsl:text> </xsl:text>
<xsl:value-of select="/a/header"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Best Regards,
George

Re: XSL multiple file input

Posted: Mon Apr 12, 2010 2:36 pm
by george
One more thing: for additional information on the syntax for the Saxon 9 collection function please see
http://www.saxonica.com/documentation/s ... tions.html

Best Regards,
George

Re: XSL multiple file input

Posted: Thu Apr 29, 2010 5:00 pm
by ra0543
Thanks. That's really helpful. I have a follow-up question or two, with regard to doing more complex things with this ...

(1) Can I combine collections in the usual way in select expressions, e.g.

Code: Select all

select="collection('../A?select=*.xml')|collection('../B?select=*.xml')"
? This seems to work OK, but I just wanted to be sure.

(2) When processing collections, is it possible to use xsl:for-each-group rather than xsl:for-each? I'd like to group items within a collection by their declared schema (i.e. group by each document root element's attribute xsi:noNamespaceSchemaLocation) so that I deal with all the XML files with one schema together and then all with the next schema and so on?

I wondered about:

Code: Select all

<xsl:for-each-group select="collection('../A?select=*.xml')|collection('../B?select=*.xml')" group-by="//*/@xsi:noNamespaceSchemaLocation">
But I only seem to get the first document per group when I then show the <xsl:value-of select="current-group()">

(3) This means then that I don't seem to have a group of files to deal with together as the input to another <xsl:for-each>. And in fact, inside each group I'd like to sort files (what <xsl:for-each select=???> do I need to select each file in turn?) ideally by just their filename (irrespective of the path to that file, i.e. what <xsl:sort select=???>)?

Re: XSL multiple file input

Posted: Fri Apr 30, 2010 11:44 am
by george
The following stylesheet works ok:

Code: Select all


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.0">
<xsl:output indent="yes"/>
<xsl:template match="/">
<test>
<files>
<xsl:for-each select="collection('../a?select=*.xml')|collection('../b?select=*.xml')">
<file>
<xsl:value-of select="document-uri(.)"/>
</file>
</xsl:for-each>
</files>
<grouped>
<xsl:for-each-group select="collection('../a?select=*.xml')|collection('../b?select=*.xml')"
group-by="/*/@xsi:noNamespaceSchemaLocation">
<group schema="{current-grouping-key()}">
<xsl:for-each select="current-group()">
<file>
<xsl:value-of select="document-uri(.)"/>
</file>
</xsl:for-each>
</group>
</xsl:for-each-group>
</grouped>
</test>
</xsl:template>
</xsl:stylesheet>
on my test files it gives:

Code: Select all


<?xml version="1.0" encoding="UTF-8"?>
<test xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<files>
<file>file:/Users/george/test/xslt/a/test.xml</file>
<file>file:/Users/george/test/xslt/b/test.xml</file>
<file>file:/Users/george/test/xslt/b/testb.xml</file>
</files>
<grouped>
<group schema="test.xsd">
<file>file:/Users/george/test/xslt/a/test.xml</file>
<file>file:/Users/george/test/xslt/b/test.xml</file>
</group>
<group schema="testb.xsd">
<file>file:/Users/george/test/xslt/b/testb.xml</file>
</group>
</grouped>
</test>
Best Regards,
George

Re: XSL multiple file input

Posted: Fri Apr 30, 2010 2:02 pm
by george
Please note that the above does not resolve the schema file paths and in the example both test.xml from a and b refer to test.xsd, but those schemas are one in the folder a and the other in the folder b. To correctly group file that refer to the same schema file you can use something like below:

Code: Select all


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.0">
<xsl:output indent="yes"/>
<xsl:template match="/">
<test>
<files>
<xsl:for-each select="collection('../a?select=*.xml')|collection('../b?select=*.xml')">
<file>
<xsl:value-of select="document-uri(.)"/>
</file>
</xsl:for-each>
</files>
<grouped>
<xsl:for-each-group select="collection('../a?select=*.xml')|collection('../b?select=*.xml')"
group-by="document-uri(document(/*/@xsi:noNamespaceSchemaLocation))">
<group schema="{current-grouping-key()}">
<xsl:for-each select="current-group()">
<file>
<xsl:value-of select="document-uri(.)"/>
</file>
</xsl:for-each>
</group>
</xsl:for-each-group>
</grouped>
</test>
</xsl:template>
</xsl:stylesheet>
Best Regards,
George