Xquery and large files (>1gb); cmd line?
Having trouble installing Oxygen? Got a bug to report? Post it all here.
-
- Posts: 7
- Joined: Sun Dec 05, 2010 2:43 am
Xquery and large files (>1gb); cmd line?
Hi,
I am trying to run an xquery on a large xml file. I have been processing it mainly from the cmd line inside a bash shell. Opening a nearly 2gb file takes awhile. even in vi.
I have oxygen installed on a windows machine so I was under the impression I would be able to utilize saxon9ee from the cmd line. using
however, it returns the error
I am now going to endure the long wait associated with the 1.85gb file in oxygen and try the processing through the application, however, I anticipate a processing error. I shall report back
(btw, I cannot feed Xmx more than 1024m w/o oxygen refusing to open)
I have several collections I will need to run this query against. this being the smallest.
I am trying to run an xquery on a large xml file. I have been processing it mainly from the cmd line inside a bash shell. Opening a nearly 2gb file takes awhile. even in vi.
I have oxygen installed on a windows machine so I was under the impression I would be able to utilize saxon9ee from the cmd line. using
Code: Select all
java -cp c:/program files/oxygen XML Editor 12/lib/saxon9ee.jar net.sf.saxon.Query -s:infile.xml -o:outfile.xml -q:query.xq
Code: Select all
License File saxon-license.lic not found
(btw, I cannot feed Xmx more than 1024m w/o oxygen refusing to open)
I have several collections I will need to run this query against. this being the smallest.
-
- Posts: 2879
- Joined: Tue May 17, 2005 4:01 pm
Re: Xquery and large files (>1gb); cmd line?
Hello,
I'm afraid you won't be able to use the Saxon libraries(saxon9ee.jar) from Oxygen in the command line. These are only licensed to be used inside Oxygen.
Also, opening a 1-2GB file in Oxygen would most certainly result in an out-of-memory error.
If you have a 64-bit OS and a lot of memory(>=4GB) and want to put them to good use then download and unpack the "All platforms" distribution of Oxygen and a 64-bit JRE from Oracle: http://www.oracle.com/technetwork/java/ ... index.html
To launch Oxygen use the command line launcher scripts::
* On Windows run oxygen.bat.
* On Mac OS X run oxygenMac.sh.
* On Unix, Linux, Solaris, etc. run oxygen.sh.
To adjust the memory, edit the corresponding launcher script and adjust the "-Xmx" argument to an appropriate value(e.g. -Xmx3072m ).
Regards,
Adrian
I'm afraid you won't be able to use the Saxon libraries(saxon9ee.jar) from Oxygen in the command line. These are only licensed to be used inside Oxygen.
Also, opening a 1-2GB file in Oxygen would most certainly result in an out-of-memory error.
If you have a 64-bit OS and a lot of memory(>=4GB) and want to put them to good use then download and unpack the "All platforms" distribution of Oxygen and a 64-bit JRE from Oracle: http://www.oracle.com/technetwork/java/ ... index.html
To launch Oxygen use the command line launcher scripts::
* On Windows run oxygen.bat.
* On Mac OS X run oxygenMac.sh.
* On Unix, Linux, Solaris, etc. run oxygen.sh.
To adjust the memory, edit the corresponding launcher script and adjust the "-Xmx" argument to an appropriate value(e.g. -Xmx3072m ).
Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
-
- Posts: 9421
- Joined: Fri Jul 09, 2004 5:18 pm
Re: Xquery and large files (>1gb); cmd line?
Hi,
Another approach to process the XML file with Saxon SA without opening it in Oxygen:
In the Oxygen Project view add the XQuery file which you want to use for processing.
Right click it, choose Transform->Configure Transformation scenario.
Create a new XQuery type transformation scenario for it and then transform with it the XML source.
Regards,
Radu
Another approach to process the XML file with Saxon SA without opening it in Oxygen:
In the Oxygen Project view add the XQuery file which you want to use for processing.
Right click it, choose Transform->Configure Transformation scenario.
Create a new XQuery type transformation scenario for it and then transform with it the XML source.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 7
- Joined: Sun Dec 05, 2010 2:43 am
Re: Xquery and large files (>1gb); cmd line?
That is really to bad, it would be nice if a license was included with my oxygen license purchase.I'm afraid you won't be able to use the Saxon libraries(saxon9ee.jar) from Oxygen in the command line. These are only licensed to be used inside Oxygen.
Thank you, I am aware of all of this, I alluded to this when I mentioned I cannot feed oxygen more than 1024m. I guess I was hoping I was missing somethingAlso, opening a 1-2GB file in Oxygen would most certainly result in an out-of-memory error.
If you have a 64-bit OS and a lot of memory(>=4GB) and want to put them to good use then download and unpack the "All platforms" distribution of Oxygen and a 64-bit JRE from Oracle: http://www.oracle.com/technetwork/java/ ... index.html
To launch Oxygen use the command line launcher scripts::
* On Windows run oxygen.bat.
* On Mac OS X run oxygenMac.sh.
* On Unix, Linux, Solaris, etc. run oxygen.sh.
To adjust the memory, edit the corresponding launcher script and adjust the "-Xmx" argument to an appropriate value(e.g. -Xmx3072m ).

Right, I am aware of this approach, but in my case would also result in an out of memory error.Another approach to process the XML file with Saxon SA without opening it in Oxygen:
In the Oxygen Project view add the XQuery file which you want to use for processing.
Right click it, choose Transform->Configure Transformation scenario.
Create a new XQuery type transformation scenario for it and then transform with it the XML source.
I have found a partial solution. In terms of using saxon from the cmd line Michael Kay has allowed a 30day trial license for saxon. However, he does address large files in page one of the documentation on saxon and XQuery when he says:
I must admit, I haven't fully explored streaming, but from my understanding it is splitting into smaller files and this would be useful in a simple transformation, but I am counting distinct pairs.Saxon is an in-memory processor. Unless you can take advantage of streaming, Saxon is designed to process source documents that fit in memory. Saxon has been used successfully to process source documents of 100Mbytes or more without streaming, but if you attempt anything this large, you need to be aware (a) that you will need to allocate sufficient memory to the Java VM (at least 5 times the size of the source document), and (b) that complex FLWOR expressions may be very time-consuming to execute (In this scenario, Saxon-EE is recommended, because it has a more powerful optimizer for complex joins).
I have currently parsed this into a CSV file and have thrown it into a DB base and ran some SQL against the file. The SQL I ran is essentially buliding a Freq table; however, I think using uniq -c in bash would be a simple option for what I want, but uniq -c does not work right with UTF-8, I could change the encoding, but a lot of my foreign words would be outputted in a poor way. Further, both the perl script I have and bash's inconv result in errors when converting from UTF8 to ascii.
I wonder if python or ruby offer an alternative. Alas, this is outside of oxygen discussion.
Thank you for the help and support!
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service