Which Java XML Parsing method to use when rewriting an XML file? -
edited little clarity.
i'm writing java application takes xml file , rewrites if information in file needs updated. example of xml file below:
<!doctype book public "mydtd.dtd" [ <!entity % ent system "entities.ent"> %ent; ]> <book id="exdoc" label="beta" lang="en"> <title>example document</title> <bookinfo> <authorgroup> <author> <firstname>george</firstname> <surname>washington</surname> </author> <author> <firstname>barbara</firstname> <surname>bush</surname> </author> </authorgroup> <pubsnumber>e12345</pubsnumber> <releaseinfo/> <pubdate>march 2016</pubdate> <copyright> <year>2012, 2016</year> <holder>company and/or affiliates. rights reserved.</holder> </copyright> <xi:include xmlns:xi="http://www.w3.org/2001/xinclude" href="abstract.xml" parse="xml"/> <xi:include xmlns:xi="http://www.w3.org/2001/xinclude" href="legal.xml" parse="xml"/> </bookinfo> <xi:include xmlns:xi="http://www.w3.org/2001/xinclude" href="preface.xml" parse="xml"/> ...
i need grab nodes , check information, , if information incorrect, update node have correct text. might want add/remove nodes needed.
for example, in node, might need change copyright year list recent year. or, might need add writer element.
at moment, create instance of sax parser, validate xml file create document instance (which in turn resolves entities), read nodes document, , update text settextcontent() method. take resulting document @ end of updates particular file , use domsource , transformer factory output file:
transformerfactory transformerfactory; transformerfactory = transformerfactory.newinstance(); transformer transformer = transformerfactory.newtransformer(); domsource source = new domsource(doc); streamresult result = new streamresult(new file(uri)); transformer.transform(source, result);
this presents limitations, though, want around. one, if inline text has text entity &something;, want keep entity is. @ moment, entity resolves text when file rewritten.
so example, if have
<!entity "something">
if file has like:
<para> there's &something; here.</para>
when rewrite, want say:
<para> here's &something; there.</para>
but entity resolves , file becomes:
<para>here's there.</para>
i'm not sure entityresolver class such doesn't automatically resolve these entities when read nodes without breaking rest of code. have class use xpath pulls information doc compare information in xml file recorded in database, can't not set entityresolver otherwise xpath expression breaks entirely.
i suppose have separate parser reading/writing xml file , sax parser that's necessary grab info our database, want clean possible.
any appreciated...
unfortunately, cannot tell transformation engine not expand entity references. happens xml parsed, lost time xml content being transformed.
what multi-stage transformation scenario you:
- replace entity reference entity-reference-like tokens i.e. replace
&something;
¶something;
, as michael kay suggested. perform transformation adjust content needed, won't expand entity references , preserve entity-reference-like tokens. , if need entities resolved in order verify entities information, load original xml doc (with expanded entities) , cross-reference between documents.
change entity-reference-like tokens in transformed output entity-references find/replace.
Comments
Post a Comment