PHP Warning :: DOMDocument::loadXML(): xmlParseEntityRef: no name in Entity, line: 78

Project:RUcore SOLR Searching and Indexing
Version:8.1
Component:Code
Category:bug report
Priority:normal
Assigned:chadmills
Status:closed
Description

[10-May-2016 10:34:36 America/New_York] PHP Warning: DOMDocument::loadXML(): xmlParseEntityRef: no name in Entity, line: 78 in /mellon/htdocs/dlr/EDIT/INT/solrfilter-api.php on line 885
[10-May-2016 10:34:47 America/New_York] PHP Warning: DOMDocument::loadXML(): xmlParseEntityRef: no name in Entity, line: 78 in /mellon/htdocs/dlr/EDIT/INT/solrfilter-api.php on line 885

Comments

#1

Is there a PID for the object generating this? This is the rather general error I now get whenever there is bad data that fails XML parsing. It could be bad or broken multi-byte characters.

#2

Sorry, but I have no idea. Tailing the php error log and this popped up. No context provided. You might have to check against the apache access log for more info.

#3

I'm sure I can find some on rep-test at least. In general I need a filter to perform a function like perl's utf8 function, such as
$string = utf8($string);
I've been trying various mb_ functions in PHP, but have not found something robust enough to replace utf8.

#4

While testing another issue I saw this in the /solr/logs/solr-0.log from the same time.

<record>
<date>2016-05-10T10:34:47</date>
<millis>1462890887800</millis>
<sequence>931</sequence>
<logger>org.apache.solr.core.SolrCore</logger>
<level>SEVERE</level>
<class>org.apache.solr.common.SolrException</class>
<method>log</method>
<thread>12</thread>
<message>org.apache.solr.common.SolrException: Illegal processing instruction target ("xml"); xml (case insensitive) is reserved by the specs.
at [row,col {unknown-source}]: [1,385]
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: com.ctc.wstx.exc.WstxParsingException: Illegal processing instruction target ("xml"); xml (case insensitive) is reserved by the specs.
at [row,col {unknown-source}]: [1,385]
at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630)
at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:467)
at com.ctc.wstx.sr.BasicStreamReader.readPIPrimary(BasicStreamReader.java:3919)
at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2055)
at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2647)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:98)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
... 22 more
</message>
</record>

#5

I've seen that too. I'm not sure what it means. It's onl;y started showing up with the PHP version and I wonder if it has something to do with a misreading of the new errors thrown by PHP. I'd like to try /mellon/includes/classes/php/string_cleaner/class.string.cleaner.php to deal with the bad character or utf8 errors and hope fixing that might get the Solr complaints to go away as well.

#6

Here is an example of how to implement that class.

https://github.com/RutgersUniversityLibraries/stringCleaner/blob/master/...

#7

Still seeing loads of errors in the php_error log. If I may, I would suggest the following.

1) When instantiating a new XML DOM object include the version and encoding parameters.

$inputdom = new DomDocument('1.0', 'UTF-8');

2) Next suppress warnings as follows

$inputdom->@loadXML($xmldata, LIBXML_NOERROR);

3) Then check that the $inputdom is valid; which means converting the one-liner from:

$inputdom->loadXML($xmldata) or exit("Could not parse the xmldata");

to

if (!$inputdom){
exit("Could not parse the xmldata");
}
.....

so all put together....

$inputdom = new DomDocument('1.0', 'UTF-8');
$inputdom->@loadXML($xmldata, LIBXML_NOERROR);
if (!$inputdom){
exit("Could not parse the xmldata");
}

#8

I'm running it now (57% so far with no errors, though these come near the end). I am trying to catch the loadXML problem before it hits and exit with an error, which seems to work on command line tests. I'll know soon if there are no errors in this morning's run. I'm holding off if I can on suppressing errors with @. We'll see.

#9

I see a few more errors in the run. Perhaps I'll need to suppress the DOMDocument::loadXML() after all. I'll do that and rerun the portalcron.

#10

I'm running a new test; I had to use
@$inputdom->loadXML($xmldata, LIBXML_NOERROR);
since
$inputdom->@loadXML($xmldata, LIBXML_NOERROR);
threw a PHP parse error.

#11

Yes, sorry about the misplaced @ sign.

#12

Assigned to:triggs» chadmills
Status:active» test

The latest run of portalcron just before lunch produced no new errors. The last loadXML error is 16-May-2016 11:26:06, before the last run of the cron. The 4 o'clock cron should verify this.

#13

Status:test» fixed

#14

Status:fixed» closed

Back to top