Error generating search xml file

Project:RUcore Workflow Management System (WMS)
Version:8.2.3
Component:File Upload Module
Category:bug report
Priority:critical
Assigned:rmarker
Status:closed
Description

After uploading tiff files (more than 100 of them), successfully generating djvu, pdf, and thumbjpeg (and smap), I have been unable to generate a searchable text (xml) file. I get an error message. This is happening on both production and test.

Production Collection error message:
Searchxml (xml) - 3:
error opening ocr server url: <a href="http://ocrserver.scc-net.rutgers.edu/ocrtmp/pres-1-00078769.djvu" title="http://ocrserver.scc-net.rutgers.edu/ocrtmp/pres-1-00078769.djvu">http://ocrserver.scc-net.rutgers.edu/ocrtmp/pres-1-00078769.djvu</a>
Error converting /workarea/rucore10001600001/78769/djvu/pres-1-00078769.djvu to /workarea/rucore10001600001/78769/xml/ocr-3-00078769.xml

Graduate School - New Brunswick Electronic Theses and Dissertations
Working Title: HISTORIC DISSERTATION - RJM - Combined source-channel coding using multilevel/phase modulation (I will remove the ALL CAPS portion before ingest)
WMS 78769

Test Collection error message:
Searchxml (xml) - 3:
error opening ocr server url: <a href="http://ocrtest.scc-net.rutgers.edu/ocrtmp/pres-1-00010338.djvu" title="http://ocrtest.scc-net.rutgers.edu/ocrtmp/pres-1-00010338.djvu">http://ocrtest.scc-net.rutgers.edu/ocrtmp/pres-1-00010338.djvu</a>
Error converting /mellon/htdocs/openwms/test/test_data/upload_area/rucore00000000252/10338/djvu/pres-1-00010338.djvu to /mellon/htdocs/openwms/test/test_data/upload_area/rucore00000000252/10338/xml/ocr-3-00010338.xml

Marker Test Collection
Title: HISTORIC DISSERTATION - RJM - Combined source-channel coding using multilevel/phase modulation
WMS 10338

Comments

#1

Version:7.7.1» 8.1

#2

Assigned to:yuyang» rmarker
Status:active» test

From the error message, it looks like by the time upload was happening, the djvu ocr server was having some problems (in March 2016). Now we have removed djvu from the file policies, everything will go through ABBYY server to get OCR. I tested the same batch of files (112 files) with the same file processing instructions except djvu, and the process went through without any issues. I did the test on rep-test, where djvu is no longer an option. I think we should not spend time trying to figure out what happened in March, but move on to use ABBYY as it is set up on rep-test (and will be on rep-staging and rep-prod). Please double check by uploading similar batch and see what happens. -YY

#3

Leaving this here to test on staging R8.1

#4

Status:test» fixed

We ran various tests with more than 100 TIFF files and did not run into any problems. Marking this as "Fixed."

#5

Assigned to:rmarker» yuyang
Status:fixed» active

Staging, R8.1: still a problem. 100 page (tif/tiff) book, uploaded all files but got stuck after techMD created.
When I tried to generate a thumbnail, it generated smap, but no thumbnail.
Also unable to go in to generate a PDF, or searchable XML

#6

What is the record system ID? -YY

#7

Assigned to:yuyang» rmarker
Status:active» test

ABBYY disk full. Nick changed to a different bigger disk, test again. -YY

#8

Version:8.1» 8.2.3
Status:test» closed

Back to top