XML text error for any Word documents submitted in Faculty Deposit

Project:RUcore Workflow Management System (WMS)
Component:Faculty Deposit
Category:bug report

Submitted three word docs in older version, resulting in this Error message for the XML (text) file:


Error converting /mellon/htdocs/openwms/test/test_data/upload_area/rucore30001900001/4273/master/original-1-00004273.doc /mellon/htdocs/openwms/test/test_data/upload_area/rucore30001900001/4273/master/original-2-00004273.doc /mellon/htdocs/openwms/test/test_data/upload_area/rucore30001900001/4273/master/original-3-00004273.doc to /mellon/htdocs/openwms/test/test_data/upload_area/rucore30001900001/4273/xml/ocr-1-00004273.xml



Title:XML text error when submitting multiple Word docs in Faculty Deposit» XML text error for any Word documents submitted in Faculty Deposit

Searchable XML is not being created for single Word doc (older version or Office 2007), or multiple Word docs. Here is another error message:


Error converting /mellon/htdocs/openwms/test/test_data/upload_area/rucore30001900001/4268/master/original-1-00004268.doc to /mellon/htdocs/openwms/test/test_data/upload_area/rucore30001900001/4268/xml/ocr-1-00004268.xml


Status:active» postponed

Rhonda, I have tested the three word files you sent me. I tested by uploading each file individually, then uploading them all at once in faculty submission. All 4 tests were completed successfully with not errors. I am wondering if the errors you saw was due to too many requests sent to server while other people also was testing. Could you please test again using the same files and see if you can reproduce the problem. Please make sure you clear your browser's cache before testing. -YY


Status:postponed» active

No XML is being created for *any* Word, Excel, or Powerpoint documents submitted through Faculty Deposit. Here are some example error messages from the log. No tar file created, no XML text created. Please work with Yang to fix this.

WMS 3212 (multiple Excel)

Error converting /mellon/htdocs/openwms/test/test_data/upload_area/rucore30013800001/4312/master/original-1-00004312.xls /mellon/htdocs/openwms/test/test_data/upload_area/rucore30013800001/4312/master/original-2-00004312.xls /mellon/htdocs/openwms/test/test_data/upload_area/rucore30013800001/4312/master/original-3-00004312.xls to /mellon/htdocs/openwms/test/test_data/upload_area/rucore30013800001/4312/xml/ocr-1-00004312.xml
WMS 4316 (multiple older powerpoint)

Error converting /mellon/htdocs/openwms/test/test_data/upload_area/rucore30013800001/4316/master/original-1-00004316.ppt to /mellon/htdocs/openwms/test/test_data/upload_area/rucore30013800001/4316/xml/ocr-1-00004316.xml
WMS4324 (Office2007 powerpoint)

Error converting /mellon/htdocs/openwms/test/test_data/upload_area/rucore30013800001/4324/master/original-1-00004324.pptx to /mellon/htdocs/openwms/test/test_data/upload_area/rucore30013800001/4324/xml/ocr-1-00004324.xml


Priority:normal» critical


Assigned to:jgeng» yuyang

I tested uploading a word document from WMS. I selected to create PDF, Thumbnail JPEG, and SearchXML. The PDF and Thumbnail JPEG were created successfully but Seach XML failed. The process ended at this point and no tar file was created.

This is a critical issue. People may not notice that the archival file is missing and ingest the objects without archival file because the status is OK .


Probably the same problem as /node/1508


Status:active» postponed

This is most likely file policy configuration issue. The ocr xml can only be generated from djvu or pdf, nothing else. But in test site, the policy listed all possible file types as the source file for ocr, which is incorrect. Please correct the policy and test again. If this solve the problem, all the other policies need to be checked for the same problem. -YY


Assigned to:yuyang» rmarker
Status:postponed» test

Yang and I have updated the file policy and tested uploading a word file. It looks like the file uploading process completed OK. PDF, Thumbnail JPEG, and Search XML files were created and a tar file was also created OK.


Assigned to:rmarker» ananthan
Status:test» fixed

Tested with box Word and Excel formats. Verified fixed.


Status:fixed» closed

See Rhonda's comment.

Back to top