Presentation files not being generated for large (100+ page) book object requiring OCR and PDF generation - test on staging

Project:RUcore Workflow Management System (WMS)
Version:8.2.3
Component:File Upload Module
Category:bug report
Priority:normal
Assigned:rmarker
Status:closed
Description

As part of the test plan, a requirement was to upload and ingest a 100+ page document. I carted WMS ID #10537, "Test Book record for OCR ibb" in Release 8.1 test collection - Chrome.

In this object, TIFF files were uploaded, and instructions were given to system generate a PDF, SearchXML, Searchxml_coord, and Thumbnail.

All file uploaded successfully, but no presentation files appear to have been generated, with an "Object X missing presentation file" error appearing for each file uploaded.

Will try again with a smaller batch and advise on status.

Comments

#1

Tried again and confirmed the same issue, this time with a 122 page document. This second item is WMS ID# 10546 in the same collection.

#2

Not sure if it is directly related, but based on :

[10-May-2016 17:20:28 America/New_York] PHP Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 72 bytes) in /mellon/includes/classes/php/string_cleaner/class.string.cleaner.php on line 201

In php.ini I doubled the memory limit:

;memory_limit = 256M ; Maximum amount of memory a script may consume (128MB)
memory_limit = 512M ; Maximum amount of memory a script may consume (128MB)

and restarted Apache

#3

If this class is the cause then, since that is my class, I'll need to know how it is being used in this case to see if there is a leak. To me, right now, it looks like the class was asked to parse a very large (string).

#4

This class "class.string.cleaner" is not used in WMS. Since Jeffery was talking about using this class, I suspect he may be doing something to test his code. I'll forward the error message Dave reported (#2) to him. In the meantime, I am looking into the cause of #1. -YY

#5

If it's any help, the output of the file processing window gives the following:

Presentation (pdf) - 1:
Error Fetching http headers
Error converting....

and then lists all of the tiff files I uploaded. This after it sitting for sometime at the pdf generation step. unsure if there's actual work going on or if it's sitting at a timeout.

#6

Assigned to:yuyang» ibeard
Status:active» test

Isaiah, the file "Index_074.tif" is corrupted, this is why your multi-files run didn't go through. I don't know if there are any other files that may have the same problem. Please run the test by choosing the "good" files. In the meantime, I need to explore if I can get from ABBYY more detailed failing message to show on the WMS screen. -YY

#7

I replaced the file and was able to get an 85 page document to work, but a large 500-page actual document did not go through. The failed item is WMS ID# 10656, and the document has both been used before testing and is an actual document in RUcore, so if there's corruption in one of the TIFF files on the storage space, then we may have a bigger problem here.

I will work to see if I can find a threshold where these issues start to happen.

#8

I tried uploading a 1000 page book, the upload finished, but ABBYY failed (Error Fetching http headers). This may have something to do with either the ABBYY server's soap configuration or something else that timed out during the process. Yes, it would be nice to know what the current threshold is, be it the number of pages or the file size of the total. -YY

#9

I tried an 800 page document, and a 600-page document, and received errors. The object is WMS ID# 10693. The latest errors kept referencing /mellon/htdocs/openwms/test/test_data/upload_area/rucore00000000432/10693/master/original-646-00010693.tiff however, I retrieved this file and found it to be a valid TIFF file.

It had a lot of Syriac text on it with a few english characters, but at worst I would assume ABBY would just not recognize anything.

#10

Isaiah, I went into record #10693 in WMS and continued your file process (let system generate presentation pdf, search xml, and thumbnail, since files were already uploaded) last night, and the process finished in 3 and 1/2 hours, no issues encountered. I suspect your problem in #9 was probably due to ABBYY over loaded by the testers. When you get chance, could you please test your 800 page document? Maybe you want to start the process before you leave for home to avoid the congestion. -YY

#11

Title:Presentation files not being generated for large (100+ page) book object requiring OCR and PDF generation» Presentation files not being generated for large (100+ page) book object requiring OCR and PDF generation - test on staging

Test this on the Staging server. When the ABBYY server is busy this problem seems to occur. When done off-peak hours, the process seems to pass.

#12

Assigned to:ibeard» yuyang
Status:test» active

Not fixed on staging. See also issue #3427

#13

Assigned to:yuyang» rmarker
Status:active» test

ABBYY disk full. Nick changed to a different bigger disk, test again. -YY

#14

Version:8.1» 8.2.3
Status:test» closed

Back to top