Test creation of text layer from a system-generated PDF

Project:RUcore Workflow Management System (WMS)
Component:File Upload Module
Category:bug report

Use files in Voorhess/Voorhees_1957* for testing. Project team experienced intermittent problems.



Status:active» test


Status:test» active

Test failed in Marker Test Collection, on record with title: test creation of text layer from a system-generated PDF
WMS 6963 (rep-devel)
I got this error report:

Presentation (pdf) - 1:
getCurrentJobStatus::Failure:JobID-29d99a46-f55b-451b-97cf-4fa77c10d9bf666102 failed.
Error converting /mellon/htdocs/openwms/test/test_data/upload_area/rucore00000000252/6963/master/derived-1-00006963.tif /mellon/htdocs/openwms/test/test_data/upload_area/rucore00000000252/6963/master
[lines deleted]
/mellon/htdocs/openwms/test/test_data/upload_area/rucore00000000252/6963/master/derived-27-00006963.tif to /mellon/htdocs/openwms/test/test_data/upload_area/rucore00000000252/6963/pdf/pres-1-00006963.pdf
2013-05-31 12:20:58


Assigned to:rmarker» yuyang


This specific pdf file has a large map on the last page. It seems that the pdf server at its current state cannot handle it, not sure it is due to the overall file size or that specific map. Chad has sent an email to the pdf server company asking for help. We can still do testing using other pdf files, but keep in mind about this pdf server issue. -YY


Status:active» test

It turns out that PDF server, when doing OCR, has a limit on the pdf page size (max < 28" x 28"). The last page of the pdf file for test is over this limit. Until we have a better OCR engine, the best we can do is to check the page size before doing OCR. If the size is over the limit, WMS stops the process and displays the error message, indicating which pages need re-work. User then has to re-create the pdf file according to the specs before coming back to WMS and upload it again. This checking mechanism has been added into WMS and need testing. -YY


Status:test» active

I uploaded files referred in this report. WMS did not generate PDF and it did not produce any error messages. The process ran over 2 hours but not successful. Since PDF was not created, I could not test search XML.


Status:active» test

The issue of #6 is probably due to the problem found in file upload module when people select server upload and specify list or range file names. This specific issue is fixed. Test again. If still not seeing error message about pdf size limit, reopen the bug. -YY


Version:7.2» 7.4
Status:test» active

Voorhees TIFF to PDF using "Provide a list of files" option does not generate PDF file. Moving to R7.4 for further investigation.


Status:active» postponed

I suggest combine 2255, 2235, 2227, and 2176 into one bug report, and move them into 7.5. They all have something to do with OCR process failing, either on large number of files or problematic source file(s) that pdf server cannot handle. We need to investigate for better and reliable solution for OCR process. -YY

Back to top