Bad PDF datastream; i.e. corrupted

Project:RUcore Jobs & Reports
Component:Job - production
Category:task
Priority:normal
Assigned:ibeard
Status:closed
Description

These objects have a bad/corrupted presentation PDF.

rutgers-lib:3079
rutgers-lib:2832
rutgers-lib:23921
rutgers-lib:34012
rutgers-lib:33701
rutgers-lib:34983
rutgers-lib:32925
rutgers-lib:33645

Comments

#1

I've worked through the RUcore objects marked as having bad PDFs. I was able to recover all but one object, whose ARCH-1 also appears to consist of a corrupted PDF.

The cause of the bad PDFs appeared to be all across the map. I’ve created a table that describes what I found and action taken to correct. There are some outstanding issues that someone else may need to step in and correct, as highlighted in the table. And we may need to consider purging the object with the bad ARCH and presentation datastreams.

Object with bad PDF
EXIFTOOL Analysis of PDF file
Disposition
rutgers-lib:3079
Actually a 16.3MB PostScript file with MIME time “application/postscript.”  Creating Application is “GNU Ghostscript 652 (pswrite)” with creation date/time of 2005/04/17 13:26:57, though RUcore shows creation date as “2013-03-20T10:33:50.520Z
ARCH-1 datastreams valid. Created a new PDF from TIFF files. 
rutgers-lib:2832
Actually an HTML file renamed as a PDF. The page simply redirected to “http://www.scc.rutgers.edu”  Creation date is listed as 2013-03-20T10:36:38.144Z

ARCH-1 datastreams valid. Created a new PDF from TIFF files. 
rutgers-lib:23921
Contains two PDF files for a scholarly article. PDF-2 is a Japanese version with SOAR coversheet, and is valid.  PDF-1 is actually a 303-byte HTML file which states “You do not have permission to view this page.You will be redirected back to the login page in a short while.” and redirects to http://mss3.libraries.rutgers.edu/dlr/EDIT/login.php
ARCH datastreams valid. Found a Word doc from which new PDF was created. However, I’m not sure of the cover sheet I used is the correct one.  New coversheet may need to be created.
rutgers-lib:34012
Is actually a renamed TIFF file.
Converted to a Valid PDF file.
rutgers-lib:33701
Is actually a renamed TIFF file.
Converted to a Valid PDF file.
rutgers-lib:34983
Corrupted PDF fie. exiftool reports “Invalid xfref table”
ARCH datastream is a PDF, also corrupted, same problem.  This PDF appears to have been damaged from the start, or possibly corrupted during the ingest process. Unable to fix.
rutgers-lib:32925
Corrupted PDF fie. exiftool reports “Invalid xfref table”
ARCH-1 datastreams valid. Created a new PDF from TIFF files. Thumbnail needs to be generated.
rutgers-lib:33645
Is actually a renamed TIFF file.
Converted to a Valid PDF file.

#2

Component:Job - production server» Job - production

Updating regarding this task: Is this completed? Note that I am unable to fix rutgers-lib:34983 as I do not have access to a suitable archival master. All other objects were fixed.

I am also unable to close this ticket.

#3

Status:active» fixed

#4

Status:fixed» closed

Back to top