As a follow up to the DJVU only resource in NJEDL that staff asked if I could run a report to list those resources that also didn't have an archival datastream.

Using a snapshot of the production database I found 799 possible objects on production that do not have an arch, darch or rarch datastream. A majority appear to be NJEDL related. I am attaching the complete report. I will also send this along to the NJEDL staff.



From this report, I've identified 459 objects that are in fact, text objects with DJVU datastreams, lacking something better. I am in the process now of generating two files from the DJVUs for these objects:

1. A decompressed best-quality archival PDF from the DJVU
2. An OCR'd, compressed presentation PDF from the archival PDF.

A good portion of the archival PDFs might be too large to easily add through the dlr/EDIT interface. Would it be possible to do a batch addition of ARCH1 and PDF1 datastreams to these objects? The file names for these new PDFs indicate which PID they belong ti.

As an aside: there are a significant number (300+) objects listed in this report that are moving image files. It appears that somehow, these video objects were ingested without their ARCH datastreams. The majority of these appear to be part of the Equine Science video collection. We should discuss investigating how this happened and figure out next steps, as to my knowledge, these objects should have been ingested through WMS, not through any batch process that would've permitted this to happen.


If you have the PDF files created, we should be able to set up a batch add datastream process without much trouble. We could do it for the videos files as well, though size might be an issue there. We could experiment.


I now have PDF replacement files for 459 djvu-laden objects. For each object, I have 1 high-resolution PDF (to be used as an ARCH1), and one optimized PDF for presentation. How should we proceed? I can upload these to a location for adding to datastreams as a bulk operation, when ready.

The ARCH-level PDFs are collectively, 38GB. The presentation PDFs are 5.5 GB.


Following instructions found on rep-test in /mellon/cvsroot

I ran the following two commands:

rep-prod /home/DJVU_replace# more
./ dsid=ARCH1 mime="application/pdf" useserver=prod filelist=PDF-Mas
ter.list usefiledir="TESTOBJECTS/DJVU-replacements/PDF-Master" controlgroup="M"

./ dsid=PDF-1 mime="application/pdf" useserver=prod filelist=PDF-Opti
mized.list usefiledir="TESTOBJECTS/DJVU-replacements/PDF-Optimized" controlgroup

Output is attached.

Please review and if there are no further issues, mark this as closed


I tested for Dave and it looks like the script ran OK in both instances, so I'll mark this fixed.


