Fix date metadata for 2000+ resources

Project:RUcore Jobs & Reports
Component:Job - production
Category:task
Priority:normal
Assigned:rmarker
Status:test
Description

After running the DOI script to create DOIs for legacy objects, we identified 2000+ resources which have 1) dates in invalid format 2) no dates. These resources have "reserved" DOIs. After fixing date metadata, the state needs to be changed from "reserved" to "public".

Comments

#1

#2

#3

Assigned to:ananthan» triggs

We've completed majority of the resources. About 100+ collection objects were fixed. Most of the resources with dates and without dates have been fixed.

Tombstone collection in NJDH has a few hundred resources which were all photographed in 2010. We discussed at the meeting yesterday that Jeffery can write a script to supply dateCreated for these resources. The XML is provided for your convenience. Please check with Rhonda and make sure that is correct.

[mods:originInfo]
[mods:dateCreated encoding="w3cdtf" qualifier="exact"]2010[/mods:dateCreated]
[/mods:originInfo]

#4

Yes, that is the XML that should be added to the MODS datastream for all the objects in the Tombstone Collection. There are 460 objects in this collection.

#5

The files are created on rep-test. We just need to set up a time for Dave to run a fixds script to put them in and then run a quick DOI update on the list.

#6

Jeffery,

Please send an email to Dave. I'm not sure if he is subscribed to this project.

#7

Yes I am subscribed.

The mention of a fixds script here is not what gets me to act.

The creation of the list, readme, shell script and any other pertinent
files in /mellon/cvsroot on rep-test is what allows me to schedule
and run the scripts.

Notification that this is ready can be added here, or sent to me in email
if necessary.

#8

Assigned to:triggs» dhoover

Hi Dave,

I'm attaching a readme for the two-part Tombstone update: 1) a fixds script to add in MODS with proper dates to 459 objects, and 2) a run of the rundoidopublic script to make the DOIs for these objects public. The list is short enough that there is no need to split it up.

Thanks,

Jeffery

#9

Assigned to:dhoover» triggs

Following the readme file I put this :

./fixds-09.pl dsid=MODS mime="text/xml" useserver=prod filelist=tombstonelist.tx
t usefiledir="TESTOBJECTS/TOMBSTONE" controlgroup="X" dcslave="yes"

on tombstone.sh and tried to run it.

I got:

rep-prod DS_fix/20150526# Modifying object rutgers-lib:37839 which already has an MODS datastream
<a href="http://xxxxx:xxxxxx@127.0.0.1:8080/fedora/get/rutgers-lib:37839/MODS" title="http://xxxxx:xxxxxx@127.0.0.1:8080/fedora/get/rutgers-lib:37839/MODS">http://xxxxx:xxxxxx@127.0.0.1:8080/fedora/get/rutgers-lib:37839/MODS</a>
Special condition for MODS datastream...
./tmpbuildsolr.xml:1: parser error : Document is empty

^
./tmpbuildsolr.xml:1: parser error : Start tag expected, '<' not found

^ at ./fixds-09.pl line 165

[1] Exit 22 ./tombstone.sh

Also please do not output the Fedora username/password in the displays.

#10

Component:Job - production server» Job - production

Jeffery,

Please run a report to identify resources that still have "reserved" DOIs. There should be four different reports:
1) Resources with no dates
2) Resources with dates in invalid format
3) Collection objects with no dates
4) Collection objects with dates in invalid format (We may not find any)

The report should include collection name, PID, Title, date and should be in a spreadsheet.

Before we run this report, if we haven't yet, we should run a script to update date and DOI state for about 900+ resources in Tombstone collection. Since it has been a while, I can't remember the status of this task.

Thanks,
Kalaivani

#11

The tombstone project identified only 459 objects. Dave apparently got a somewhat flaky error when he gave the script a first shot on May 26, after which we all got distracted by the release to staging and prod. I'm looking into what might have caused the error message.

#12

Assigned to:triggs» dhoover

Dave,

Thanks to your log I was able to reproduce the bug on rep-test. A variable $modsdata based on the simple datastream file was not being set in the one condition where MODS was being changed with dcslave set to "yes". The XML parser was failing on the empty variable. I fixed this and was able to run the command on rep-test. In the process I noticed that having dcslave set to "yes" in this instance doesn't really do anything useful. We could have avoided all this trouble (though we would have missed the bug) by simply running the command with dcslave="no". I've updated /mellon/cvsroot/mellon/cvsroot/tombstone-readme.txt to run the command as
/path/to/fixds.pl dsid=MODS mime="text/xml" useserver=prod filelist=tombstonelist.txt usefiledir="TESTOBJECTS/TOMBSTONE" controlgroup="X" dcslave="no"
which should go smoothly.

#13

Ran the new tombstone process using fixds-10.pl

The last couple of records produced this screen output:

modified rutgers-lib:38830-MODS.xml in object...
<?xml version="1.0" encoding="UTF-8"?>
<responses>
HTTP::Request=HASH(0x11a3de8)<response actiontype="add"><status>OK</status><message>Success with add action for rutgers-lib:38830...</message></response>
HTTP::Request=HASH(0x11a3de8)<response actiontype="commit"><status>OK</status><message>Success with commit action for rutgers-lib:38830...</message></response>
</responses>
Modifying object rutgers-lib:38831 which already has an MODS datastream
<a href="http://mss3.libraries.rutgers.edu/dlr/EDIT/INT/getfedorarest.php?pid=rutgers-lib:38831&amp;altid=&amp;label=MODS&amp;mimetype=text/xml&amp;checksumtype=&amp;dslocation=http://rep-test.libraries.rutgers.edu/dlr/EDIT/TESTOBJECTS/TOMBSTONE/rutgers-lib:38831-MODS.xml&amp;dsid=MODS&amp;version=yes&amp;view=chds&amp;logmessage=jat+ran+fixds" title="http://mss3.libraries.rutgers.edu/dlr/EDIT/INT/getfedorarest.php?pid=rutgers-lib:38831&amp;altid=&amp;label=MODS&amp;mimetype=text/xml&amp;checksumtype=&amp;dslocation=http://rep-test.libraries.rutgers.edu/dlr/EDIT/TESTOBJECTS/TOMBSTONE/rutgers-lib:38831-MODS.xml&amp;dsid=MODS&amp;version=yes&amp;view=chds&amp;logmessage=jat+ran+fixds">http://mss3.libraries.rutgers.edu/dlr/EDIT/INT/getfedorarest.php?pid=rut...</a>
Success changing the MODS datastream of rutgers-lib:38831. Since versioning is requested, the old version was left in place.

modified rutgers-lib:38831-MODS.xml in object...
<?xml version="1.0" encoding="UTF-8"?>
<responses>
HTTP::Request=HASH(0x11a3de8)<response actiontype="add"><status>OK</status><message>Success with add action for rutgers-lib:38831...</message></response>
HTTP::Request=HASH(0x11a3de8)<response actiontype="commit"><status>OK</status><message>Success with commit action for rutgers-lib:38831...</message></response>
</responses>
Modifying object rutgers-lib:38832 which already has an MODS datastream
<a href="http://mss3.libraries.rutgers.edu/dlr/EDIT/INT/getfedorarest.php?pid=rutgers-lib:38832&amp;altid=&amp;label=MODS&amp;mimetype=text/xml&amp;checksumtype=&amp;dslocation=http://rep-test.libraries.rutgers.edu/dlr/EDIT/TESTOBJECTS/TOMBSTONE/rutgers-lib:38832-MODS.xml&amp;dsid=MODS&amp;version=yes&amp;view=chds&amp;logmessage=jat+ran+fixds" title="http://mss3.libraries.rutgers.edu/dlr/EDIT/INT/getfedorarest.php?pid=rutgers-lib:38832&amp;altid=&amp;label=MODS&amp;mimetype=text/xml&amp;checksumtype=&amp;dslocation=http://rep-test.libraries.rutgers.edu/dlr/EDIT/TESTOBJECTS/TOMBSTONE/rutgers-lib:38832-MODS.xml&amp;dsid=MODS&amp;version=yes&amp;view=chds&amp;logmessage=jat+ran+fixds">http://mss3.libraries.rutgers.edu/dlr/EDIT/INT/getfedorarest.php?pid=rut...</a>
Success changing the MODS datastream of rutgers-lib:38832. Since versioning is requested, the old version was left in place.

modified rutgers-lib:38832-MODS.xml in object...
<?xml version="1.0" encoding="UTF-8"?>
<responses>
HTTP::Request=HASH(0x11a3de8)<response actiontype="add"><status>OK</status><message>Success with add action for rutgers-lib:38832...</message></response>
HTTP::Request=HASH(0x11a3de8)<response actiontype="commit"><status>OK</status><message>Success with commit action for rutgers-lib:38832...</message></response>
</responses>
Modifying object rutgers-lib:38833 which already has an MODS datastream

Please check and see if it did what you wanted.

#14

Hi Dave,

Thanks for this too. The messages look OK suggesting successful runs. I'll check the collection and then we can run the doi update part.

#15

Assigned to:dhoover» rmarker
Status:active» test

I updated the metadata and changed the DOIs to public for all that were still marked reserved. This should be done now.

#16

Assigned to:rmarker» pkonin

#17

Assigned to:pkonin» triggs
Status:test» active

The DOIs resolve, but a lot of the records have dates listed twice. If you just search for "tombstone" and skip around to some of the results pages, you'll see them.

#18

Within this collection, detect duplicate dates and delete the MODS xml for one of them.
mods:originInfo - mods:dateCreated

#19

Assigned to:triggs» pkonin
Status:active» test

I started doing that before even seeing this. It should be all set now with DOIs active and single dates.

#20

Assigned to:pkonin» rmarker

These objects still have no mods:date* elements. Many are collection objects and many have rulib:date in a mods:extension section, but no actual mods:dateCreated/Issued/Other etc.

#21

These 50 objects still have bad mods:dates of various sorts.

Back to top