Date sorting using Solr transformer

Project:RUcore Jobs & Reports
Component:Report - production
Category:task
Priority:normal
Assigned:triggs
Status:test
Description

Investigate and prepare a strategy that performs date sorting more accurately. The current process that is in place does not use any Solr built-in functionality.

This was mentioned in the past and this built-in transformer was identified as a possible solution.

<a href="http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer" title="http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer">http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer</a>

Also, in the January 12, 2012 Software Architecture meeting there is the mention of an investigation that was to occur. The results of that investigation could prove helpful.

"We briefly addressed the issue of date ordering in presentation results. Basically we need to be able to handle many different date formats. Jeffery will investigate the SOLR module that might help with this issue and will also talk with Kalaivani, Linda, and others to understand what date formats in R6.1 are not being properly sorted. If the SOLR module proves out, we may want to consider a dot release (i.e. R6.1.1). "

<a href="http://rucore.libraries.rutgers.edu/collab/ref/min_sawg_20120112.pdf" title="http://rucore.libraries.rutgers.edu/collab/ref/min_sawg_20120112.pdf">http://rucore.libraries.rutgers.edu/collab/ref/min_sawg_20120112.pdf</a>

Comments

#1

Version:7-x» 7.4

#2

Version:7.4» 7.5

I believe the index will support a number of different sorting approaches now (e.g. data ranges), but there has not been time to build any of it into a public search portal.

#3

Version:7.5» 7.6

Date sorting is more complicated that it might first appear. In particular, we have run into a lot of issues on the Roman Coins project regarding the handling of BCE dates. At one point, it was suggested that metadata standards prohibited the input of text dates. We also don't display start-end dates properly on the landing page (and probably also on search results). I think we also need to clarify the meaning of dateOther, dateIssued, and dateCreated. Intuitively, none of these seem to work for a coin minted in 280 BCE. Finally, I think we also need to clarify what dates are displayed and with what labels on the landing page.

So, this task needs to be moved to R7.6 and I will try to put together a document that describes the issues. Ultimately, I believe the MDWG will need to make a decision on these issues.

Ron J

#4

Version:7.6» 7.7
Assigned to:rjantz» triggs

We'll discuss this with Ron.

#5

Ron asked for clarity on the meaning of the various date sub-elements in the originInfo element.
dateIssued: The date that the resource was published, released, or issued.
dateCreated: The date of creation of the resource. If creation date is also the origination date (this is what is used in MARC cataloging, in the publication area), use the dateIssued element, or repeat both dateCreated and dateIssued.
dateCaptured: The date on which the resource was digitized or a subsequent snapshot was taken. NOTE: Currently, we do not use this in any RUcore collections, and it is not an option in the WMS.
copyrightDate: A date in which a resource is copyrighted.
dateOther: A date that does not fall into another category but is important to record. dateOther may be used with the type attribute to designate a specific kind of date which was not deemed of sufficient general use to have its own date element.

Ron also mentioned that the metadata standards prohibit the use of recording dates in a textual form. This is true. The Digital Library Federation (DLF)/Aquifer Guidelines recommends the recording of each date in a structured form rather than a textual form. Use of the encoding attribute is recommended, and the DLF/Aquifer guideines recommend representing the value using the W3CDTF encoding. The DLF/Aquifer guidelines recommend using ISO 8601 encoding (a more flexible standard) only when a date cannot be expressed using W3CDTF. The WMS allows either standard to be used, but only validates on the W3CDTF encoding.

Finally decisions about what dates are displayed and what labels are used on the landing page are determined within the search portals, e.g., the RUcore portal (applies to a search in RUcore, and the landing page from a web search engine), the Roman Coins portal, the NJDH portal, etc.

#6

Jeffery,

What is the status of this?

#7

Status:active» test

I think we now have several acceptable fields for sorting, including the UTF dt field sortdate_dt.

#8

Version:7.7» 8.1
Assigned to:triggs» rjantz

Please test date sorting (in test system) when searching.

#9

Project:RUcore SOLR Searching and Indexing» RUcore Jobs & Reports
Version:8.1» <none>
Component:Code» Report - production
Assigned to:rjantz» triggs

Back to top