Solr field for author

Project:RUcore SOLR Searching and Indexing

In support of the OA R7.4 search & browse specification I need a Solr field that is indexed as a non-tokenized string with the authors name in the following format:

{Last name}, {First Name} {Middle name}

I cannot continue the browse development elsewhere until this is in place.



Further clarification from the OA specification:

Authors - retrieves all mods:name where type=personal and role=author


Right now the field name="author" gets "surname given names author" without a comma and tokenized. I don't see how the comma would be of interest to Solr. We could make a string type copy field that should work for this.


I plan on using the facet values to supply a list of browse terms. The comma is needed in the value when pulling the facets from this field.


Does it have to be multivalued?


Yes because there can be multiple authors per resource and they would need to be faceted individually.


OK. I've created a new multivalued string field in schema.xml called *_st (the pre-packaged *_s being single valued), so we'll need to restart the Solr server on devel to use it. In the meantime, I've set the xslt base to create an author_s that you could use right away until the revised schema is read in with the restart. It creates things like this:
<field name="author">Triggs Jeffery Alan
</field><field name="author_s">Triggs, Jeffery Alan</field>


OK great thanks! Let me know when it has been restarted and the test objects have been re-indexed.


We'll need to have Dave or Ashwin restart Solr on devel. Then I can switch the author_s to author_st. Who should do the asking?


Please facilitate. Thank you.


OK. multivalued string field author_st is working, e.g.:
<field name="author_st">Triggs, Jeffery</field>


Great. Was everything re-indexed on the test system?


Only a couple of objects was reindexed so far to test. I'll run portalcron now.


Assigned to:triggs» chadmills
Status:active» test


Works well, thanks.


Assigned to:chadmills» triggs
Status:test» active

Having trouble with this field. If I perform the following query I get 0 results. I expect one result.

author_st:(Alexe, Sorin) AND portalkey:(FACULTY) NOT type:(collection)

When I try a similarly formed query against the titleletter_st field I get results as expected.

titleletter_st:(B) AND portalkey:(FACULTY) NOT type:(collection)

I am not sure what is going on here.


Hmmm. He is definitely there in this field:
<field name="author_st">Alexe, Sorin</field>
and shows up in glob searches:
though not
author_st:Alexe, *
The space does something odd, though _st is defined as a string:
<dynamicField name="*_st" type="string" indexed="true" stored="true" multiValued="true"/>
I wonder if omitNorms, which is set to true by default for string fields, could have anything to do with it.


There is a lot of discussion about the spaces in queries, even for string fields. I notice through lynx that I get a hit with:
lynx -source "\"Alexe,+Sorin\"&start=0&rows=10"
though not with:
lynx -source ",+Sorin)&start=0&rows=10"

Some people on the solr forums have suggested the workaround of getting rid of the spaces for fields meant only for facet searching.


Assigned to:triggs» chadmills
Status:active» test

Surrounding the search value in quotes fixed the problem, thanks.

So this...

author_st:(Alexe, Sorin) AND portalkey:(FACULTY) NOT type:(collection)

becomes this...

author_st:("Alexe, Sorin") AND portalkey:(FACULTY) NOT type:(collection)


Status:test» fixed

We are able to browse by Author in the SOAR portal. Also, I am able to search for 'ananthan' in name in RUcore without quotes. If further testing needs to be done, please change the status to test with testing instructions.


Status:fixed» closed

Back to top