Index for A-Z list

Project:RUcore SOLR Searching and Indexing
Version:7.4
Component:Code
Category:task
Priority:critical
Assigned:chadmills
Status:closed
Description

For the OA R7.4 release they want to offer a 0-9 and A-Z list of resources by

1)Title
2)Author
3)Schools, Departments, and Centers

I think the best course of action would be to create new index fields where the first non article words letter, or number, is indexed. A facet of that would would provide the number of occurrences and the list. Search against that field could provide the results. Articles would need to be omitted; the, a, an etc. For anything beginning with a number, 0-9, those could be folded into a single value for this new field, maybe just a 0 would suffice as the value.

Comments

#1

Assigned to:triggs» chadmills
Status:active» test

Is this a title index search? We have a solr string field called sorttitle already that has articles filtered out. It can be searched with wildcards on individual letters of the alphabet like this:
sorttitle:a*
sorttitle:b*
etc.
In dlr/EDIT you can see this with searches like:
sorttitle:c* portalkey:ETD
which finds all ETD's with titles beginning with c.

#2

Assigned to:chadmills» triggs
Status:test» active

I still need to create the schools field so I'm taking this back for now. I though that was in a separate issue.

#3

Wildcards won't work in this case. I need a faceted list of Title's and the letter and/or number they begin with. You suggestion would require 36(26 letters & 0-9) sequential searches using wildcards to see if if a title exists that begins with a letter or number.

If a new field that stored the first character after an article; the, an, etc, I could just run one faceted query. As originally mentioned for numbers there is no need to separate them individually, grouping/folding them under the value '0-9' will be sufficient.

#4

Assigned to:triggs» chadmills
Status:active» test

OK. I have several new fields ready for indexing. In addition to sorttitle, there is now a field called "titleletter_st" which is a lowercased varsion of the first important character in the title. Thus "ABCD" is "a", "Beyond" is "b", etc. If the first character is a number, it is "0", so "401 Things" becomes "0". This can easily be changed to give "4" if you like.

In addition to author_st there is now a field called "authorletter_st" which is a lowercased version of the first letter of the author's last name.

There are also two new school/department fields based on the following xpaths:
//mods:name[@type="corporate"][@authority="RutgersOrg-School"]
and
//mods:name[@type="corporate"][@authority="RutgersOrg-Department"]
as specified in the May 7th version of Jane's and Laura's document. These are "schooldeptletter_st", a lowercased version of the first letter of the department or school, and "schooldeptfacet_st", a facet ready string field of the department or school. I have not seen any objects with these mods elements so have not been able to test but I imagine one can be made up if you don't know of any.

#5

For the titleletter_st field can any title whose initial character start with a number be stored as value '0-9' in the index field?

#6

Done. Do you want me to reindex everything or would you want to reindex special test objects? I have already reindexed the NB ETD collection. The dlr/EDIT facet search for authorletter_st on the collection gives the following:
Facet Results for this Query

Refine Search? 119 hits for b
Refine Search? 76 hits for s
Refine Search? 58 hits for c
Refine Search? 58 hits for m
Refine Search? 39 hits for l
Refine Search? 38 hits for g
Refine Search? 33 hits for a
Refine Search? 33 hits for h
Refine Search? 32 hits for p
Refine Search? 32 hits for r
Refine Search? 31 hits for d
Refine Search? 30 hits for k
Refine Search? 24 hits for w
Refine Search? 21 hits for t
Refine Search? 15 hits for z
Refine Search? 14 hits for n
Refine Search? 13 hits for f
Refine Search? 13 hits for j
Refine Search? 10 hits for v
Refine Search? 10 hits for y
Refine Search? 7 hits for e
Refine Search? 6 hits for i
Refine Search? 6 hits for x
Refine Search? 4 hits for q
Refine Search? 3 hits for u
Refine Search? 1 hits for o

#7

Faceting on titleletter_st gives:
Facet Results for this Query

Refine Search? 72 hits for p
Refine Search? 69 hits for e
Refine Search? 67 hits for s
Refine Search? 59 hits for d
Refine Search? 54 hits for r
Refine Search? 52 hits for t
Refine Search? 51 hits for c
Refine Search? 43 hits for m
Refine Search? 41 hits for a
Refine Search? 40 hits for i
Refine Search? 31 hits for f
Refine Search? 21 hits for n
Refine Search? 17 hits for h
Refine Search? 16 hits for g
Refine Search? 13 hits for u
Refine Search? 13 hits for w
Refine Search? 12 hits for b
Refine Search? 12 hits for o
Refine Search? 10 hits for l
Refine Search? 8 hits for v
Refine Search? 7 hits for 0
Refine Search? 4 hits for k
Refine Search? 4 hits for q
Refine Search? 2 hits for j
Refine Search? 2 hits for z
The Z search gives:
Title: Zeno, Aristotle, the Racetrack and the Achilles: a historical and philosophical investigation
and
ZnO nanotip-based acoustic wave sensors

#8

Indexing everything would be very helpful, but at the very least I will need the test faculty deposit collections indexed.

#9

OK. I'll run portalcron with the force flag before I go. I reindexed the ETDs and the title facets now list:

Facet Results for this Query

Refine Search? 77 hits for p
Refine Search? 70 hits for e
Refine Search? 68 hits for s
Refine Search? 61 hits for d
Refine Search? 54 hits for r
Refine Search? 54 hits for t
Refine Search? 53 hits for c
Refine Search? 49 hits for m
Refine Search? 45 hits for a
Refine Search? 44 hits for i
Refine Search? 32 hits for f
Refine Search? 22 hits for n
Refine Search? 20 hits for g
Refine Search? 20 hits for h
Refine Search? 15 hits for o
Refine Search? 14 hits for w
Refine Search? 13 hits for u
Refine Search? 12 hits for b
Refine Search? 10 hits for l
Refine Search? 8 hits for v
Refine Search? 6 hits for q
Refine Search? 4 hits for k
Refine Search? 3 hits for 0-9
Refine Search? 2 hits for j
Refine Search? 2 hits for z
The 0-9 set gives:
Title: 100% of anything looks good -- the appeal of one hundred percent and the psychology of vaccination
Title: 3-D morphometry and non-rigid registration for quantitative analysis and clinical assessment in radiology
The 4-1-9 coalition, the internet, and Nigerian business integration in the United States

#10

Assigned to:chadmills» triggs
Status:test» active

Looks good on my end, just one change is needed. Please index the first letter capitalized and not lowered case.

#11

Assigned to:triggs» chadmills
Status:active» test

OK. A ten second change, though it will take a bit longer to reindex devel. I've got it started at least. You can test in a couple of hours or so.

#12

Assigned to:chadmills» triggs
Status:test» active

Noticed an anomaly with the lower vs. upper case first letter index for Title. Right now for some letters two fact values are returned; such as...

A and a

A returns a different set of records than a. For the A facet values all of the title begin with a capital A. For the lowercase 'a' value the search returns records with a lowercase 'a' in the title. Expecting there to be one uppercase A vale that returns all upper and lower case records.

Example, upper A:

<a href="https://rucore-test.libraries.rutgers.edu/search.dev/results/?q1=%22A%22&amp;q1field=facet:titlebeginswith&amp;key=FACULTY" title="https://rucore-test.libraries.rutgers.edu/search.dev/results/?q1=%22A%22&amp;q1field=facet:titlebeginswith&amp;key=FACULTY">https://rucore-test.libraries.rutgers.edu/search.dev/results/?q1=%22A%22...</a>

Example, lower a:

<a href="https://rucore-test.libraries.rutgers.edu/search.dev/results/?q1=%22a%22&amp;q1field=facet:titlebeginswith&amp;key=FACULTY" title="https://rucore-test.libraries.rutgers.edu/search.dev/results/?q1=%22a%22&amp;q1field=facet:titlebeginswith&amp;key=FACULTY">https://rucore-test.libraries.rutgers.edu/search.dev/results/?q1=%22a%22...</a>

#13

Hmmm. You just reminded me. In Solr the string fields are case sensitive because they are not analyzed.
&lt;!-- The StrField type is not analyzed, but indexed/stored verbatim. -->
&lt;fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>

Should I uppercase the single letters?

#14

Yes I think uppercasing them would be a good normalization path. This would be true for all "first letter" fields requested.

#15

I'll just put back the normalizations I had before - only toupper instead of tolower. That takes only a few seconds but then we'll need another reindex.

#16

Assigned to:triggs» chadmills
Status:active» test

The objects are reindexed and it looks like the caps and lowercase are folded now.

#17

Assigned to:chadmills» triggs
Status:test» active

Yes, much better.

'Z' is being returned as a first letter in title facet but when I search I get zero results. Can you confirm the same on your end?

Test URL

<a href="https://rucore-test.libraries.rutgers.edu/search.dev/results/?q1=%22Z%22&amp;q1field=facet:titlebeginswith&amp;key=FACULTY" title="https://rucore-test.libraries.rutgers.edu/search.dev/results/?q1=%22Z%22&amp;q1field=facet:titlebeginswith&amp;key=FACULTY">https://rucore-test.libraries.rutgers.edu/search.dev/results/?q1=%22Z%22...</a>

#18

I get the following titleletter_st facets including 37 titles beginning with Z. These are across the whole set of objects. But aren't you already filtering these results out before the facets are calculated?

Refine Search? 1666 hits for D
Refine Search? 1368 hits for C
Refine Search? 1243 hits for N
Refine Search? 1189 hits for S
Refine Search? 1134 hits for P
Refine Search? 863 hits for T
Refine Search? 761 hits for M
Refine Search? 638 hits for A
Refine Search? 580 hits for W
Refine Search? 564 hits for E
Refine Search? 522 hits for R
Refine Search? 502 hits for F
Refine Search? 500 hits for H
Refine Search? 451 hits for B
Refine Search? 440 hits for L
Refine Search? 423 hits for G
Refine Search? 398 hits for J
Refine Search? 354 hits for I
Refine Search? 281 hits for O
Refine Search? 222 hits for K
Refine Search? 173 hits for 0-9
Refine Search? 172 hits for U
Refine Search? 164 hits for V
Refine Search? 81 hits for Y
Refine Search? 37 hits for Z
Refine Search? 33 hits for Q
Refine Search? 2 hits for SS
Refine Search? 2 hits for X
Refine Search? 2 hits for 中
Refine Search? 2 hits for 数
Refine Search? 1 hits for Ή
Refine Search? 1 hits for 当
Refine Search? 1 hits for 漢
Refine Search? 1 hits for 猫
Refine Search? 1 hits for 當
Refine Search? 1 hits for 살

Printing 1 to 10 of 37 matches for query 'titleletter_st:Z'.
1.
Content Provider Functions: Export Object | Validate Object | View Audit Trail
View Item Index | View Object Record | View Solr/Lucene and Object XML
Metadata Manager Functions: Edit Metadata | Export MARC 21
Repository Administrator Functions: Full Object Access | Manage Signatures | Manage Embargo Policies | Manage Relationships
Add a New Datastream | Change a Datastream | Purge Object | Manage DOI | Reindex the Object
Title: Z_Test Ingest with DOI
Identifier: <a href="http://hdl.rutgers.edu/1782.1/rep-devel.rucore00000000315.Book.000004632" title="http://hdl.rutgers.edu/1782.1/rep-devel.rucore00000000315.Book.000004632">http://hdl.rutgers.edu/1782.1/rep-devel.rucore00000000315.Book.000004632</a>
Identifier: doi:10.5072/FK2959SJ6
Identifier: rutgers-lib:26899
Date: 2009
Date: 2009
2.
Content Provider Functions: Export Object | Validate Object | View Audit Trail
View Item Index | View Object Record | View Solr/Lucene and Object XML
Metadata Manager Functions: Edit Metadata | Export MARC 21
Repository Administrator Functions: Full Object Access | Manage Signatures | Manage Embargo Policies | Manage Relationships
Add a New Datastream | Change a Datastream | Purge Object | Manage DOI | Reindex the Object
Title: Zeno, Aristotle, the Racetrack and the Achilles: a historical and philosophical investigation
Identifier: <a href="http://hdl.rutgers.edu/1782.2/rucore10001600001.ETD.17425" title="http://hdl.rutgers.edu/1782.2/rucore10001600001.ETD.17425">http://hdl.rutgers.edu/1782.2/rucore10001600001.ETD.17425</a>
Identifier: ETD_1175
Identifier: rutgers-lib:24501
Date: 2008
Date: 2008-10
3.
Content Provider Functions: Export Object | Validate Object | View Audit Trail
View Item Index | View Object Record | View Solr/Lucene and Object XML
Metadata Manager Functions: Edit Metadata | Export MARC 21
Repository Administrator Functions: Full Object Access | Manage Signatures | Manage Embargo Policies | Manage Relationships
Add a New Datastream | Change a Datastream | Purge Object | Manage DOI | Reindex the Object
Title: Zimmerman map template test 2014 Feb 25
Identifier: rutgers-lib:201807
Date: 2014
Date: 2012
Date: 2014
4.
Content Provider Functions: Export Object | Validate Object | View Audit Trail
View Item Index | View Object Record | View Solr/Lucene and Object XML
Metadata Manager Functions: Edit Metadata | Export MARC 21
Repository Administrator Functions: Full Object Access | Manage Signatures | Manage Embargo Policies | Manage Relationships
Add a New Datastream | Change a Datastream | Purge Object | Manage DOI | Reindex the Object
Title: Zimmerman map test 2014 Feb 21 #1 take #2
Identifier: doi:10.5072/FK2TB1KQP
Identifier: rutgers-lib:201768
5.
Content Provider Functions: Export Object | Validate Object | View Audit Trail
View Item Index | View Object Record | View Solr/Lucene and Object XML
Metadata Manager Functions: Edit Metadata | Export MARC 21
Repository Administrator Functions: Full Object Access | Manage Signatures | Manage Embargo Policies | Manage Relationships
Add a New Datastream | Change a Datastream | Purge Object | Manage DOI | Reindex the Object
Title: Zimmerman map test 2014 Feb 21 brief rec. test #1
Identifier: doi:10.5072/FK2X3599B
Identifier: rutgers-lib:201772
Date: 1999
Date: 1999
6.
Content Provider Functions: Export Object | Validate Object | View Audit Trail
View Item Index | View Object Record | View Solr/Lucene and Object XML
Metadata Manager Functions: Edit Metadata | Export MARC 21
Repository Administrator Functions: Full Object Access | Manage Signatures | Manage Embargo Policies | Manage Relationships
Add a New Datastream | Change a Datastream | Purge Object | Manage DOI | Reindex the Object
Title: Zimmerman map test 2014 Feb 21 test #2
Identifier: <a href="http://www.hdl.com" title="http://www.hdl.com">http://www.hdl.com</a>
Identifier: rutgers-lib:201762
Date: 1999
7.
Content Provider Functions: Export Object | Validate Object | View Audit Trail
View Item Index | View Object Record | View Solr/Lucene and Object XML
Metadata Manager Functions: Edit Metadata | Export MARC 21
Repository Administrator Functions: Full Object Access | Manage Signatures | Manage Embargo Policies | Manage Relationships
Add a New Datastream | Change a Datastream | Purge Object | Manage DOI | Reindex the Object
Title: Zimmerman map test 2014 Feb 21 test #3
Identifier: doi:10.5072/FK2JT037S
Identifier: rutgers-lib:201770
Date: 2015
Date: 2015
8.
Content Provider Functions: Export Object | Validate Object | View Audit Trail
View Item Index | View Object Record | View Solr/Lucene and Object XML
Metadata Manager Functions: Edit Metadata | Export MARC 21
Repository Administrator Functions: Full Object Access | Manage Signatures | Manage Embargo Policies | Manage Relationships
Add a New Datastream | Change a Datastream | Purge Object | Manage DOI | Reindex the Object
Title: Zion Lutheran Church 50th Anniversary Postcard
Identifier: <a href="http://hdl.rutgers.edu/1782.3/EHCHSPHOTO.Photograph.3114" title="http://hdl.rutgers.edu/1782.3/EHCHSPHOTO.Photograph.3114">http://hdl.rutgers.edu/1782.3/EHCHSPHOTO.Photograph.3114</a>
Identifier: rutgers-lib:10860
Date: 1909
9.
Content Provider Functions: Export Object | Validate Object | View Audit Trail
View Item Index | View Object Record | View Solr/Lucene and Object XML
Metadata Manager Functions: Edit Metadata | Export MARC 21
Repository Administrator Functions: Full Object Access | Manage Signatures | Manage Embargo Policies | Manage Relationships
Add a New Datastream | Change a Datastream | Purge Object | Manage DOI | Reindex the Object
Title: Zion Lutheran Church Congregation, c. 1910
Identifier: <a href="http://hdl.rutgers.edu/1782.3/EHCHSPHOTO.Photograph.7921" title="http://hdl.rutgers.edu/1782.3/EHCHSPHOTO.Photograph.7921">http://hdl.rutgers.edu/1782.3/EHCHSPHOTO.Photograph.7921</a>
Identifier: rutgers-lib:14410
Date: 1890
10.
Content Provider Functions: Export Object | Validate Object | View Audit Trail
View Item Index | View Object Record | View Solr/Lucene and Object XML
Metadata Manager Functions: Edit Metadata | Export MARC 21
Repository Administrator Functions: Full Object Access | Manage Signatures | Manage Embargo Policies | Manage Relationships
Add a New Datastream | Change a Datastream | Purge Object | Manage DOI | Reindex the Object
Title: Zion Lutheran Church, Egg Harbor City, NJ, 1970
Identifier: <a href="http://hdl.rutgers.edu/1782.3/EHCHSPHOTO.Photograph.3121" title="http://hdl.rutgers.edu/1782.3/EHCHSPHOTO.Photograph.3121">http://hdl.rutgers.edu/1782.3/EHCHSPHOTO.Photograph.3121</a>
Identifier: rutgers-lib:11439
Date: 1970

#19

Yes, I am filtering them by portal key.

I get this response when searching on portalkey = FACULTY with the facet for firstletter in title.

Array
(
[titleletter_st] => Array
(
[T] => 141
[N] => 40
[M] => 29
[S] => 24
[R] => 22
[C] => 21
[D] => 21
[A] => 19
[P] => 17
[B] => 14
[U] => 12
[E] => 11
[L] => 11
[G] => 10
[K] => 9
[I] => 8
[O] => 7
[H] => 6
[V] => 5
[F] => 4
[J] => 3
[W] => 3
[Y] => 3
[0-9] => 2
[Q] => 1
[X] => 1
[Z] => 1
)
)

Q and X both return one record, Z return 0 records.

#20

Ah. When I restrict to portalkey:FACULTY I get the same results. but Z does hit this:

1.
Content Provider Functions: Export Object | Validate Object | View Audit Trail
View Item Index | View Object Record | View Solr/Lucene and Object XML
Metadata Manager Functions: Edit Metadata | Export MARC 21
Repository Administrator Functions: Full Object Access | Manage Signatures | Manage Embargo Policies | Manage Relationships
Add a New Datastream | Change a Datastream | Purge Object | Manage DOI | Reindex the Object
Title: Zwaaf Katrina Collection
Identifier: rucore30016300001
Identifier: <a href="http://hdl.rutgers.edu/1782.1/rep-devel.rucore21016300001.collection.04909" title="http://hdl.rutgers.edu/1782.1/rep-devel.rucore21016300001.collection.04909">http://hdl.rutgers.edu/1782.1/rep-devel.rucore21016300001.collection.049...</a>
Identifier: rutgers-lib:26859

#21

Hmm it's a collection object which are excluded by default from the "regular" results. I hate to say it but I think you need to excluded collection records from the faceting but index otherwise. If we don't the count will always be off and in this case through false positives.

#22

OK. It's done. When a collectioon object is detected, we skip over the letter facet fields. - so
<field name="collectionname">Technical and Automated Services Collection</field><field name="relation">rucore21016300001</field><field name="sortdate">sd:99990000--di:|dc:|cd:|do:--</field><field name="sortdate_i">99990000</field><field name="sorttitle">Zwaaf Katrina Collection</field>.

I'll reindex the whole set now.

#23

Assigned to:triggs» chadmills
Status:active» test

#24

Chad,

Not sure how I can test this. Can you test this? OR provide instructions for testing and I will be happy to test this.

#25

Kalaivani,

If you goto the SOAR site the browse by should have A-Z for Schools and Authors. I need to work on title still once the searching is completely working as expected. So you can partially test for now if you want to make sure all three work.

#26

Works as expected.

#27

Status:test» fixed

#28

Status:fixed» closed

Back to top