Escaping special characters

Project:RUcore/NJDH/Partner Portal Search
Version:6.1
Component:API - Search API
Category:bug report
Priority:normal
Assigned:ananthan
Status:closed
Description

The following special characters are escaped before submitting the query to Solr no matter what. This is done in the background.

+
-
&&
||
!
(
)
{
}
[
]
^
~
:
\

Double quotes need to be treated specially. If a single instance of a " appears, it needs escaping. If two instances appear in a query string then no escaping is needed.

Comments

#1

Assigned to:chadmills» rmarker
Status:active» test

The resolution to the double quote issue is that if a pair, or even number of double quotes are submitted in the search I do not escape them. If an odd number of double quotes are submitted I escape all of them. If I do not do this Solr will not be able to handle the query and error out.

Please test phrase searching using double quotes. Also, you can test wildcard searches using the * sign. Finally proximity searching can be tested using the ~ operator immediately following by a numeric value.

Proximity example 9 words apart in a title search returns one result:

<a href="http://rep-test.libraries.rutgers.edu/rucore/search/results.php?key=ETD-RU&amp;q1=%22Philosophies+genre%22+~9&amp;q1field=mods%3AtitleInfo" title="http://rep-test.libraries.rutgers.edu/rucore/search/results.php?key=ETD-RU&amp;q1=%22Philosophies+genre%22+~9&amp;q1field=mods%3AtitleInfo">http://rep-test.libraries.rutgers.edu/rucore/search/results.php?key=ETD-...</a>

Proximity example 8 words apart in a title search returns no results:

<a href="http://rep-test.libraries.rutgers.edu/rucore/search/results.php?key=ETD-RU&amp;q1=%22Philosophies+genre%22+~8&amp;q1field=mods%3AtitleInfo" title="http://rep-test.libraries.rutgers.edu/rucore/search/results.php?key=ETD-RU&amp;q1=%22Philosophies+genre%22+~8&amp;q1field=mods%3AtitleInfo">http://rep-test.libraries.rutgers.edu/rucore/search/results.php?key=ETD-...</a>

#2

Assigned to:rmarker» chadmills
Status:test» active

Proximity search did not work. Search of
maps ~10 text
within Scholarly Collections portal returned this error:

Warning: SolrClient::query() [solrclient.query]: in /mellon/htdocs/rucore/api/search/lib/class.query.solr.php on line 256

Fatal error: Uncaught exception 'SolrClientException' with message 'Unsuccessful query request : Response Code 400. <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/> <title>Error 400 org.apache.lucene.queryParser.ParseException: Cannot parse '(metadata:maps ~10 text) AND portalkey:(scholarship) NOT type:(collection)': Minimum similarity for a FuzzyQuery has to be between 0.0f and 1.0f !</title> </head> <body><h2>HTTP ERROR 400</h2> <p>Problem accessing /solr/select/. Reason: <pre> org.apache.lucene.queryParser.ParseException: Cannot parse '(metadata:maps ~10 text) AND portalkey:(scholarship) NOT type:(collection)': Minimum similarity for a FuzzyQuery has to be between 0.0f and 1.0f !</pre></p><hr /><i><small>Powered by Jetty://</small></i><br/> <br/> <br/> <br/> <br/> in /mellon/htdocs/rucore/api/search/lib/class.query.solr.php on line 256

#3

Status:active» test

Currently to perform a proximity search the syntax is:

"terms as a phrase" ~[number of words apart]

In the submitted test query that would translate to:

"maps text" ~10

This might not be ideal, it is just what Solr excepts out of the box. The errors reported should not have happened, they were caused by the lack of double quotes. I have added some checking for double quotes and that will hopefully counteract the error. When a submission occurs without double quotes the proximity operator will be escaped behind the scenes, thus ignored by Solr.

In general we should talk about what is the best syntax for our users to perform proximity searches with; as I think the out of the box Solr solution might not fit our users expectations well.

#4

Assigned to:chadmills» rmarker

#5

All of the escaped characters appear to work properly except ~.

A (normal) search of
dogs cats
yields 3 records. A search of simply
dogs
yields 11 records. A search of
dogscats
yields no records.

A search of two terms with any of these special characters between them with no space
dogs+cats
dogs-cats
dogs!cats
dogs~cats
etc. yields 1 record, consistently.

A search of two terms with any of these special characters between them with a space before, after, or before and after
dogs +cats
dogs + cats
dogs -cats
dogs - cats
dogs !cats
dogs! cats
etc. yields 3 records. This is expected.

A search of one of these special characters at the beginning of the search term
!dogs
^dogs
(dogs
(dogs)
etc. yields 11 records. This is expected.

However, a search of the special character ~ beginning the search term
~dogs
~dogs cats
yields no records, unlike any of the other searches.

#6

Assigned to:rmarker» chadmills

#7

Assigned to:chadmills» rmarker

Behind the scenes the ~ was being double escaped under some cases the '~dogs' and '~dogs cats' were one of those cases. It should nto be escaped twice any more. please test using previous cases.

Thanks.

#8

Assigned to:rmarker» ananthan
Status:test» fixed

Searched ~dogs and ~dogs cats, and got the same search results as !dogs and !dogs cats, etc. Verified fixed.

#9

Status:fixed» closed

Back to top