Abstract - character translation

Category:bug report

Some characters do not translate correctly in the abstract field. I am not sure if this is due to students copying and pasting from a PDF document. Here is an example"

<mods:abstract>Generic Subversions: De-Formations of Character in the Popular Imagination
by Theresa L. Geller
Dissertation Director:
Richard Dienst
Genres rely on audience expectation&#38;#08212;its implicit &#38;#08220;contract&#38;#08221;&#38;#08212;to do their narrative work, part



Also if the student copies and pastes the abstract from a PDF, some characters are being translated incorrectly. Example:

efficient ->eccient (the second c appears like a cent sign. I could not find this character in the character map on my PC)


Oh, this is so funny! I copy/pasted from the initial post and asked this question:

What should the text in the initial post really look like?

expectation&#08212;its implicit &#08220;contract&#08221;&#08212;to

When I previewed my comment, the CORRECT characters displayed instead of the garbled ones! I'm going to send in this comment anyway.



cm2 -> should be cm to the power of 2
VBR2/RON,SP -> BR should be lowered after V; 2 should be a power of BR or R; ON,SP should be lowered after R.


Status:active» fixed

Removed much of the twisted logic that was in place. Will need to test and move forward slowly.


Version:2.1.0» 2.2.0
Status:fixed» test


Status:test» active

Still some of the characters are translated incorrectly when copied and pasted from a PDF document into abstract field when submitting an ETD.


PDF and word doc attached to test the character problems. The word doc has the problematic characters in abstract that were giving problem to import into WMS.


Status:active» test

It seems the trouble characters were ASCII control codes characters decimal value below 32. In order to remedy during the XML file creation those characters that are identified as control codes are replaced with a space.

New function in 'library/export_functions.php' was added called replaceAsciiControlCodes() which uses a character by character lookup using ctype_cntrl() to determine if it needs to be replaced.

For reference:

<a href="http://www.asciitable.com/" title="http://www.asciitable.com/">http://www.asciitable.com/</a>

<a href="http://us2.php.net/manual/en/function.ctype-cntrl.php" title="http://us2.php.net/manual/en/function.ctype-cntrl.php">http://us2.php.net/manual/en/function.ctype-cntrl.php</a>


Status:test» active

I copied abstract from a "real" ETD and pasted into RUetd abstract field. When I exported this ETD from RUetd for WMS testing, I got the following error message:

Warning: htmlentities(): charset `ANSI_X3.4-1968' not supported, assuming iso-8859-1 in /srv/www/htdocs/etd/etd_test/library/export_functions.php on line 269 Warning: htmlentities(): charset `ANSI_X3.4-1968' not supported, assuming iso-8859-1 in /srv/www/htdocs/etd/etd_test/library/export_functions.php on line 269 Warning: htmlentities(): charset `ANSI_X3.4-1968' not supported, assuming iso-8859-1 in /srv/www/htdocs/etd/etd_test/library/export_functions.php on line 269 Warning: htmlentities(): charset `ANSI_X3.4-1968' not supported, assuming iso-8859-1 in /srv/www/htdocs/etd/etd_test/library/export_functions.php on line 269

Abstract metadata does not exist in the XML.


Status:active» test

Sorry about that. I set the default to UTF-8 and tested this with the examples you have. No errors should be reported now. Please test once again.


Status:test» fixed

I was able to export those ETDs that gave error message. I will test WMS import tomorrow.


Status:fixed» closed

The trouble characters are being replaced with a space. The import works OK.

Back to top