Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To create the full-text versions of documents that allow for this, DSpace uses a media filter that runs nightly and extracts text (via OCR) from newly deposited documents.

By default these filters on TDL-hosted repositories extract the first 200,000 characters of text and index it for full-text searching. This default number works for the majority of documents in most repositories, but it may not catch everything in very long documents (for example, in large yearbooks or newspapers). Additionally, DSpace 7.6 contains a bug that affects the full-text indexing of large documents. If the media filter hits the 200,000 character limit, it fails and does not index any of the text for that item.

...