Difference between revisions of "Textract"

From UFRC
Jump to navigation Jump to search
(Created page with "Category:Software Category:Data Category:Data Management {|<!--CONFIGURATION: REQUIRED--> |{{#vardefine:app|textract}} |{{#vardefine:url|https://textract.readthedo...")
 
 
Line 1: Line 1:
 
[[Category:Software]]
 
[[Category:Software]]
[[Category:Data]]
+
[[Category:Data Science]]
[[Category:Data Management]]
+
[[Category:File Management]]
 
{|<!--CONFIGURATION: REQUIRED-->
 
{|<!--CONFIGURATION: REQUIRED-->
 
|{{#vardefine:app|textract}}
 
|{{#vardefine:app|textract}}

Latest revision as of 13:55, 21 October 2022

Description

textract website  

As undesireable as it might be, more often than not there is extremely useful information embedded in Word documents, PowerPoint presentations, PDFs, etc—so-called “dark data”—that would be valuable for further textual analysis and visualization. While several packages exist for extracting content from each of these formats on their own, this package provides a single interface for extracting content from any type of file, without any irrelevant markup.

Environment Modules

Run module spider textract to find out what environment modules are available for this application.

System Variables

  • HPC_TEXTRACT_DIR - installation directory