Difference between revisions of "NLP"

From UFRC
Jump to navigation Jump to search
(Created page with "Category:SoftwareCategory:NLP {|<!--CONFIGURATION: REQUIRED--> |{{#vardefine:app|nlp}} |{{#vardefine:url|}} <!--CONFIGURATION: OPTIONAL (|1}} means it's ON)--> |{{#var...")
 
(9 intermediate revisions by 2 users not shown)
Line 14: Line 14:
 
|}
 
|}
 
<!--BODY-->
 
<!--BODY-->
<!--Description-->
+
==Description==
 
{{#if: {{#var: url}}|
 
{{#if: {{#var: url}}|
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
  
A collection of natural language processing libraries. This includes but not limited to:
+
This page describes the collection of natural language processing software on HiPerGator. Research computing can provide help with language modeling for conversational AI, measurement, or classification tasks via [https://www.rc.ufl.edu/help/support-requests/ support requests] or [https://www.rc.ufl.edu/consultation-purchase-request-consultation/ consultations]. NVIDIA [https://github.com/NVIDIA/Megatron-LM Megatron] and [https://github.com/NVIDIA/NeMo NeMo] is open-source software using transformer neural networks that we suggest for applications and research. Example notebooks are in /data/ai/examples/text.
* pytorch
 
* torchtext
 
* rapidsai
 
* bertopic
 
* nltk
 
* gensim
 
* spacy
 
* scikit-learn
 
  
 +
<!--Modules-->
 +
==Environment Modules for NLP==
 +
*'''nlp:''' module load nlp will provide a Python environment with pytorch, torchtext, nltk, spacy, sentencepiece for vocabulary training, transformers, sentence-transformers, BERTopic for topic modeling, RAPIDSai for data processing and machine learning algorithms, gensim, scikit-learn, and more.
  
<!--Modules-->
+
 
==Environment Modules==
+
*'''nemo:''' module load nemo will provide a singularity container environment with Python and Nvidia NeMo. NeMo has NLP task training, plus speech-to-text and text-to-speech models.
Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.
+
 
==System Variables==
+
 
* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
+
*'''pytorch:''' Note, use module spider pytorch to list the version we have available. Beyond stock pytorch versions, we have the Nvidia pytorch singularity container with the Apex optimizers required for Megatron-LM. Use module load ngc-pytorch to access this container, and you can run Megatron from source code.  
* HPC_{{uc:{{#var:app}}}}_BIN - executable directory
+
 
 +
 
 +
*'''spark-nlp:''' See our [[Spark]] help doc to start a Spark cluster. Spark-nlp Python module is available in tensorflow/2.4.1.
 +
 
 +
 
 +
*'''parlai:''' Conversational AI framework by Facebook, includes a wide variety of models from 110M to 9B parameters. 
  
 
<!--Configuration-->
 
<!--Configuration-->

Revision as of 20:07, 20 August 2021

Description

This page describes the collection of natural language processing software on HiPerGator. Research computing can provide help with language modeling for conversational AI, measurement, or classification tasks via support requests or consultations. NVIDIA Megatron and NeMo is open-source software using transformer neural networks that we suggest for applications and research. Example notebooks are in /data/ai/examples/text.

Environment Modules for NLP

  • nlp: module load nlp will provide a Python environment with pytorch, torchtext, nltk, spacy, sentencepiece for vocabulary training, transformers, sentence-transformers, BERTopic for topic modeling, RAPIDSai for data processing and machine learning algorithms, gensim, scikit-learn, and more.


  • nemo: module load nemo will provide a singularity container environment with Python and Nvidia NeMo. NeMo has NLP task training, plus speech-to-text and text-to-speech models.


  • pytorch: Note, use module spider pytorch to list the version we have available. Beyond stock pytorch versions, we have the Nvidia pytorch singularity container with the Apex optimizers required for Megatron-LM. Use module load ngc-pytorch to access this container, and you can run Megatron from source code.


  • spark-nlp: See our Spark help doc to start a Spark cluster. Spark-nlp Python module is available in tensorflow/2.4.1.


  • parlai: Conversational AI framework by Facebook, includes a wide variety of models from 110M to 9B parameters.