Difference between revisions of "NLP"

From UFRC
Jump to navigation Jump to search
Line 18: Line 18:
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
  
This page describes the collection of natural language processing software on HiPerGator. Research computing can provide help with language modeling for conversational AI, measurement, or classification tasks via [https://www.rc.ufl.edu/help/support-requests/ support requests] or [https://www.rc.ufl.edu/consultation-purchase-request-consultation/ consultations]. NVIDIA [https://github.com/NVIDIA/Megatron-LM Megatron] and [https://github.com/NVIDIA/NeMo NeMo] is open-source software using transformer neural networks that we suggest for applications and research. Example notebooks are in /data/ai/examples/text.
+
This page describes the collection of natural language processing software on HiperGator. NLP is involved in many other fields of AI, such as image recognition. Research computing will help with language modeling for conversational AI, measurement, classification tasks, etc. via [https://www.rc.ufl.edu/help/support-requests/ support requests] or [https://www.rc.ufl.edu/consultation-purchase-request-consultation/ consulting]. NVIDIA [https://github.com/NVIDIA/Megatron-LM Megatron] and [https://github.com/NVIDIA/NeMo NeMo] are open-source software using transformer neural networks that can scale to multiple nodes of GPUs. See the directory /data/ai for more information.
  
 
<!--Modules-->
 
<!--Modules-->
 
==Environment Modules for NLP==
 
==Environment Modules for NLP==
*'''nlp:''' module load nlp will provide a Python environment with pytorch, torchtext, nltk, spacy, sentencepiece for vocabulary training, transformers, sentence-transformers, Flair, BERTopic for topic modeling, RAPIDSai for data processing and machine learning algorithms, gensim, scikit-learn, and more.
+
*'''nlp:''' module load nlp will provide a Python environment with pytorch, torchtext, nltk, spacy, transformers, sentence-transformers, Flair, BERTopic for topic modeling, sentencepiece, RAPIDSai for data processing and machine learning algorithms, gensim, scikit-learn, and more.
  
  
Line 28: Line 28:
  
  
*'''pytorch:''' Note, use module spider pytorch to list the version we have available. Beyond stock pytorch versions, we have the Nvidia pytorch singularity container with the Apex optimizers required for Megatron-LM. Use module load ngc-pytorch to access this container, and you can run Megatron from source code.  
+
*'''pytorch:''' Note, use module spider pytorch to list the version we have available. Beyond stock pytorch versions, we have a Nvidia pytorch singularity container with the requirements for Megatron-LM. Use module load ngc-pytorch to access this container, and you can run Megatron from source code.  
  
  
Line 36: Line 36:
 
*'''parlai:''' Conversational AI framework by Facebook, includes a wide variety of models from 110M to 9B parameters.   
 
*'''parlai:''' Conversational AI framework by Facebook, includes a wide variety of models from 110M to 9B parameters.   
  
==Models and Examples==
+
==Examples==
Please see our /data/ai folder for examples, reference data, and pretrained language models.
+
Please see our /data/ai folder for examples, reference data, and pretrained language models. Notebooks and batch scripts cover everything from pretraining and inferencing to summarization, information extraction, and topic modeling. Some of the reference data is listed on the reference data page on this help site.
  
 
<!--Configuration-->
 
<!--Configuration-->

Revision as of 13:41, 10 June 2022

Description

This page describes the collection of natural language processing software on HiperGator. NLP is involved in many other fields of AI, such as image recognition. Research computing will help with language modeling for conversational AI, measurement, classification tasks, etc. via support requests or consulting. NVIDIA Megatron and NeMo are open-source software using transformer neural networks that can scale to multiple nodes of GPUs. See the directory /data/ai for more information.

Environment Modules for NLP

  • nlp: module load nlp will provide a Python environment with pytorch, torchtext, nltk, spacy, transformers, sentence-transformers, Flair, BERTopic for topic modeling, sentencepiece, RAPIDSai for data processing and machine learning algorithms, gensim, scikit-learn, and more.


  • nemo: module load nemo will provide a singularity container environment with Python and Nvidia NeMo. NeMo has NLP task training, plus speech-to-text and text-to-speech models.


  • pytorch: Note, use module spider pytorch to list the version we have available. Beyond stock pytorch versions, we have a Nvidia pytorch singularity container with the requirements for Megatron-LM. Use module load ngc-pytorch to access this container, and you can run Megatron from source code.


  • spark-nlp: See our Spark help doc to start a Spark cluster. Spark-nlp Python module is available in tensorflow/2.4.1.


  • parlai: Conversational AI framework by Facebook, includes a wide variety of models from 110M to 9B parameters.

Examples

Please see our /data/ai folder for examples, reference data, and pretrained language models. Notebooks and batch scripts cover everything from pretraining and inferencing to summarization, information extraction, and topic modeling. Some of the reference data is listed on the reference data page on this help site.