Difference between revisions of "NLP"

From UFRC
Jump to navigation Jump to search
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:Software]][[Category:NLP]][[Category:Machine Learning]]
+
[[Category:Software]][[Category:Machine Learning]][[Category:Data Science]]
 
{|<!--CONFIGURATION: REQUIRED-->
 
{|<!--CONFIGURATION: REQUIRED-->
 
|{{#vardefine:app|nlp}}
 
|{{#vardefine:app|nlp}}
Line 18: Line 18:
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
  
This page describes the collection of natural language processing software on HiperGator. NLP is involved in many other fields of AI, such as image recognition. Research computing will help with language modeling for conversational AI, measurement, classification tasks, etc. via [https://www.rc.ufl.edu/help/support-requests/ support requests] or [https://www.rc.ufl.edu/consultation-purchase-request-consultation/ consulting]. NVIDIA [https://github.com/NVIDIA/Megatron-LM Megatron] and [https://github.com/NVIDIA/NeMo NeMo] are open-source software using transformer neural networks that can scale to multiple nodes of GPUs. See the directory /data/ai for more information.
+
This page describes natural language processing software and resources on HiperGator. NLP is involved in many other fields of AI, such as image recognition. Research computing can help with language modeling for knowledge exploration, measurement, classification, summarization, conversational AI, or other uses via [https://www.rc.ufl.edu/help/support-requests/ support requests] or [https://www.rc.ufl.edu/consultation-purchase-request-consultation/ consulting].
  
 
<!--Modules-->
 
<!--Modules-->
 
==Environment Modules for NLP==
 
==Environment Modules for NLP==
*'''nlp:''' module load nlp will provide a Python environment with pytorch, torchtext, nltk, spacy, transformers, sentence-transformers, Flair, BERTopic for topic modeling, sentencepiece, RAPIDSai for data processing and machine learning algorithms, gensim, scikit-learn, and more.
+
*'''nlp:''' module load nlp provides a Python environment with pytorch, torchtext, nltk, Spacy, transformers, sentence-transformers, Flair, BERTopic for topic modeling, sentencepiece, RAPIDSai for data processing and machine learning algorithms, gensim, scikit-learn, and more.
  
  
*'''nemo:''' module load nemo will provide a singularity container environment with Python and Nvidia NeMo. NeMo has NLP task training, plus speech-to-text and text-to-speech models.  
+
*'''ngc-pytorch:''' module load ngc-pytorch will provides a singularity container Pythong environment with pytorch including the Nvidia Apex optimizers required for [https://github.com/NVIDIA/Megatron-LM Megatron-LM]. Research computing has pretrained, large parameter Megatron language models available to HiperGator users. See /data/ai/examples/nlp or [[AI_Examples]] for more information.
  
  
*'''pytorch:''' Note, use module spider pytorch to list the version we have available. Beyond stock pytorch versions, we have a Nvidia pytorch singularity container with the requirements for Megatron-LM. Use module load ngc-pytorch to access this container, and you can run Megatron from source code.  
+
*'''Flair NLP:''' See [[FlairNLP]] for more information.
 +
 
 +
 
 +
*'''nemo:''' module load nemo will provide a singularity container environment with Python and Nvidia NeMo. NeMo has NLP task training, plus speech-to-text and text-to-speech models, and the option to apply your own pretrained Megatron language models.
 +
 
 +
 
 +
*'''pytorch or tensorflow:''' Note, use module spider pytorch or tensorflow to list the version we have available. If the nlp environments or these environments do not have libraries you require, you made need to create a Conda environment. See [[Conda]] and [[Managing_Python_environments_and_Jupyter_kernels]] for more details.
  
  
Line 36: Line 42:
 
*'''parlai:''' Conversational AI framework by Facebook, includes a wide variety of models from 110M to 9B parameters.   
 
*'''parlai:''' Conversational AI framework by Facebook, includes a wide variety of models from 110M to 9B parameters.   
  
==Examples==
+
 
Please see our /data/ai folder for examples, reference data, and pretrained language models. Notebooks and batch scripts cover everything from pretraining and inferencing to summarization, information extraction, and topic modeling. Some of the reference data is listed on the reference data page on this help site.
+
==Examples and Reference Data==
 +
Please see /data/ai/ folder, [[AI_Examples]], and [[AI_Reference_Datasets]] for helpful resources. Notebooks and batch scripts cover everything from pretraining and inferencing to summarization, information extraction, and topic modeling. Addition reference data, including benchmarks such as the popular [https://super.gluebenchmark.com/tasks superglue], are already available in /data/ai/benchmarks/nlp.
  
 
<!--Configuration-->
 
<!--Configuration-->

Revision as of 17:22, 25 August 2022

Description

This page describes natural language processing software and resources on HiperGator. NLP is involved in many other fields of AI, such as image recognition. Research computing can help with language modeling for knowledge exploration, measurement, classification, summarization, conversational AI, or other uses via support requests or consulting.

Environment Modules for NLP

  • nlp: module load nlp provides a Python environment with pytorch, torchtext, nltk, Spacy, transformers, sentence-transformers, Flair, BERTopic for topic modeling, sentencepiece, RAPIDSai for data processing and machine learning algorithms, gensim, scikit-learn, and more.


  • ngc-pytorch: module load ngc-pytorch will provides a singularity container Pythong environment with pytorch including the Nvidia Apex optimizers required for Megatron-LM. Research computing has pretrained, large parameter Megatron language models available to HiperGator users. See /data/ai/examples/nlp or AI_Examples for more information.


  • Flair NLP: See FlairNLP for more information.


  • nemo: module load nemo will provide a singularity container environment with Python and Nvidia NeMo. NeMo has NLP task training, plus speech-to-text and text-to-speech models, and the option to apply your own pretrained Megatron language models.


  • pytorch or tensorflow: Note, use module spider pytorch or tensorflow to list the version we have available. If the nlp environments or these environments do not have libraries you require, you made need to create a Conda environment. See Conda and Managing_Python_environments_and_Jupyter_kernels for more details.


  • spark-nlp: See our Spark help doc to start a Spark cluster. Spark-nlp Python module is available in tensorflow/2.4.1.


  • parlai: Conversational AI framework by Facebook, includes a wide variety of models from 110M to 9B parameters.


Examples and Reference Data

Please see /data/ai/ folder, AI_Examples, and AI_Reference_Datasets for helpful resources. Notebooks and batch scripts cover everything from pretraining and inferencing to summarization, information extraction, and topic modeling. Addition reference data, including benchmarks such as the popular superglue, are already available in /data/ai/benchmarks/nlp.