Difference between revisions of "Data Science Platform"

From UFRC
Jump to navigation Jump to search
Line 12: Line 12:
 
*'''[[Python]]:''' <code>module spider python</code> has been built with extraordinary Python libraries such as NumPy, SciPy, Pandas, Matplotlib, Scrapy, and BeautifulSoup. These libraries are essential for data science and are utilized daily by programmers to solve problems.
 
*'''[[Python]]:''' <code>module spider python</code> has been built with extraordinary Python libraries such as NumPy, SciPy, Pandas, Matplotlib, Scrapy, and BeautifulSoup. These libraries are essential for data science and are utilized daily by programmers to solve problems.
  
*'''[[Tensorflow]]:''' <code>module spider tensorflow</code>  an open-source software library commonly used for implementing artificial neural networks and deep learning. TensorFlow is widely used in data science for building and training complex machine learning models, offering scalable and flexible tools for deep learning, numerical computation, and large-scale optimization, with extensive support for both research and production deployments.
+
*'''[[TensorFlow]]:''' <code>module spider tensorflow</code>  an open-source software library commonly used for implementing artificial neural networks and deep learning. TensorFlow is widely used in data science for building and training complex machine learning models, offering scalable and flexible tools for deep learning, numerical computation, and large-scale optimization, with extensive support for both research and production deployments.
  
 
*'''[[Pytorch]]:''' use <code>module spider pytorch</code> is a Python-based scientific computing package that uses the power of graphics processing units. PyTorch can be used effectively in various data science applications, especially those that involve complex numerical computations or the development of custom machine learning models.  
 
*'''[[Pytorch]]:''' use <code>module spider pytorch</code> is a Python-based scientific computing package that uses the power of graphics processing units. PyTorch can be used effectively in various data science applications, especially those that involve complex numerical computations or the development of custom machine learning models.  

Revision as of 18:11, 8 May 2024

Description

Data Science Platform on HiperGator includes several different software environments and examples. Data science is the practice of using math, programming, analytics, AI, and machine learning to discover valuable insights within large data sets. Research Computing provides essential infrastructure, tools, and expertise to support data science research and accelerate impactful discoveries, or other uses via support requests or consulting.

Platform for Data Science

  • SQL: module spider sqldeveloper Oracle SQL Developer is a free integrated development environment that simplifies the development and management of Oracle Database in both traditional and Cloud deployments. module spider sqlit SQLite is an in-process library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.
  • R: module spider R a free software environment for statistical computing and graphics. R is favored in data science for its extensive suite of tools and packages for data manipulation, statistical modeling, and visualization, making it ideal for tasks ranging from simple data analysis to complex predictive modeling.
  • Python: module spider python has been built with extraordinary Python libraries such as NumPy, SciPy, Pandas, Matplotlib, Scrapy, and BeautifulSoup. These libraries are essential for data science and are utilized daily by programmers to solve problems.
  • TensorFlow: module spider tensorflow an open-source software library commonly used for implementing artificial neural networks and deep learning. TensorFlow is widely used in data science for building and training complex machine learning models, offering scalable and flexible tools for deep learning, numerical computation, and large-scale optimization, with extensive support for both research and production deployments.
  • Pytorch: use module spider pytorch is a Python-based scientific computing package that uses the power of graphics processing units. PyTorch can be used effectively in various data science applications, especially those that involve complex numerical computations or the development of custom machine learning models.
  • Scikit-learn: module spider scikit-learn set of python modules for machine learning and data mining. It provides simple and efficient tools for predictive data analysis. Scikit-learn is built on NumPy, SciPy, and matplotlib, and is open source and commercially usable under a BSD license.
  • Rapidsai module spider rapidsai accelerates end-to-end data science pipelines by providing a familiar dataframe API. Rapidsai supports machine learning integration without typical serialization costs, enabling multi-node, multi-GPU deployments for faster processing of large datasets.

If the environments or these platforms do not have the libraries you require, you may need to create a Conda environment. See Conda and Managing_Python_environments_and_Jupyter_kernels for more details.

Examples and Reference Data

Please see /data/ai/ folder, AI_Examples, and AI_Reference_Datasets for helpful resources. Addition references, such as how to run RAPIDs on HiPergator are already available in /data/ai/examples/rapids.