Difference between revisions of "Globus"

From UFRC
Jump to navigation Jump to search
Line 3: Line 3:
  
 
Globus uses a grid-FTP network which uses the power of multiple servers to simultaneously transfer data.
 
Globus uses a grid-FTP network which uses the power of multiple servers to simultaneously transfer data.
 +
 +
UFRC maintains a managed Globus setup with multiple servers for the highest available bandwidth and filesystem throughput. The following managed endpoints are available at UFRC:
 +
* Globus Version 4: 'ufrc#hpg2'
 +
* Globsus Version 5: 'UFRC HiperGator
  
 
= Getting Started =
 
= Getting Started =
Line 9: Line 13:
  
 
==UFRC Globus Group==
 
==UFRC Globus Group==
;Note<nowiki>:</nowiki> this is not needed if you are transferring to ufrc#hpg2 or any other ''managed'' end-point.  
+
;Note<nowiki>:</nowiki> this is not needed if you are transferring to a collection created under a managed endpoint. To be able to create shared end-points (see below) on your local computer if the transfer will be happening to a '''non-managed endpoint''' e.g. another local computer with a personal end-point you will need to have Globus Plus User status. To obtain that status please log into the [https://www.globus.org Globus Interface], click on ''Groups'' menu at the top and select ''Search For Groups''. Search for 'University of Florida Research Computing' and access request to the group. Once approved, you will have Globus Plus User status when running 'Globus Connect Personal' software on your local computer.
To be able to create shared end-points (see below) on your local computer if the transfer will be happening to a '''non-managed endpoint''' e.g. another local computer with a personal end-point you will need to have Globus Plus User status. To obtain that status please log into the [https://www.globus.org Globus Interface], click on ''Groups'' menu at the top and select ''Search For Groups''. Search for 'University of Florida Research Computing' and access request to the group. Once approved, you will have Globus Plus User status when running 'Globus Connect Personal' software on your local computer.
 
  
 
==Globus Endpoints==
 
==Globus Endpoints==
Globus transfers files between two ''endpoints''. An endpoint is one of the two file transfer locations – either the source or the destination – between which files can move. Once a resource (server, cluster, storage system, laptop, or other system) is defined as an endpoint, it will be available to authorized users who can transfer files to or from this endpoint.
+
Globus transfers files between two ''endpoints'' or shared collections created under the endpoints. An endpoint is one of the two file transfer locations – either the source or the destination – between which files can move. Once a resource (server, cluster, storage system, laptop, or other system) is defined as an endpoint, it will be available to authorized users who can transfer files to or from this endpoint.
 
 
Globus endpoints are named using the following format: <globus-online-username>#<endpoint-name>. For example, Research Computing has an account under the username "ufrc" and so it's endpoint is named ufrc#hpg2. Likewise, an individual that has a Globus account under the username "jdoe" might have a personal endpoint called jdoe#mylaptop.
 
  
===UF Research Computing Endpoint===
+
===UF Research Computing Endpoints===
ufrc#hpg2 -- The primary Globus endpoint for /home, /ufrc and /orange filesystems.
+
The managed Globus endpoints for /home, /blue, and /orange filesystems are
 +
* Globus Version 4: 'ufrc#hpg2'
 +
* Globsus Version 5: 'UFRC HiperGator
 
   
 
   
A valid UF Research Computing account is required to access this endpoint. If you do not have a Research Computing account, you may [http://www.rc.ufl.edu/help/account-request/ request one here]. Please note that your username to activate a Globus endpoint is the same as the GatorLink username, and the password is the same as your GatorLink password. If you are having problems with your password, please submit a support request through the [http://helpdesk.ufl.edu/ UF Computing Help Desk].
+
A valid UF Research Computing account is required to access these endpoints. If you do not have a Research Computing account, you may [http://www.rc.ufl.edu/help/account-request/ request one here]. Please note that your username to activate a Globus endpoint is the same as the GatorLink username, and the password is the same as your GatorLink password. If you are having problems with your password, please submit a support request through the [http://helpdesk.ufl.edu/ UF Computing Help Desk].
  
===Shared Endpoints===
+
===Shared Collections===
UFRC users have Globus Plus user status, so they are able to create ''shared'' endpoints, which do not require a Research Computing account to connect to. In this mode Globus acts as a secure high-performance equivalent of Dropbox and other similar services. See [https://docs.globus.org/how-to/share-files/ Globus Sharing documentation] to learn how to create shared endpoints in ''just a few seconds''.
+
UFRC users automatically have Globus Plus user status on UFRC managed endpoints, so they are able to create ''shared collections'', which do not require a Research Computing account to connect to, only a Globus user account. In this mode Globus acts as a secure high-performance equivalent of Dropbox and other similar services. See [https://docs.globus.org/how-to/share-files/ Globus Sharing documentation] to learn how to create shared collections.
  
==Transferring Data Between Endpoints==
+
==Transferring Data==
To transfer data between two endpoints, log in to your account at [https://globus.org Globus.org]. In the Quick Links drop down, select Transfer Files. Enter the endpoint information and, if needed, authentication information for each endpoint. The example below, shows settings for a transfer between ufrc#go and the FSU Research Computing endpoint fsurc#lustre
+
To transfer data between two endpoints or shared collections, log in to your Globus account at [https://globus.org Globus.org]. In the Quick Links drop down, select Transfer Files. Enter the endpoint information and, if needed, authentication information for each endpoint. The example below, shows settings for a transfer between ufrc#go and the FSU Research Computing endpoint fsurc#lustre
 
[[File:Globus transfer.png|thumbnail]]
 
[[File:Globus transfer.png|thumbnail]]
  
After authenticating at both endpoints, navigate to the folder and select the files to transfer and click the arrow corresponding to the direction of the transfer. The transfer will be scheduled and you will receive an email when the transfer has completed.
+
After activating both endpoints, navigate to the folder and select the files to transfer and click the arrow corresponding to the direction of the transfer. The transfer will be scheduled and you will receive an email when the transfer has completed.
  
 
==Advanced Transfer Options ==
 
==Advanced Transfer Options ==

Revision as of 19:17, 2 February 2021

Globus is an easy-to-use, high-performance data transfer tool developed by the Computation Institute, the University of Chicago and Argonne National Laboratory. UF Research Computing has deployed Globus as one mechanism to facilitate data transfer to and from HiPerGator.

Globus uses a grid-FTP network which uses the power of multiple servers to simultaneously transfer data.

UFRC maintains a managed Globus setup with multiple servers for the highest available bandwidth and filesystem throughput. The following managed endpoints are available at UFRC:

  • Globus Version 4: 'ufrc#hpg2'
  • Globsus Version 5: 'UFRC HiperGator

Getting Started

Getting a Globus Account

Globus will redirect you to UF GatorLink authentication when you log into Globus.org. If you created your globus.org account before Globus started using the new authentication you'll need to link your old globus @globusid.org account with your @ufl.edu account.

UFRC Globus Group

Note: this is not needed if you are transferring to a collection created under a managed endpoint. To be able to create shared end-points (see below) on your local computer if the transfer will be happening to a non-managed endpoint e.g. another local computer with a personal end-point you will need to have Globus Plus User status. To obtain that status please log into the Globus Interface, click on Groups menu at the top and select Search For Groups. Search for 'University of Florida Research Computing' and access request to the group. Once approved, you will have Globus Plus User status when running 'Globus Connect Personal' software on your local computer.

Globus Endpoints

Globus transfers files between two endpoints or shared collections created under the endpoints. An endpoint is one of the two file transfer locations – either the source or the destination – between which files can move. Once a resource (server, cluster, storage system, laptop, or other system) is defined as an endpoint, it will be available to authorized users who can transfer files to or from this endpoint.

UF Research Computing Endpoints

The managed Globus endpoints for /home, /blue, and /orange filesystems are

  • Globus Version 4: 'ufrc#hpg2'
  • Globsus Version 5: 'UFRC HiperGator

A valid UF Research Computing account is required to access these endpoints. If you do not have a Research Computing account, you may request one here. Please note that your username to activate a Globus endpoint is the same as the GatorLink username, and the password is the same as your GatorLink password. If you are having problems with your password, please submit a support request through the UF Computing Help Desk.

Shared Collections

UFRC users automatically have Globus Plus user status on UFRC managed endpoints, so they are able to create shared collections, which do not require a Research Computing account to connect to, only a Globus user account. In this mode Globus acts as a secure high-performance equivalent of Dropbox and other similar services. See Globus Sharing documentation to learn how to create shared collections.

Transferring Data

To transfer data between two endpoints or shared collections, log in to your Globus account at Globus.org. In the Quick Links drop down, select Transfer Files. Enter the endpoint information and, if needed, authentication information for each endpoint. The example below, shows settings for a transfer between ufrc#go and the FSU Research Computing endpoint fsurc#lustre

Globus transfer.png

After activating both endpoints, navigate to the folder and select the files to transfer and click the arrow corresponding to the direction of the transfer. The transfer will be scheduled and you will receive an email when the transfer has completed.

Advanced Transfer Options

At the bottom of the transfer window, click the More Options link. The drop down will display advanced transfer options. Many of these are useful for file synchronization. There is also an option to encrypt the data transfer, an important option for sensitive data.

End-point activation time

The default end-point activation time is 24 hrs. If your transfer is expected to take longer and you don't want to click on the link that will be sent to you in an email to re-activate the end-point then use the Advanced transfer options and change the activation time up to the maximum possible value of 168 hrs.

Globus Connect Personal

It is possible to transfer data to and from your own computer using the Globus Connect Personal Client. Select Manage Endpoints from the Manage Data drop down, and click the add Globus Connect Personal. Name the endpoint and click the Generate Setup Key. Copy that key. Download and start the install process for the client for your operating system. When prompted, paste the Setup Key into the installation box.

To transfer data to and from your computer, the Globus Connect Personal client must be running. Select it as one of the endpoints for your transfer.

Additional information

For additional support information, please see the Globus Support Site, especially the How To Guides and FAQs.

Transfer Scenarios

Let's consider a few hypothetical transfer of data scenarios. In cases involving more than one user it's necessary to know the Globus username of the remote user.

Managed End-Point Transfer between UFRC and a remote site

Note
We'll call a Research Computing user RCUser for brevity. The sender or a receiver on the other name will be called Remote_User.

Scenario 1: RCUser has an account at the remote site.

  • RCUser logs into Globus
  • RCUser authenticates against managed endpoints on both ends like ufrc#hpg2 for /ufrc at UFRC.
  • RCUser starts a transfer

Scenario 2: RCUser does not have a remote account. There is a collaborator (RemoteUser) on the other end. Remote site has a Globus Subscription.

  • Remote_User logs into Globus and creates a shared Globus end-point (shared directory) on their storage

and gives write access to the RCUser

  • RCUser logs into Globus
  • RCUser authenticates against ufrc#hpg2 for /ufrc at UFRC and against the shared end-point on the remote end.
  • RCUser starts a transfer

Scenario 3: RCUser does not have a remote account. There is a collaborator (RemoteUser) on the other end. Remote site does not have a Globus Subscription.

  • RCUser logs into Globus and creates a shared Globus end-point (shared directory) within ufrc#hpg2 and

and gives read access to the Remote_User.

  • Remote_User logs into Globus
  • Remote_User authenticates against the managed end-point on the remote end and against the shared end-point created by RCUser.
  • Remote_User starts a transfer
Note

To transfer from a remote site to UFRC reverse the sequence in scenarios 2 and 3.

Personal End-Point Transfers

Since UFRC has a Globus Subscription our users who requested access to the "University of Florida Research Computing" Globus group can create Shared End-Points using Globus Connect Personal running a local computer instead of within ufrc#hpg2 managed end-point at UFRC. The following scenarios are useful when transferring data not hosted on HiPerGator.

Scenario 1: RCUser receives data from a Remote User at an external entity, which does not have a Globus Subscription

  • RCUser runs Globus Connect Personal
  • RCUser creates a shared end-point on the local desktop pointing either to internal or attached (e.g. USB Drive) storage and gives write access to the Remote User.
  • Remote_User either logs into Globus and authenticates against their managed end-point or a local end-point created with Globus Connect Personal as well as against the shared end-point created by RCUser.
  • Remote_User transfers data to RCUser's shared end-point.

Scenario 2: RCUser receives data from a Remote User at an external entity, which has have a Globus Subscription or Globus Plus User status.

  • Remote User either logs into Globus and authenticates against their managed end-point or a local end-point created with Globus Connect Personal.
  • Remote User creates a shared end-point on their end and gives read access to RCUser.
  • RCUser runs Globus Connect Personal and creates a local end-point on the local computer pointing either to internal or attached (e.g. USB Drive) storage.
  • RCUser logs into Globus and authenticates against their local end-point created with Globus Connect Personal and against the remote shared end-point created by Remote User.
  • RCUser transfers data to RCUser's local end-point.

Reverse the scenarios for transfers from a local end-point to a remote end-point.

Globus Transfer Walkthrough

Let's consider a hypothetical scenario in which two people - Alice and Bob need to conduct a data transfer from an external system to HiPerGator or vice versa. Alice works at UF and has a HiPerGator account while Bob is an external collaborator or vendor without a HiPerGator account. Alice is in a group called topresearch. The first question they need to do is log into https://globus.org and tell each other their Globus usernames. Since Alice is at UF she can choose 'University of Florida' on the login screen and use her GatorLink credentials to log in. Bob can use their institution's credentials if possible or create a new free globus.org account. Next,

Alice
  • Log into https://globus.org with your GatorLink credentials.
  • Activate ufrc#hpg2 collection with your GatorLink credentials.
  • Browse to /ufrc/topresearch/share or type it into the ‘Path’ field. Substitute for your own group name, of course.
  • Create a directory for data intake by clicking on ‘New Folder’. For example we'll use an 'incoming' directory.
Bob
  • Log into https://globus.org with your existing globus account credentials, your institutional credentials if your organization has a Globus license, or create a free globus.org account if you don’t have any.
  • If your organization has a Globus server follow the same collection activation and browsing to the data folder steps listed for Alice. If you don't have a Globus server you'll have to use Globus Connect Personal on the machine that has a connection to a drive with data
  • During the download you create a personal collection (also called an endpoint), which is your personal globus address for your Globus Connect Personal installation. Click here to create a Globus Connect Personal collection.
  • Once you start GCP go into preferences and add the path to the directory with data to a list of permitted paths, so Globus would have permissions to read or write your data to or from a particular folder as necessary. Add as many paths as you need.

Since Alice and Bob shared their usernames they can set up a transfer in either direction at this point. Will the data be transferred by Bob or by Alice i.e. who will push the button and who needs to give that person a permission to access their data? If Bob is responsible for the transfer then Alice needs to give him write access to the 'incoming' directory or read access to a directory with data.

  • Transfer-in i.e. write access: Alice has to browse inside the 'incoming' directory in globus transfer application (globus.org), click on the 'Share' button once inside the incoming directory, put in Bob's globus username, and select ‘write’ in permissions. Give the share a name and tell Bob what the name is. At that point everything is set and Bob can connect to that share and start a transfer.
  • Transfer-out i.e. read access: Alice can browse to any directory with data in globus transfer application after activating ufrrc#hpg2 collection, click on the 'Share' button once inside that directory, give the new share a name, put in Bob's globus username, and select 'read' in permissions. Bob can now connect to that share and transfer data over to his system.