Difference between revisions of "Massmine"

From UFRC
Jump to navigation Jump to search
Line 1: Line 1:
 
[[Category:Software]]
 
[[Category:Software]]
 
{|<!--CONFIGURATION: REQUIRED-->
 
{|<!--CONFIGURATION: REQUIRED-->
|{{#vardefine:app|MassMine}}
+
|{{#vardefine:app|massmine}}
|{{#vardefine:url|http://www.massmine.org/}}
+
|{{#vardefine:url|http://www.massmine.org}}
 
<!--CONFIGURATION: OPTIONAL (|1}} means it's ON)-->
 
<!--CONFIGURATION: OPTIONAL (|1}} means it's ON)-->
 
|{{#vardefine:conf|}}          <!--CONFIGURATION-->
 
|{{#vardefine:conf|}}          <!--CONFIGURATION-->
|{{#vardefine:exe|1}}            <!--ADDITIONAL INFO-->
+
|{{#vardefine:exe|}}            <!--ADDITIONAL INFO-->
 
|{{#vardefine:pbs|}}            <!--PBS SCRIPTS-->
 
|{{#vardefine:pbs|}}            <!--PBS SCRIPTS-->
 
|{{#vardefine:policy|}}        <!--POLICY-->
 
|{{#vardefine:policy|}}        <!--POLICY-->
 
|{{#vardefine:testing|}}      <!--PROFILING-->
 
|{{#vardefine:testing|}}      <!--PROFILING-->
 
|{{#vardefine:faq|}}            <!--FAQ-->
 
|{{#vardefine:faq|}}            <!--FAQ-->
|{{#vardefine:citation|1}}      <!--CITATION-->
+
|{{#vardefine:citation|}}      <!--CITATION-->
 
|{{#vardefine:installation|}} <!--INSTALLATION-->
 
|{{#vardefine:installation|}} <!--INSTALLATION-->
 
|}
 
|}
Line 19: Line 19:
  
 
MassMine is a social media mining and archiving application that simplifies the process of collecting and managing large amounts of data across multiple sources. It is designed with the researcher in mind, providing a flexible framework for tackling individualized research needs. MassMine is designed to run both on personal computers and dedicated servers/clusters. MassMine handles credential authorizations, data acquisition & archiving, as well as customized data export and analysis.
 
MassMine is a social media mining and archiving application that simplifies the process of collecting and managing large amounts of data across multiple sources. It is designed with the researcher in mind, providing a flexible framework for tackling individualized research needs. MassMine is designed to run both on personal computers and dedicated servers/clusters. MassMine handles credential authorizations, data acquisition & archiving, as well as customized data export and analysis.
 
+
<!--Modules-->
<!--Modules ##No modules at this time
 
 
==Required Modules==
 
==Required Modules==
  
 
===Serial===
 
===Serial===
 
* {{#var:app}}
 
* {{#var:app}}
 +
<!--
 
===Parallel (OpenMP)===
 
===Parallel (OpenMP)===
 
* intel
 
* intel
Line 33: Line 33:
 
* {{#var:app}}
 
* {{#var:app}}
 
-->
 
-->
<!-- No system variables at this time
 
 
==System Variables==
 
==System Variables==
 
* HPC_{{#uppercase:{{#var:app}}}}_DIR - installation directory
 
* HPC_{{#uppercase:{{#var:app}}}}_DIR - installation directory
-->
+
* HPC_{{#uppercase:{{#var:app}}}}_BIN - executable directory
 +
* HPC_{{#uppercase:{{#var:app}}}}_LIB - libary directory
 +
 
 
<!--Configuration-->
 
<!--Configuration-->
 
{{#if: {{#var: conf}}|==Configuration==
 
{{#if: {{#var: conf}}|==Configuration==
Line 44: Line 45:
 
{{#if: {{#var: exe}}|==Additional Information==
 
{{#if: {{#var: exe}}|==Additional Information==
  
===Installing and Running MassMine at UF Research Computing===
+
WRITE_ADDITIONAL_INSTRUCTIONS_ON_RUNNING_THE_SOFTWARE_IF_NECESSARY
====You will need:====
 
*UF Research Computing Account: [http://www.rc.ufl.edu/help/account-request/ apply here].
 
*A [https://twitter.com/ Twitter] account
 
*An SSH client:
 
**Mac: Terminal is installed at /Applications/Utilities/Terminal
 
**Windows: Most users use [http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html PuTTY]
 
*An SFTP client:
 
**See our page on [[FileZilla]]
 
====Connecting to Research Computing====
 
The best host for running MassMine is daemon1.rc.ufl.edu. There is some additional information about this host [[Daemons|here]]. If you have a Mac or Linux machine, in the Terminal application, type:
 
ssh username@daemon1.rc.ufl.edu
 
Use your username, and enter your password when prompted. Note that when you type your password, nothing will appear.
 
 
 
If you have a Windows machine, open PuTTY and enter username@daemon1.rc.ufl.edu in the Hostname box of the window and click Open. Enter your password when prompted, again nothing will appear when you type your password.
 
 
 
====Downloading MassMine====
 
MassMine is hosted on GitHub and is under active development. At this time we are suggesting that each user download their own copy of MassMine and update it periodically.
 
We recommend installing MassMine in your /scratch/lfs directory. The following is an example of the steps needed to download MassMine. Note the examples below were run for the user magitz, use your username where you see that.
 
 
 
[magitz@daemon1 ~]$ cd /scratch/lfs/magitz/
 
[magitz@daemon1 magitz]$ git clone https://github.com/n3mo/massmine.git
 
Initialized empty Git repository in /scratch/lfs/magitz/massmine/.git/
 
remote: Counting objects: 199, done.
 
remote: Total 199 (delta 0), reused 0 (delta 0)
 
Receiving objects: 100% (199/199), 147.03 KiB, done.
 
Resolving deltas: 100% (106/106), done.
 
[magitz@daemon1 ~]$
 
 
 
====First Run Initialization====
 
The first time MassMine is run, it checks for some needed R packages and installs them if not found. Here are the steps, picking up from the download above.
 
 
 
[magitz@daemon1 magitz]$ cd massmine/
 
[magitz@daemon1 massmine]$ module load R
 
[magitz@daemon1 massmine]$ ./massmine
 
 
 
 
 
Note the "module load R" command. This gets the system ready to run R, the language in which MassMine was written. The module system is not a standard Linux feature, but greatly facilitates running applications at Research Computing. Learn more about [[Modules|modules here]].
 
When prompted, type "yes" to install the needed R packages.
 
 
 
====Creating your Twitter app====
 
To create a Twitter app for MassMine to use follow these steps
 
#Go to https://apps.twitter.com/
 
#Make sure you are logged in
 
#Click the Create New App button
 
#Fill in the form, read and accept the developer agreement, and click Create your Twitter Application
 
 
 
====Configure MassMine====
 
MassMine is configured with a configuration file. There is an example file to use as a start at massmine/examples/mmconfig. Make a copy of the example into the main directory and edit this file using the nano text editor.
 
[magitz@daemon1 massmine]$ cp examples/mmconfig .
 
[magitz@daemon1 massmine]$ nano mmconfig
 
Using your arrow keys move around and change the following lines for your Twitter app:
 
mm_apps:
 
  - TwitterAppName
 
 
 
mm_keys:
 
  - YourKeyGoesHere
 
 
 
mm_secrets:
 
  - YourSecretGoesHere
 
 
 
The mm_keys is the Consumer KEY (API Key) and mm_secrets is the Consumer Secret (API Secret) which you can get from the Keys and Access Tokens tab on the Twitter app page. Be sure to keep the "-" and one space between it and the App name, key and secret.
 
 
 
The additional configuration steps are outlined on the [http://www.massmine.org/docs/twitter.html MassMine configuration page].
 
 
 
====Authorizing MassMine to use your App====
 
The last step to allow MassMine to use your App is to authorize it. Here's the steps (this assumes the R module is already loaded, Note the > on the second line is the R command prompt, don't type this):
 
[magitz@daemon1 massmine]$ R
 
> source("massmine")
 
Please choose an account to authenticate:
 
  <Option> RC_test
 
Choose => RC_test
 
To enable the connection, please direct your web browser to:
 
https://api.twitter.com/oauth/authorize?oauth_token=rK9ttLR...
 
When complete, record the PIN given to you and provide it here: 1234567
 
MassMine is loaded and ready to go
 
> q()
 
Copy and paste the URL into a web browser and click Authorize app. Type the number in at the prompt and hit return.
 
Quit R by typing q() and typing n to not save your workspace.
 
 
 
====Running MassMine====
 
 
 
You are now ready to run MassMine! Simply type:
 
[magitz@daemon1 massmine]$ ./massmine mmconfig
 
 
 
 
 
====Running MassMine for Long Periods====
 
 
 
If you configure MassMine to use the streaming API and set a run time longer than you want to keep your ssh connection open, you can run MassMine in the background. Either start MassMine as above and then type control-Z, then "bg" or modify the command above by adding the & character:
 
[magitz@daemon1 massmine]$ ./massmine mmconfig &
 
 
 
You can log out of the daemon1 node and MassMine will continue to run.
 
 
 
====Retrieving Your Data====
 
MassMine saves your data in the file specified in the mmconfig file. You can download this file using an SFTP client like FileZilla. Please see the [[FileZilla]] page for more information on using this tool. You can also mount your scratch space on a computer on the UF network using samba as outlined [[Samba_Access|here]]. Or use GatorBox as outlined [[GatorBox|here]].
 
 
 
 
 
  
 
|}}
 
|}}
Line 166: Line 71:
 
If you publish research that uses {{#var:app}} you have to cite it as follows:
 
If you publish research that uses {{#var:app}} you have to cite it as follows:
  
Van Horn, N. M. & Beveridge, A. (2014). MassMine: Your Access To Big Data. http://www.massmine.org
+
WRITE_CITATION_HERE
  
 
|}}
 
|}}

Revision as of 21:49, 28 August 2015

Description

massmine website  

MassMine is a social media mining and archiving application that simplifies the process of collecting and managing large amounts of data across multiple sources. It is designed with the researcher in mind, providing a flexible framework for tackling individualized research needs. MassMine is designed to run both on personal computers and dedicated servers/clusters. MassMine handles credential authorizations, data acquisition & archiving, as well as customized data export and analysis.

Required Modules

Serial

  • massmine

System Variables

  • HPC_{{#uppercase:massmine}}_DIR - installation directory
  • HPC_{{#uppercase:massmine}}_BIN - executable directory
  • HPC_{{#uppercase:massmine}}_LIB - libary directory