Introduction to Cloud Computing for Genomics: Launching your own instances

Launching and logging onto a cloud instance

This document will cover common questions and advanced topics that we won’t get to discuss in the lessons.

Launching cloud VMs on your own

Objectives:

Important Caveat: For many of the above steps, your instructor will have completed these tasks for you. If you are learning on your own, or trying this after the workshop, see the exercises and instructions below.

Choosing a cloud platform

The most important thing about The Cloud is choice - instead of purchasing a physical computer, you can obtain on-demand computing at almost any scale. This power comes with advantages and disadvantages:

Advantages of Cloud Computing

Disadvantages of Cloud Computing

Cloud platform choices

There are several cloud providers to choose from. Some scientific clouds may either be free or allocate resources competitively. Commercial clouds are can be very powerful, but choice can be overwhelming. We will cover as much as we you need to get through the Data Carpentry lessons, but you will ultimately need to learn things not covered here so see the documentation below:

Commercial Clouds

Open Science Clouds

Launching an Cloud Instance (Virtual Machine)

We will provide instructions for working on the Amazon and beta instructions for the Atmosphere cloud. Follow the instructions for your platform of choice.

Please select the platform you wish to use for the exercises:

AWS (Amazon EC2 instances)

In this tutorial, we’ll show you how to launch your own instance.

If you want to go through the the genomics lesson outside of a workshop, or if you just want to use AWS to process your own data, you’ll need to have your own account and launch your own instance.

These instructions will help you launch an instance identical to the one we use in workshops, and and this tutorial is not meant to be exhaustive, but for minor changes:

  • You can launch any publicly available image by changing the AMI that you load in step 4 below.
  • You can choose not to load an AMI at all, and install your own software later by skipping step 4 below.
  • If you want to launch a different set of resources- more processors, more disk space, different operating system, etc; you can choose a different option at step 5.
  • For more detailed instructions, and how-tos for other changes consult EC2 Getting Started

Launching an instance on Amazon Web Services

Prerequisites

* You can use some of Amazon Web Services for free, or see if you qualify for an AWS Grant (See: https://aws.amazon.com/grants/ ) if you are using AWS for education. The free level of service will not be sufficient for working with the amount of data we are using for our lessons.

Create an AWS account

  1. Go to Amazon Web Services https://aws.amazon.com/
  2. Follow the button to sign up for an account - you will need to agree to Amazon’s terms and conditions and provide credit card information.

Sign into AWS and Launch an Instance

  1. Sign into AWS EC2 Dashboard: https://console.aws.amazon.com/ec2/

  2. Click the ‘Launch Instance’ button
  3. In ‘Step 1’ you will be asked to choose an Amazon Machine Image (AMI), on the lefthand side, look for ‘Community AMIs’ and then search for ami-373ab74d“Data Carpentry Genomics release 1.0”; select this image
  4. For ‘Choose and Instance Type’ select t2.medium; then click ‘Review and Launch’

    Tip: You can select larger instances, but these will cost more to run. Also for the ‘Review Instance Launch’ you will get some warning about instance security. We will not go into detail here about security groups. However, you should know that when you launch an instance, you will want to take steps to make sure no one maliciously accesses your instance. For example, never give your security credentials (next step) to another user.

  5. For the ‘Review Instance Launch’ step, click ‘Launch’
  6. You will be asked to Select an existing key pair or create a new key pair. Select ‘Proceed without a key pair’ and select the acknowledgement you are given. Then click ‘Launch Instances’

You instance will now be launched. You should follow the links to ‘Create billing alerts’.

Connect to Amazon Instance

  1. Log into your AWC EC2 Dashboard https://console.aws.amazon.com/ec2/
  2. You should see that you have one instance, to proceed the instance state must be ‘running’ (if you just launched the instance it will take <5 min for the instance to start running)
  3. At the bottom of the dashboard, you should see a Public DNS which will look something like ec2.12.2.45.678.compute-1.amazonaws.com. Copy that address (you may wish make a note of it as you will need this each time you connect.)
  4. Open the terminal application on your Mac and use ‘ssh’ to connect. Your command will be:
$ ssh dcuser@your.amazon.dns
  1. Your computer will be unable to verify the authenticity of the host; type yes to continue connecting
  2. Then enter the password for this computer: ‘data4Carp’

You should now be connected to your personal instance. You can confirm this with the following commands; whoami,pwd,ls, which should yield the following results:

Last login: Thu Jul 30 13:21:08 2015 from 8.sub-70-197-200.myvzw.com
$ whoami
dcuser
$ pwd
/home/dcuser
$ ls
dc_sample_data	Downloads  openrefine-2.6-beta.1  Trimmomatic-0.32
Desktop		FastQC	   R
$

#Very Important Warning - Avoid Unwanted Charges Please remember, for as long as this instance is running, you will be charged for your usage. You can see an estimate of the current charge from your AWS EC2 dashboard by clicking your name (Account name) on the upper right of the dashboard and selecting ‘Billing & Cost Management’. DO NOT FORGET TO TERMINATE YOUR INSTANCE WHEN YOU ARE DONE

Terminating your instance

When you are finished with your instance, you must terminate. Follow the following steps.

  1. Sign into AWS and go to the EC2 Dashboard: https://console.aws.amazon.com/ec2/
  2. Under ‘Resources’ select ‘Running Instances’
  3. Select the instance you wish to terminate, then click ‘Actions’
  4. Under ‘Instance State’ select terminate.

    Warning: This will delete any data on this instance, so you must move any data you wish to save off the instance.

    Tip: You can use iCommands to move data between your computer, a cloud instance, and the iPlant Data Store. iCommands is installed on the Data Carpentry Amazon AMI. You can download and see documentation for iCommands here - there is also some documentation on setting up iCommands in the Atmosphere section below

  5. Select ‘Yes, Terminate’ to terminate the instance.

Instructions for PC

AWS (Amazon EC2 instances)

In this tutorial, we’ll show you how to launch your own instance.

If you want to go through the the genomics lesson outside of a workshop, or if you just want to use AWS to process your own data, you’ll need to have your own account and launch your own instance.

These instructions will help you launch an instance identical to the one we use in workshops, and and this tutorial is not meant to be exhaustive, but for minor changes:

  • You can launch any publicly available image by changing the AMI that you load in step 4 below.
  • You can choose not to load an AMI at all, and install your own software later by skipping step 4 below.
  • If you want to launch a different set of resources- more processors, more disk space, different operating system, etc; you can choose a different option at step 5.
  • For more detailed instructions, and how-tos for other changes consult EC2 Getting Started

Launching an instance on Amazon Web Services

Prerequisites

* You can use some of Amazon Web Services for free, or see if you qualify for an AWS Grant (See: https://aws.amazon.com/grants/ ) if you are using AWS for education. The free level of service will not be sufficient for working with the amount of data we are using for our lessons.

Create an AWS account

  1. Go to Amazon Web Services https://aws.amazon.com/
  2. Follow the button to sign up for an account - you will need to agree to Amazon’s terms and conditions and provide credit card information.

Sign into AWS and Launch an Instance

  1. Sign into AWS EC2 Dashboard: https://console.aws.amazon.com/ec2/

  2. Click the ‘Launch Instance’ button

  3. In ‘Step 1’ you will be asked to choose an Amazon Machine Image (AMI), on the lefthand side, look for ‘Community AMIs’ and then search for ami-373ab74d; select this image

  4. For ‘Choose and Instance Type’ select t2.medium

    Tip: You can select larger instances, but these will cost more to run. Also for the ‘Review Instance Launch’ you will get some warning about instance security. We will not go into detail here about security groups. However, you should know that when you launch an instance, you will want to take steps to make sure no one maliciously accesses your instance. For example, never give your security credentials (next step) to another user.

  5. Click on “6. Configure Security Group” in the top navigation menu. Choose “select an existing security group” under “assign a security group”, and choose “sg-a5dba9ef” which has the name “to use with DC genomics AMI”.

  6. Click “Review and Launch” and the bottom of the screen, and then click ‘Launch’

  7. You will be asked to Select an existing key pair or create a new key pair. Select ‘Proceed without a key pair’ and select the acknowledgement you are given. Then click ‘Launch Instances’.

You instance will now be launched. You should follow the links to ‘Create billing alerts’.

Connecting to your Amazon Instance

  1. Download the PuTTY application at: http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe

  2. Log into your AWC EC2 Dashboard https://console.aws.amazon.com/ec2/

  3. You should see that you have one instance, make sure instance state is ‘running’ (if you just launched the instance it will take <5 min for the instance to start running)

  4. At the bottom of the dashboard, you should see a Public DNS which will look something like ec2.12.2.45.678.compute-1.amazonaws.com. Copy that address (you may wish make a note of it as you will need this each time you connect.)

  5. Start PuTTY. In the section ‘Specify the destination you want to connect to’ for ‘Host Name (or IP address)’ paste in the DNS address and click ‘Open’

  6. When prompted to login as, enter ‘dcuser’; you may be notified that the authenticity of the host cannot be verified - if so, ignore the warning an continue connecting

  7. When prompted for a password enter ‘data4Carp’

You should now be connected to your personal instance. You can confirm this with the following commands; whoami,pwd,ls, which should yield the following results:

Last login: Thu Jul 30 13:21:08 2015 from 8.sub-70-197-200.myvzw.com
$ whoami
dcuser
$ pwd
/home/dcuser
$ ls
dc_sample_data	Downloads  openrefine-2.6-beta.1  Trimmomatic-0.32
Desktop		FastQC	   R
$

#Very Important Warning - Avoid Unwanted Charges Please remember, for as long as this instance is running, you will be charged for your usage. You can see an estimate of the current charge from your AWS EC2 dashboard by clicking your name (Account name) on the upper right of the dashboard and selecting ‘Billing & Cost Management’. DO NOT FORGET TO TERMINATE YOUR INSTANCE WHEN YOU ARE DONE

Terminating your instance

When you are finished with your instance, you must terminate. Follow the following steps.

  1. Sign into AWS and go to the EC2 Dashboard: https://console.aws.amazon.com/ec2/
  2. Under ‘Resources’ select ‘Running Instances’
  3. Select the instance you wish to terminate, then click ‘Actions’
  4. Under ‘Instance State’ select terminate.

    Warning: This will delete any data on this instance, so you must move any data you wish to save off the instance.

    Tip: You can use iCommands to move data between your computer, a cloud instance, and the iPlant Data Store. iCommands is installed on the Data Carpentry Amazon AMI. You can download and see documentation for iCommands here - there is also some documentation on setting up iCommands in the Atmosphere section below

  5. Select ‘Yes, Terminate’ to terminate the instance.

Launching an instance on Atmosphere - These instructions are in beta and not fully tested!

Prerequisites

  • You must have an iPlant account (register at https://user.cyverse.org/ ) - You must also request access to Atmosphere (see documentation CyVerse requires Atmosphere users to have a valid .edu or .org email address )

Sign into Atmosphere and launch an instance

  1. Sign into Atmosphere at: http://atmo.iplantcollaborative.org/
  2. Select ‘Launch New Instance’
  3. Search for the ‘TSW Workshop Williams 1.2’ image; Select this image.
  4. Click Launch and then select the following options in the launch wizard:
    • Instance Name:Enter a name(optional)
    • Base Image Version: 1.0
    • Project: Choose an existing or create a new project.
    • Provider: iPlant Cloud - Tucson
    • Instance Size: small2 Click Launch.

    Tip: You can select a larger instance, but you must have sufficient resources (no exceeded quotas). If Atmosphere is at high capacity, you may not be able to launch an instance if the instance size exceeds what is currently free on the cloud provider

Your instance should be ready in 10-15 minutes. When your instance status is listed as ‘active’ you will be able to connect.

Connect to Atmosphere instance

Instructions for Mac\Linux

  1. If necessary, log into your Atmosphere at: https://atmo.iplantcollaborative.org/application/images
  2. Click on ‘Projects’ and select the project that contains your instance.
  3. Verifying that your instance status is ‘active’, copy the IP address (e.g. 128.123.12.34) to your clipboard.
  4. Open the terminal application and use ‘ssh’ to connect. Your command will be:

    $ ssh iplantusername@your.atmosphere.ipaddress
    
  5. Your computer will be unable to verify the authenticity of the host… type yes to continue connecting
  6. When prompted for a password, enter your iPlant username.

You should now be connected to your personal instance. You can confirm this with the following commands; whoami,pwd, which should yield the following results:

Welcome to Ubuntu 12.04.2 LTS (GNU/Linux 3.2.0-37-virtual x86_64)

New release '14.04.2 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

    _   _                             _
   / \ | |_ _ __ ___   ___  ___ _ __ | |__   ___ _ __ ___
  / _ \| __| '_ ` _ \ / _ \/ __| '_ \| '_ \ / _ \ '__/ _ \
 / ___ \ |_| | | | | | (_) \__ \ |_) | | | |  __/ | |  __/
/_/   \_\__|_| |_| |_|\___/|___/ .__/|_| |_|\___|_|  \___|
                               |_|

iPlant Collaborative

The user manual is located here: http://goo.gl/2pT72
For assistance, contact support@iplantcollaborative.org.  

Last login: Fri Aug 14 10:16:50 2015 from dhcp140-78.cshl.edu
iplantusername@vm65-164:~$ whoami
iplantusername
iplantusername@vm65-164:~$ pwd
/home/iplantusername

Note: In the above example ‘iplantusername’ will be your actual iPlant username.

Instructions for PC

  1. Download the PuTTY application at: http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe
  2. If necessary, log into your Atmosphere at: http://atmo.iplantcollaborative.org/
  3. Click on ‘Projects’ and select the project that contains your instance.
  4. Verifying that your instance status is ‘active’, copy the IP address (e.g. 128.123.12.34) to your clipboard.
  5. Start PuTTY. In the section ‘Specify the destination you want to connect to’ for ‘Host Name (or IP address)’ paste in the DNS address and click ‘Open’
  6. When prompted to login as, enter your iPlant username; you may be notified that the authenticity of the host cannot be verified - if so, ignore the warning an continue connecting
  7. When prompted for a password enter your iPlant password.

You should now be connected to your personal instance. You can confirm this with the following commands; whoami,pwd, which should yield the following results:

Welcome to Ubuntu 12.04.2 LTS (GNU/Linux 3.2.0-37-virtual x86_64)

New release '14.04.2 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

    _   _                             _
   / \ | |_ _ __ ___   ___  ___ _ __ | |__   ___ _ __ ___
  / _ \| __| '_ ` _ \ / _ \/ __| '_ \| '_ \ / _ \ '__/ _ \
 / ___ \ |_| | | | | | (_) \__ \ |_) | | | |  __/ | |  __/
/_/   \_\__|_| |_| |_|\___/|___/ .__/|_| |_|\___|_|  \___|
                               |_|

iPlant Collaborative

The user manual is located here: http://goo.gl/2pT72
For assistance, contact support@iplantcollaborative.org.  

Last login: Fri Aug 14 10:16:50 2015 from dhcp140-78.cshl.edu
iplantusername@vm65-164:~$ whoami
iplantusername
iplantusername@vm65-164:~$ pwd
/home/iplantusername

Note: In the above example ‘iplantusername’ will be your actual CyVerse username.

Bringing sample data into your Atmosphere instance

The sample dataset is NOT included on the Atmosphere instance. The very first time you load the instance, you will need to copy data from the public CyVerse Data Store into your instance using the following instructions.

Setup iCommands

Prerequisites

  • You must be connected to your Atmosphere instance

iCommands will allow you to quickly transfer data into your Atmosphere instance:

  1. Initialize iCommands using the following command

    $ iinit
    
  2. You will then be asked to setup your account and will need to enter the following information

    Prompt Entry
    irodsHost data.iplantcollaborative.org
    port 1247
    zone iplant
    irodsUserName your iplant username
    Current iRODS password your iplant password
  3. Verify that you have connected to your iPlant Data Store; view the contents of your home directory using the following the ils command:

    $ ils
    

    Copy Sample Data to your instance

  4. Enter the following command:

    $ iget -rPVT /iplant/home/shared/iplant_training/data_carpentry_ngs/dc_sampledata_lite .
    
  5. using ls you should be able to verify you have downloaded the dc_sampledata_lite directories and files.

Tip: You can use iCommands to move data between your computer, a cloud instance, and the CyVerse Data Store. iCommands is installed on the Data Carpentry Amazon AMI. You can download and see documentation for iCommands here

Terminating your Atmosphere instance

When you are finished with your instance, you must terminate. Follow the following steps.

  1. If necessary, sign into Atmosphere: http://atmo.iplantcollaborative.org/
  2. Locate your instance (e.g. select ‘Projects’ and then select the project that contains the instance you will terminate.)
  3. Click on the instance name to get to the ‘Actions’ menu. You may then select the ‘Delete’ button to terminate the instance.

    Warning: This will delete any data on this instance, so you must move any data you wish to save off the instance (for example by using iCommands).

  4. After reading the warning and confirming you are ready, click ‘Terminate’ to terminate the instance.