Introductions

An Introduction to Linux and HPC

Dr Simon Hood and Dr Jonathan Boyle

Research Computing Services



 

Research Computing Services, RCS

  • Specialist part of IT Services.

Contact Details
What is Research Computing?
  • Computing to support research! Examples:
    • running complex simulations;
    • performing vast parameter searches.



 

Research Computing Examples

High Throughput Computing (HTC)
  • Large amounts of comp. power over a "long" time:
    • Running long jobs!
    • Running the same experiment many (1000s) times, with different inputs.
High Performance Computing (HPC)
  • Large amounts of comp. power over a "short" time:
    • many CPUs simultaneously to run complex models quicker.
  • Many compute-nodes' RAM simultaneously to handle very big jobs.
Data Analysis and Visualization
  • Getting the information out of the vast quantities of data.



 

How Does,Can RCS Help You? Free Stuff

Free Stuff!

Provision of resources:
  • Horace, Man1, Man2, Mace01, Redqueen. . .
  • Condor pools; NW-Grid, NGS.
Administration of HPC/HTC Clusters:
  • Administer and support University, NW-Grid, NGS and some school and research group HPC clusters.
Support and Training:
  • Documentation — Web and Wiki.
  • Courses!
  • Usage of HPC/HTC (Inc. Condor) clusters,
  • application support,



 

How Can RCS Help You? In-Depth Support

In-depth support and collaborations

Free dedicated short-term help
  • Advice on parallelisation of code, or
  • advanced use of HTC (inc. Condor).
More in-depth help and collaborations
  • Optimising code/models: scoping, estimate, coding — dedicated resources may require funding.
  • Example: one year's dedicated effort extracting maximumum performance. Named resource/researchers on RCUK/EU etc. grants.



 

Other Related Courses

Introduction to Condor:
Introduction to OpenMP:
Introduction to MPI:
Introduction to LaTeX:



 

All RCS Courses


Research Computing using Fortran 95 2, 3 Nov
Condor for HTC 10 Nov
Intro to Matlab; Programming in Matlab 11, 12 Nov
OpenMP: Shared Memory Programming 17 Nov
Advanced Matlab; Profiling, Optimisation and Numerical Methods in Matlab 18, 19 Nov
LaTeX for Researchers 24 Nov
Parallel Programming using Message Passing 25 Nov
Image-Based Modelling 1, 2 Dec



 

Course Organisation

Basic Linux Stuff
  • Necessary so you can get your research done!
  • GUI stuff; command-line power; using Linux and MSW simultaneously on the same machine; more. . .
How-Tos
Give context — divert to required Linux topics as required:
  • How do I run a job on an HPC cluster?
  • How can I get my results faster?
  • How can I work away from my office?
  • How can I backup my stuff?
Flexible and Future
  • Something particular you want covered? Ask. . .



 

Linux?

Linux?



 

Linux? What?

What is Linux?

Linux and its accomplices are:

  • an operating system;
  • graphical user interfaces (GUIs);
  • applications software;
. . .just like MS Windows and OS-X — except completely free!


More — Linux "newbie" sites:



 

Linux? Why?

Why on Earth should I use Linux?

  • Unix/Linux developed over many years by researchers for researchers — ideal environment for computational research!
  • Almost all HPC facilities have Linux as the OS.
  • Easier to collaborate and share resources with Linux:
    • SSH, X11;
    • multiple simultaneous users — desktop MS Win crippled. . .
  • Completely free (in both senses) — plus masses of free apps.
  • Linux is a very flexible system:
    • choice of distros and. . .
    • . . .of user-interface (GUI): GNOME vs KDE vs XFCE vs LXDE. . .
    • GUI and very powerful commandline;
    • open-source.



 

Basic Linux Stuff

In this module we describe:

  • user-interfaces (GUIs);
  • the commandline — BASH — and scripts;
  • file-handling, permissions, editors;
  • process-management and stdout/in/err;
  • using MS Windows and Linux at the same time;
  • distros;
  • getting Linux-related help at UoM.



 

The Edit, Compile, Link Cycle

The Edit, Compile, Link Cycle

  • Important for anyone who handles Fortran or C code.
  • Turning Fortran 77/90/95 or C/Obj-C/C++ code into executable programmes.



 

Why should I care?

  • Can't always use a commercial application or someone else's programme — may need to write your own.
  • Understanding this. . .
        error while loading shared libraries: libgfortran.so.1: 
        cannot open shared object file: No such file or directory
    . . .and what to do about it!


 

How do I run my code?

What:

In this module we look at:

  • Basic, necessary steps — editing and compiling.
  • The hidden step — linking.
  • Things that can and do go wrong!
  • [[Static vs. dynamic linking.]]




Not Matlab, Perl, Python, etc.



 

HPC?

HPC

What is HPC?

  • High-Performance Computing.
  • The application of "supercomputers" to computational problems that are
    • too big for desktops/laptops, or
    • would take too long.
  • A "supercomputer" has many CPUs/cores (hundreds, thousands. . .); a desktop computer has only 1, 2 or 4.



HPC

Why should I care about HPC?

I just run things on my desktop machine, can't I?

Yes
  • The CPUs in a desktop are similar to those in a "supercomputer"
But
  • Do you want to run 100%-CPU jobs on your desktop machine?
  • Would you like to:
    • Run a dozen or more jobs at once?
    • Run a (parallel) job on 64 or 128 CPUs/cores at once?
    • Use 32 GB RAM?
  • What happens if the cleaner unplugs your desktop 90% through a month-long computation?



HPC

How do I run a job on a HPC machine?

How?

In this module we describe the nature of HPC clusters, and how to:

  • remote, secure access to HPC machine (using SSH);
  • copy files to/from remote HPC machine (using SCP/SFTP);
  • understand nature and structure of HPC machine;
  • run programs/jobs and share resources with other users — submit job to batch/queue system.



HPC

Facilities and Services Available at UoM

HPC Facilities and Services Available at UoM

In this module we outline:

  • Man2 and NW-Grid
  • Mace01
  • Bennu
  • Condor pools
  • Horace
  • National Grid Service (NGS)



 

How can I get my results faster?

How can I get results faster?



How can I get my results faster?

Three options for faster results

Optimisation
  • make your program more efficient
High Throughput Computing (HTC)
  • run lots of copies of your program
High Performance Computing (HPC)/Parallelisation
  • run (one instance of) your program on many CPUs at the same time



How can I get my results faster?

Be specific! I want details!

Optimisation, Distribution, Parallelisation — How?

In this module we survey specifics:

  • Optimise your code:
    • compiler options
    • use available numerical libraries
    • optimise memory use
    • profiling
  • High throughput computing:
    • SGE Arrays
    • Condor
  • Parallelisation/HPC:
    • OpenMP
    • MPI






 

Remote Working

Remote Working

In this section we look at:

  • connecting to remote machines and file-transfer;
  • using GUIs remotely;
  • address the questions, "How can I work away from my office? Or using my laptop?"



Remote Working

Remote Sessions and Working from Home

Remote Working Tools
These the the tools of the trade:
Working From Home
I want to:
  • work from home;
  • connect to sessions I was running at work, from home;
  • do all my work on my laptop, at home, work and elsewhere.
In this module we outline how to do this (using the tools listed above).



 

Backups!

Backups!

Accidental Deletion
  • Everyone makes mistakes;
  • very difficult to get files back once deleted.
Catastrophic Hardware/Software Failure
  • It happens! ("It'll never happen to me. . .")
  • RAID helps — but is not a panacea.



Backups!

More Context and Motivation

Three Catastrophes

Usto-Oran
  • RAID redundancy of disks — but two disks failed!
  • Used as archive, but no backups!

Research Group at UMIST (Some years ago!)
  • All data and source code on one Solaris server.
  • All backed up nightly — but no one checked the backups!

Thesis MS Word File Corruption
  • On a more personal level. . .
  • File corrupted; hard copy only; type it all in again!



Backups!

How do I backup my stuff?

In this module we describe:

  • Why you should do backups!
  • Different types of backups.
  • Some practical methods of making backups of personal desktop and laptop machines.



 

Help and Community

Help and Community

  • Where can I get help?
    • Two sources. . .



Help and Community

Why a distinct route?

Very different from (mainstream) IT Services. . .

  • Mainstream IT Services deals with "commodity" services:
    • word-processing, email. . .
    • 1000 customers ~ 1 IT staff
  • Research computing is fundamentally different:
    • postgrads must learn a lot in a short time;
    • much wider and deeper support required;
    • 30 customers ~ 1 IT staff?



Help and Community

So what is there?

In this module we describe:

Formal:
  • School and faculty IS teams;
  • IT Services virtual helpdesk;
  • RCS!
Informal:
  • Wiki
  • Email list



 

Research Computing Support

The End (Almost)



 

Questions?



 

Possible Future Sections and Modules

More on Grid Stuff
Parallel Computation