Module Catalogues, Xi'an Jiaotong-Liverpool University   
 
Module Code: DTS202TC
Module Title: Fundamentals of Parallel Computing
Module Level: Level 2
Module Credits: 5.00
Academic Year: 2021/22
Semester: SEM1
Originating Department: Shool of AI and Advanced Computing
Pre-requisites: N/A
   
Aims
This module is an introduction to parallel computing on modern supercomputer. The aim of this module is to provide students with an understanding of the modern processors and architectures of parallel computers, enable students to have preliminary ability in design and implementation of parallel algorithms in MPI.


Parallel programming with emphasis on developing applications for processors with many computation cores. Computational thinking, forms of parallelism, programming models, mapping computations to parallel hardware, efficient data structures, paradigms for efficient parallel algorithms, and application case studies.
Learning outcomes 
A. Identify serial and parallel algorithm

B. Appreciate basic principal and techniques in devising parallel algorithm

C. Devise and implement parallel algorithms

D. Acquire basic software development skill using MPI


E. Analyze and implement common parallel algorithm patterns in a parallel programming model such as CUDA.

F. Design experiments to analyze the performance bottlenecks in their parallel code.

G. Apply common parallel techniques to improve performance given hardware constraints.

H. Use a parallel debugger to identify and repair code defects; Use a parallel profiler to identify performance bottlenecks in their code.

I. Understand and apply common parallel algorithm patterns.

J. Understand the major types of hardware limitations that limit parallel program performance.

K. Identify and solve a computational problem with parallel algorithm design and program.
Method of teaching and learning 
Students will be expected to attend about 2 hours of a formal lecture and 1 hour for a tutorial or a lab section in a typical week. Lectures will introduce students to the academic content. Tutorials will be used to expand the students understanding of lecture materials. In addition, students will be expected to devote unsupervised time to private study. Private study will provide time for reflection and consideration of lecture material and background reading
Syllabus 
1. Parallel architecture – interconnection networks, processor arrays, multiprocessors, multicomputers, Flynn’s Taxonomy (3 lectures)

2. Parallel algorithm design – task/channel model, Foster’s design methodology (2 lectures)

3. Parallel algorithm implementation-examples (2 lectures)

4. Introduction to Message passing programming – model, interface, circuit satisfiability, collective communication, benchmarking parallel performance (4 lectures)

5. Introduction to CUDA C and Data Parallel Programming – data parallelism, Cuda C program structure, vector addition kernel, global memory and data transfer, kernel functions and threading, kernel launch (4 lectures)

6. Application Case Study – parallel patterns: convolution, prefix sum, parallel histogram computation, sparse matrix computation, merge sort, or graph search (2 lectures)

7. GPU as part of the PC Architecture - Model of Host/Device Interaction, kernel execution control (3 lectures)

8. GPU Data Transfer - Memory Bandwidth and Compute Throughput, (3 lectures)

9. Joint CUDA-MPI Programming - Message Passing Interface, Overlapping Computation and Communication, Message Passing Interface Collective Communication, CUDA-Aware Message Passing Interface (3 lectures)
Delivery Hours  
Lectures Seminars Tutorials Lab/Prcaticals Fieldwork / Placement Other(Private study) Total
Hours/Semester 26    13  13    98  150 

Assessment

Sequence Method % of Final Mark
1 Assignment 1(Groupwork) 10.00
2 Assignment 2(Groupwork) 10.00
3 Lab Report 20.00
4 Formal Exam 60.00

Module Catalogue generated from SITS CUT-OFF: 6/5/2020 5:37:29 PM