Parallel
computing is a form of computation in which many calculations are carried out
simultaneously,operating on the principle that large problems can often be
divided into smaller ones, which are then solved concurrently ("in
parallel"). There are several different forms of parallel computing:
bit-level, instruction level, data, and task parallelism. Parallelism has been
employed for many years, mainly in high-performance computing, but interest in
it has grown lately due to the physical constraints preventing frequency
scaling. As power consumption (and consequently heat generation) by computers
has become a concern in recent years, parallel computing has become the
dominant paradigm in computer architecture, mainly in the form of multi-core processors.
Distributed
computing is a field of computer science that studies distributed systems. A
distributed system is a software system in which components located on
networked computers communicate and coordinate their actions by passing
messages. The components interact with each other in order to achieve a common
goal. Three significant characteristics of distributed systems are: concurrency
of components, lack of a global clock, and independent failure of components.
Examples of distributed systems vary from SOA-based systems to massively
multiplayer online games to peer-to-peer applications.
Parallel
computing is the simultaneous execution of the same task (split up and
specially adapted) on multiple processors in order to obtain faster results.
There are many different kinds of parallel computers (or "parallel
processors"). Flynn's taxonomy classifies parallel (and serial) computers
according to whether all processors execute the same instructions at the same
time (single instruction/multiple data -- SIMD) or each processor executes
different instructions (multiple instruction/multiple data -- MIMD). They are also
distinguished by the mode used to communicate values between processors.
Distributed memory machines communicate by explicit message passing, while
shared memory machines have a global memory address space, through which values
can be read and written by the various processors.
Thread of
execution is the smallest sequence of programmed instructions that can be
managed independently by an operating system scheduler. The scheduler itself is
a light-weight process. The implementation of threads and processes differs
from one operating system to another, but in most cases, a thread is contained
inside a process. Multiple threads can exist within the same process and share
resources such as memory, while different processes do not share these
resources. In particular, the threads of a process share the latter's
instructions (its code) and its context (the values that its variables
reference at any given moment). On a single processor, multithreading is
generally implemented by time-division multiplexing (as in multitasking): the
processor switches between different threads. This context switching generally
happens frequently enough that the user perceives the threads or tasks as
running at the same time. On a multiprocessor or multi-core system, threads can
be truly concurrent, with every processor or core executing a separate thread
simultaneously.
CUDA (Compute
Unified Device Architecture) is a parallel computing platform and programming
model created by NVIDIA and implemented by the graphics processing units (GPUs)
that they produce.[1] CUDA gives program developers direct access to the
virtual instruction set and memory of the parallel computational elements in
CUDA GPUs. Using CUDA, the GPUs can be used for general purpose processing
(i.e., not exclusively graphics); this approach is known as GPGPU. Unlike CPUs,
however, GPUs have a parallel throughput architecture that emphasizes executing
many concurrent threads slowly, rather than executing a single thread very
quickly.