Next: Limitation on NDF or HDS file sizes
Up: Handling Large Images
Previous: Handling Large Images
How large is large?
What is and is not a problem large enough to require special care
will depend on what is being done and on
the computer being used.
As a very rough indication, images smaller than 1000x1000
in most cases do not
count as large, and ones larger than
5000x5000
in most cases (at the time of writing) do;
for cases in-between it depends very much on the details.
The `size' of a data reduction problem is some ill-defined
function of, inter alia:
- number of pixels per frame,
- number of objects,
- number of frames:
the number of bias and flat field frames to be processed will be important
as well as the number of target object frames,
- overlap of frames:
some parts of the reduction process which compare objects or
backgrounds between frames will perform differently according
to how much overlap in coverage there is between frames.
The principal resources which can fall into short supply during
a data reduction process are as follows.
- Memory:
- a computer has a fixed amount of real memory (RAM;
Random Access Memory),
and also a part of the disk called swap space
which serves as an overflow if running processes need
more memory than the available RAM.
If there is insufficient real memory + swap space to run the
program, it will fail.
If there is insufficient real memory for the parts of the
program and data which are used simultaneously to be loaded
at once, a lot of time will be spent shifting data between
RAM and disk, and the program
(as well as other processes on the same machine)
will run painfully slowly.
Depending on the operating system and the way the machine is
set up, either of these eventualities can lead to
termination of other processes on the machine, or system crashes.
- Disk space:
- if there is insufficient disk space the program will fail.
If other processes are writing to the same disk partition they can fail
too.
- Input/Output:
- Input/Output (I/O) time, that is the time spent
waiting for data to be read from and written to disk,
will inevitably increase with large data sets.
I/O speed is likely to be fairly similar between different
low- or mid-range workstations
and servers, except in the case where a resource is being used
heavily by other processes at the same time;
on a busy server this may be the norm.
- CPU time:
- algorithms which are efficient with CPU (Central
Processor Unit) time for small
problems may become inefficient for large ones.
Speed of execution varies quite a lot between different machines.
Some guide is given by the nominal processor speed (in MHz or megaflops),
but when processing large data sets on a modern workstation or server,
the CPU time spent will normally be limited by memory bandwidth.
Bandwidth is not usually quoted as prominently as processor speed,
but is typically better on heavy duty servers than on smaller
workstations.
Normally the statistic which will actually concern you is
elapsed, or `wall clock' time, that is the number of minutes
or hours between starting a job off, and the results being available.
For a large data reduction job most of this time will typically be
spent in I/O, which may or may not include moving data between
real memory and swap space.
In a multi-user environment however it is important to consider
how your use of the machine is affecting the elapsed times
of other people's jobs, or other jobs of your own.
As a general rule then,
if your data reduction runs fast enough that it does not inconvenience
you or other people then you do not have a `large' problem.
Otherwise, the rest of this recipe may provide some useful tips.
Next: Limitation on NDF or HDS file sizes
Up: Handling Large Images
Previous: Handling Large Images
The 2-D CCD Data Reduction Cookbook
Starlink Cookbook 5
A.C. Davenhall, G.J. Privett & M.B. Taylor
16th August 2001
E-mail:starlink@jiscmail.ac.uk
Copyright © 2001 Council for the
Central Laboratory of the Research Councils