next up previous 63
Next: Limitation on NDF or HDS file sizes
Up: Handling Large Images
Previous: Handling Large Images


How large is large?

What is and is not a problem large enough to require special care will depend on what is being done and on the computer being used. As a very rough indication, images smaller than 1000x1000 in most cases do not count as large, and ones larger than

5000x5000 in most cases (at the time of writing) do; for cases in-between it depends very much on the details.

The `size' of a data reduction problem is some ill-defined function of, inter alia:

The principal resources which can fall into short supply during a data reduction process are as follows.

Memory:
a computer has a fixed amount of real memory (RAM; Random Access Memory), and also a part of the disk called swap space which serves as an overflow if running processes need more memory than the available RAM. If there is insufficient real memory + swap space to run the program, it will fail. If there is insufficient real memory for the parts of the program and data which are used simultaneously to be loaded at once, a lot of time will be spent shifting data between RAM and disk, and the program (as well as other processes on the same machine) will run painfully slowly. Depending on the operating system and the way the machine is set up, either of these eventualities can lead to termination of other processes on the machine, or system crashes.

Disk space:
if there is insufficient disk space the program will fail. If other processes are writing to the same disk partition they can fail too.

Input/Output:
Input/Output (I/O) time, that is the time spent waiting for data to be read from and written to disk, will inevitably increase with large data sets. I/O speed is likely to be fairly similar between different low- or mid-range workstations and servers, except in the case where a resource is being used heavily by other processes at the same time; on a busy server this may be the norm.

CPU time:
algorithms which are efficient with CPU (Central Processor Unit) time for small problems may become inefficient for large ones. Speed of execution varies quite a lot between different machines. Some guide is given by the nominal processor speed (in MHz or megaflops), but when processing large data sets on a modern workstation or server, the CPU time spent will normally be limited by memory bandwidth. Bandwidth is not usually quoted as prominently as processor speed, but is typically better on heavy duty servers than on smaller workstations.

Normally the statistic which will actually concern you is elapsed, or `wall clock' time, that is the number of minutes or hours between starting a job off, and the results being available. For a large data reduction job most of this time will typically be spent in I/O, which may or may not include moving data between real memory and swap space. In a multi-user environment however it is important to consider how your use of the machine is affecting the elapsed times of other people's jobs, or other jobs of your own. As a general rule then, if your data reduction runs fast enough that it does not inconvenience you or other people then you do not have a `large' problem. Otherwise, the rest of this recipe may provide some useful tips.


next up previous 63
Next: Limitation on NDF or HDS file sizes
Up: Handling Large Images
Previous: Handling Large Images

The 2-D CCD Data Reduction Cookbook
Starlink Cookbook 5
A.C. Davenhall, G.J. Privett & M.B. Taylor
16th August 2001
E-mail:starlink@jiscmail.ac.uk

Copyright © 2001 Council for the Central Laboratory of the Research Councils