A) Use the fastest library routines. The nodes have fast dense linear routines. E.g. If these routines are used in the code to solve systems of linear equations, a large increase in speed may be possible by linking with the vendor supplied routines. Link with -lacml rather than non-optimized libraries. B) Change to a more efficient algorithm. This is the best since you get your answers quicker. AIT's HPC group can help you with numerical aspects and some algorithm choices, but you would need to supply the modeling knowledge. C) Go parallel. The code can be recoded using MPI: The program can be rewritten to use MPI. This often takes a long time but usually gives the best performance OpenMP: The program can be modified with OpenMP directives to perform portions of the program in parallel, and compiled to use all 16 processors in a single node. This is limited to a 16x speedup though, and if not done well can even slow down a program.
Major production codes are checkpointed. In check-pointing, you periodically save the state of the program in a restart file. Whenever you run your program it reads the restart file to pick up from the last checkpoint. The advantage of this is that there is no limit to the total amount of time you can use. Barring disk crashes or total loss of the machine, your total runtime is indefinite, you just keep submitting the same job and start from where you left off. There is overhead associated with each checkpoint, and time executed after the last checkpoint is lost whenever the job is stopped. You may want to do this whatever else you do, since as soon as a research code gets faster, the next step is to run a larger problem, so you are back up against the queue time restriction.