Fortran - Errors and error handling - Part 7 - Fatal errors
Matthias Noback
We’ve encountered several ways of designing functions in a way that allows them to fail for some reason, without stopping the program, or making it otherwise risky or awkward to use the function. We introduced the error_t
type which is very flexible. It can be used to provide some information to the caller, helping them understand what went wrong and how it can be fixed. By allowing errors to be wrapped inside others, we can create chains of errors that describe the problem at various abstraction levels. It gives back control to the user: how do they want to deal with an error? Would they like to try something else? Or, in the end, should we just stop trying and quit te program?
It is generally advisable to not just stop a program as soon as you notice something is wrong. There is often some work that needs to be done to clean things up. But when the time is there, you need to stop the program in the best way possible… What does that mean?
- The program has to return a non-zero exit code. This helps other programs that run our program (like a terminal, shell, script, etc.) detect the failure.
- We should show a detailed error message, providing as many helpful clues as possible, so the user can fix the problem and run the program again.
- We should include some tracing information for a programmer who wants to find out where in the code the fatal error occurred.
Non-zero exit codes
We can quit a Fortran program using a stop
statement:
stop 'Error'
! Output: Error
! Exit code: 0
This is confusing: we stop because of an error, but the exit code is 0, indicating success. We can provide a number higher than 0 instead of a string:
stop 1
! Output: 1
! Exit code: 1
The exit code is okay, but the output is just the exit code itself. It’s better to print a message, then exit with the provided code. You can’t give both an exit code and a message to stop
. We can add a quiet specifier though to prevent the code exit from being added to the output:
stop 1, quiet = .true.
! Output: [nothing]
! Exit code: 1
Providing a specific error code may not be a portable thing to do. What’s the best value or value range to use may vary between operating systems, so it would be a good idea to let the compiler pick a value for “failure”. This can be accomplished with the error stop
command:
error stop 'Error'
! Output: Error
! Exit code: 128
If needed, we can still pick a different exit code:
error stop 129
! Output: 129
! Exit code: 129
As with stop
, this shows the exit code in the output… Luckily, we can silence the output again:
error stop 129, quiet=.true.
! Output: [nothing]
! Exit code: 129
A good suggestion would be to use error stop
for fatal errors and stop
for situations where the program did was what expected from it, but didn’t want to wait until the main program
block finishes.
Printing an error
Both stop
and error stop
can print messages, but when we pass a string we no longer have control over the exit code. So it’s better to separate the responsibility of printing a useful error message on screen, and exiting with a specific error code. These can be two separate steps in a subroutine fatal_error
:
subroutine fatal_error(error, exit_code)
class(error_t), intent(in) :: error
integer, intent(in), optional :: exit_code
print *, error%get_message()
if (present(exit_code)) then
error stop exit_code, quiet = .true.
else
error stop
end if
end subroutine fatal_error
We accept an instance of our own error_t
type and print the result of calling get_message()
. We created that function earlier to return the error’s own message, with any previous error message concatenated to it. Note that exit_code
is an optional argument:
! Providing a specific exit code
call fatal_error(error_t('Something went wrong'), 129)
! Using the default exit code
call fatal_error(error_t('Something went wrong'))
When printing errors, it’s best practice to use a different output “channel”. Instead of stdout
, which is used by print
, we should explicitly send our output to stderr
. This has to be done with a write
statement, passing the error_unit
from the instrinsic
module iso_fortran_env
as the write target:
subroutine fatal_error(error, exit_code)
+ use, intrinsic :: iso_fortran_env, only: error_unit
class(error_t), intent(in) :: error
integer, intent(in), optional :: exit_code
- print *, error%get_message()
+ write (error_unit, fmt=*) error%get_message()
if (present(exit_code)) then
error stop exit_code, quiet = .true.
else
error stop
end if
end subroutine fatal_error
Note that we’re importing
error_unit
inside the subroutine itself instead of at the top of the module definition. This works fine, although I’m not sure if it should be promoted to become a best practice. One advantage is that if we ever (re)move this procedure, we don’t have to clean up theuse
statement(s) at the top of the module. In a sense, adding imports to a subroutine is the most cohesive thing to do. But of course, in many cases some code duplication is the downside of this approach.
Tracing the origin of an error
Languages that have exceptions built-in also have a mechanism to capture a stack trace of the location where the exception was produced. The stack trace gets passed as data alongside the exception, and can be rendered to the user. This kind of mechanism isn’t available in Fortran. We have to write something ourselves.
There are several things we can do. For instance, we can at least record the source location where we created an error_t
instance. Any time we want to do that, we have to pass this information explicitly. Let’s start by defining a location_t
type that holds a file name and a line number:
type :: location_t
character(len=:), allocatable :: file
integer :: line
end type location_t
We then allow the location to be stored on an error_t
instance:
type :: error_t
character(len=:), allocatable :: message
class(error_t), allocatable :: previous
+ class(location_t), allocatable :: location
contains
procedure :: get_message => error_get_message
end type error_t
To show the location as part of the message we can update the type-bound procedure get_message
to concatenate the location but only if it’s allocated
:
pure recursive function error_get_message(self) result(res)
class(error_t), intent(in) :: self
character(len=:), allocatable :: res
character(len=32) :: temp
+ res = self%message
+ if (allocated(self%location)) then
+ write (temp, *) self%location%line
+ res = res//' in '//self%location%file//' on line '//trim(adjustl(temp))
+ end if
if (allocated(self%previous)) then
res = res//' Previous error: '//self%previous%get_message()
end if
end function error_get_message
Note that we have to do some complicated work before we can concatenate an integer
to a string. We’ll find an easier way to do it in another post.
When creating the error we can now also pass the file and line number:
call fatal_error(error_t('Something went wrong', &
location=location_t('file.f90', 10))
Note that we have to use a named argument (location=
) because the default structure constructor for error_t
expects the second argument to be of type error_t
(to match the previous
data component).
This shows the following message on screen:
Something went wrong in file.f90 on line 10
Of course, hard-coding the file name and line number is asking for trouble. It would be a nightmare to ensure these values stay up-to-date. Instead, we can use a pre-processor like cpp
to do the work for us. It has macros for the file name and line number. With FPM, you can easily enable the pre-processor by adding to fpm.toml
:
[preprocess]
[preprocess.cpp]
Now we can write:
call fatal_error(error_t('Something went wrong', &
location=location_t(__FILE__, __LINE__)))
One thing that is not very user-friendly and leads to code duplication is the use of a derived type and a named argument. The quick solution for that is to define an interface
for error_t
, effectively overriding the structure constructor. If a user passes a string (the message), another string (the file), and an integer (the line), then the function will do the rest:
interface error_t
module procedure :: create_error_with_message_and_location
end interface
contains
pure function create_error_with_message_and_location(message, file, line) result(res)
character(len=*), intent(in) :: message
character(len=*), intent(in) :: file
integer, intent(in) :: line
type(error_t) :: res
res = error_t(message, location=location_t(file, line))
end function create_error_with_message_and_location
Now we can use it like this, and the result will be the same as before.
call fatal_error(error_t('Something went wrong', __FILE__, __LINE__))
We can create any number of variants we need for the error_t
interface, e.g. with or without a previous error, etc.
Stack traces
Having a file and line number in the error itself is nice. But most likely we also want to find out what has happened before the error occurred. In other words, we want the stack trace: which procedure calls have lead to this problem. When actually trying to figure out what went wrong, we most likely need an interactive debugging session anyway, but the stack trace helps us find out where to start.
Unfortunately, there’s no built-in way to get a stack trace either. Compilers provide their own ways of doing this. For example, the IFX compiler we’re using has a subroutine tracebackqq
, provided by its own ifcore
module. Before we can use it, the compiler should be able to find it (this will be the case when you use oneAPI’s setvars script). Because we’re using FPM we need to define ifcore
as an external module, so it doesn’t try to find the ifcore
module in the src
folder. We do this in fpm.toml
:
[build]
external-modules = ["ifcore"]
Now we can add the traceback to our fatal_error
subroutine:
subroutine fatal_error(error, exit_code)
+ use ifcore, only: tracebackqq
use, intrinsic :: iso_fortran_env, only: error_unit
class(error_t), intent(in) :: error
integer, intent(in), optional :: exit_code
write (error_unit, fmt=*) error%get_message()
+ call tracebackqq()
! ...
end subroutine fatal_error
The output will look like this:
Something went wrong in errors_part_7.f90 on line 21
Image PC Routine Line Source
errors_part_7 0000000000405C18 fatal_error 38 errors_part_7_module.f90
errors_part_7 00000000004055D7 demo_fatal_error 21 errors_part_7.f90
errors_part_7 00000000004052FE main 11 errors_part_7.f90
errors_part_7 00000000004052CD Unknown Unknown Unknown
libc.so.6 0000780C403821CA Unknown Unknown Unknown
libc.so.6 0000780C4038228B __libc_start_main Unknown Unknown
errors_part_7 00000000004051E5 Unknown Unknown Unknow
It includes the name of the subroutines, files and line numbers. That is, in a debug build. A release build will likely not have this information, although you can just turn it on if you want.
Checking the exit code of the program, it turns out to be 0
. That’s because tracebackqq()
also stops the program. If we don’t want that, we have to pass -1
as the argument for user_exit_code
:
-call tracebackqq()
+call tracebackqq(user_exit_code=-1)
Note that using subroutines like tracebackqq()
is compiler-dependent. If you have to support multiple compilers, you may also have to provide alternatives or disable some code. In such cases you can use pre-processor macros like this:
#ifdef __INTEL_COMPILER
call tracebackqq(user_exit_code=-1)
#endif
There’s a nice fortran-error-handling library that implements many of the ideas offered in this article series. It offers standardized, portable solutions, e.g. for the backtrace/stack trace problem. The library can be installed with FPM.
This post concludes the series on error handling.