Fortran - Functional Programming Concepts - Reduce
Matthias Noback
We’ve implemented filter and map operations. The results of both these operations are new arrays. Yet another common operation on arrays is the so-called reduction operation; we “reduce” multiple values to a single value. The most obvious example is calculating the sum of an array of integers: the input is an array (or list) and the output is a single integer
:
pure function sum_all(numbers) result(res)
integer, dimension(:), intent(in) :: numbers
integer :: res
integer :: i
res = 0
do i = 1, size(numbers)
res = res + numbers(i)
end do
end function sum_all
Just like with filter
and map
we can imagine a generalization of this process, which involves looping over the array elements and building up a single return value along the way. We start with the existing sum_all
function. The first step is to let the user decide what the return value should be, which is the logic that happens inside the do
loop. We extract a function for this:
pure function sum(number, previous_res) result(res)
integer, intent(in) :: number
integer, intent(in) :: previous_res
integer :: res
res = previous_res + number
end function sum
This is the equivalent of res = res + numbers(i)
.
We then invoke this sum
function inside the do
loop, passing the previous result and the current number:
do i = 1, size(numbers)
- res = res + numbers(i)
+ res = sum(numbers(i), res)
end do
The reduction logic is still hard-coded now. To further generalize, we should allow a function like sum
to be passed as a procedure argument. To accomplish that, we need to extract an interface from the sum
function we already have. Finally, we should also rename the sum_all
function to something more generic, like reduce_integers
, because it can now be used for other operations than summing the values:
pure function reduce_integers(numbers, reduction_func) result(res)
integer, dimension(:), intent(in) :: numbers
integer :: res
interface
pure function reduction_func(number, previous_res) result(new_res)
implicit none(type, external)
integer, intent(in) :: number
integer, intent(in) :: previous_res
integer :: new_res
end function reduction_func
end interface
integer :: i
res = 0
do i = 1, size(numbers)
res = reduction_func(numbers(i), res)
end do
end function reduce_integers
This function can now be used as follows:
reduce_integers([5, 6, 7, 8], sum)
The argument previous_res
indicates that it’s the value that was returned by the previous call to reduction_func
, or 0
if it was the first time the function gets called. This value is often called carry
. Always setting it to 0
is limiting the flexibility of this function. It should be possible for users to choose their own initial value, which is also the value that will be returned when the array is empty. So let’s promote it to be a function argument:
-pure function reduce_integers(numbers, reduction_func) result(res)
+pure function reduce_integers(numbers, reduction_func, carry) result(res)
integer, dimension(:), intent(in) :: numbers
+ integer, intent(in) :: carry
! ...
- res = 0
+ res = carry
! ...
end function reduce_integers
It can be used like this:
reduce_integers([5, 6, 7, 8], sum, 0)
Intrinsic function reduce
It’s good to know that we can hide a do
loop that builds up a single value inside a generic reduction function. But we don’t have to write our own reduction function because it already exists as an intrinsic function, adequately named reduce
. It works exactly as we’d imagine:
integer, dimension(:), allocatable :: integers
integers = [5, 6, 7, 8]
reduce(integers, sum, identity=0)
Note: for some reason, reduce
didn’t work when I passed [5, 6, 7, 8]
directly. I had to put the integers in a variable first. Also, it’s surprising that identity
is an optional argument, because it represents the initial carry. If the array is empty, reduce
will fail, but with a runtime error:
forrtl: severe (408): fort: (36): If the array passed to the
REDUCE intrinsic is empty the IDENTITY argument must be specified
This should have been a compile error: it’s a mistake not to pass the initial carry. How else could the function decide what to pass as the carry
to the first reduction_func
call? If we leave out the identity
argument, the previous example still works, which means the initial carry will be 0
. As might be clear from this discussion: it’s a good rule to still provide an explicit identity
value.
Although the intrinsic function reduce
has these issues, we may assume it’s still better to use it than to write our own reduction function: intrinsic functions are often optimized for performance and memory use. However, the programming interface is often not exactly what we want or need. This was also the case with pack
, and there we solved that by putting it behind our own filter
function “programming interface”. Here we can do the same. To support the use of automatic arrays (e.g. [5, 6, 7, 8]
) and to make the initial carry a required argument, we will keep our reduce_integers
function, but use the intrinsic reduce
function inside.
pure function reduce_integers(numbers, reduction_func, carry) result(res)
! ...
- integer :: i
-
- res = 0
-
- do i = 1, size(numbers)
- res = reduction_func(numbers(i), res)
- end do
+ res = reduce(numbers, reduction_func, identity=carry)
end function reduce_integers
A next step would be to bind this function to the integer_list_t
(which I leave as an exercise for the reader).
Recursion
But let’s consider one other implementation option. I’d say it’s not essential to functional programming, but it’s often associated with it, and one might expect the word to be mentioned, especially in the context of a reduction function: recursion. It may be nice to look at an example of how recursion could be used here.
Consider the sum_all
function we started with. An alternative, recursive solution comes from the realization that the sum of 3 integers is:
- 0 + the first integer + the sum of the remaining integers; which is:
- 0 + the first integer + the second integer + the sum of the remaining integers
- And so on…
For every step in the process we keep the subtotal and pass it to the next call to sum_all
, which we make from inside sum_all
itself. It looks like this:
recursive pure function recursive_sum(integers, carry) result(res)
integer, dimension(:), intent(in) :: integers
integer, intent(in) :: carry
integer :: res
if (size(integers) == 0) then
res = carry
return
end if
res = recursive_sum(integers(2:), carry + integers(1))
end function recursive_sum
Note: Fortran requires the function to be explicitly marked as recursive
. If you don’t add it, the compiler won’t allow you to call it from within itself.
Say we call this function like this:
recursive_sum([5, 6, 7, 8], 0)
What will be executed is:
recursive_sum([5, 6, 7, 8], 0)
recursive_sum([6, 7, 8], 5)
recursive_sum([7, 8], 11)
recursive_sum([8], 18)
recursive_sum([], 26)
When an empty array is passed, the termination case is there to return the current value of the carry
:
if (size(integers) == 0) then
res = carry
return
end if
If the array is not empty, there is at least one element so we can calculate carry + integers(1)
. Then we recursively call recursive_sum
, passing the result of this as the new carry
, and the remaining integers in the array. We get those remaining integers by slicing the array with the range 2:
, which means “the second element to the last element, inclusive”. Conveniently, if there is only one element in the array, this returns an empty array.
There is nothing intrinsically wrong with doing recursion like this. I know some people don’t like it, and some find it hard to understand. Some programming problems require it, but in other situations using recursion may be confusing. A recursive solution is slower though, because there’s the overhead of doing a lot more function calls, depending on how large your array is. Also, there’s a limit to how many times you can do that, since the call stack can only be so large. On my machine I got this error trying to use recursive_sum
on an array of 100000 integers:
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Stack trace buffer overflow; further frames not shown.
The IFX compiler seems to have support for recursive tail-call optimization, which would mitigate this problem, but I can’t get it to work. As the IFX documentation states: “In tail recursion, a recursive call is converted to a GOTO statement that returns to the beginning of the function.” If the compiler doesn’t do it for us, we can replicate this behavior ourselves:
function pseudo_recursive_sum_with_goto(integers, initial_carry) result(res)
integer, dimension(:), intent(in), target :: integers
integer, intent(in) :: initial_carry
integer :: carry
integer :: res
integer, dimension(:), pointer :: temp
temp => integers
carry = initial_carry
temp = integers
100 if (size(temp) == 0) then
res = carry
return
end if
carry = carry + temp(1)
temp => temp(2:)
goto 100
end function pseudo_recursive_sum_with_goto
Note: the goto
label 100
is not significant. Also note, I’m preventing local copies of data by using array pointers for the slicing.
I don’t think this is any better than the do
loop we had. I think the best solution is to use the intrinsic function reduce
inside our own reduce_integers
function, which I consider to be a golden mean. It’s better than using the intrinsic function as it is, and also better than using the plain do
loop we started with. The benefit are:
- We have a nice, generic programming interface for reductions.
- We can easily unit-test the reduction (if it happens in a
do
loop in our program somewhere, we can’t separately test it). - It invokes the functional paradigm in our code reader’s mind, which may get them in the right mindset when transforming array/list data.
We’re done with the array transformation functions now. The next topic to discuss is how to deal with optional values and errors.