Fortran: Module Design
Matthias Noback
Fortran projects are famous for their large modules. Actually, we may also find very large files in legacy projects written in other languages, like Java, PHP, etc. These projects sometimes have files with thousands of lines of code. Fortran projects suffer more from large files I’d say, because:
- IDE support is not great. It’s often still hard to navigate to definitions or usages. There isn’t any support for automated refactorings, like extract function, move function, add parameter, etc. You often can’t get a clear overview of the structure/outline of a file.
- There are no strict indentation rules, making it easy to create a complete mess. Not everyone uses code formatting tools. Code formatting tools that exist and are used, aren’t always very good or easily configurable.
- Fortran code can be hard to read, partially because a lack of curly braces. It’s hard to visually recognize blocks of code. Another reason is that commonly there’s no distinction in letter casing between type names, function names, variable names, etc. Other languages may use CamelCase for types, CAPS for constants, pascalCase for methods, snake_case for functions, etc. In Fortran code, it’s all snake_case.
If there’s anything we can do to make Fortran code easier to deal with, it’s to split the code into smaller parts. We need to have many more modules, each with distinct topics and concerns, so we can get a quick overview of the application’s domain and capabilities. Additionally, this will help us find the right place when we want to make changes to the code.
The logging library we’ve built in the last few posts isn’t that big. You could say the topic is “logging”, so we only need a logging
module. However, we have already defined several derived types that deal with different aspects of logging. They can be seen as separate things (e.g. timestamp decoration, file versus stdout logging, etc.) and would be better off in their own modules. Also, since we’ve defined several concrete derived types, we have type-bound procedures for each of these types in the contains
section of the logging
module. A type-bound procedure is always several lines away from the derived type definition itself, so it isn’t very nice if we have to scroll a lot to find the procedure implementations. It would be nice if each derived type has its own module, so the contains
section won’t grow too big.
Each derived type gets its own module
The first thing we can do to keep things manageable is to put each derived type in its own module. This way we get the following modules:
logging_base
: containslogger_t
, the interface forlog
procedures, and theLOG_*
level constants.logging_file
: containsfile_logger_t
, itslog
procedure, its custom constructorcreate_file_logger
, and thefile_logger_t
interface which points to that constructor.logging_stdout
: containsstdout_logger_t
and itslog
procedure.logging_timestamp
: containstimestamp_logger_t
, itslog
procedure, and thecurrent_time
helper function.logging_aggregation
: containslogger_reference_t
,multi_logger_t
and itslog
procedure.logging_threshold
: containsthreshold_logger_t
, itslog
procedure, and theset_minimum_log_level
subroutine.
FPM naming convention
We follow the FPM naming convention so FPM can find the modules that are imported with a
use
statement and will (re)compile them automatically when needed.
A nice effect of this is that related helper procedures like set_minimum_log_level
and current_time
have a logical place now. They are not hidden somewhere in the large logging
module.
It’s important to make the right things private
. For instance, current_time
remains private
to the logging_timestamp
module. But set_minimum_log_level
needs to be a public
procedure, because the main application uses it to configure the desired log level.
A façade for users
Splitting things into maintainable parts is nice, but we don’t want to force long import lists on users. They should still be able to import one module, that hides all its parts and complexities behind a nice programming interface. We may call such a module a façade, using the terminology from “Design Patterns: Elements of Reusable Object-Oriented Software”. This façade module will offer the get_logger()
factory (and the old log
subroutine):
module logging_facade
use logging_file, only: file_logger_t
use logging_base, only: logger_t, LOG_WARN, LOG_DEBUG
use logging_aggregation, only: multi_logger_t, logger_reference_t
use logging_stdout, only: stdout_logger_t
use logging_timestamp, only: timestamp_logger_t
use logging_threshold, only: threshold_logger_t
implicit none(type, external)
private
public :: get_logger
public :: log
contains
function get_logger() result(logger)
! ...
end function get_logger
subroutine log(message, level)
! ...
end subroutine log
end module logging_facade
A user can import logging_facade
and use get_logger()
like they did before. However, they also need the logger_t
derived type and the log level constants, which are housed in logging_base
. Should we force the user to import them from that module? It would be nicer if we can keep the logging_base
module behind the logging_facade
as well.
Importing from other modules
Fortran offers a useful technique for cases like this: a module can itself export something it has imported from another module, as if it was its own. In fact, if a module’s default accessibility is public
(which is also the default if you don’t specify it), every module exposes everything it imports (which is often not preferable):
module bar
! implicitly, every element of this module is `public`
integer :: baz
end module bar
module foo
use bar, only: baz
! implicitly, every element of this module is `public`
end module foo
program main
use foo, only: baz
! baz is actually owned by `bar`, but imported from `foo`
print *, baz
end program main
Of course, this leads to very confusing situations. It becomes hard to recognize what the actual module dependencies are. It would be better if the main
program imported baz
from bar
. Legacy Fortran code may be in an even worse shape, when it doesn’t add only:
after every use
statement. In that case, it’s not clear at all what’s being imported, and if that’s still needed, because everything that is exported by the module will be imported.
Knowing the issues that we could run into with module imports, we should follow these rules to prevent them:
- For every
use
statement, specify the external elements you actually need with anonly:
clause. - If you no longer need an imported element, remove it from the
only:
clause. Compilers may help recognizing such unused elements. - Always use
private
as the default accessibility of module elements. This disables importing elements that aren’t designed to be used directly. It also prevents “leaking” module elements when there are stilluse
statements without anonly:
clause. - Import directly from the module that defines the element.
Following these rules, we would require users to import things from several modules before they can log something:
program main
use logging_threshold, only: set_minimum_log_level
use logging_facade, only: get_logger, log
use logging_base, only: logger_t, LOG_DEBUG, LOG_WARN
implicit none(type, external)
class(logger_t), allocatable :: logger
call set_minimum_log_level(LOG_DEBUG)
logger = get_logger()
call logger%log('A debug message', LOG_DEBUG)
call log('Old-school logging', LOG_WARN)
end program main
In the end, I think we should allow an exception to the rule that we always import an element from the module that defines it. When we are creating a façade, where we are explicitly looking to hide internal details and complexity behind a single module, it’s okay to expose some of the underlying module’s elements as if they are owned by the façade. We are aware of the potential issues, but trade it against the benefit of needing to import only things from logging_facade
.
To make it work, we import logger_t
and the log level constants in logging_facade
, then make them public
so they can be imported from logging_facade
:
module logging_facade
use logging_base, only: logger_t, &
LOG_DEBUG, &
LOG_INFO, &
LOG_WARN, &
LOG_ERROR, &
LOG_FATAL
! ...
private
public :: get_logger
public :: log
! Imported from `logging_base`:
public :: logger_t
public :: LOG_DEBUG
public :: LOG_INFO
public :: LOG_WARN
public :: LOG_ERROR
public :: LOG_FATAL
! ...
end module logging_facade
It’s very clear where these elements come from. If they are one day removed or renamed, the compiler will give us an error.
The main program can import everything from logging_facade
now, except set_minimum_log_level
:
program main
use logging_threshold, only: set_minimum_log_level
use logging_facade, only: get_logger, log, logger_t, &
LOG_DEBUG, LOG_WARN
! ...
end program main
It’s an option to let logging_facade
expose this subroutine as its own too. However, it only needs to be called during the application’s initialization. Normal users won’t call this function when they want to log something, so it’s better to keep the façade clean, and let it export only things that every user needs.
Module dependencies
Using FPM and its brand new fpm-modules
plugin we can generate a nice module dependency diagram now. It looks like this:
In the end all the arrows point to logging_base
. This is great. It’s an abstract module that will rarely change, so it’s safe to depend on it. Depending on relatively abstract modules also reduces the chance that you have to recompile your whole project.
Compilation cascades
When a module changes, and it’s used by many other modules, this causes a huge compilation cascade. If we make a change in any module that’s behind logging_facade
, everything that depends on it will have to be recompiled. That’s because logging_facade
is used by every module that logs something. So any change behind our nice and clean façade triggers a recompilation of the entire project… How can we fix this?
Submodules
The answer is by using Fortran’s concept of a submodule. This allows us to separate the abstract outline of a module from its concrete implementations. We add interface definitions for get_logger()
and log()
to the logging_facade
module:
module logging_facade
use logging_base, only: logger_t, &
LOG_DEBUG, &
LOG_INFO, &
LOG_WARN, &
LOG_ERROR, &
LOG_FATAL
implicit none(type, external)
private
public :: get_logger
public :: log
! Imported from `logging_base`:
public :: logger_t
public :: LOG_DEBUG
public :: LOG_INFO
public :: LOG_WARN
public :: LOG_ERROR
public :: LOG_FATAL
interface
! Note the "module" keyword:
module function get_logger() result(logger)
import logger_t
implicit none(type, external)
class(logger_t), allocatable :: logger
end function get_logger
module subroutine log(message, level)
implicit none(type, external)
character(len=*), intent(in) :: message
integer, intent(in) :: level
end subroutine log
end interface
end module logging_facade
Next, we create a submodule for logging_facade
, called logging_facade_implementation
. In this submodule we put the actual implementations of get_logger()
and log()
:
submodule(logging_facade) logging_facade_implementation
use logging_aggregation, only: multi_logger_t, logger_reference_t
use logging_base, only: logger_t, &
LOG_DEBUG, &
LOG_WARN
use logging_file, only: file_logger_t
use logging_stdout, only: stdout_logger_t
use logging_threshold, only: threshold_logger_t
use logging_timestamp, only: timestamp_logger_t
implicit none(type, external)
contains
! Note the "module" keyword
module function get_logger() result(logger)
! ...
end function get_logger
module subroutine log(message, level)
! ...
end subroutine log
end submodule logging_facade_implementation
The logging_facade
itself no longer imports elements from all those specialized logging_*
modules. Instead, these import statements have been moved to logging_facade_implementation
, keeping the module itself really clean.
The module dependencies visualization tool doesn’t show submodule dependencies yet, but here’s the diagram as a future version of the tool may produce it:
A submodule is like Dependency Inversion for modules: the arrow goes up now, from logging_facade_implementation
to logging_facade
. The submodule has all the dependencies on the underlying modules which contain all the implementation details. Yet, the logging_facade
module has almost no dependencies (only to logging_base
). This makes it quite abstract and safe to depend on. It will also prevent compilation cascades. We can change anything in the submodule or its dependencies; it won’t trigger a recompilation of all the code that uses logging_facade
.
The logging library is in a much better shape, but there are some remaining design issues. For instance: at the moment every time you call get_logger
, you will receive a copy of the logger. That may not be handy or efficient. Also, users have to pass integer constants (parameters) as log levels. That really calls for an enumeration type. We’ll look at that first.