|
Last update: Tue Nov 19 18:03:58 2002
Comments, and reports of errata or bugs, are welcome via e-mail to the author, Nelson H. F. Beebe <[email protected]>. In your report, please supply the full document URL, and the title and Last update time stamp recorded near the top of the document.
This document discusses issues in calling Fortran code from C and C++ programs. However, before you start, please consider these possibly simpler alternatives:
f2c
translator, or the commercial
Cobalt Blue
translator,
and then using the NAG C Library so that your code avoids language mixing entirely.
If your Fortran code will never be modified again, and contains no I/O statements (for example, the EISPACK, LINPACK, and LAPACK libraries are I/O-free), then you may be satisfied with the translation to C. However, if you anticipate having to modify that code, you should look at the translated code carefully to decide whether or not you are introducing a future maintenance nightmare: machine-translated code is never as clean as good hand-written code could be, and if there were I/O statements, their translation requires additional support code that must be carried around.
If you unfamiliar with interlanguage calling issues, then you should definitely start by reading the NAG tutorial on the subject.
What follows is a summary of the major points that need to be considered by C and C++ programmers who wish to call routines written in Fortran.
Reader comments are invited, and may be communicated via e-mail to the author.
Since there are several ISO Standards for Fortran, C, and C++ that have been published between 1966 and 1998, you might wonder why these Standards have not specified precise rules for interlanguage calling. Several such requests have certainly come before their respective ISO Committees, but nothing has been done, in fear of putting an unfair Standards-compliance burden on some vendors, of hindering future language development, and of opening up a Pandora's box of interlanguage compliance issues: if those three languages should interoperate, what about the several other programming languages covered by ISO Standards? It appears unlikely that these issues will ever be resolved, and so interlanguage calling is likely to remain an issue requiring benevolent vendor support, and tutorials like this one.
The choice of language for your main program influences how
you must compile and link your program: you will probably
have to provide additional library names on the compiler
command line so that it can find the runtime routines needed
by the other language. These names are strongly
compiler-dependent, and their locations are frequently
nonstandard too. The easiest way to find out what they are
is to compile and link a small test program in the
subordinate language, supplying a compiler option that asks
for verbose output of the program stages. This flag is
frequently called -v,
or sometimes, -#.
On UNIX, you then look for -L
and
-l
options in that verbose output.
Fortran arrays are stored in column order, with the first subscript varying most rapidly. C and C++ arrays are stored in row order, with the last subscript varying most rapidly. The C/C++ program may therefore need to transpose multidimensional arrays before and after the call to a Fortran routine.
Fortran arrays are normally indexed from 1, although other
values are possible in Fortran 77 or later if the array has
dimensions of the form of ranges, start:end.
C/C++ arrays are always indexed from 0. Thus, for a vector
v(1:n)
, a Fortran loop will begin DO 10
j = 1,n
, while the corresponding C/C++ loop will
begin for (k = 0; k < n; ++k)
. Fortran
v(j)
corresponds to C/C++ v[k]
,
with k = j - 1
.
Fortran is distinctly superior to C in its handling of multidimensional arrays, because Fortran routines can be passed dynamic array dimensions, while C functions cannot. The C programmer is either forced to live with static compile-time dimensions, or to put ugly, and highly-error-prone, multidimensional array subscripting code inline, or inside macros, or inside element access functions. C++ programmers would probably define an array class with suitable element access functions.
If C/C++ functions call Fortran routines, dynamic dimensioning is not an issue, but it may be a significant problem for C/C++ functions called from Fortran.
Starting with Stuart Feldman's original AT&T UNIX System
V f77
compiler (the first Fortran 77 compiler
ever written on any operating system), Fortran
compilers from most UNIX vendors transform Fortran external
names by appending an underscore. The motivation for this
was admirable: because Fortran and C/C++ have
different argument passing conventions, different
names for the same thing are required. Thus, both
C/C++ and Fortran programmers could use library calls like
fgetc(handle)
: the runtime library for C/C++
would have a routine fgetc
, and the Fortran
runtime library (which is written in C) would have one named
fgetc_
. Mixed-language programs could be
created without confusion between the two versions of that
library routine.
Regrettably, vendor-provided Fortran compilers on
Hewlett-Packard HP-UX and IBM AIX do not follow this
practice: they use the same external names for
both Fortran and C/C++, with no trailing underscore. Not
only does this complicate mixed-language programming, it is
also a nuisance for software portability of mixed-language
programs. To make matters even more complex, the GNU
g77
compiler on these systems does supply
the trailing underscore, unless it is suppressed with the
-fno-underscoring
compiler option.
One historical UNIX vendor, Ardent, later renamed Stardent
after its merger with Stellar in the early 1990s, used yet
another variant: Fortran fgetc()
was mapped
to the uppercase external name FGETC
.
These variants are best handled by concealing the name mappings in C/C++ header files. Here is an example from a real program: the named routines are all written in Fortran, and this extract is from the C/C++ header file that defines the interface to them.
#if defined(ardent) /* Stardent (now defunct) uppercased Fortran names */ #define gjqf GJQF #define gjqfd GJQFD #define glqf GLQF #define glqfd GLQFD #define deps DEPS #define dgamma DGAMMA #define dpsi DPSI #define dpsum DPSUM #elif defined(_AIX) || defined(__hpux) /* IBM RS/6000 AIX and HP HP-UX use identical names in C and Fortran */ #else /* Everyone else adds a trailing underscore to Fortran names */ #define gjqf gjqf_ #define gjqfd gjqfd_ #define glqf glqf_ #define glqfd glqfd_ #define deps deps_ #define dgamma dgamma_ #define dpsi dpsi_ #define dpsum dpsum_ #endif
Fortran (at least before the Fortran 90 Standard) has only two main data structures: scalars, and arrays. C and C++ have those too.
Fortran offers two other statements that provide control
over storage order: common
blocks, for global
data, and equivalence
, for local data.
While common
blocks are essential for
information hiding in library routines, modern Fortran code
seldom uses them for communication at the user level, and
C/C++ code should be able to remain ignorant of how they are
implemented. Consult your Fortran compiler documentation
for details. On many systems, you can make the
correspondence like this:
Fortran:
double precision a, b, c common /cshare/ a, b, c
C/C++ (drop the trailing underscore on cshare_
on HP and IBM systems):
struct { double a; double b; double c; } cshare_;
equivalence
statements were widely used in
Fortran code to reduce memory requirements before 32-bit
(and larger) address spaces became available, because memory
storage was expensive (about US$1/byte in 1965, and a
million times cheaper in 2000), and therefore a scarce
resource. Modern Fortran code should rarely use the
equivalence statement, and then only to get at the
details of storage bits, much like the C/C++ union
structure, which is how you would map that Fortran
statement for interlanguage use.
We discuss below specific data types, and show their typical
correspondence between Fortran and C/C++. However, you
should never hard code assumptions about this
data-type correspondence. Instead, you should always
introduce new data types using the C/C++ typedef
statement, or perhaps the preprocessor
#define
directive, to provide synonyms, and
then use suitable type casts using your new type names
when calling Fortran routines. Here is a short example of
both approaches:
typedef double fortran_double_precision; typedef int fortran_integer; #if !defined(fortran_double_precision) #define fortran_double_precision double #endif #if !defined(fortran_integer) #define fortran_integer int #endif
Although C++ strongly deprecates use of the C preprocessor,
#defines
offers one advantage over
typedef
: you can test whether a name is defined,
allowing a user to override a definition, either by a prior
definition in the code, or by a compile-time definition
passed on the compiler command line.
In C and C++, a typedef
only introduces a
synonym, not a new type, so neither scheme is superior from
the point of view of catching type errors at compile time.
Fortran has real floating-point data types real
and double precision
, which correspond exactly
to C/C++ float
and double
. Prior
to the 1989 C Standard, float
scalar arguments
were always promoted to double
by the compiler,
but for Standard-conforming code with proper function
prototypes, this is no longer the case, and float
now receives no special handling compared to other
scalar data types.
Fortran complex
is equivalent to an array, or
C/C++ structure, of two floating-point values, the real
part, followed by the imaginary part. In C/C++, you
probably want to access it via a structure type declared
like this:
typedef struct { float re; float im; } fortran_complex;Objects of this type can be assigned, passed as function arguments, and returned as function values. Had you used a two-element array instead, you would have lost assignment and function return of these objects.
Prior to Fortran 90, the language stupidly lacked a double
precision complex data type, but almost all compiler vendors
provided it. Most followed IBM in allowing it to be
declared as complex*16
, and most also permitted
it to be called double complex
. You can best
represent this in C/C++ as
typedef struct { double re; double im; } fortran_double_complex;
The NAG tutorial notes that several C/C++ compilers are
unable to handle Fortran functions returning double
complex
values. If you have such functions, you
should provide a Fortran subroutine
wrapper for
them that provides the function result in an argument.
Fortran has only one integer data type integer
.
On all current UNIX architectures, this corresponds to the
C/C++ data type int
. However, on some personal
computer operating systems based on the Intel x86
architecture, it may correspond to C/C++ data type
long
. Further confusing the matter is that some
Fortran compilers on those systems may map Fortran
integer
to a C/C++ int
of size 16 bits.
The only way to tell for sure on such systems is to read
your Fortran and C/C++ compiler documentation carefully, or
resort to compilation experiments with small test programs.
The Fortran logical
data type is a definite
barrier to interlanguage calling.
The reason is that the Fortran language requires that data
of types integer
, logical
, and
real
occupy one storage `location' (in 1956,
when Fortran was first defined, all computers were
word-addressed; byte addressing did not appear until IBM's
System/360 in 1964). Data of types complex
and
double precision
each occupy two
successive storage locations, exactly twice as much as
the other three types.
Since a logical
value holds only two distinct
values, .true.
and .false.
, a
single bit of storage is sufficient, yet the Fortran
Standards mandate that such a value occupies an entire word
of storage. So, which bit in that word should be used? Some
compilers use the sign bit, others use the least-significant
bit (corresponding to odd/even), and still others use
zero/nonzero. Because C and C++ also use zero/nonzero, you
can expect Fortran compilers on UNIX systems to uniformly
follow that practice. Even then, there are differences:
GNU, HP, IBM, NAG, SGI, and Sun Fortran 77, 90, 95, and HPF
compilers use 1 for .true.
, and 0 for
.false.,
while Compaq/DEC and PGI compilers use -1 for
.true.
and 0 for .false.
. Thus,
on Compaq/DEC OSF/1 (now called Tru64) and GNU/Linux
systems, both forms are found, depending on which compiler
you use.
Fortran 77 (and later) character
data pose the
biggest barrier to interlanguage calling, because they are
handled so differently by various compilers. The original
AT&T UNIX f77
compiler had to deal with
legacy Fortran code containing Hollerith data. In order to
make call foo(5Hhello)
work exactly like
call foo('hello')
, it passed Fortran character
data by the address of the first byte.
Unfortunately, the Fortran 77 Standard made character
data unlike all other Fortran data types, in that it
magically carries around its length.
Feldman's compiler handled this by passing additional
arguments at the end of the argument list, one for each
character string, passing them by value. Thus, Fortran
call bar('one', 'two', 'three')
would be
handled in C and C++ by void bar_(char *a, char *b,
char* c, int lena, int lenb, int lenc)
. This was a
perfectly sensible solution, in that it handled both
Hollerith and character
data, and communicated
the needed string lengths between the Fortran and C/C++
routines.
Unfortunately, IBM's AIX/370 mainframe compilers did not
follow this sensible practice. Instead, they interspersed
the length arguments with the normal arguments, following
each character
argument, and passing the
address, rather than the value, so the C/C++ routine in the
previous paragraph must be rewritten as void bar_(char
*a, int *lena, char *b, int *lenb, char* c, int
*lenc).
Hewlett-Packard also did this up to HP-UX
version 8, but with version 9 and later (10.20 is
current), changed to the AT&T style for
character
arguments. IBM's RS/6000 AIX C and
C++ compilers also use the AT&T style for
character
arguments.
Still other Fortran compilers have used a different scheme.
For each character
argument, they pass a
pointer to a structure that contains a pointer to the
string, and a maximum length. The details of this scheme
vary between compilers, so once again, you must consult your
Fortran compiler documentation for details.
Finally, it should be remembered that Fortran
character
data are of fixed length: they
are blank padded on the right when assigned a shorter value,
and silently truncated on the right when assigned a longer
value. C/C++ char
strings are of varying
length, up to some compile-time or run-time maximum; a
trailing NUL character ('\0') terminates the string. Thus,
C/C++ char*
strings always contain at least one
more character than their length (as returned by
strlen()
). C/C++ strings include the empty string,
""
, but Fortran does not allow one.
This is somewhat akin to defining an integer arithmetic
system without a zero! Fortran programmers are therefore
forced to simulate empty strings by blank ones.
The best way to handle passing character strings to Fortran from C/C++ is to define a new Fortran string data type in C/C++, and create a set of primitives to handle the blank padding.
Nonstandard data types with a byte-length modifier
frequently appear in carelessly-written Fortran code: avoid
them like the plague. Write the Standard double
precision
instead of real*8
. If you
unavoidably have integer*1
or byte,
these may map to C/C++ signed char
and unsigned char
. Fortran
integer*2
may map to C/C++ short
int.
Fortran integer*8
may
map to C/C++ long long int
.
In the other direction, there are no Fortran equivalents
of the unsigned
types of C and C++, or of
integer bitfields inside struct
and
union
.
Fortran views files as data streams containing Fortran
records, where a record is either a text line for
formatted files, or whatever is written by a single
write
statement for binary unformatted files. The
record is an identifiable object, so that a
read
statement with an empty I/O list will skip one
record, and a backspace
statement can
successfully move backwards over it. The analogy with
magnetic tapes, card readers, and line printers is very
strong.
C and C++ view files as data streams containing bytes
, the smallest amount of storage capable of holding one
item of type char
. All I/O is done at the byte
level, although for text files, higher-level primitives can
give the illusion of block- and line-structuring. This
model is notably more powerful than Fortran's, because it
imposes no structure on files. In Standard Fortran, you
simply cannot write an arbitrary stream of bytes to a file:
there will always be additional material surrounding what
you wrote that is compiler-dependent, beyond your control,
and invisible to you in Fortran.
Fortran, C, and C++ all refer to files through a small object, called a file handle, or file descriptor, or in Fortran terminology, unit number. In Fortran, that value is a small integer that must be set by the user. Consequently, its choice can lead to a loss of portability if integer handle values acceptable on one system are found to be out-of-range on another.
The architect of C therefore chose to have the handle
returned by the open-file system call. Later, it was found
convenient to store more information about the file in a
FILE
object, invariably defined as a C
struct
, one of whose elements is the integer file
handle. That form was adopted in Standard C, and the older
one was not. However, UNIX systems at least, still have
low-level system calls that require the integer file handle,
so a macro, or function, fileno()
, is provided
to extract it from the FILE
structure.
Since the Fortran library is implemented in C or C++ on UNIX
systems, there has to be a correspondence, somewhere
, between a Fortran unit number and a C/C++ integer
file handle. Unfortunately, there is no consistent way to
find this across platforms. For example, Compaq/DEC and Sun
provide getfd()
to map a Fortran unit number to
the file handle, but HP and IBM hide the relation entirely.
These considerations strongly suggest that you should restrict I/O activity to a single language.
In general, the runtime library for the language in which a file was first opened has control over its I/O buffers, and maintains additional state information about the file. You should not reference the file in the other language.
However, in UNIX and IBM PC DOS, all processes start with at
least three standard files already open and ready to use. In
UNIX, these are called stderr,
stdin,
and stdout,
and their respective file
handles are guaranteed to be 0, 1, and 2. It is quite
possible that you will need to refer to these standard files
from both languages, even though it is always best to
restrict their use to just one language. To avoid confusion
from I/O buffering in each runtime library, it is best to
force those buffers to be emptied before beginning I/O in
either language. Some, but not all, Fortran vendors provide
a flush()
routine, and C/C++ always have
fflush()
available. Thus, you will never achieve
portable behavior if you do this.
Fortran views formatted (text) files as a series of records, each of which corresponds to a single line. C and C++ view them as byte streams. While all three languages can produce such files, you may have trouble communicating those files between Fortran and C/C++ programs, for at least these reasons:
D
exponents or omitted exponent letters,
and neither of these are recognized by C/C++ input
routines. You can avoid this second problem by using
E
-style format items with explicit
exponent lengths: that is, use e25.15e3
instead of d25.15
. On Cray systems, which
have a wider exponent range, you should increase the
exponent width from 3 to 4.
namelist
I/O have
no counterparts in C/C++, and files with such contents
will be difficult to impossible to deal with easily in
C/C++ programs.
Because of the record structure discussed
earlier,
Fortran unformatted (binary) files must
contain
additional material prefixing and suffixing the record.
This material is compiler-specific, and you cannot even
expect to read binary files on the same system when two
different Fortran compilers have been used for the reading
and writing programs.
It will be very difficult to deal with such files in C/C++ programs, and you are likely to have difficulty in even finding vendor documentation of what unformatted Fortran files look like.
When runtime exceptions, notably floating-point ones, occur, which language handles them? In general, they are handled by the language in which the main program was written. You can confuse this issue, however, by using system-specific calls to supply your own error handlers.
Historically, most Fortran runtime libraries provided fixups for numerical exceptions, flushing underflows to zero, and setting overflows to the largest floating-point number, or Infinity if supported (as in IEEE 754 arithmetic). The practice in C and C++ implementations has been to call an error handler which prints a message and terminates. You may have to compile with special options, or call nonstandard library routines, to control this behavior.
Fortran passes all arguments by reference (by address), while C and C++ pass scalars by value and structures by value, and arrays by reference. Thus, scalar arguments to Fortran routines will require an ampersand prefix to pass their address instead of their value.
Fortran character
data require special
treatment, as discussed in an
earlier section.
Historically, compiler writers have used at least three different mechanisms for argument passing:
On the IBM System/360 mainframes introduced in 1964, stack instructions were absent, and compilers generally constructed a vector of addresses, with the high-order bit set in the last address to mark the end of the list. Since addresses were limited to 24 bits, and words were 32 bits, the high-order byte of each word holding an address was `wasted', and software architects therefore made use of it, for type flags, and end-of-list markers.
In 1981, when the S/360 architecture was extended to support larger address spaces, and renamed S/370-XA (for eXtended Architecture), that flag bit was so ingrained in existing software that the IBM architects could only extend addressing to 31 bits. [There was also an important loop instruction that assumed signed arithmetic on addresses, again limiting them to 31 bits.]
It was not until 1988 that the Enterprise Systems Architecture, ESA/370, got around that problem, extending addressing to 44 bits, but even then, that argument list flag bit still interferes, and extended addressing requires a complicated remapping of 2GB (31-bit address) memory segments with hidden base registers.
You can read more about this topic in a separate (lengthy) document, The Impact of Memory and Architecture on Computer Performance.
On stack architectures, on which all current personal computer and UNIX workstation systems run, argument lists are created on the stack by pushing one argument after another. Some compilers push from first to last, and others, last to first.
On all UNIX systems, the argument order is first to last, and interlanguage calling is relatively feasible.
On personal computer operating systems on the Intel x86 architecture, however, each compiler and assembler is free to choose its own argument-passing scheme, with the result that it is usually impossible to mix object code compiled with different compilers, even for the same language!
printf()
and scanf()
family. The 1989 C
Standard addressed this by introducing the
<stdarg.h>
header file, with
va_start(),
va_arg(),
and
va_end()
access macros to hide the nasty details of when
arguments move from registers to a vector in memory.
Since Fortran has never standardly supported routines
with a variable number of arguments, this aspect should
rarely be of concern in the C/C++-to-Fortran interface.
Like virtually all modern languages defined after the mid
1960s, C and C++ fully support recursion. Fortran 77 does
not. Fortran 90 and 95 permit it only if the functions and
subroutines involved are declared with an initial
recursive
option. This is appallingly-bad language
design, and you are advised to avoid recursive use of
Fortran code, unless you know that you will always have
compilers for Fortran 90 or later available to compile
your code, and you make careful use of the
recursive
option.