- Read complex array from Fortran binary file with Python (06/14/2013)
-
Today, I faced a tiny challenge when using Python to read complex arrays from a binary file that is generated by Fortran. Why do I need to use another language to read a binary file generated by Fortran? Why do I not use ASCII instead? The reasons are:
(1) many of computational codes (such as PWSCF, VASP, Gaussian) that I use are written in Fortran
(2) Python is my favorite language for post-processing tasks
(3) unformatted binary files are chosen for saving disk space.
I did google for help and found few useful posts, including the FortranFile class by Neil Martinsen-Burrell. None of these seemed to co-operate with me. I decided not to spend times to dig solutions out of the W3 but to find a workaround. My solution consists of 4 steps:
(1) breaking the complex array into two equal-length real arrays
(2) writing the two arrays in a direct access binary file
(3) using numpy.fromfile to read the arrays
(4) reconstructing the original complex array.
Let's first see how to use Python read a real array in a binary file written by Fortran. I noted that numpy.fromfile could read correctly (only?) from a direct access file. Thus, the Fortran code should look like:
PROGRAM bin_real
IMPLICIT NONE
INTEGER , PARAMETER :: DP = 8
REAL ( DP ) , ALLOCATABLE :: x ( : )
INTEGER :: n, i, reclength
! Generate real data for the array x
n = 100
ALLOCATE( x ( n ) )
DO i = 1, n
x ( i ) = i**2 / 9._DP ! Who cares?
END DO
INQUIRE ( IOLENGTH = reclength ) x
OPEN ( 100 , FILE = 'DATA' , STATUS = 'UNKNOWN' , &
FORM = 'UNFORMATTED' , ACCESS = 'DIRECT' , RECL = reclength )
WRITE ( 100 , REC = 1 ) x
CLOSE ( 100 )
END PROGRAM bin_real
The Python code that reads correctly the DATA file created above looks like:
import numpy as np
f = open ( 'DATA' , 'rb' )
x = np.fromfile ( f , dtype = np.float64 , count = -1 )
f.close()
Note that the DP parameter and dtype must be compatible.
The Fortran code below will write a complex array to DATA binary file as two real arrays:
PROGRAM bin_cmplx
IMPLICIT NONE
INTEGER , PARAMETER :: DP = 8
COMPLEX ( DP ) , ALLOCATABLE :: wfc ( : )
REAL ( DP ) , ALLOCATABLE :: wfc_real ( : ) , wfc_imag ( : )
INTEGER :: n, i, reclength
n = 100000000
ALLOCATE( wfc ( n ) , wfc_real ( n ) , wfc_imag ( n ) )
! Generate data for the array wfc
DO i = 1, n
wfc ( i ) = cmplx ( 2*i + 0._DP , 2*i+1._DP ) ! Who cares?
END DO
! Break complex array wfc into two real arrays
wfc_real ( : ) = real ( wfc ( : ) )
wfc_imag ( : ) = imag ( wfc ( : ) )
INQUIRE ( IOLENGTH = reclength ) wfc_real
OPEN ( 100 , FILE = 'DATA' , STATUS = 'UNKNOWN' , &
FORM = 'UNFORMATTED' , ACCESS = 'DIRECT' , RECL = reclength )
WRITE ( 100 , REC = 1 ) wfc_real
WRITE ( 100 , REC = 2 ) wfc_imag
CLOSE ( 100 )
END PROGRAM bin_cmplx
And the Python code:
import numpy as np
from datetime import datetime
f = open ( 'DATA' , 'rb' )
x = np.fromfile ( f , dtype = np.float64 , count = -1 )
f.close()
n = len ( x ) / 2
""" break x into two arrays """
""" or reshape x to ( 2 , n ) """
x = x.reshape ( 2 , n )
""" Reconstruct the original complex array """
wfc = np.zeros ( [ n ] , dtype = np.complex )
wfc.real = x [ 0 ]
wfc.imag = x [ 1 ]
will read the above two real arrays and reconstruct the original complex array.
Final remarks:
(1) Since numpy.fromfile can read all records at once (with count = -1), I wrote the two arrays as two records for convenience.
(2) For the same reason, each binary file contains data of only one complex array should it be large.
(3) On a poor AMD node, it took 0.07 and 9.35 seconds for handling a complex array of, respectively, 1 and 100 million elements.
(4) I guess that there are many other ways to do the same job but I did not invest my whole day for finding solutions. The above method, however, works very consistently. Should you have/know any better method and want to share, do not hesitate to email me.
|