• Read complex array from Fortran binary file with Python (06/14/2013)
    • Today, I faced a tiny challenge when using Python to read complex arrays from a binary file that is generated by Fortran. Why do I need to use another language to read a binary file generated by Fortran? Why do I not use ASCII instead? The reasons are:

      (1) many of computational codes (such as PWSCF, VASP, Gaussian) that I use are written in Fortran
      (2) Python is my favorite language for post-processing tasks
      (3) unformatted binary files are chosen for saving disk space.

      I did google for help and found few useful posts, including the FortranFile class by Neil Martinsen-Burrell. None of these seemed to co-operate with me. I decided not to spend times to dig solutions out of the W3 but to find a workaround. My solution consists of 4 steps:

      (1) breaking the complex array into two equal-length real arrays
      (2) writing the two arrays in a direct access binary file
      (3) using numpy.fromfile to read the arrays
      (4) reconstructing the original complex array.

      Let's first see how to use Python read a real array in a binary file written by Fortran. I noted that numpy.fromfile could read correctly (only?) from a direct access file. Thus, the Fortran code should look like:

      PROGRAM bin_real 
      
      IMPLICIT NONE
      INTEGER , PARAMETER :: DP = 8
      REAL ( DP ) , ALLOCATABLE :: x ( : )
      INTEGER :: n, i, reclength
      
      ! Generate real data for the array x
      n = 100
      ALLOCATE( x ( n ) )
      DO i = 1, n
          x ( i ) = i**2 / 9._DP  ! Who cares?
      END DO
      
      INQUIRE ( IOLENGTH = reclength ) x
      OPEN ( 100 , FILE = 'DATA' , STATUS = 'UNKNOWN' , &
             FORM = 'UNFORMATTED' , ACCESS = 'DIRECT' , RECL = reclength )
      WRITE ( 100 , REC = 1 ) x
      CLOSE ( 100 )
      
      END PROGRAM bin_real
      

      The Python code that reads correctly the DATA file created above looks like:

      import numpy as np
      
      f = open ( 'DATA' , 'rb' )
      x = np.fromfile ( f , dtype = np.float64 , count = -1 )
      f.close()
      

      Note that the DP parameter and dtype must be compatible.

      The Fortran code below will write a complex array to DATA binary file as two real arrays:

      PROGRAM bin_cmplx 
      
      IMPLICIT NONE
      INTEGER , PARAMETER :: DP = 8
      COMPLEX ( DP ) , ALLOCATABLE :: wfc ( : )
      REAL ( DP )   , ALLOCATABLE :: wfc_real ( : ) , wfc_imag ( : )
      INTEGER :: n, i, reclength
      
      n = 100000000
      ALLOCATE( wfc ( n ) , wfc_real ( n ) , wfc_imag ( n ) )
      
      ! Generate data for the array wfc
      DO i = 1, n
          wfc ( i ) = cmplx ( 2*i + 0._DP , 2*i+1._DP )  ! Who cares?
      END DO
      
      ! Break complex array wfc into two real arrays
      wfc_real ( : ) = real ( wfc ( : ) )
      wfc_imag ( : ) = imag ( wfc ( : ) )
      
      INQUIRE ( IOLENGTH = reclength ) wfc_real
      OPEN ( 100 , FILE = 'DATA' , STATUS = 'UNKNOWN' , &
             FORM = 'UNFORMATTED' , ACCESS = 'DIRECT' , RECL = reclength )
      WRITE ( 100 , REC = 1 ) wfc_real 
      WRITE ( 100 , REC = 2 ) wfc_imag 
      CLOSE ( 100 )
      
      END PROGRAM bin_cmplx
      

      And the Python code:

      import numpy as np
      from datetime import datetime
      
      f = open ( 'DATA' , 'rb' )
      x = np.fromfile ( f , dtype = np.float64 , count = -1 )
      f.close()
      
      n = len ( x ) / 2
      
      """ break x into two arrays    """
      """ or reshape x to ( 2 , n )  """
      x = x.reshape ( 2 , n )
      
      """ Reconstruct the original complex array """
      wfc = np.zeros ( [ n ] , dtype = np.complex )
      wfc.real = x [ 0 ]
      wfc.imag = x [ 1 ]
      

      will read the above two real arrays and reconstruct the original complex array.

      Final remarks:

      (1) Since numpy.fromfile can read all records at once (with count = -1), I wrote the two arrays as two records for convenience.
      (2) For the same reason, each binary file contains data of only one complex array should it be large.
      (3) On a poor AMD node, it took 0.07 and 9.35 seconds for handling a complex array of, respectively, 1 and 100 million elements.
      (4) I guess that there are many other ways to do the same job but I did not invest my whole day for finding solutions. The above method, however, works very consistently. Should you have/know any better method and want to share, do not hesitate to email me.
Department of Physics, University of Central Florida
4111 Libra Drive - Physical Sciences Bldg. 430 - Orlando, FL 32816-2385
Fax: 407-823-5112 Email: duy.le[at]ucf.edu