Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit the compiler flags used to build on gadi with the Intel compiler #12

Open
harshula opened this issue Feb 14, 2023 · 26 comments
Open
Assignees

Comments

@harshula
Copy link
Contributor

harshula commented Feb 14, 2023

Document compiler flags of the following components:

  • oasis3-mct
  • libaccessom2
  • cice5
  • mom5
@harshula
Copy link
Contributor Author

harshula commented Feb 14, 2023

oasis3-mct
util/make_dir/make.nci:

#
# Include file for OASIS3 Makefile for a Linux system using 
# Portland Group Fortran Compiler and MPICH
#
###############################################################################
#
# CHAN	: communication technique used in OASIS3 (MPI1/MPI2/NONE)
CHAN            = MPI1
#
# Paths for libraries, object files and binaries
#
# COUPLE	: path for oasis3-mct main directory
COUPLE=$(OASIS_HOME)
# ARCHDIR       : directory created when compiling
ARCHDIR= $(COUPLE)/compile_oa3-mct

#
# NETCDF library
NETCDF_INCLUDE = $(NETCDF_ROOT)/include
NETCDF_LIBRARY = -L$(NETCDF_ROOT)/lib -lnetcdf -lnetcdff
#
# Compiling and other commands
MAKE        = /usr/bin/make
F90         = mpifort
CC          = mpicc
F           = $(F90)
f90         = $(F90)
f           = $(F90)

LD          = $(F90)
AR          = ar
ARFLAGS     = -rvD

# -g is necessary in F90FLAGS and LDFLAGS for pgf90 versions lower than 6.1
# For compiling in double precision, put -r8
# For compiling in single precision, remove -r8 and add -Duse_realtype_single
NCI_INTEL_FLAGS = -r8 -i4 -traceback -fpe0 -convert big_endian -fno-alias -ip -check noarg_temp_created
NCI_REPRO_FLAGS = -fp-model precise -fp-model source -align all
ifeq ($(DEBUG), yes)
    NCI_DEBUG_FLAGS = -g3 -O0 -fpe0 -no-vec -debug all -check all -no-vec
    F90FLAGS_1      = $(NCI_INTEL_FLAGS) $(NCI_REPRO_FLAGS) $(NCI_DEBUG_FLAGS)
    CPPDEF          = -Duse_netCDF -Duse_comm_$(CHAN) -DTREAT_OVERLAY -DDEBUG -D__VERBOSE
    MCT_FCFLAGS     = $(NCI_REPRO_FLAGS) $(NCI_DEBUG_FLAGS) -ip
else
    NCI_OPTIM_FLAGS = -g3 -O2 -axCORE-AVX2 -debug all -check none -qopt-report=5 -qopt-report-annotate
    F90FLAGS_1      = $(NCI_INTEL_FLAGS) $(NCI_REPRO_FLAGS) $(NCI_OPTIM_FLAGS)
    CPPDEF          = -Duse_netCDF -Duse_comm_$(CHAN) -DTREAT_OVERLAY
    MCT_FCFLAGS     = $(NCI_REPRO_FLAGS) $(NCI_OPTIM_FLAGS) -ip
endif
f90FLAGS_1  = $(F90FLAGS_1)
FFLAGS_1    = $(F90FLAGS_1)
fFLAGS_1    = $(F90FLAGS_1)
CCFLAGS_1   = 
LDFLAGS     = 

#
###################
#
# Additional definitions that should not be changed
#
FLIBS		= $(NETCDF_LIBRARY)
# BINDIR        : directory for executables
BINDIR          = $(ARCHDIR)/bin
# LIBBUILD      : contains a directory for each library
LIBBUILD        = $(ARCHDIR)/build/lib
# INCPSMILE     : includes all *o and *mod for each library
INCPSMILE       = -I$(LIBBUILD)/psmile.$(CHAN) -I$(LIBBUILD)/pio  -I$(LIBBUILD)/mct 

F90FLAGS  = $(F90FLAGS_1) $(INCPSMILE) $(CPPDEF) -I$(NETCDF_INCLUDE)
f90FLAGS  = $(f90FLAGS_1) $(INCPSMILE) $(CPPDEF) -I$(NETCDF_INCLUDE)
FFLAGS    = $(FFLAGS_1) $(INCPSMILE) $(CPPDEF) -I$(NETCDF_INCLUDE)
fFLAGS    = $(fFLAGS_1) $(INCPSMILE) $(CPPDEF) -I$(NETCDF_INCLUDE)
CCFLAGS   = $(CCFLAGS_1) $(INCPSMILE) $(CPPDEF) -I$(NETCDF_INCLUDE)
#
#############################################################################

@harshula
Copy link
Contributor Author

libaccessom2
CMakeLists.txt:

# compiler flags for gfortran
if(CMAKE_Fortran_COMPILER_ID MATCHES GNU)
  set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -std=f2008 -Wall -fdefault-real-8 -ffpe-trap=invalid,zero,overflow")
  set(CMAKE_Fortran_FLAGS_DEBUG "-O0 -g -pg -fbounds-check -fbacktrace")
  set(CMAKE_Fortran_FLAGS_RELEASE "-O3")
endif()

# compiler flags for ifort
if(CMAKE_Fortran_COMPILER_ID MATCHES Intel)
  set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -r8 -fpe0 -fp-model precise -fp-model source -align all -traceback")
  set(CMAKE_Fortran_FLAGS_DEBUG "-g3 -O0 -check all")
  set(CMAKE_Fortran_FLAGS_RELEASE "-g3 -O2 -axCORE-AVX2 -debug all -check none -qopt-report=5 -qopt-report-annotate")
endif()

@harshula
Copy link
Contributor Author

cice5
bld/Macros.nci

#==============================================================================
# Makefile macros for gadi.nci.org.au
#==============================================================================

INCLDIR    := -I.
SLIBS      :=
ULIBS      :=
CPP        := cpp
FC         := mpifort

CPPFLAGS   := -P -traditional
CPPDEFS    := -DLINUX -DPAROPT
CFLAGS     := -c -O2
FIXEDFLAGS := -132
FREEFLAGS  :=

NCI_INTEL_FLAGS := -r8 -i4 -traceback -w -fpe0 -ftz -convert big_endian -assume byterecl -check noarg_temp_created
NCI_REPRO_FLAGS := -fp-model precise -fp-model source -align all
ifeq ($(DEBUG), 1)
    NCI_DEBUG_FLAGS := -g3 -O0 -debug all -check all -no-vec -assume nobuffered_io
    FFLAGS          := $(NCI_INTEL_FLAGS) $(NCI_REPRO_FLAGS) $(NCI_DEBUG_FLAGS)
    CPPDEFS         := $(CPPDEFS) -DDEBUG=$(DEBUG)
else
    NCI_OPTIM_FLAGS := -g3 -O2 -axCORE-AVX2 -debug all -check none -qopt-report=5 -qopt-report-annotate -assume buffered_io
    FFLAGS          := $(NCI_INTEL_FLAGS) $(NCI_REPRO_FLAGS) $(NCI_OPTIM_FLAGS)
endif

MOD_SUFFIX := mod
LD         := $(FC)
LDFLAGS    := $(FFLAGS) -v

CPPDEFS :=  $(CPPDEFS) -DNXGLOB=$(NXGLOB) -DNYGLOB=$(NYGLOB) \
            -DNUMIN=$(NUMIN) -DNUMAX=$(NUMAX) \
            -DTRAGE=$(TRAGE) -DTRFY=$(TRFY) -DTRLVL=$(TRLVL) \
            -DTRPND=$(TRPND) -DNTRAERO=$(NTRAERO) -DTRBRI=$(TRBRI) \
            -DNBGCLYR=$(NBGCLYR) -DTRBGCS=$(TRBGCS) \
            -DNICECAT=$(NICECAT) -DNICELYR=$(NICELYR) \
            -DNSNWLYR=$(NSNWLYR) \
            -DBLCKX=$(BLCKX) -DBLCKY=$(BLCKY) -DMXBLCKS=$(MXBLCKS)

ifeq ($(COMMDIR), mpi)
   SLIBS   :=  $(SLIBS) -lmpi
endif

ifeq ($(DITTO), yes)
   CPPDEFS :=  $(CPPDEFS) -DREPRODUCIBLE
endif
ifeq ($(BARRIERS), yes)
   CPPDEFS :=  $(CPPDEFS) -Dgather_scatter_barrier
endif

ifeq ($(IO_TYPE), netcdf)
   CPPDEFS :=  $(CPPDEFS) -Dncdf
   INCLDIR := $(INCLDIR) -I$(NETCDF_ROOT)/include
   SLIBS   := $(SLIBS) -L$(NETCDF_ROOT)/lib -lnetcdf -lnetcdff
endif

ifeq ($(IO_TYPE), pio)
   CPPDEFS :=  $(CPPDEFS) -Dncdf -DPIO
   INCLDIR := $(INCLDIR) -I$(NETCDF_ROOT)/include
   SLIBS   := $(SLIBS) -L$(NETCDF_ROOT)/lib -lnetcdf -lnetcdff
   SLIBS   := $(SLIBS) -L$(SRCDIR)/ParallelIO/build/lib/ -lpiof -lpioc -Wl,-rpath=$(SRCDIR)/ParallelIO/build/lib/
endif

ifeq ($(USE_ESMF), yes)
   CPPDEFS :=  $(CPPDEFS) -Duse_esmf
   INCLDIR :=  $(INCLDIR) -I ???
   SLIBS   :=  $(SLIBS) -L ??? -lesmf -lcprts -lrt -ldl
endif

ifeq ($(AusCOM), yes)
   CPPDEFS := $(CPPDEFS) -DAusCOM -Dcoupled
   INCLDIR := $(INCLDIR) $(CPL_INCS) $(LIBAUSCOM_INCS)
   SLIBS   := $(SLIBS) $(CPLLIBS)
endif

ifeq ($(UNIT_TESTING), yes)
   CPPDEFS := $(CPPDEFS) -DUNIT_TESTING
endif
ifeq ($(ACCESS), yes)
   CPPDEFS := $(CPPDEFS) -DACCESS
endif
# standalone CICE with AusCOM mods
ifeq ($(ACCICE), yes)
   CPPDEFS := $(CPPDEFS) -DACCICE
endif
# no MOM just CICE+UM
ifeq ($(NOMOM), yes)
   CPPDEFS := $(CPPDEFS) -DNOMOM
endif
ifeq ($(OASIS3_MCT), yes)
   CPPDEFS := $(CPPDEFS) -DOASIS3_MCT
endif

@harshula
Copy link
Contributor Author

mom5
bin/mkmf.template.nci

# Template for the NCI (nf.nci.org.au) machines. Uses intel compiler and OpenMPI.
# typical use with mkmf
# mkmf -t template.ifc -c"-Duse_libMPI -Duse_netCDF" path_names /usr/local/include
############
# commands #
############
ifeq ($(VTRACE), yes)
    FC := mpifort-vt
    LD := mpifort-vt
else
    FC := mpifort
    LD := mpifort
endif

CC := mpicc

#########
# flags #
#########
VERBOSE :=
OPT := on

MAKEFLAGS += -j

INCLUDE   := -I$(NETCDF_ROOT)/include

ifneq ($(LIBACCESSOM2_ROOT),)
INCLUDE  += -I$(LIBACCESSOM2_ROOT)/oasis3-mct/Linux/build/lib/psmile.MPI1 \
            -I$(LIBACCESSOM2_ROOT)/oasis3-mct/Linux/build/lib/mct \
            -I$(LIBACCESSOM2_ROOT)/build/include
endif

ifneq ($(OASIS_ROOT),)
INCLUDE  += -I$(OASIS_ROOT)/Linux/build/lib/psmile.MPI1 \
            -I$(OASIS_ROOT)/Linux/build/lib/pio \
            -I$(OASIS_ROOT)/Linux/build/lib/mct
endif

FPPFLAGS := -fpp -Wp,-w $(INCLUDE)
FFLAGS := -fno-alias -safe-cray-ptr -fpe0 -ftz -assume byterecl -i4 -r8 -traceback -nowarn -check noarg_temp_created -assume nobuffered_io -convert big_endian -grecord-gcc-switches -align all
FFLAGS_OPT := -g3 -O2 -xCORE-AVX2 -debug all -check none
FFLAGS_REPORT := -qopt-report=5 -qopt-report-annotate
FFLAGS_DEBUG := -g3 -O0 -debug all -check -check noarg_temp_created -check nopointer -warn -warn noerrors -ftrapuv
FFLAGS_REPRO := -fp-model precise -fp-model source -align all
FFLAGS_VERBOSE := -v -V -what

CFLAGS := -D__IFC $(INCLUDE)
CFLAGS_OPT := -O2 -debug minimal -xCORE-AVX2
CFLAGS_REPORT := -qopt-report=5 -qopt-report-annotate
CFLAGS_DEBUG := -O0 -g -ftrapuv -traceback
CFLAGS_REPRO := -fp-model precise -fp-model source

LDFLAGS :=
LDFLAGS_VERBOSE := -Wl,-V,--verbose,-cref,-M

ifneq ($(REPRO),)
CFLAGS += $(CFLAGS_REPRO)
FFLAGS += $(FFLAGS_REPRO)
endif

ifneq ($(DEBUG),)
CFLAGS += $(CFLAGS_DEBUG)
FFLAGS += $(FFLAGS_DEBUG)
else
CFLAGS += $(CFLAGS_OPT)
FFLAGS += $(FFLAGS_OPT)
endif

ifneq ($(VERBOSE),)
CFLAGS += $(CFLAGS_VERBOSE)
FFLAGS += $(FFLAGS_VERBOSE)
LDFLAGS += $(LDFLAGS_VERBOSE)
endif

ifneq ($(REPORT),)
CFLAGS += $(CFLAGS_REPORT)
FFLAGS += $(FFLAGS_REPORT)
endif

LIBS := -L$(NETCDF_ROOT)/lib -lnetcdf -lnetcdff \

ifneq ($(OASIS_ROOT),)
LIBS += -L$(OASIS_ROOT)/Linux/lib -lpsmile.MPI1 -lmct -lmpeu -lscrip
endif

ifneq ($(LIBACCESSOM2_ROOT),)
LIBS += -L$(LIBACCESSOM2_ROOT)/build/lib -laccessom2
endif

LDFLAGS += $(LIBS)

#---------------------------------------------------------------------------
# you should never need to change any lines below.

# see the MIPSPro F90 manual for more details on some of the file extensions
# discussed here.
# this makefile template recognizes fortran sourcefiles with extensions
# .f, .f90, .F, .F90. Given a sourcefile <file>.<ext>, where <ext> is one of
# the above, this provides a number of default actions:

# make <file>.opt	create an optimization report
# make <file>.o		create an object file
# make <file>.s		create an assembly listing
# make <file>.x		create an executable file, assuming standalone
#			source
# make <file>.i		create a preprocessed file (for .F)
# make <file>.i90	create a preprocessed file (for .F90)

# The macro TMPFILES is provided to slate files like the above for removal.

RM := rm -f
SHELL := /bin/csh -f
TMPFILES := .*.m *.B *.L *.i *.i90 *.l *.s *.mod *.opt

.SUFFIXES: .F .F90 .H .L .T .f .f90 .h .i .i90 .l .o .s .opt .x

.f.L:
	$(FC) $(FFLAGS) -c -listing $*.f
.f.opt:
	$(FC) $(FFLAGS) -c -opt_report_level max -opt_report_phase all -opt_report_file $*.opt $*.f
.f.l:
	$(FC) $(FFLAGS) -c $(LIST) $*.f
.f.T:
	$(FC) $(FFLAGS) -c -cif $*.f
.f.o:
	$(FC) $(FFLAGS) -c $*.f
.f.s:
	$(FC) $(FFLAGS) -S $*.f
.f.x:
	$(FC) $(FFLAGS) -o $*.x $*.f *.o $(LDFLAGS)
.f90.L:
	$(FC) $(FFLAGS) -c -listing $*.f90
.f90.opt:
	$(FC) $(FFLAGS) -c -opt_report_level max -opt_report_phase all -opt_report_file $*.opt $*.f90
.f90.l:
	$(FC) $(FFLAGS) -c $(LIST) $*.f90
.f90.T:
	$(FC) $(FFLAGS) -c -cif $*.f90
.f90.o:
	$(FC) $(FFLAGS) -c $*.f90
.f90.s:
	$(FC) $(FFLAGS) -c -S $*.f90
.f90.x:
	$(FC) $(FFLAGS) -o $*.x $*.f90 *.o $(LDFLAGS)
.F.L:
	$(FC) $(CPPDEFS) $(FPPFLAGS) $(FFLAGS) -c -listing $*.F
.F.opt:
	$(FC) $(CPPDEFS) $(FPPFLAGS) $(FFLAGS) -c -opt_report_level max -opt_report_phase all -opt_report_file $*.opt $*.F
.F.l:
	$(FC) $(CPPDEFS) $(FPPFLAGS) $(FFLAGS) -c $(LIST) $*.F
.F.T:
	$(FC) $(CPPDEFS) $(FPPFLAGS) $(FFLAGS) -c -cif $*.F
.F.f:
	$(FC) $(CPPDEFS) $(FPPFLAGS) -EP $*.F > $*.f
.F.i:
	$(FC) $(CPPDEFS) $(FPPFLAGS) -P $*.F
.F.o:
	$(FC) $(CPPDEFS) $(FPPFLAGS) $(FFLAGS) -c $*.F
.F.s:
	$(FC) $(CPPDEFS) $(FPPFLAGS) $(FFLAGS) -c -S $*.F
.F.x:
	$(FC) $(CPPDEFS) $(FPPFLAGS) $(FFLAGS) -o $*.x $*.F *.o $(LDFLAGS)
.F90.L:
	$(FC) $(CPPDEFS) $(FPPFLAGS) $(FFLAGS) -c -listing $*.F90
.F90.opt:
	$(FC) $(CPPDEFS) $(FPPFLAGS) $(FFLAGS) -c -opt_report_level max -opt_report_phase all -opt_report_file $*.opt $*.F90
.F90.l:
	$(FC) $(CPPDEFS) $(FPPFLAGS) $(FFLAGS) -c $(LIST) $*.F90
.F90.T:
	$(FC) $(CPPDEFS) $(FPPFLAGS) $(FFLAGS) -c -cif $*.F90
.F90.f90:
	$(FC) $(CPPDEFS) $(FPPFLAGS) -EP $*.F90 > $*.f90
.F90.i90:
	$(FC) $(CPPDEFS) $(FPPFLAGS) -P $*.F90
.F90.o:
	$(FC) $(CPPDEFS) $(FPPFLAGS) $(FFLAGS) -c $*.F90
.F90.s:
	$(FC) $(CPPDEFS) $(FPPFLAGS) $(FFLAGS) -c -S $*.F90
.F90.x:
	$(FC) $(CPPDEFS) $(FPPFLAGS) $(FFLAGS) -o $*.x $*.F90 *.o $(LDFLAGS)

@harshula harshula changed the title Document Compiler Flags Audit the compiler flags used to build on gadi with the Intel compiler Feb 16, 2023
@access-hive-bot
Copy link

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/audit-the-compiler-flags-used-to-build-access-om2-on-gadi-with-the-intel-compiler/437/1

@aidanheerdegen
Copy link
Member

The -opt_report_level flags should never been kept in the production build. They produce a huge amount of profiling information that isn't relevant unless you're interested in profiling the code.

@aidanheerdegen
Copy link
Member

Note that I am not volunteering to do this, but it would be good to extract out the exact flags used in each model and chuck it in a table or similar to be able to compare them across models.

@aekiss
Copy link
Contributor

aekiss commented Feb 20, 2023

might be easiest to build access-om2 and copy from what it echoes to the terminal

@harshula
Copy link
Contributor Author

harshula commented Mar 2, 2023

Notes

https://www.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top/compiler-reference/compiler-options/code-generation-options/march.html

march
...
Default
pentium4
If no architecture option is specified, value pentium4 is used by the compiler to generate code.
...
Description
This option tells the compiler to generate code for processors that support certain features. If you specify both the -ax and -march options, the compiler will not generate Intel-specific instructions.
Options -x and -march are mutually exclusive. If both are specified, the compiler uses the last one specified and generates a warning.
Specifying -march=pentium4 sets -mtune=pentium4.
For compatibility, a number of historical processor values are also supported, but the generated code will not differ from the default.

@harshula
Copy link
Contributor Author

harshula commented Mar 3, 2023

Notes
-no-vec does not appear to override vectorization via -march. Worth further investigation.

@harshula
Copy link
Contributor Author

harshula commented Mar 3, 2023

Notes
Using the argument target=x86_64 when building a Spack package causes SPACK_TARGET_ARGS='-march=pentium4 -mtune=generic'. This appears to override -axCORE-AVX2. Worth further investigation. Setting no target on Gadi causes SPACK_TARGET_ARGS='-march=cascadelake -mtune=cascadelake'

@harshula
Copy link
Contributor Author

harshula commented Mar 6, 2023

Notes
Instruction Set support on various architectures as understood by Spack:
https://github.com/spack/spack/blob/develop/lib/spack/external/archspec/json/cpu/microarchitectures.json

@aidanheerdegen
Copy link
Member

-qopt-report=5 and -qopt-report-annotate should not be used by default. I see that the mom5 model uses a separate compile flag option (REPORT) to enable this, which seems a reasonable approach to adopt for all the models. This would require a variant to pass this flag to the build through spack.

@harshula
Copy link
Contributor Author

Hi @penguian & @micaeljtoliveira , your feedback will be appreciated.

@micaeljtoliveira
Copy link
Member

Hi, I'm afraid I don't have much else to add. I would add backtrace by default to all builds, as it has no performance penalty and it can be useful to track down why some calculation failed in production. You can also consider adding any check done by the compiler at compile time. Their only caveat is that they make compilation slower.

@harshula harshula self-assigned this Apr 4, 2023
@harshula
Copy link
Contributor Author

harshula commented Apr 5, 2023

-qopt-report=5 and -qopt-report-annotate should not be used by default.

In oasis3-mct, it was added via commit ACCESS-NRI/oasis3-mct@601de2c
In libaccessom2, it was added via commit ACCESS-NRI/libaccessom2@e85298d
In cice5, it was added via commit ACCESS-NRI/cice5@c4cca23
In mom5, it was added via commit ACCESS-NRI/MOM5@11cebea

@harshula
Copy link
Contributor Author

harshula commented May 2, 2023

Notes

  1. I've created a new variant name "deterministic" for all 4 SPD files.
  2. I've removed "-traceback" from NCI_INTEL_FLAGS and added it explicitly to NCI_OPTIM_FLAGS and NCI_DEBUG_FLAGS. Now, NCI_INTEL_FLAGS is a common set of flags for all variants.

@harshula
Copy link
Contributor Author

-qopt-report=5 and -qopt-report-annotate should not be used by default.

Just spoke with @penguian, these flags were not intended to be enabled by default.

@harshula
Copy link
Contributor Author

harshula commented Aug 7, 2023

@aidanheerdegen asked the question, "Is -O2 non-deterministic?"
@micaeljtoliveira answered, "Yes, it is. You can still use -O2, but then you need to specify the floating point model to make it deterministic. See the Intel compiler manual. You might also want to set -qopt-dynamic-align (see here)."

Currently, I default to -O0 for deterministic builds. In the future we should try to use -O2 with additional flags for deterministic builds.

@harshula
Copy link
Contributor Author

@harshula
Copy link
Contributor Author

@penguian and I were chatting about -xHost, -xCORE-AVX2 and -axCORE-AVX2. It is likely we'll have to change some of the compiler flags below:

$ rg -i avx2 *
packages/mom5/package.py
59:        FFLAGS_OPT = "-g3 -O2 -xCORE-AVX2 -debug all -check none -traceback"
60:        CFLAGS_OPT = "-O2 -debug minimal -xCORE-AVX2"
62:            FFLAGS_OPT = "-g0 -O0 -xCORE-AVX2 -debug none -check none"
63:            CFLAGS_OPT = "-O0 -debug none -xCORE-AVX2"

packages/cice5/package.py
110:        NCI_OPTIM_FLAGS = "-g3 -O2 -axCORE-AVX2 -debug all -check none -traceback -assume buffered_io"
113:            NCI_OPTIM_FLAGS = "-g0 -O0 -axCORE-AVX2 -debug none -check none -assume buffered_io"

packages/oasis3-mct/package.py
218:        NCI_OPTIM_FLAGS = "-g3 -O2 -axCORE-AVX2 -debug all -check none -traceback"
221:            NCI_OPTIM_FLAGS = "-g0 -O0 -axCORE-AVX2 -debug none -check none"
$ rg -i xhost *packages/cice4/package.py
93:    FFLAGS     := -r8 -i4 -O0 -g -align all -w -ftz -convert big_endian -assume byterecl -no-vec -xHost -fp-model precise
95:    FFLAGS     := -r8 -i4 -O2 -align all -w -ftz -convert big_endian -assume byterecl -no-vec -xHost -fp-model precise

@harshula
Copy link
Contributor Author

Hi @manodeep , Do you want start with this isssue?

@manodeep
Copy link

Thanks @harshula! Is the fortran compiler ifort or the newer (llvm-based) ifx?

Regarding the -xHost, -xCORE-AVX2 and -axCORE-AVX2 options above:

  • -xHost is effectively -march=native for the CPU where the code is being compiled, and if the runtime CPU is a lower-spec, then a SIGILL (illegal instruction) may be generated at runtime.
  • -xCORE-AVX2 is similar but uses AVX2 instruction set which can also get a runtime SIGILL if the runtime CPU does not support AVX2. Since all the GADI cpus support AVX2, this should not happen.
  • -axCORE-AVX2 - this option is to create a "fat binary" with runtime dispatch based on the detected runtime CPU. Each (some?) function has multiple generated copies targeting different instruction sets, and a runtime CPU feature detection offloads the execution to the appropriate (highest) suitable kernel. This option is most useful when used in conjunction with multiple instruction sets - e.g., -axCORE-AVX2,skylake-avx512,cascadelake,sapphirerapids etc (unverified syntax, archnames are here). Note, this definitely bloats up the library size but represents a good balance between performance and portability - where the same binary can run "optimally" on multiple runtime hardware. The -ax* options get overridden by any -x* option that might be present during compilation.

@harshula
Copy link
Contributor Author

Hi @manodeep , ifort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

6 participants