MPI - Parallel Run

I am working with weather forecast model and use local Linux for test (before a submmit it to a cluster). I'd install the mpich (MPI lib) and compile the model with mpif90/mpicc (Fortran and C language). The binary was created. I did run the model with one processor using mpirun. But when I try to run with 2 or more (my laptop has 4) I receive a segmentation fault. Using xubuntu distro the process run correctly. ANy Ideia?

The first thing that comes to mind is to talk to your local research computing personnel as dealing with this stuff is their day job.

The second thing is that you'll need to provide far more detail about what you're doing for anyone to have any idea what's happening, e.g. provide a "minimal working example" with replication steps.

2 Likes

Hi Jonathon

I need to solve it myself, with no computing personnel. The complete code is in github with install manuals. GitHub - luiz-flavio-rodrigues/brams

Someone may get it and try to do the same as me, install the prerequisites and than install (compile) the model. But the compile works and run with 1 processor works. The question is that when I try in parallel with more than one processor the model crash.

Seems to me that is a memory management at Garuda distro.

Thanks.

Why?

I'll have a look because I'm interested in this sort of thing.

1 Like

Because I have no support.

Are you using GCC 11 or 8? The documentation recommends GCC 8.4, and 8.5 is in the AUR: AUR (en) - gcc8

Interestingly, the configure step fails for me,

config.status: creating Makefile
config.status: creating Make_utils
config.status: error: cannot find input file: `../src/jules/LIB/Makefile.in'

and I can't see where that file should be or what might create it.

Why do you think it's a memory management issue, and why is this a Garuda-specific thing?

Does your supervisor know anyone who might be able to provide help locally?

2 Likes

No. I have not. The error You got "Jules Makefile" must be postponed. Jules isnt used with this Makefile. You can do a make.

How can someone follow the build instructions if they don't actually work? It seems like this should be filed as a bug with the project. :thinking:

If you skip the configure step then you're using the Makefile produced by the author's system with their software and compilers. It's quite possible issues will be created due to different or incorrect libraries being used. For example, when configuring the first dependency mpich:

configure: error: The Fortran compiler gfortran does not accept programs that call the same routine with arguments of different types without the option -fallow-argument-mismatch.  Rerun configure with FFLAGS=-fallow-argument-mismatch
1 Like

The fortran since version 10 need the ```
-fallow-argument-mismatch

in fflag.

Yes, that's what the error message says, and reinforces the point that you're building against a different set of software and libraries than the authors.

2 Likes

Latter I will try find why the system crash with more processors. The same set of libs and compilers works in xubuntu in the same machine.

Which compiler versions does your edition of Xubuntu come with? Which version of Xubuntu are you using?

Meh. Prerequisites don't compile, failure at the grib2 libs stage,

make[1]: Entering directory '/build/brams/dist/grib2/ftn_api'
gfortran -c -O3 -c wgrib2lowapi.f90
gfortran -c -O3 -c wgrib2lowapi.f90
gfortran -c -O3 -c wgrib2api.f90
during RTL pass: expand
wgrib2api.f90:881:16:

  881 |             if (max0(nnx,one)*max0(nny,one) .ne. ndata) then
      |                ^
internal compiler error: in expand_fix, at optabs.c:5532
0x16db558 internal_error(char const*, ...)
	???:0
0x66b7a1 fancy_abort(char const*, int, char const*)
	???:0
0x9588c7 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier)
	???:0

I suspect the best approach is to use a distro that's similar to that used by the authors, whether that's installed, in a VM, or in a container.

The model can run without grib2. Dont use it. I use to compile with gfortran 8.4, gcc 8.4. In this case You cant use -fallow-argument-mismatch . But the last one I make with version 11. Its works too.

The grib2 is only need if You will use data input from GFS model. But I can send You data pre-built in other version

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.