I don't understand how to fix the MPI code for the job requirements

Question

1.00/5 (4 votes)

See more:

Task: Compose a program using blocking and non-blocking operations according to the variant. Ensure that operations are executed in several processes. The distribution of initial data must be performed using non-blocking operations, and the collection of results must be performed using blocking operations.
b=min(A+C)

UPD:
Made one rank check in the code

#include <mpi.h>
#include <iostream>
#include <cstdlib>
#include <ctime>
#include <cfloat>
#include <algorithm>

#define N 13

int main(int argc, char* argv[])
{
    int rank, size;
    double* A = 0, * C = 0, localMin = DBL_MAX;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    int pSize = N / size;      // Number of data points per process
    int remainder = N % size;  // Distribute remaining data points evenly

    // Consider remaining data points
    if (rank < remainder) {
        pSize++;
    }

    std::cout << "Proc" << rank << ":  pSize=" << pSize << std::endl;

    // Master process: distribution of data to processes (asynchronous)
    MPI_Request* requestA = new MPI_Request[size - 1];
    MPI_Request* requestC = new MPI_Request[size - 1];

    if (rank == 0) {
        srand((unsigned)time(0));
        A = new double[N];
        C = new double[N];
        for (int i = 0; i < N; i++) {
            A[i] = (rand() % 20) / 2.;
            C[i] = (rand() % 20) / 2.;
            std::cout << i << ". sum:" << A[i] + C[i] << std::endl;
        }

        int offset = pSize;
        for (int i = 1; i < size; i++) {
            int send_count = (remainder == 0 || i < remainder) ? pSize : pSize - 1;
            MPI_Isend(A + offset, send_count, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &requestA[i - 1]);
            MPI_Isend(C + offset, send_count, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &requestC[i - 1]);
            offset += send_count;
        }

        MPI_Waitall(size - 1, requestA, MPI_STATUSES_IGNORE);
        MPI_Waitall(size - 1, requestC, MPI_STATUSES_IGNORE);

        double globalMin = localMin;

        for (int i = 0; i < pSize; i++) {
            double temp = A[i] + C[i];
            globalMin = std::min(globalMin, temp);
        }

        for (int i = 1; i < size; i++) {
            double receivedMin;
            // Collect data from other processes (blocking)
            MPI_Recv(&receivedMin, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
            globalMin = std::min(globalMin, receivedMin);
        }
        std::cout << "Minimum min(A+C) = " << globalMin << std::endl;
    }
    else {
        // Blocking receive from A and C
        A = new double[pSize];
        C = new double[pSize];
        MPI_Recv(A, pSize, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
        MPI_Recv(C, pSize, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

        for (int i = 0; i < pSize; i++) {
            double temp = A[i] + C[i];
            localMin = std::min(localMin, temp);
        }

        // Send local result of the process to master
        MPI_Send(&localMin, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
    }

    delete[] A;
    delete[] C;
    delete[] requestA;
    delete[] requestC;
    MPI_Finalize();
}

What I have tried:

I've tried everything and I don't know what's wrong.

Posted 7-Nov-23 7:32am

w4de

Updated 11-Nov-23 0:33am

v10

Add a Solution

Comments

merano99 10-Nov-23 15:11pm

"UPD: the teacher said that there should be only one check if(rank==0) - is it realistic to do so?"
To implement an MPI program with only one check if(rank==0), you could execute the same code in both the if and else branches. I think this is (gross) nonsense, but of course it would work.
The program will be longer, more confusing and probably more difficult to maintain, but if there is a good grade for it, I wouldn't argue with him.
Of course, it could also be that the teacher wants to see if you can do it yourself, because no one will do that for you.

w4de 11-Nov-23 6:35am

I've updated the code in the thread, please see if I've done one rank check correctly and in general if there are any errors somewhere else in the code. Thank you for helping me.

merano99 11-Nov-23 9:46am

It seems to work for now, but there are still shortcomings, here are some examples:
1. When declaring the MPI_Request variables, memory and effort should be reduced.
2. The calculation of globalMin should be implemented in such a way that time is not wasted.
3. Many comments are missing that could improve understanding.
4. Almost all evaluations of return values are missing

And also remember to confirm all solutions that were useful, unfortunately this has been forgotten so far.

w4de 15-Nov-23 6:02am

he asked - what command for receiving data should work in parallel with Isend? and said that the collection on 0 with the help of blockers should be............

w4de 16-Nov-23 1:46am

help please

merano99 16-Nov-23 2:19am

I have answered your original questions several times in the meantime and the hints have not yet been accepted as a solution. If you keep changing the requirements for the original question, it is very unsatisfactory. It would be better to leave the original question for review and reward the solutions with stars first. Then you can post a new question and ask it again under the changed conditions. Simply replacing your original code with my solution so that my answer looks funny is not a sensible approach.

w4de 16-Nov-23 3:19am

I'll do it now.

3 solutions

Solution 1

Well, this "question" is unanswerable as there is nothing to work with at all. "It's still broken in some way" (whatever IT is) is not anything anyone can work with.

Posted 7-Nov-23 8:23am

Dave Kreskowiak

Solution 2

If your teacher is not specifying the exact issue in your code, it might be helpful to review your code and see if there are any possible errors or improvements that can be made. You can consult the link to figure it out and complete your assignment.
Blocking and Non-Blocking Algorithms – MC++ BLOG[^]

Posted 7-Nov-23 8:54am

M Imran Ansari

Comments

merano99 7-Nov-23 18:05pm

The linked article deals with C++ threads, but the questioner uses MPI, i.e. (distributed) processes - not threads.

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

merano99 · Accepted Answer · 2023-11-07T12:01:00

You have tried to implement a tiny part of the suggestions, but unfortunately neither completely nor correctly.

The process with rank 0, for example, is usually responsible for distributing the task.
You are now trying to distribute N data to size processes as follows.

C++

if (!rank)
    {
    ...
    if (rank < remainder) {
       pSize++;
    }

It already starts with the fact that it is unhealthy to write if(!rank). Here, too, it probably goes wrong again. What value does pSize have and how much data is distributed if N=20 and size=8?

I had suggested calculating pSize per process ...

Then the process with rank 0 should also complete its task at the same time as all the others, but in your case it first waits for all of them and only then does process 0 start working, simply bad.

C++

MPI_Waitall(3, request, MPI_STATUSES_IGNORE);
    
for (int i = (size - 1) * pSize; i < N; i++) {
   double temp = A[i] + C[i];
   localMin = min(localMin, temp);
}

I had suggested here

https://www.codeproject.com/Answers/5370748/I-dont-understand-how-to-fix-the-MPI-code-for-the#answer2

a (usual) MPI flow that avoids code redundancy, keeps everything parallel and would completely avoid the need for Wait. It would be good to use this route.

I therefore only refer to my suggestion again. Your program has so many quirks that it is obvious that the teacher cannot be satisfied with it.

// edit:
There are various problems with your design. Since more help seems to be needed here, I'll provide a slightly longer draft. I have intentionally left out some details so that it does not become a copy & paste solution. The design distributes a possible remainder to the first processes, lets process 0 be calculated and does the sending asynchronously and blocking the receiving. I also tested the following framework with mpic++ under Ubuntu.

C++

#define N 13

int main(int argc, char* argv[])
{
    int rank, size;
    double *A, *C;
    double localMin = DBL_MAX;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    srand((unsigned)time(0));

    // Master process: Initialization of the data
    if (rank == 0) {
        A = new double[N];
        C = new double[N];

        for (int i = 0; i < N; i++) {
            A[i] = (rand() % 20) / 2.;
            C[i] = (rand() % 20) / 2.;
            cout << i << ". sum:" << A[i] + C[i] << endl;
        }
    }

    int pSize = N / size;      // Number of data points per process
    int remainder = N % size;  // Distribute remaining data points evenly

    // Consider remaining data points
    if (rank < remainder) {
        pSize++;
    }

   cout << "Proc" << rank << ":  pSize=" << pSize << endl;

   // Master process: distribution of data to processes (asynchronous)
   // Note: Alternatively, you could also use scatter
   MPI_Request requestA[size - 1], requestC[size - 1];
   if (rank == 0) {
        int offset = pSize;
        for (int i = 1; i < size; i++) {
            int send_count = (remainder == 0 || i < remainder) ? pSize : pSize - 1;
            MPI_Isend(A + offset, ...);
            MPI_Isend(C + offset, ...);
            offset += send_count;
        }
    }
    else {
        // Blocking receive from A and C
        A = new double[pSize];
        C = new double[pSize];
        MPI_Recv(A, pSize, ...);
        MPI_Recv(C, pSize, ...);
    }

    for (int i = 0; i < pSize; i++) {
        double temp = A[i] + C[i];
        localMin = min(localMin, temp);
    }

    // Collecting the local minima (blocking)

    // 1. with MPI_Reduce (optimized solution was not wanted
    
    // 2. with blocking MPI_Send and MPI_Recv instead of MPI_Reduce
    if (rank == 0) {
        // Wait for completion of the MPI_Isend processes (optional here)
        MPI_Waitall(size - 1, requestA, MPI_STATUSES_IGNORE);
        MPI_Waitall(size - 1, requestC, MPI_STATUSES_IGNORE);

        double globalMin = localMin;
        for (int i = 1; i < size; i++) {
            double receivedMin;
            // Collect data from other processes (blocking)
            MPI_Recv(&receivedMin, ...);
            globalMin = min(globalMin, receivedMin);
        }
        cout << "Minimum min(A+C) = " << globalMin << endl;
    }
    else {
       // Send local result of the process to master
       MPI_Send(&localMin, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
    }

    delete[] A;
    delete[] C;
    MPI_Finalize();
}

With N=13 and 5 Processes i get the Output

Proc2:  pSize=3
0. sum:4
1. sum:9
2. sum:15.5
3. sum:13
4. sum:2.5
5. sum:8
6. sum:5
7. sum:13
Proc1:  pSize=3
8. sum:12.5
9. sum:10
10. sum:10.5
11. sum:2.5
12. sum:17.5
Proc0:  pSize=3
Proc4:  pSize=2
Proc3:  pSize=2
Minimum min(A+C) = 2.5