You have tried to implement a tiny part of the suggestions, but unfortunately neither completely nor correctly.
The process with rank 0, for example, is usually responsible for distributing the task.
You are now trying to distribute N data to size processes as follows.
if (!rank)
{
...
if (rank < remainder) {
pSize++;
}
It already starts with the fact that it is unhealthy to write if(!rank). Here, too, it probably goes wrong again. What value does pSize have and how much data is distributed if N=20 and size=8?
I had suggested calculating pSize per process ...
Then the process with rank 0 should also complete its task at the same time as all the others, but in your case it first waits for all of them and only then does process 0 start working, simply bad.
MPI_Waitall(3, request, MPI_STATUSES_IGNORE);
for (int i = (size - 1) * pSize; i < N; i++) {
double temp = A[i] + C[i];
localMin = min(localMin, temp);
}
I had suggested here
https://www.codeproject.com/Answers/5370748/I-dont-understand-how-to-fix-the-MPI-code-for-the#answer2
a (usual) MPI flow that avoids code redundancy, keeps everything parallel and would completely avoid the need for Wait. It would be good to use this route.
I therefore only refer to my suggestion again. Your program has so many quirks that it is obvious that the teacher cannot be satisfied with it.
// edit:
There are various problems with your design. Since more help seems to be needed here, I'll provide a slightly longer draft. I have intentionally left out some details so that it does not become a copy & paste solution. The design distributes a possible remainder to the first processes, lets process 0 be calculated and does the sending asynchronously and blocking the receiving. I also tested the following framework with mpic++ under Ubuntu.
#define N 13
int main(int argc, char* argv[])
{
int rank, size;
double *A, *C;
double localMin = DBL_MAX;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
srand((unsigned)time(0));
if (rank == 0) {
A = new double[N];
C = new double[N];
for (int i = 0; i < N; i++) {
A[i] = (rand() % 20) / 2.;
C[i] = (rand() % 20) / 2.;
cout << i << ". sum:" << A[i] + C[i] << endl;
}
}
int pSize = N / size; int remainder = N % size;
if (rank < remainder) {
pSize++;
}
cout << "Proc" << rank << ": pSize=" << pSize << endl;
MPI_Request requestA[size - 1], requestC[size - 1];
if (rank == 0) {
int offset = pSize;
for (int i = 1; i < size; i++) {
int send_count = (remainder == 0 || i < remainder) ? pSize : pSize - 1;
MPI_Isend(A + offset, ...);
MPI_Isend(C + offset, ...);
offset += send_count;
}
}
else {
A = new double[pSize];
C = new double[pSize];
MPI_Recv(A, pSize, ...);
MPI_Recv(C, pSize, ...);
}
for (int i = 0; i < pSize; i++) {
double temp = A[i] + C[i];
localMin = min(localMin, temp);
}
if (rank == 0) {
MPI_Waitall(size - 1, requestA, MPI_STATUSES_IGNORE);
MPI_Waitall(size - 1, requestC, MPI_STATUSES_IGNORE);
double globalMin = localMin;
for (int i = 1; i < size; i++) {
double receivedMin;
MPI_Recv(&receivedMin, ...);
globalMin = min(globalMin, receivedMin);
}
cout << "Minimum min(A+C) = " << globalMin << endl;
}
else {
MPI_Send(&localMin, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
}
delete[] A;
delete[] C;
MPI_Finalize();
}
With N=13 and 5 Processes i get the Output
Proc2: pSize=3
0. sum:4
1. sum:9
2. sum:15.5
3. sum:13
4. sum:2.5
5. sum:8
6. sum:5
7. sum:13
Proc1: pSize=3
8. sum:12.5
9. sum:10
10. sum:10.5
11. sum:2.5
12. sum:17.5
Proc0: pSize=3
Proc4: pSize=2
Proc3: pSize=2
Minimum min(A+C) = 2.5