|
Well, in terms of the DHCP thing, they could be simply relying on several things like the fact that DHCP addresses are leased and expire in x number of days, and most high-speed internet access related computers are online all, if not most of the time, especially by those users who use products like Kazaa. As a result, when the lease expires, if it's even configured to expire (in rare but some cases), it's immediately leased right back to you if you're online at the time, or as soon as you turn your computer back on. Since everyone else is that subnet are most likely high-speed internet users using the same ISP you are, the same applies to them, hence when their leases expire, they typically get the same one they had before.
I had the same DHCP leased address for six months, until I went camping for a week. When I came back, someone else had gotten my lease and I received a new IP address.
But anyway, after thinking about it a little more, I don't think that's how they're doing. They COULD be doing it that way, but it would be extremely stupid.
What they're probably doing is reading/writing and FTP'ing a file with all of the necessary user and computer information in it.
Your application can connect to an FTP server, where you have the file stored, copy it down to the local workstation, read and write what it needs to, and sends it back up.
For example, if they have a "friend" user saved to their list, it would look up that user in the file and see what their current status is, and it would also update your own information, then upload it back up to the server. Meanwhile, your computer is also making connections to the users within or near your zip code (obtained from the file) AND to the "friends" that you've added to your users list to let them know that you're online now or whatever.
This way, every time someone connects, disconnects, changes their status, etc. the file will be written to, read, and manipulated for the eyes of all of your other running and connected applications to see.
There may be an even better way to do it, but it's a good start.
You can email me at dearnest@tampabay.rr.com and I'll give you more specifics on this approach.
Cheers,
Don
|
|
|
|
|
Come to think of it, there's probably two files. One with a list of information that you've saved to your local drive, and one that gets dynamically updated the way I just described.
Your application could look in the locally saved file first, then download the other file and compare the two. It could then copy new users (not already in your local file) that are close to you out of the dynamically updated file and into your local file, then copy it back up. Also, when you save a user in your list, it'll copy their info into your local file.
See where I'm going?
|
|
|
|
|
See my posts below about win2k and vc6. Hmm same error, same solution. Makes ya think huh? The finding the problem wasn't blocked some magical, mystical .net gremlin.
The original code worked because of timing. This is obvious from the fact that inserting Sleep(1000) fixed it. The code that fixed the problem protects against events that only cause an error if the events occur too often. With the multitasking nature of current operating systems, sometimes the system is doing something else which adds some time and the small window for the error to occur is masked. Each of the running programs have there own ‘frequency’ even though they are physically on the same clock. These frequencies interact in complex ways creating and masking the window. Add in the network traffic delays and you end up with a complex mess in the time domain.
Possibly .net changed some timing parameters that made the problem more obvious but it was there all along. This is a perfect example of observer error and the herd instinct. Someone screams ".net" and most people ignore the evidence and join the herd. It is easier wait for an update from Microsoft than to figure out the real problem.
I don’t really like this solution. It seems to me inelegant. The real problem, I suspect, lies deeper in the core of Microsoft’s implementation.
I give you now my 3 rules for effective debugging:
3- Never believe the user.
2- Never believe the programmer.
1- Never believe your own eyes.
|
|
|
|
|
Well, I can't speak for everyone here but I do appreciate your opinion and agree with what you're saying. Unfortunately, opinions don't pay the bills. Some of us don't have the luxury of having a lot of free time on our hands, while facing that pesky little thing called a "deadline".
My guess is that if everyone here was familiar enough with winsocks, callbacks, blocking, etc., we'd be writing our own winsock classes and wouldn't need the assistance of something like the NDK.
In conclusion, my "herd" may be following me down a less scenic path, but as long as we reach our destination successfully and take notes along the way, we'll know better next time. That's why Bill invented upgrades. My only options were to delay the project or get it on the shelf, a no brainer when you're self-employed.
|
|
|
|
|
I've been developing an application using the NDK as the communication backbone for the past six months or so. I've been scripting for several years and I've done some database development back in the day, but this is my first experience with C++, so please bare with me.
When I first started this project, all I had available to me was a copy of C++ 5 and it sounded like a job for winsocks. So, I bought some books, started investigating, found this website, and realized that much of the work had already been done and they were giving it away! All I had to do was take little bits and pieces from here and there, put them together, and voila! Piece of cake! Right.
After several long months of solitude, sleep deprevation, no social life, nearly going blind, the loss of a few friends, nearly losing my girlfriend, and nearly losing my mind, it was finally working... technically. The clients were sending and receiving, staying connected for days, taking up very little resources, the timing was excellent, and it was even tested over a sat link between Florida and Doha, Qatar, and it performed wonderfully. I only had some cosmetic issues (screen resolution stuff), and some mfc dll problems, which I later realized was due to the fact that I was compiling with "shared" dll's instead of "static".
So, I thought that upgrading and comiling with C++ 6 would solve some or maybe all of my issues, but the copy that I "acquired" would only turn out to be an Introductory Edition and could not be used to freely distribute my application.
That's when my client offered to fork over a licensed copy of .NET 2002. I'm happy to say that all of my cosmetic issues are now resolved, the installation procedure is sound, and I just sent out the first beta to a major client Monday morning.
Then, they called me Monday afternoon and said that the clients were losing their connections. I argued with him. Surely, he'd done something wrong. There couldn't have been anything wrong with the code. It was working great before and .NET is only supposed to make the world a better place!
Unfortunately, later personal tests concluded that my clients were now losing their connections in no particulure interval, order, or other consistent fashion.
So here I am. I see that others are having the same issue and I don't know whether I should be happy or miserable about that, but I know that I just took a huge step backwards and I'm passed my deadline, I haven't slept in three days and I've scratched my head to the point of drawing blood, and I'm in desperate need of outside assistance!
Here's the only difference that I can think of, maybe a clue. When I attempted to compile the first time with .NET, it complained about a line in NDKUserManager.cpp:
CNDKUser& user = m_users.GetNext(pos);
The error was:
error C2440: 'initializing': cannot convert from 'const CNDKUser' to 'CNDKUser &'
I changed it to the following and the compile error went away:
CNDKUser user = m_users.GetNext(pos);
Thanks for any help in advance,
Don
|
|
|
|
|
Hi,
You're right about changing the code "CNDKUser& user = m_users.GetNext(pos);"
to "CNDKUser user = m_users.GetNext(pos);" in order to remove the compiler error BUT the problem is not there...
I can guarantee that the problem is not related to the NDK itself because it's only an architecture over the CSocket and CArchive. So the problem comes from CSocket and/or CArchive.
Someone else found an article from Microsoft with a possible problem
http://support.microsoft.com/default.aspx?scid=kb;en-us;185728
"In Windows Sockets, you should not make multiple recv calls within an FD_READ notification unless you are willing to disable FD_READ notifications prior to calling recv. However, CSocket and CAsyncSocket make no provision for doing so. Therefore, you should make only one Receive call per OnReceive function. Under high data transmission rate, if you make more than one Receive call in the OnReceive function, the application might lose FD_READ, have fake FD_READ, or have no FD_READ (hanging)." "You can use CSocket with CArchive and CSocketFile to directly receive and send MFC CObject-derived objects. However, under high data transmission rates, you should not use CSocket with CArchive and CSocketFile within the OnReceive function because they might internally generate multiple Receive calls."
I suggest to you to visit MSDN Newsgroup.
If you find something PLEASE tell it to us.
Thank you
Sébastien Lachance
|
|
|
|
|
Hi Sebastien,
You'd already replied by the time I wrote Part 2 of my delima! Thanks for getting back to me so quickly, and thanks for the NDK.
So, is this a problem with the included CSocket and CArchive files that came with .NET then? Is it safe to say that this problem was introduced in .NET? Could I simply replace these files with the 5.0 files? If not, could I go back and recompile the code in 5.0? Would that make these problems go away?
If this is in fact a Microsoft issue, I don't really have time to wait for them to resolve it, so I'd rather get it working for now and upgrade it later. Again, I'm fairly new to C++. I noticed that others are having this issue outside of the NDK but there doesn't seem to be any resolution.
I'll keep digging, but do you think stepping back to 5.0 will solve the problem?
Thanks again,
Don
|
|
|
|
|
I cannot say that the problem comes from entirely with .NET but it seems that I receive many mails since that.
I don't have 5.0 anymore.
If you have it, you could test it easily. You only have to create a workspace then insert .cpp/.h files.
|
|
|
|
|
I recompiled with VC++ 5.0 and everything is working perfectly again. I even took out all of the client heartbeats and they've been running and staying connected for going on several hours. I'll check back in a few days and send an update, but it looks like this solved my problem and may solve others.
I still have the PingAllUsers on a timer in the server, just in case a client crashes, and that's working great as well.
Maybe the 5.0 winsock files could be copied into the include directories of .NET and after recompiling, similar problems would go away, but as long as it's working this way, I'm happy with the known good. Lost connections after days I can deal with, but not minutes!
I guess the moral of this story is to be cautious of the "latest and greatest"!
Thanks again for you help!
|
|
|
|
|
Good! I'w waiting for your confirmation.
Sébastien
|
|
|
|
|
Still cooking with gas!
Since I recompiled everything with 5.0, I haven't had one single problem. The clients have been connected non-stop for the last few days without failure and no messages have been lost lost.
In fact, the application will be beta tested on three different WAN's, including satellite links between Florida and the middle east and ranging from 500-1500 clients starting next week. I'm just polishing up the documentation.
Has anyone performed any stress tests? I'm curious to know if there are any limitations on the number of clients that a server can handle and what might happen when that threshold is reached. I'm sure it wouldn't have anything to do with the NDK itself and that the limitations would most likely be those of the hardware or OS, but if anyone has tested a similar client/server application with this code on a large network, I'd love to hear the results.
This way, at least I'll know there's a problem BEFORE the beta testers break it and start complaining!
Cheers,
Don
Great class by the way. Thank you very much! You trimmed off of my project what would've probably turned into months of frustration!
|
|
|
|
|
Check just below... http://www.codeproject.com/internet/ndk.asp?select=506839&df=100&forumid=1156&fr=26#xx522006xx
Just disable the read notifications, and re-enable it after.
Djof
"Je me souviens"
|
|
|
|
|
Well done!!
Can you confirm it really works?
Thanx
Maurizio
|
|
|
|
|
I can't confirm it works with NDK, because I'm not using it, but I'm pretty sure the problem I had is the same NDK suffer's from.
So, yes, it fixed my problem. The connection is flawless now. Everyone that had previously tested my client reported the problem at some point, and now, everyone said it worked perfectly.
Djof
"Je me souviens"
|
|
|
|
|
My only doubt is because I cannot see the link between connection freezing and multiple OnReceive calls.
|
|
|
|
|
Just see what I posted below.
Quote from Microsoft Knowledge Base Article - 185728:
----------
In Windows Sockets, you should not make multiple recv calls within an FD_READ notification unless you are willing to disable FD_READ notifications prior to calling recv. However, CSocket and CAsyncSocket make no provision for doing so. Therefore, you should make only one Receive call per OnReceive function. Under high data transmission rate, if you make more than one Receive call in the OnReceive function, the application might lose FD_READ, have fake FD_READ, or have no FD_READ (hanging).
You can use CSocket with CArchive and CSocketFile to directly receive and send MFC CObject-derived objects. However, under high data transmission rates, you should not use CSocket with CArchive and CSocketFile within the OnReceive function because they might internally generate multiple Receive calls.
----------
Djof
"Je me souviens"
|
|
|
|
|
Although I have gone and written my own similar classes using ASyncSocket, this solution (the multiple receive's problem) is what was wrong with my classes as well. I was missing messages just as with the NDK. All I did was only do ONE Receive() in my OnReceive() function (which Windows calls automatically when there is data to receive). Now, with NDK the solution would be different as it's using CSocket (blocking socket). So you would disable FD_READ notifications (I haven't tried it, but it should work).
Thanks again Sébastien, your NDK really helped me design my own classes!
-jeff
|
|
|
|
|
Also, a randomly generated User ID wasn't practical for my application so I used the following to change it to the hostname, which may or may not be conflicting:
LPTSTR lpszSystemInfo;
DWORD cchBuff = 256;
lpszSystemInfo = _T("");
::GetComputerName(lpszSystemInfo,&cchBuff);
m_strNickname = lpszSystemInfo;
I set a timer event in the server to ping all users every 60 seconds. Some of the clients will lose their connection within the first two or three minutes. Even though the UserId's are still listed in the list box, they're no longer sending replies to the server's ping or receiving data. It doesn't appear that a ping from a client to the server produces any results.
I also set a timer event in the client to test for IsConnected() and even though they're not recieving data and their UserId is still displayed in the server's list box, IsConnected() is still returning TRUE apparantly, which makes it difficult to attempt a method of reconnecting.
I set up a "HeartBeat" message that sets a variable, sends the message, and then tests the variable again after a sleep(). The server responds by resetting and sending the variable back to the client. If the server responds, nothing happens. Otherwise, the client will attempt to reconnect. This doesn't seem to be working for some reason, and seems a bit redundant anyway since this is TCP/IP and there should be a natural keepalive method already in place?
|
|
|
|
|
Client Stop receiving packets.
I send huge file ~ 500 MB by 100kb packets and client freqently stop receiving.
Server block in winsock send and gives error WSAEWOULDBLOCK, so buffer is full and client don't accept any data.
Maybe analyzing with sniffer transfer packet will show real problem, maybe packets don't reach client.
|
|
|
|
|
I have the same problem where the client stops recieving messages but I am using win2k-sp3 and vc6.
Some things i noticed:
I think it started happening after I started sending larger packets. (>300 but < 600 bytes)
It fails less often on win xp pro.
It doesn't happen all the time. 75% of the time it fails. In other words, repeating the exact same step with the exact same data doesn't always fail. (bleh)
Closing and reopening connection and then rejoining the server seems to fix the problem. (bleh,bleh)
Some Questions:
Could it have anything to do with 'reusing' the message instead of instaniating a new CNDKMessage to reply? I noticed I have done both.
I have been leaving open the connection so either the server or the client can start a transaction. Is this not recomended?
I have a timed (1 sec interval) server status message going out to the clients. Can u see any problem with this?
Thanks in advance.
|
|
|
|
|
More info:
Just before the client made the offending request. The server was sending two small packets. If I removed the first packet the problem dissapeared.
The sequence was this.
A-Client: Sends ~400 byte packet. Server places data in database (MySQL) and generates a transaction ID.
B-Server: Sends CString that was poped up for the user to see that the previous transaction was completed. Client used AfxMessageBox to display this CString. BTW: It doesn't matter if I call AfxMessageBox or not.
C-Server: Sends Ack code for 400 byte packet. This Ack packet contains the id for the transaction.
D-Client: Sends request for the server to lookup the transaction using the ID returned in the Ack packet. (Yes the client is requesting the info it just sent. I did this as a check and to reuse existing code)
F-Server: Sends requested info. <- This is the packet that doesn't arrive, although debuging shows it was sent.
If I remove B, F arrives. I consider this very ugly, especially since it only failed 75% of the time. Makes you wonder what else is out there ready to bite you in the butt.
Offending Server code:
CNDKMessage message(ChatText);
CString strText;
strText.Format("Transaction %011d added",id);
message.SetAt(0, strNickname);
message.SetAt(1, strText);
SendMessageToUser(lUserId, message); <- REMOVING THIS LINE FIXED THE PROBLEM
message.SetAt(0, ChatNewTransaction);
message.SetAt(1, id);
message.SetId(ChatAck);
SendMessageToUser(lUserId, message);
Question:
Why did this fix the problem? Seems like it should have been a legal opperation.
FYI: Other than this one problem the NDK has been outstanding. It is easy to use and understand. Thank you for sharing!
|
|
|
|
|
Hi,
Could you try the following code:
CNDKMessage message(ChatText);
CString strText;
strText.Format("Transaction %011d added",id);
message.Add(strNickname);
message.Add(strText);
SendMessageToUser(lUserId, message); <- REMOVING THIS LINE FIXED THE PROBLEM
CNDKMessage message2(ChatAck);
message.Add(ChatNewTransaction);
message.Add(id);
SendMessageToUser(lUserId, message2);
|
|
|
|
|
It did not work. Same problem.
Typos:
message2.Add(ChatNewTransaction);
message2.Add(id);
Thanks for the quick reply. This is a weird one.
|
|
|
|
|
I hope that this will give you a futher clue as to what the problem is. I added a Sleep(1000) between the 2 packets and it fixed the problem.
CNDKMessage message(ChatText);
CString strText;
strText.Format("Transaction %011d added",id);
message.SetAt(0, strNickname);
message.SetAt(1, strText);
SendMessageToUser(lUserId, message);
Sleep(1000); <- ADDING THIS LINE FIXED THE PROBLEM
message.SetAt(0, ChatNewTransaction);
message.SetAt(1, id);
message.SetId(ChatAck);
SendMessageToUser(lUserId, message);
|
|
|
|
|
I made the changes sugested by djof and happily they have fixed the problem in win 2000 using vc6. I was able to get rid of all the unhappy 'Sleep(1000)' statemants in my code except for one. I made the following changes to the client side:
void CNDKClientSocket::OnReceive(int nErrorCode)
{
VERIFY(AsyncSelect(/*FD_READ | */FD_WRITE | FD_OOB | FD_ACCEPT | FD_CONNECT | FD_CLOSE)); // CHANGE 1
CSocket::OnReceive(nErrorCode);
ASSERT(m_pClient != NULL);
if (m_pClient != NULL)
m_pClient->ProcessPendingRead(nErrorCode);
VERIFY(AsyncSelect()); // CHANGE 2
}
Similarly to the server side:
void CNDKServerSocket::OnReceive(int nErrorCode)
{
VERIFY(AsyncSelect(/*FD_READ | */FD_WRITE | FD_OOB | FD_ACCEPT | FD_CONNECT | FD_CLOSE)); // CHANGE 3
CSocket::OnReceive(nErrorCode);
ASSERT(m_pServer != NULL);
if (m_pServer != NULL)
m_pServer->ProcessPendingRead(this, nErrorCode);
VERIFY(AsyncSelect()); // CHANGE 4
}
As I said this HAS fixed the problem with the client stalling. Everything works smoothly until the client closes. This is the client side code that handles the WM_CLOSE message:
void CMyClient::OnClose()
{
KillTimer(ID_HEARTBEAT);
Sleep(1000); // Bleh
g_LogMgr.CloseLogFile();
CloseConnection();
}
If I remove the Sleep(1000) here, the SERVER asserts in line labled 'CHANGE 4' above.
Since the program is closing this is not a big problem. I guess it has something to do with the order stuff is being closed. The important thing is the nasty client stalls are fixed. Hooray!
|
|
|
|
|