As previous posts suggested if the number of threads needed is less then 64 then you should implement it using the normal threading model but if the number of threads goes above 64 then this becomes very inefficient.
waitformultpleobjects can only wait on a max of 64 objects plus you will be wasting lots of cpu cycles on context switch's as such on windows platforms completion ports are the way to go in this scenario.
first - go back and read about io compilation ports (check MSDN)
second - unlike in normal scenario where the OS will insert work items into the io compilation port(they are called i/o packets)you need to manually insert them using the provided functions (again check MSDN)
EDIT - here are the functions you need to look at:
http://msdn.microsoft.com/en-us/library/aa364986%28v=vs.85%29.aspx[
^] - get an item from the completion port
http://msdn.microsoft.com/en-us/library/aa365458%28v=vs.85%29.aspx[
^] - insert an item into the io completion port