In principle you can, but you will push the limits of the available RAM of some computers. You need not as much memory as you might thing if you compose your hash set wisely.
Instead of keeping some content, a file record in the values of the hash set, keep only the position in a file (or maybe, the position and the size of record). I don't know what would be you key type and how much memory will the key take.
By the way, you probably meant the type
System.Collections.Hashtable
. For any new development, you never should use this type, as well as any other non-specialized non-generic collection types. It was rendered obsolete as early as of the .NET version 2.0, when generics were introduces. It wasn't formally marked with the
[Obsolete
attribute only because there is nothing wrong in maintaining them in well-working legacy code. Non-generic types require type casts and hence potentially more dangerous than the generic classes you really need to use. You should pick one of these three:
http://msdn.microsoft.com/en-us/library/xfhwa508.aspx[
^],
http://msdn.microsoft.com/en-us/library/ms132259.aspx[
^],
http://msdn.microsoft.com/en-us/library/ms132319.aspx[
^].
The major difference between all those key-indexed container is different overhead between
computational complexity (time of operation, practically) and memory overhead. As you situation can be most critical to memory overhead, you will need to study this problem to make a right choice.
I don't know if you can use the class
System.Collections.Generic.HashSet<T>
for your purpose.
Now, the remaining question is: what if you still need to keep more data than you can hold in your RAM? Well, I would certainly solve such problem, but it would need more work. The idea is simple: you can learn how associative containers work and implement it using disk memory for major storage. Please see:
http://en.wikipedia.org/wiki/Hash_table[
^].
However, I would stop here. First and foremost, I'm not quite sure that your whole approach is reasonable. To me, all solution which involve huge memory consumption are suspicious. If I knew your exact goals, I would probably tried to review the whole architecture.
—SA