|
It appears that I may have stumbled upon the best possible collection for my purpose, as there is no unique field in the data, and avoiding duplicates is paramount. How often is that likely to happen?
My input dataset contains about ten lines of garbage at the beginning, then 5 to 9 lines of csv data to be processed. If I had a full year's worth of data to process at one go, that would still only be about 35,000 records. If I tried, instead, to capture and eliminate duplicates as I INSERTed the records to SQL Server, rather than eliminating them when I enqueue them in the buffer before insertion, do you think that it would make a noticable difference? I know that's a technique I should learn, for future reference, but is it really useful for this app? It seems to me that there would be a lot of overhead, making connections and retrieving error messages, in order to cull out duplicates. That just seems wasteful of a scarce resource. Besides, SQL transactions would have to be carried by a network, which is always subject to collisions and dropouts. That could badly affect reliability, though hopefully such events would be very rare.
Thanks, as always, for your valuable guidance!
Will Rogers never met me.
|
|
|
|
|
Roger Wright wrote: do you think that it would make a noticable difference?
Hmm. As your number of new records isn't particularly high, sending all of them to the DB wouldn't be an obstacle, even when there would be several duplicates. And you have to take precautions against duplicates in the DB anyway. So I'd go for one of the auxiliary-table techniques PIEBALD hinted to.
|
|
|
|
|
Luc's answer is perfect. You might want to have a look at this[^]. Algorithm complexity analysis becomes crucial sometimes.
|
|
|
|
|
Interesting stuff, but far beyond my present ability. Thirty years ago I would have lapped it up, since it was my job to write super efficient code. Maybe one day I will again - I bookmarked it.
Will Rogers never met me.
|
|
|
|
|
I don't do it that way. I prefer not to load a collection with data that I don't need for very long.
I recommend loading (with bcp) the data into tables that are designed specifically to hold the "raw" data -- these would have all text fields and allow duplicates. Then you can use SQL to move any non-duplicate rows to where they need to be -- this can be done with a trigger, but you need to enable triggers in bcp. Then clean up any left-over data.
Another option is, after the data is in the "raw" table, use a DataReader to read the data, copying and deleting the rows one-by-one, ignoring any duplicate exceptions you may receive. Unfortunately, .net doesn't make distinguishing different types of database error easy.
|
|
|
|
|
That sounds like way more complexity than this trivial exercise deserves. Besides, I have no idea what bcp is.
Will Rogers never met me.
|
|
|
|
|
bcp is the command-line Bulk CoPy utility that comes with Sql Server. You write a format file to tell bcp what to do with the file contents, and maybe some SQL, but otherwise you don't any any code. It can be confusing and takes a little getting used to, but I can probably send you some examples.
|
|
|
|
|
One way of improving performance when dealing with collections is by setting the capacity of the list when initialising the list. If you know the size of the list when instantiating the class, use the overloaded constructor (int capacity).
The reason for doing this is that under the hood, lists are just arrays. I can't remember what the default capacity is but lets say the underlying array has a capacity of 100,000, when you add the 100,001 item, the list will re-dimension to 200,000. Basically it doubles in size each time it re-dimensions. Re-dimensioning the array exponentially is expensive, if you know the capacity in advance, you are going to maximise performance.
|
|
|
|
|
That's a great idea! I can count the lines in each of the selected files, delete about 10 from each for the useless header info, then use the total when I instantiate the queue. Cool!
Will Rogers never met me.
|
|
|
|
|
hello, i am doing contrast for image and when i run my program i come across this particular error "Error 1 The name 'Clamp' does not exist in the current context "
the code as follow:
public static Bitmap AdjustContrast(Bitmap Image, float Value)<br />
{<br />
Value = (100.0f + Value) / 100.0f;<br />
Value *= Value;<br />
System.Drawing.Bitmap TempBitmap = Image;<br />
System.Drawing.Bitmap NewBitmap = new System.Drawing.Bitmap(TempBitmap.Width, TempBitmap.Height);<br />
System.Drawing.Graphics NewGraphics = System.Drawing.Graphics.FromImage(NewBitmap);<br />
NewGraphics.DrawImage(TempBitmap, new System.Drawing.Rectangle(0, 0, TempBitmap.Width, TempBitmap.Height), new System.Drawing.Rectangle(0, 0, TempBitmap.Width, TempBitmap.Height), System.Drawing.GraphicsUnit.Pixel);<br />
NewGraphics.Dispose();<br />
<br />
for (int x = 0; x < NewBitmap.Width; ++x)<br />
{<br />
for (int y = 0; y < NewBitmap.Height; ++y)<br />
{<br />
Color Pixel = NewBitmap.GetPixel(x, y);<br />
float Red = Pixel.R / 255.0f;<br />
float Green = Pixel.G / 255.0f;<br />
float Blue = Pixel.B / 255.0f;<br />
Red = (((Red - 0.5f) * Value) + 0.5f) * 255.0f;<br />
Green = (((Green - 0.5f) * Value) + 0.5f) * 255.0f;<br />
Blue = (((Blue - 0.5f) * Value) + 0.5f) * 255.0f;<br />
<br />
NewBitmap.SetPixel(x, y, Color.FromArgb(Clamp((int)Red, 255, 0), Clamp((int)Green, 255, 0), Clamp((int)Blue, 255, 0)));<br />
<br />
<br />
<br />
}<br />
}<br />
<br />
return NewBitmap;<br />
<br />
}
May i know what is wrong?
|
|
|
|
|
Clamp is defined in System.Windows.Media; is that reference included in your class?
Will Rogers never met me.
|
|
|
|
|
thank you for the efficient respond. how to i add reference? i added "using System.Windows.Media;" in my form.cs. However, i get this error "The type or namespace name 'Media' does not exist in the namespace 'System.Windows' (are you missing an assembly reference?)"
|
|
|
|
|
I couldn't find it, either!
You can easily write your own, something like:
int Clamp(int value, int min, int max)
{
return (value < min) ? min : ((value > max) ? max : value);
}
CQ de W5ALT
Walt Fair, Jr., P. E.
Comport Computing
Specializing in Technical Engineering Software
|
|
|
|
|
|
is there other way to write
NewBitmap.SetPixel(x, y, Color.FromArgb(Clamp((int)Red, 255, 0), Clamp((int)Green, 255, 0), Clamp((int)Blue, 255, 0)));
without the use of clamp?
|
|
|
|
|
|
|
this will do the trick:
private static int Clamp(int Value, int Max, int Min)<br />
{<br />
Value = Value > Max ? Max : Value;<br />
Value = Value < Min ? Min : Value;<br />
return Value;<br />
}
|
|
|
|
|
|
Take a look inside the source code for the article you referenced. I haven't tried it personally, but it appears that the author provided the code to create and compile the control.
CQ de W5ALT
Walt Fair, Jr., P. E.
Comport Computing
Specializing in Technical Engineering Software
|
|
|
|
|
Well I am working on my first Clients / Server application. First I started off working with syncronous sockets, and then switched to asyncronous sockets. The way it looks like it is going to work for now (until I get a better understanding) is the Client connects to server, Client sends data to server, Server gets data, Server looks for data pertaining to that specific client, Server returns data to client, connections close and end.
I see a problem though? What if the client connects to the server, sends data to server, server gets it, then server loses connection to the internet?
This would mean that my client is stuck on the BeginReceive part. I can't find a timeout. Now I haven't tried to test this yet.. but I was wondering how to handle this situation? Would syncronous sockets be better than asyncronous?
|
|
|
|
|
Hi Jacob,
this is how I see it:
1.
there are no synchronous or asynchronous sockets, all sockets are the same; however you can operate them in sync or async way; you can choose your way separately for clients and servers.
2.
you can always mimic an async operation by launching a separate thread that works synchronously. The disadvantage is cost (one more thread, with its state and stack), the advantage is comfort, as you have less of a problem remembering your state.
3.
The .NET Socket class supports ReceiveTimeout in sync mode only; when using the async methods, if you want some kind of timeout, you have to implement it yourself. And even then, it will not pre-empt an outstanding async Receive, all it will do is tell your app sooner the data isn't coming (in time).
4.
Assuming your client is using only one or a few sockets at any point in time, I don't see much objections to using the thread and sync mode there. On the server side, the potential number of clients may force you to work in async mode.
Hope this helps.
|
|
|
|
|
Ok thanks!
My server could possibly be accepting 50-100 connections. If you have agents out there checking in every 2-5 minutes. So I should implement the asyncronous method on the server end, and use syncronous on the client? That way I can specify a timeout and won't get in a situation like I mentioned above. Do you see any real objections to doing something like that?
|
|
|
|
|
Yes, that is what I would do in a first iteration, as it keeps the clients simple, and optimizes the server.
|
|
|
|
|
Awesome! Thanks for the replies. I will use async on server side and sync on client side.
|
|
|
|
|