Re: How can I optimize this code - C# Discussion Boards

Luc Pattyn15-May-09 3:16

15-May-09 3:16

Hi,

IMO performance can be improved dramatically, however the right tactics depend on some extra info:
- Could you show say 10 lines of each of those files?
= is the order of the output lines relevant?

Smile | :)

Luc Pattyn [Forum Guidelines] [My Articles]

The quality and detail of your question reflects on the effectiveness of the help you are likely to get.
Show formatted code inside PRE tags, and give clear symptoms when describing a problem.

Nagy Vilmos15-May-09 3:57

Nagy Vilmos

15-May-09 3:57

[aside] thinking sorted data are we? [\aside]

Panic, Chaos, Destruction.
My work here is done.

Luc Pattyn15-May-09 4:02

15-May-09 4:02

yep.
for performance, always sort; the sort order is not that important, as long as you sort.
if need be, sort an existing collection; better yet sort while collecting.

Smile | :)

Luc Pattyn [Forum Guidelines] [My Articles]

The quality and detail of your question reflects on the effectiveness of the help you are likely to get.
Show formatted code inside PRE tags, and give clear symptoms when describing a problem.

Nagy Vilmos15-May-09 4:09

Nagy Vilmos

15-May-09 4:09

While back had to fix a wee oops due to sorting:

Original logic was as items are added repaint the list; even if items are not visible.
Then sparky came along and added a sort. A resort of EVERY row as EACH row was added.
Then users came along and added a shed load of rows.

Can you guess why it took 10 minutes to get 500 items into the list...
Laugh | :laugh:

Panic, Chaos, Destruction.
My work here is done.

Luc Pattyn15-May-09 4:23

15-May-09 4:23

you violated performance rule #1: don't let the users touch anything.

Smile | :)

Luc Pattyn [Forum Guidelines] [My Articles]

The quality and detail of your question reflects on the effectiveness of the help you are likely to get.
Show formatted code inside PRE tags, and give clear symptoms when describing a problem.

Nagy Vilmos15-May-09 4:27

Nagy Vilmos

15-May-09 4:27

I've just signed another item of my list without doing a thing! Bug raised [low priority] 18 months ago has bubblde up for the next phase. Investigate and a fix elsewhere has already resolved it.

Problem is, I've get 2 days on the plan to fill...

[ Beer | [beer]

]

Panic, Chaos, Destruction.
My work here is done.

MumbleB15-May-09 23:28

15-May-09 23:28

HI Luc. I could post a few lines for you but they are pretty much what I posted in my orriginal post. What I think I want to achieve is read the first ID in FILE1 take that and search for it in FILE2, if found then replace the ID in FILE2 with the ID alongside the one in FILE1. So, File1 has the following data in it:

<br />
C0000000001   C0001000010<br />
C0000000002   C0001000011<br />
C0000000003   C0001000012<br />

FILE2 contains the following data:

<br />
C0000000001N1            SAMPLE NAME 1<br />
C0000000002N5            SAMPLE NAME 2<br />
C0000000123D2            SAMPLE NAME 3<br />
C0000000003N4            SAMPLE NAME 4<br />

Now, FILE1 has approximately 2000 lines of data and FILE2 has approximately 80000 lines of data in it. So in short, take C0000000001 from FILE1 and search through 80000 lines of data and for each instance found replace it with the Value C0001000010 + the remainder of the data in FILE2 on the line and write that to a new file.

Hope this makes sense.

Excellence is doing ordinary things extraordinarily well.

Luc Pattyn16-May-09 1:08

16-May-09 1:08

Hi,

0.
several things aren't clear yet:
- You did not answer my last question: is the order of the output relevant to you?
- is the first input file ordered at all?
- why is there a string newhid = line.Substring(14, 11); when the replacement string seems to be located at column 12?

Anyway, here are some toughts:

1.
the job at hand consists of one or maybe two parts: sorting lines and replacing the first part of lines. both can be tackled independently.

2.
the way you have it the second file gets read (maybe half of it on average) for each line present in the first line, that makes this a quadratic algorithm, so execution time depends on filesize1 * filesize2 which is bad.

3.
you must ry to overcome this, the one way I see is by:
- keeping one file in memory
- stepping through the other file only once

4.
my prefered way, not sure it fits your needs (see 0), would be to:
- read the first file into a Dictionary<string, string=""> which would map oldhid -> newhid
- read the second file line by line, process every line in turn using the dictionary, and write the result.
That would be a linear operation, dramatically faster than what you have; however it would basically keep the data in the order it is in the second file.

5.
point 4 does not work in the same order as your current code. If order is relevant, rather than trying to keep it by quadratic code, sort the file; either the second file before you process it, or the result file, depending on your exact needs.

6.
To sort a file, either use an existing utility, or write some code; easiest again is by using File.ReadAllLines() and Array.Sort(); for 80,000 lines of say 120 chars this amounts to one million chars, so no memory problem. And a decent sort algorithm will behave much better (run much faster) than the linear search your code is representing right now.

7.
For special sort orders, there is an Array.Sort() overload that takes an IComparer, so all is possible.

Smile | :)

Luc Pattyn [Forum Guidelines] [My Articles]

The quality and detail of your question reflects on the effectiveness of the help you are likely to get.
Show formatted code inside PRE tags, and give clear symptoms when describing a problem.

MumbleB16-May-09 5:55

16-May-09 5:55

0. The order of the file is not important at all.

Luc Pattyn wrote:
why is there a string newhid = line.Substring(14, 11); when the replacement string seems to be located at column 12?

This is in FILE1 and is located at pos:14 size:11.
I think that reading the first file into a dictionary could maybe work. I'll give that a go and see how it works. I have tried reading file1 into memory and then reading file 2 line by line and that took ages as well. So resorted to reading both files into memory after suggestions received and this seems to be taking forever as well. What I want to achieve is the same effect as Micrsoft Access Simple Query. I don't need to have the output fle in any kind of sort order so sorting would probably not make much of a difference unless I sort both files?? Could help I recon.

Thanks guys for the advice but I think there should be a much simpler way to get this done but getting it right is a bit of a ^&*^*#^*...........LOL........When I get it all hanging and running fast enough I will post the code or if you guys come up with some more ideas please post!

Thanks again.

Excellence is doing ordinary things extraordinarily well.

Luc Pattyn16-May-09 6:54

16-May-09 6:54

Kwagga wrote:
This is in FILE1 and is located at pos:14 size:11

Not according to the samples you provided earlier. I can see only one space between oldhid and newhid.

Kwagga wrote:
I don't need to have the output fle in any kind of sort order

OK, so my current estimate is the job is worth no more than 3 seconds on a modern PC provided it gets coded right, without showing it all on some GUI. Strategy:
1. read file 1, probably line by line, into a dictionary
2. read and process file 2 line by line

Smile | :)

Luc Pattyn [Forum Guidelines] [My Articles]

The quality and detail of your question reflects on the effectiveness of the help you are likely to get.
Show formatted code inside PRE tags, and give clear symptoms when describing a problem.

MumbleB17-May-09 22:32

17-May-09 22:32

OK, there are 3 spaces there.
I wish it would take all of 3 seconds but what seems to get the better of me is that file1 is all of 289kb and file2 is all of 247kb, this is one of the smaller files that I want to process. When this runs it generates a file that is 800MB and I eventually have to abort it as it seems to get into an infinate loop. file2 could contain multiple instances of the searchID in file1. I have searched high and low and far and wide for any articles on this topic but can't find anything relating towhat I want to do. There are hundreds of articles on Finding and replacing single strings in files but nothing that reads the strings from an input file. NOthing that even suggests the same logic that I want to impliment.

As mentioned before, I want to impliment something similar to what is done in a Microsoft Access Database. I may just have to write this to a DB into two tables and then do the matching in there with a SQL statement and export the new data!! Cry | :((

Thought this could be possible other than going the Database route.

Excellence is doing ordinary things extraordinarily well.

Luc Pattyn18-May-09 0:37

18-May-09 0:37

It does not make any sense at all:

1.
you showed the following content of file1:

C0000000001 C0001000010
C0000000002 C0001000011
C0000000003 C0001000012

which is some 25 characters per line, times 2000 lines, equals 50KB (or 100KB when all in wide characters), not 289KB. With 289KB the average line length would be 140 bytes. Check its content!

2.
file2 holding 247KB for 80,000 lines results in some 30 characters per line, that sounds fine.

3.
the result file growing beyond 800MB is utter nonsense. For it to grow that large you would have to:
- either have 80,000 lines of 10,000 characters each (where are these coming from?)
- or have many more lines. How would you explain that?
Did you bother checking what is inside that file? is anything in there correct?

Now start looking at what you have, and stop making up stories. Unsure | :~

Luc Pattyn [Forum Guidelines] [My Articles]

The quality and detail of your question reflects on the effectiveness of the help you are likely to get.
Show formatted code inside PRE tags, and give clear symptoms when describing a problem.

MumbleB18-May-09 7:11

18-May-09 7:11

The way I had the code written was that for each instance of ID in file1 found in file2 it wrote the whole of file2 as I had file2 completely read into memory and for some stupidness of myself I had it coded that way. Now, I have ammended this code to the below which seems to work just fine but the performance is still "pish" as it itterates 2000* through 80000 records to find a match. There are instances where an ID in FILE1 could exist more then once in FILE2. So, on a file of 80000 records it does a 160 000 000 reads on file2 which could mean a major hit on performance.

Luc Pattyn wrote:
Now start looking at what you have, and stop making up stories

I swear I'm not making up stories. Laugh | :laugh:

THis is what I have refined the code to now but I am sure it CAN be sped up much more.
My actual files have these amounts of records on it.
FILE1 = 10948 Lines of 25 characters per line. (289KB)
FILE2 = 360 Lines of 650 characters per line. (247KB)
FILE3 = 79341 Lines of 650 characters per line. (54,392KB)
FILE4 = 76283 Lines of 700 characters per line. (50,508KB)

To process this it takes all of 34.61 seconds to complete which is fine and the resultant file is 100% spot on.

Now, processing FILE1 with 10948 lines of data against another file, say FILE3 with 79341 lines with 650 characters per line it takes forever.

The code implimented is as follows:

string fileNot;
string fileHid;
string outPath = @"c:\Documents and Settings\Mumbleb\Desktop\Ess\";
fileNot = textBox1.Text;
fileHid = textBox2.Text;
StreamWriter sw = new StreamWriter(outPath + "ER_AU_TBSZ_SEQMHN.TXT");
string[] LineHID = File.ReadAllLines(fileHid);
string[] LineNot = File.ReadAllLines(fileNot);
foreach (string hid in LineHID)
{
    string OldHid = hid.Substring(0, 11);
    string NewHid = hid.Substring(14, 11);
    foreach (string content in LineNot)
    {

        string matchid = content.Substring(0, 11);
        string noting = content.Substring(11, 605).Trim();
        if (OldHid == matchid)
        {
            string contentnew = Regex.Replace(content, matchid, NewHid);
            sw.WriteLine(NewHid + noting);
        }
    }
}
sw.Close();

I am about to try and process the bigger files on my Dualcore Laptop and see what the performance is like. I will post if I have any timings on the performance.

Excellence is doing ordinary things extraordinarily well.

Luc Pattyn18-May-09 7:30

18-May-09 7:30

Your new data does not fit your earlier statements.[^]

Kwagga wrote:
I swear I'm not making up stories

You're just stupid then?

Kwagga wrote:
it takes forever.

No, with decent code it does not. I told you all there is to it. The subject is closed. Mad | :mad:

Luc Pattyn [Forum Guidelines] [My Articles]

The quality and detail of your question reflects on the effectiveness of the help you are likely to get.
Show formatted code inside PRE tags, and give clear symptoms when describing a problem.

MumbleB18-May-09 20:17

18-May-09 20:17

Luc Pattyn wrote:
You're just stupid then?

Yeah, I recon I probably am just stupid but could be slightly smarter than you!!!!........... Laugh | :laugh:

Luc Pattyn wrote:
No, with decent code it does not.

If you knew the decent code I am sure you would have suggested it, but you just as clueless as I am. I will resolve this and share my findings.

Harsh statements like the above is appreciated but if a matter is not resolved then I recon it is uncalled for.

Thanks for the help mate.

Excellence is doing ordinary things extraordinarily well.

Web setup project fails in Windows vista and IIS7

Henry Minute15-May-09 4:18

Henry Minute

15-May-09 4:18

One way this might be speeded up without sorting etc., although that would undoubtedly help a great deal, is to do things backwards. At least relative to the way you are currently doing things.

Currently you load the first file into the array, then you iterate the array and test, possibly every line of the second file, for a match to each member.

Why not load first file, as now. Then read file2 line by line and search the array for a match. Personally I'd change the array for a List<string>, or even a SortedList<string, string="">, to make searching easier.

Henry Minute

Do not read medical books! You could die of a misprint. - Mark Twain
Girl: (staring) "Why do you need an icy cucumber?"
“I want to report a fraud. The government is lying to us all.”

Muhammad Sohaib Yousaf15-May-09 1:44

Muhammad Sohaib Yousaf

15-May-09 1:44

I am trying to install a vs2005 web setup project in window vista with iis7 but give me error of "setup has been intrupted"
is there any problem bcoz i got stuck in this problem any help would be appreciated\
thanks sohaib.

Accepting user input

Rajdeep.NET is BACK15-May-09 1:15

15-May-09 1:15

Hi friends,

I've a WinForm consisting of a TextBox and a button. Now, I want that, whenever user enters any text in the text box and clicks the button, the entered text be saved in a precreated text file. What code should I implement in the Button_Click event handler in order to do so?

Help appreciated,
Rajdeep.NET Big Grin | :-D

Pete O'Hanlon15-May-09 1:21

Pete O'Hanlon

15-May-09 1:21

Rajdeep.NET wrote:
What code should I implement in the Button_Click event handler in order to do so?

Code to write the file out. Now, buy a book and find out what that code is.

I can't believe that you've posted questions about how to sell your software when this limping pile of scrodspittle is the level of competence you are currently at. Do yourself a favour - buy some books; then read them.

"WPF has many lovers. It's a veritable porn star!" - Josh Smith

As Braveheart once said, "You can take our freedom but you'll never take our Hobnobs!" - Martin Hughes.

My blog | My articles | MoXAML PowerToys | Onyx

Rajdeep.NET is BACK15-May-09 2:19

15-May-09 2:19

Pete O'Hanlon wrote:
Code to write the file out. Now, buy a book and find out what that code is.

Hi Pete,

Why do you think like that? I searched for the above matter in Google and MSDN. I indeed found million results, but all based upon console application. I ain't a dumb. I remember what I said, "You should have searched google before posting here", and I did and it resulted in futile. Now thats the reason for asking you guys.

Hope you understand me,
Rajdeep.NET Sigh | :sigh:

Tom Deketelaere15-May-09 2:28

Tom Deketelaere

15-May-09 2:28

Rajdeep.NET wrote:
Why do you think like that?

Because writing to a text file is the most basic programming you can do. Almost all books / classes cover this in the first chapter / lesson.

Rajdeep.NET wrote:
all based upon console application

The code for writing to a text file remains the same for console applications or for winform applications.

Rajdeep.NET is BACK15-May-09 2:37

15-May-09 2:37

Tom Deketelaere wrote:
The code for writing to a text file remains the same for console applications or for winform applications

Is that so? I wonder!!! OMG | :OMG:

Can you post me the code for doing this in a Win application? Dont worry about Console, I'll make that out Blush | :O

Tom Deketelaere15-May-09 2:47

Tom Deketelaere

15-May-09 2:47

Rajdeep.NET wrote:
Is that so?

YES

Rajdeep.NET wrote:
Can you post me the code for doing this in a Win application? Dont worry about Console, I'll make that out

NO

How about you actually trying the code you found, and do some work of your own.

Rajdeep.NET is BACK15-May-09 2:58