Re: Faster way of filling a Dataset - C# Discussion Boards

Re: To Throw or Not To Throw

22-May-13 6:55

V. wrote:
Don't shoot me for giving an honest reply.

(I ~~think~~ hope you didn't mean too)

I never mean to, but interhuman-communication isn't one of my strong points. Then again, the worst that could happen are a few downvotes, and I might learn something Thumbs Up | :thumbsup:

V. wrote:
And thus, no problem?

Nah; just that it's not a heavy operation. Still, you made me curious. Can you give me an example of the problem you're trying to avoid?

V. wrote:
Since your dutch, dat is wel héél kort door de bocht.

"Kort door de bocht" is geen argument; omrijden is nooit een wijs idee. For the English reader; "kort door de bocht" means taking a (inofficial) shortcut; and in coding, KISS is preferred.

V. wrote:
(sorry didn't know the English expression)

A "jump to conclusions-mat". Seen the movie "Office Space"?

V. wrote:
You're comparing integers and exceptions which is like comparing apples with pears.

No, I'm taking a particular use that we have for exceptions, and wondered what alternatives there could be, and whether they're more preferable or not.

I have seen "True, False, EFileNotFound" in code too often. It's easier to create a new Exception, to throw it, and to handle it. Yes, I'd say that such even goes for the validation of a BO. Why? Because it'll be guaranteed in a "correct" state; no illegal values in any properties, as it would throw an exception.

My old PC did around 10k exceptions in a second (throwing and handling). The one at work did around 33k in a second, and the new PC (new means "from the store a week ago") does 48000 exceptions per second.

..and only 192 in debug mode.

Bastard Programmer from Hell Suspicious | :suss:

If you can't read my code, try converting it here[^]

jschell22-May-13 8:25

jschell

22-May-13 8:25

V. wrote:
It's a heavy operation

Not on modern systems. In the 90s that was true for C++ and probably Java, but I have seen no evidence since that one need be overly concerned about that versus any other obvious performance related issue.

Re: To Throw or Not To Throw

jschell22-May-13 8:47

jschell

22-May-13 8:47

Kevin Marois wrote:
Examples might be a purchase amount exceeding a credit limit.

That specific example means that the database doing business logic so of course it must be capable on communicating a business failure. I suspect that an exception is a convienent and perhaps even the best way to do this.

Naturally if there are a lot of business rules in the database then there will be a lot of exceptions. However I question the need for different types of exceptions. I suspect it could be encoded within a property of the exception and although that seems more complex because there are already so many the complexity is the same.

Faster way of filling a Dataset

MumbleB21-May-13 8:17

MumbleB

21-May-13 8:17

Hi Guys. Suppose this question has been asked numerous times. I have a database "postgresql" with about 20 columns and about 1.4million records. I select data from the database between two given date ranges and it takes about 4 minutes to fill the dataset. I would like to speed this up. I am doing no updates, just using the data to compile a report. is there a faster way to do this? Below code is what I have. My results are OK but just filling the dataset is a bit of a problem i that it takes +- 4 to 5 minutes to fill the dataset.

            string sql = @"SELECT * FROM datbase where bus_received_time_stamp between @startDate AND @endDate";

            //Making use of a Postgresql Connection helper
            NpgsqlConnection conn = new NpgsqlConnection(conns);

            //Here we format the dateTimePicker dates to be used with the database
            string fromDate = dtStartDate.Text;
            string toDate = dtEndDate.Text;

            //Instanciate a SqlCommand and connect to the DB and fill a Dataset
            NpgsqlCommand cmd = new NpgsqlCommand();
            cmd.Connection = conn;
            cmd.CommandText = sql;
            cmd.Parameters.AddWithValue("@startDate", fromDate + " 00:00:01");
            cmd.Parameters.AddWithValue("@endDate", toDate + " 23:59:59");

            #endregion

            conversionRate = txtConversionRate.Text;
            double conRate = Convert.ToDouble(conversionRate);

            try
            {
                //Open the DB Connection
                conn.Open();
                setText(this, "Connection Established");
                NpgsqlDataAdapter da = new NpgsqlDataAdapter(cmd);
                setText(this, "Collecting Data for Processing!");
                da.Fill(dts);
//Time between the two "setText statements takes a good 4 to 5 minutes
                setText(this, "Define Data To be Used");

From here the processing takes a few minutes as there quite a bit of matching to do etc. However the main thing here is that filling the DataAdapter takes too much time. I would like to cut this down to maybe a few seconds, maybe a minute? I have added an Index on the DB on the Datee column to try and speed things up but now sure if this is correct?

Any ideas??

Excellence is doing ordinary things extraordinarily well.

Jasmine250121-May-13 9:29

Jasmine2501

21-May-13 9:29

Sounds like you are retrieving a huge dataset and doing calculations in your client code? If these statistics or calculations can be done on the SQL server, it will likely increase performance (because sql servers are really good at aggregating stats) and it will decrease network load, which I suspect is your problem in the first place. Getting 1.4 million rows from the database will always take a long time because it has to transfer all that data over the network. If you only had one boolean column in your table, 1.4 million rows would be 1.3MB of data - your 20 columns could result in your "fill the dataset" being a 20MB download, and that's never gonna be real fast.

To recap - solutions involve reducing the amount of data you need to transfer. Ideally, only transfer exactly what needs to be seen by the user and nothing more. Even if your query is super fast, transferring data over the network isn't. Your index on the date column is about all you can do to speed up the query since it's your only selection criteria, but again, finding the data isn't probably the issue.

SledgeHammer0121-May-13 11:30

SledgeHammer01

21-May-13 11:30

Filling the dataset is always going to be the bottleneck in your code.

That being said...

A few "band-aid" fixes:

1) Your query is pretty basic, but you *might* shave off a bit of time by using a stored procedure instead.
2) You *might* shave off a bit of time by indexing the bus_received_time_stamp column
3) Do you have an enterprise grade SQL server? Or are you running it on some re-purposed podunk PC?
4) Are you connecting to the server via gigabit ethernet?

With those band-aid fixes out of the way...:

1) The real cause of your problem is that you are asking for a LOT of data... 1.4M rows x 20 columns = 28M "cells"... if each cell is only 1 byte, thats 28M bytes = 26MB of data you are asking for. Most likely your average cell size could be 10 bytes or even 100 bytes. Now you are asking for 260MB to 2.6GB of data.
2) Its ***highly*** doubtful you need all that data. What are you doing with it? Displaying it to a user? That's great... but whats a person going to do with 1.4M rows of data? Nothing... a human can't process that amount of data.
3) Are you doing some kind of calculation on the data? If so, you can likely do that on the server and then only retrieve the result... or you can set it to run as a job at midnight or something and have it cached...
4) You could also page the data if the user really needs all that data which again I find **highly** doubtful.

You really need to define your problem better.

Eddy Vluggen21-May-13 22:32

21-May-13 22:32

SledgeHammer01 wrote:
2) Its ***highly*** doubtful you need all that data. What are you doing with it? Displaying it to a user? That's great... but whats a person going to do with 1.4M rows of data? Nothing... a human can't process that amount of data.

Post said it's for a report; but good point Smile | :)

Bastard Programmer from Hell Suspicious | :suss:

If you can't read my code, try converting it here[^]

SledgeHammer0122-May-13 4:48

SledgeHammer01

22-May-13 4:48

If someone plopped a report with 1.4M rows on it on your desk, what are you gonna do with it? I'd toss it in the recycle bin without even looking at it. If he needs all that data to generate a usable report, thats another story. In that case, he should probably do what another poster suggested and run the stats on the server instead of downloading all the data and doing it on the client.

Eddy Vluggen22-May-13 5:01

22-May-13 5:01

SledgeHammer01 wrote:

If someone plopped a report with 1.4M rows on it on your desk, what are you gonna do with it? I'd toss it in the recycle bin without even looking at it.

I'd do that with any report; then again, sometimes some bureaucrat needs a paper-trail from here to the moon, just to be able to say to his boss that the evidence has been archived.

Yes, I'll come up with the "filter, don't load all"-text, but that's more appropriate for grids/displaying than for reporting. For reporting, I'd state that time matters little.

SledgeHammer01 wrote:
If he needs all that data to generate a usable report, thats another story.

He'll be the one to answer that question, and eventually, you'd still end up writing that report. Keyword here is "needs".

SledgeHammer01 wrote:
In that case, he should probably do what another poster suggested and run the stats on the server instead of downloading all the data and doing it on the client.

It would still need databinding, whether you do it on the client or on the server. I'd suggest doing so on the client; I'd hate to see fifty people abusing the server while they have an idle desktop.

Bastard Programmer from Hell Suspicious | :suss:

If you can't read my code, try converting it here[^]

SledgeHammer0122-May-13 6:54

SledgeHammer01

22-May-13 6:54

All depends on his requirements. I'm kind of skeptical he needs to generate *real-time* reports for such massive amounts of data. In that case, I'd have a scheduled job that runs on the server at 12:01am and generates the report for the previous day and caches it somewhere.

If he needs to generate real-time reports for an arbitrary date range, yeah... there isn't really a way around that. It's going to take 4 to 5 minutes.

It would still be a clever design to have a service that runs at 12:01am and pulls the data for the previous day and caches it somewhere. Perhaps in compressed form Smile | :)

since you can't compress data over the SQL protocol (well, you can with 3rd party software I guess).

Then his client or whatever knows to pull 05202013.zip, 05212013.zip, 05222013.zip and concatenate them for those 3 days for example. You'd probably shrink down the data from 2.6GB to less then a gig.

You'd have to test it out of course Smile | :)

... the time you save on pulling the zips might be balanced out by the time it takes to unzip + concat + load into datasets.

Then again, depending on his application, the nightly service might be able to do some preprocessing steps as well.

Eddy Vluggen22-May-13 7:15

22-May-13 7:15

Good answer Thumbs Up | :thumbsup:

MumbleB31-May-13 6:46

MumbleB

31-May-13 6:46

Thanks Sledge. I did all the above before returning to the thread and after doing a whole lot of background work on Postgres Databases etc. Stored procedure definitely helped and then indexing did as well. I am in the process of looking at using a refcurs as well. But the proof will be in the pudding on Sunday when I have to run the billing for the month of May. Just over 2 million records now and it takes around 2, 3 or 4 minutes to complete everything, calculations included and creating PDF output reports.

Excellence is doing ordinary things extraordinarily well.

Pete O'Hanlon21-May-13 23:06

Pete O'Hanlon

21-May-13 23:06

As has been said, that's a lot of data. Do you really need all that data though? If this is for a report, do you really need all 20 columns? If you reduce what you're retrieving, this should help speed things up.

I was brought up to respect my elders. I don't respect many people nowadays.

CodeStash - Online Snippet Management | My blog | MoXAML PowerToys | Mole 2010 - debugging made easier

Validate textbox (characters)

MumbleB31-May-13 6:40

MumbleB

31-May-13 6:40

Thanks Pete and all the other guys. I need all the data to generate a report, summary billing report. SO client-side calculations etc is a must. What I have found helpful was creating a stored procedure on the DB, it is a local DB not a network DB. Concept phase before business would through money at a server. Did a heck of a lot of reading up on PostGres as this is my DB. For now I am sitting at 2 million records and it runs fine. Takes about 2 minutes to collect and do all relevant calculations etc. Indexing the date column helped with performance as well as I use this to create the report for a specific date range. Just for interest sake, 2 million records is only one months worth of data, which is what I am extracting from an Oracle Database which carries way more data than what I am carrying to do the billing. So, in short, it works for now. Will see as the local DB grows how the performance degrades.

Excellence is doing ordinary things extraordinarily well.

Member 991209121-May-13 7:55

Member 9912091

21-May-13 7:55

Hi. I would like to validate a textbox so that it only contains strings (words/sentences) but not symbols such as < > ! @ etc.

I checked the Regular Expression Validator, but didnt find a Validation Expression which validates for symbols

Is there a Validator Expression for this? Thanks

Re: Validate textbox (characters)

Richard MacCutchan21-May-13 21:10

Richard MacCutchan

21-May-13 21:10

You just need a Regex[^] that matches on alphabetic characters, plus any acceptable punctuation such as period, comma, space etc.

Use the best guess

Trick to Access derived class method from base

dinesh.17krishnan21-May-13 1:42

dinesh.17krishnan

21-May-13 1:42

Accessing methods of a derived class that are not in base class. I suppose there is a trick related to Implementation of Generics to achieve this. Can somebody brief the trick or defend if its not possible.

OriginalGriff21-May-13 1:57

OriginalGriff

21-May-13 1:57

No. A Base class cannot (under normal circumstances) access methods of a derived class unless they are implemented in the base class and overridden. Think about it:

public class Base
    {
    public virtual void Method()
        {
        Console.WriteLine("Base");
        }
    }
public class DerivedA : Base
    {
    public override void Method()
        {
        Console.WriteLine("A");
        }
    }
public class DerivedB : Base
    {
    public override void Method()
        {
        Console.WriteLine("B");
        }
    public void OtherMethod()
        {
        Console.WriteLine("Other method");
        }
    }
public class DerivedC : Base
    {
    public void OtherMethod()
        {
        Console.WriteLine("Other method");
        }
    }

Base can access Method in any class instance, because there will always be a Method, even if it is the base class implementation.
But it can't access OtherMethod, because it is not defined to exist in derived classes - and doesn't in DerivedA.

The universe is composed of electrons, neutrons, protons and......morons. (ThePhantomUpvoter)

Keith Barrow21-May-13 3:59

Keith Barrow

21-May-13 3:59

It might be possible to do this via reflecion, I don't know, I haven't tried.
But I've had a good reason for not trying. By doing this you are creating a dependency from the base class to the derived class meaning the method will most likely go wrong if you call the base class's method from a different subtype. The likelihood is that your hierarchy is wrong:- either you need to declare the necessary method in the base type anyway, or you require a third type (probably between the existing two).

If you give us the reason why you want to do this, then we might be able to provide further help.

“Education is not the piling on of learning, information, data, facts, skills, or abilities - that's training or instruction - but is rather making visible what is hidden as a seed”
“One of the greatest problems of our time is that many are schooled but few are educated”

Sir Thomas More (1478 – 1535)

Keld Ølykke21-May-13 5:36

Keld Ølykke

21-May-13 5:36

Hi,

When your base class requires an implemented method from a derived class, you can make the method abstract. This means all derived classed have the responsibility of implementing that method and that the base class may call it.

Not sure what you mean by "I suppose there is a trick related to Implementation of Generics to achieve this.".

Kind Regards,

Keld Ølykke

Dave Kreskowiak21-May-13 6:05

Dave Kreskowiak

21-May-13 6:05

The "trick" is to go do some research and teach yourself how Object Oriented Programming really works.

There is no special "trick" and Generics have nothing to do with this at all.

A guide to posting questions on CodeProject[^]

Dave Kreskowiak

Bernhard Hiller21-May-13 20:48

Bernhard Hiller

21-May-13 20:48

Public methods? Or private/internal/protected?
Similar to Keld's suggestion you can do:

public abstract class MyBase
{
    public virtual void DoSomeThing()
    {
        DoStep1();
        DoStep2();
    }
    protected abstract void DoStep1();
    protected abstract void DoStep2();
}

public class Derived : MyBase
{
    override void DoStep1()
    {
        //some code here
    }
    override void DoStep2()
    {
        //some code here
        SomeOtherMethod();
    }
    private void SomeOtherMethod()
    {
        //some code here
    }

}

When you call the DoSomeThing() method of the "base" class, the DoStep1(), DoStep2(), and SomeOtherMethod() functions of Derived are called.
Well, actually, you do not call DoSomeThing() of MyBase, but DoSomeThing() of Derived, which was inherited from MyBase.
See also: Template Method[^]

dinesh.17krishnan21-May-13 22:12

dinesh.17krishnan

21-May-13 22:12

Just Found the Answer.

C++: Prefer Curiously Recurring Template Pattern (CRTP) to Template Pattern[^]

http://en.wikipedia.org/wiki/Curiously_recurring_template_pattern[^]

You type cast the object to derived so you can access the derived members.
Sorry my question might have been little confusing.
Unsure | :~

Keld Ølykke22-May-13 19:35

Keld Ølykke

22-May-13 19:35

Please note that your link goes to a performance optimization of the template pattern.

It is to performance optimize pure virtual functions (aka abstract methods) in classes.

My C++ is a bit rusty, so I am not sure why the move of the pure virtual function from .cpp to .h also made it change visibility from protected to public. If this is required for the performance optimization to work then this technique trades encapsulation (do you want to allow externals to call Process?) for better performance.

Any C++ person here to confirm this point?

Kind Regards,

Keld Ølykke