|
Hi Guys. Suppose this question has been asked numerous times. I have a database "postgresql" with about 20 columns and about 1.4million records. I select data from the database between two given date ranges and it takes about 4 minutes to fill the dataset. I would like to speed this up. I am doing no updates, just using the data to compile a report. is there a faster way to do this? Below code is what I have. My results are OK but just filling the dataset is a bit of a problem i that it takes +- 4 to 5 minutes to fill the dataset.
string sql = @"SELECT * FROM datbase where bus_received_time_stamp between @startDate AND @endDate";
NpgsqlConnection conn = new NpgsqlConnection(conns);
string fromDate = dtStartDate.Text;
string toDate = dtEndDate.Text;
NpgsqlCommand cmd = new NpgsqlCommand();
cmd.Connection = conn;
cmd.CommandText = sql;
cmd.Parameters.AddWithValue("@startDate", fromDate + " 00:00:01");
cmd.Parameters.AddWithValue("@endDate", toDate + " 23:59:59");
#endregion
conversionRate = txtConversionRate.Text;
double conRate = Convert.ToDouble(conversionRate);
try
{
conn.Open();
setText(this, "Connection Established");
NpgsqlDataAdapter da = new NpgsqlDataAdapter(cmd);
setText(this, "Collecting Data for Processing!");
da.Fill(dts);
setText(this, "Define Data To be Used");
From here the processing takes a few minutes as there quite a bit of matching to do etc. However the main thing here is that filling the DataAdapter takes too much time. I would like to cut this down to maybe a few seconds, maybe a minute? I have added an Index on the DB on the Datee column to try and speed things up but now sure if this is correct?
Any ideas??
Excellence is doing ordinary things extraordinarily well.
|
|
|
|
|
Sounds like you are retrieving a huge dataset and doing calculations in your client code? If these statistics or calculations can be done on the SQL server, it will likely increase performance (because sql servers are really good at aggregating stats) and it will decrease network load, which I suspect is your problem in the first place. Getting 1.4 million rows from the database will always take a long time because it has to transfer all that data over the network. If you only had one boolean column in your table, 1.4 million rows would be 1.3MB of data - your 20 columns could result in your "fill the dataset" being a 20MB download, and that's never gonna be real fast.
To recap - solutions involve reducing the amount of data you need to transfer. Ideally, only transfer exactly what needs to be seen by the user and nothing more. Even if your query is super fast, transferring data over the network isn't. Your index on the date column is about all you can do to speed up the query since it's your only selection criteria, but again, finding the data isn't probably the issue.
|
|
|
|
|
Filling the dataset is always going to be the bottleneck in your code.
That being said...
A few "band-aid" fixes:
1) Your query is pretty basic, but you *might* shave off a bit of time by using a stored procedure instead.
2) You *might* shave off a bit of time by indexing the bus_received_time_stamp column
3) Do you have an enterprise grade SQL server? Or are you running it on some re-purposed podunk PC?
4) Are you connecting to the server via gigabit ethernet?
With those band-aid fixes out of the way...:
1) The real cause of your problem is that you are asking for a LOT of data... 1.4M rows x 20 columns = 28M "cells"... if each cell is only 1 byte, thats 28M bytes = 26MB of data you are asking for. Most likely your average cell size could be 10 bytes or even 100 bytes. Now you are asking for 260MB to 2.6GB of data.
2) Its ***highly*** doubtful you need all that data. What are you doing with it? Displaying it to a user? That's great... but whats a person going to do with 1.4M rows of data? Nothing... a human can't process that amount of data.
3) Are you doing some kind of calculation on the data? If so, you can likely do that on the server and then only retrieve the result... or you can set it to run as a job at midnight or something and have it cached...
4) You could also page the data if the user really needs all that data which again I find **highly** doubtful.
You really need to define your problem better.
|
|
|
|
|
SledgeHammer01 wrote: 2) Its ***highly*** doubtful you need all that data. What are you doing with it? Displaying it to a user? That's great... but whats a person going to do with 1.4M rows of data? Nothing... a human can't process that amount of data.
Post said it's for a report; but good point
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
|
|
|
|
|
If someone plopped a report with 1.4M rows on it on your desk, what are you gonna do with it? I'd toss it in the recycle bin without even looking at it. If he needs all that data to generate a usable report, thats another story. In that case, he should probably do what another poster suggested and run the stats on the server instead of downloading all the data and doing it on the client.
|
|
|
|
|
SledgeHammer01 wrote:
If someone plopped a report with 1.4M rows on it on your desk, what are you gonna do with it? I'd toss it in the recycle bin without even looking at it. |
I'd do that with any report; then again, sometimes some bureaucrat needs a paper-trail from here to the moon, just to be able to say to his boss that the evidence has been archived.
Yes, I'll come up with the "filter, don't load all"-text, but that's more appropriate for grids/displaying than for reporting. For reporting, I'd state that time matters little.
SledgeHammer01 wrote: If he needs all that data to generate a usable report, thats another story. He'll be the one to answer that question, and eventually, you'd still end up writing that report. Keyword here is "needs".
SledgeHammer01 wrote: In that case, he should probably do what another poster suggested and run the stats on the server instead of downloading all the data and doing it on the client. It would still need databinding, whether you do it on the client or on the server. I'd suggest doing so on the client; I'd hate to see fifty people abusing the server while they have an idle desktop.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
|
|
|
|
|
All depends on his requirements. I'm kind of skeptical he needs to generate *real-time* reports for such massive amounts of data. In that case, I'd have a scheduled job that runs on the server at 12:01am and generates the report for the previous day and caches it somewhere.
If he needs to generate real-time reports for an arbitrary date range, yeah... there isn't really a way around that. It's going to take 4 to 5 minutes.
It would still be a clever design to have a service that runs at 12:01am and pulls the data for the previous day and caches it somewhere. Perhaps in compressed form since you can't compress data over the SQL protocol (well, you can with 3rd party software I guess).
Then his client or whatever knows to pull 05202013.zip, 05212013.zip, 05222013.zip and concatenate them for those 3 days for example. You'd probably shrink down the data from 2.6GB to less then a gig.
You'd have to test it out of course ... the time you save on pulling the zips might be balanced out by the time it takes to unzip + concat + load into datasets.
Then again, depending on his application, the nightly service might be able to do some preprocessing steps as well.
|
|
|
|
|
Good answer
|
|
|
|
|
Thanks Sledge. I did all the above before returning to the thread and after doing a whole lot of background work on Postgres Databases etc. Stored procedure definitely helped and then indexing did as well. I am in the process of looking at using a refcurs as well. But the proof will be in the pudding on Sunday when I have to run the billing for the month of May. Just over 2 million records now and it takes around 2, 3 or 4 minutes to complete everything, calculations included and creating PDF output reports.
Excellence is doing ordinary things extraordinarily well.
|
|
|
|
|
As has been said, that's a lot of data. Do you really need all that data though? If this is for a report, do you really need all 20 columns? If you reduce what you're retrieving, this should help speed things up.
|
|
|
|
|
Thanks Pete and all the other guys. I need all the data to generate a report, summary billing report. SO client-side calculations etc is a must. What I have found helpful was creating a stored procedure on the DB, it is a local DB not a network DB. Concept phase before business would through money at a server. Did a heck of a lot of reading up on PostGres as this is my DB. For now I am sitting at 2 million records and it runs fine. Takes about 2 minutes to collect and do all relevant calculations etc. Indexing the date column helped with performance as well as I use this to create the report for a specific date range. Just for interest sake, 2 million records is only one months worth of data, which is what I am extracting from an Oracle Database which carries way more data than what I am carrying to do the billing. So, in short, it works for now. Will see as the local DB grows how the performance degrades.
Excellence is doing ordinary things extraordinarily well.
|
|
|
|
|
Hi. I would like to validate a textbox so that it only contains strings (words/sentences) but not symbols such as < > ! @ etc.
I checked the Regular Expression Validator, but didnt find a Validation Expression which validates for symbols
Is there a Validator Expression for this? Thanks
|
|
|
|
|
You just need a Regex[^] that matches on alphabetic characters, plus any acceptable punctuation such as period, comma, space etc.
Use the best guess
|
|
|
|
|
Accessing methods of a derived class that are not in base class. I suppose there is a trick related to Implementation of Generics to achieve this. Can somebody brief the trick or defend if its not possible.
|
|
|
|
|
No. A Base class cannot (under normal circumstances) access methods of a derived class unless they are implemented in the base class and overridden. Think about it:
public class Base
{
public virtual void Method()
{
Console.WriteLine("Base");
}
}
public class DerivedA : Base
{
public override void Method()
{
Console.WriteLine("A");
}
}
public class DerivedB : Base
{
public override void Method()
{
Console.WriteLine("B");
}
public void OtherMethod()
{
Console.WriteLine("Other method");
}
}
public class DerivedC : Base
{
public void OtherMethod()
{
Console.WriteLine("Other method");
}
}
Base can access Method in any class instance, because there will always be a Method, even if it is the base class implementation.
But it can't access OtherMethod, because it is not defined to exist in derived classes - and doesn't in DerivedA.
The universe is composed of electrons, neutrons, protons and......morons. (ThePhantomUpvoter)
|
|
|
|
|
It might be possible to do this via reflecion, I don't know, I haven't tried.
But I've had a good reason for not trying. By doing this you are creating a dependency from the base class to the derived class meaning the method will most likely go wrong if you call the base class's method from a different subtype. The likelihood is that your hierarchy is wrong:- either you need to declare the necessary method in the base type anyway, or you require a third type (probably between the existing two).
If you give us the reason why you want to do this, then we might be able to provide further help.
“Education is not the piling on of learning, information, data, facts, skills, or abilities - that's training or instruction - but is rather making visible what is hidden as a seed” “One of the greatest problems of our time is that many are schooled but few are educated”
Sir Thomas More (1478 – 1535)
|
|
|
|
|
Hi,
When your base class requires an implemented method from a derived class, you can make the method abstract. This means all derived classed have the responsibility of implementing that method and that the base class may call it.
Not sure what you mean by "I suppose there is a trick related to Implementation of Generics to achieve this.".
Kind Regards,
Keld Ølykke
|
|
|
|
|
The "trick" is to go do some research and teach yourself how Object Oriented Programming really works.
There is no special "trick" and Generics have nothing to do with this at all.
|
|
|
|
|
Public methods? Or private/internal/protected?
Similar to Keld's suggestion you can do:
public abstract class MyBase
{
public virtual void DoSomeThing()
{
DoStep1();
DoStep2();
}
protected abstract void DoStep1();
protected abstract void DoStep2();
}
public class Derived : MyBase
{
override void DoStep1()
{
}
override void DoStep2()
{
SomeOtherMethod();
}
private void SomeOtherMethod()
{
}
}
When you call the DoSomeThing() method of the "base" class, the DoStep1() , DoStep2() , and SomeOtherMethod() functions of Derived are called.
Well, actually, you do not call DoSomeThing() of MyBase , but DoSomeThing() of Derived , which was inherited from MyBase .
See also: Template Method[^]
|
|
|
|
|
|
Please note that your link goes to a performance optimization of the template pattern.
It is to performance optimize pure virtual functions (aka abstract methods) in classes.
My C++ is a bit rusty, so I am not sure why the move of the pure virtual function from .cpp to .h also made it change visibility from protected to public. If this is required for the performance optimization to work then this technique trades encapsulation (do you want to allow externals to call Process?) for better performance.
Any C++ person here to confirm this point?
Kind Regards,
Keld Ølykke
|
|
|
|
|
What you probably want to do is define an empty virtual method in the base class, thereby allowing derived classes to override. it. The base class simply calls this method. At run time, the derived class' method (if it is defined) will be called.
/ravi
|
|
|
|
|
hi all, its my first time here...i am writing an app that will receive voice from a Huawei E1752 modem, save it as a wav file and the file is later accessed by the speech SDK for processing. i have inquired from Google(my best friend) and also from past articles on this site and i am able to save something to the wav file but its not really the voice i expect to be saved, and when the speech SDK accesses the file, it brings an error saying (the audio file is not a recognizable format). how can i save the audio from the modem to a format that can be recognized by the speech sdk. i have been able to write this
thanks.
namespace joaninne
{
public partial class Form1 : Form
{
static SerialPort _SerialPort1;
byte[] buffer;
FileStream file;
public Form1()
{
InitializeComponent();
file = File.Open(@"D:\speechtestfiles.wav", FileMode.Create);
_SerialPort1 = new SerialPort("COM31", 9600, Parity.None, 8, StopBits.One);
_SerialPort1.DtrEnable = true;
_SerialPort1.RtsEnable = true;
_SerialPort1.ReadTimeout = SerialPort.InfiniteTimeout;
_SerialPort1.Open();
_SerialPort1.Write("ATS0=1\r");
buffer = new byte[100 * 1024];
_SerialPort1.DataReceived += new SerialDataReceivedEventHandler(sp_DataReceived);
Thread.Sleep(1000);
_SerialPort1.Close();
file.Close();
file.Dispose();
using (SpeechRecognitionEngine recognizer = new SpeechRecognitionEngine(new CultureInfo("en-US")))
{
Choices appliances = new Choices(new string[] { "fan", "lights" });
Choices Commands = new Choices(new string[] { "on", "off" });
GrammarBuilder gb = new GrammarBuilder();
gb.Append("Please turn the");
gb.Append(appliances);
gb.Append(Commands);
Grammar g = new Grammar(gb);
recognizer.LoadGrammarAsync(g);
recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
recognizer.SpeechRecognitionRejected +=
new EventHandler<SpeechRecognitionRejectedEventArgs>(recognizer_SpeechRecognitionRejected);
var isReady = false;
while (!isReady)
{
isReady = IsFileReady(@"D:\speechtestfiles.wav");
}
recognizer.SetInputToWaveFile(@"D:\speechtestfiles.wav");
recognizer.RecognizeAsync(RecognizeMode.Multiple);
}
}
private void sp_DataReceived(object Sender, SerialDataReceivedEventArgs e)
{
int x = _SerialPort1.BytesToRead;
_SerialPort1.Read(buffer, 0, x);
file.Write(buffer, 0, x);
}
public static bool IsFileReady(String file)
{
try
{
using (FileStream inputStream=File.Open(file,FileMode.Open,FileAccess.Read,FileShare.None))
{
if (inputStream.Length>0)
{
return true;
}
else
{
return false;
}
}
}
catch (Exception)
{
return false;
}
}
public static void recognizer_SpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
{
foreach (RecognizedPhrase phrase in e.Result.Alternates)
{
Console.WriteLine(" Rejected phrase: " + phrase.Text);
}
}
public static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
string _detected = (e.Result.Text);
string _recognised1 = " Please turn the lights on";
if (String.Compare(_recognised1, _detected) == 0)
{
PortAccess.Output(888, 1);
}
else
{
}
}
}
}
|
|
|
|
|
samweps wrote:
FileStream file; Writing a sequence of bytes to a FileStream does not produce a .wav file. You need to research how to create the sound file in the correct format[^].
Use the best guess
|
|
|
|
|
I'm doing some long term tests on my driver and have experienced an error relatively quickly; after only 23 hours of opperation, my log shows the following error:-
System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host
Now, does this mean that my instrument for some reason dropped the connection or could it perhaps mean that 'the network' has misbehaved ?
I guess I should add a method to try and re-establish the connection when this kind of exception is thrown but any thoughts anybody has would be welcome. Others must have experienced this kind of thing too; I will wire shark the connection in a moment to see If I can glean anything more because I can ping the instrument and get timely replies..
UPDATE______________________________________________________
Ok so:- This problem was the result of the laptop having gone to sleep. I have disabled any chance of it getting it's head down in the future ! Thanks for your thoughts though..
modified 21-May-13 11:45am.
|
|
|
|
|