|
Speed of search comes through complexity of code. The more indexes you build, the faster it will be, but the more memory it will use.
Christian Graus
Driven to the arms of OSX by Vista.
"Iam doing the browsing center project in vb.net using c# coding" - this is why I don't answer questions much anymore. Oh, and Microsoft doesn't want me to.
|
|
|
|
|
Don't load it into memory. Use a file stream and appropriate indexes and it will be fast as the size of the file increases close to and beyond the amount of available ram for the application, especially on embedded devices.
Need software developed? Offering C# development all over the United States, ERL GLOBAL, Inc is the only call you will have to make.
Happiness in intelligent people is the rarest thing I know. -- Ernest Hemingway
Most of this sig is for Google, not ego.
|
|
|
|
|
One possibility is to use even SQL Server compact edition and with a constantly open connection query potential words from db. This would ease the index building.
The need to optimize rises from a bad design.
My articles[ ^]
|
|
|
|
|
Here's a suggestion.
Assuming the file is sorted so the words are in alphabetic order you can treat it as an array of words and use the Seek method to do a binary search.
There are a few caveats e.g. I think you need to use a BufferedStream and it might mean padding words with trailing spaces so you can calculate the offset.
Just a thought - perhaps not completely practical.
|
|
|
|
|
One way to do this would be to split the words up into smaller chunks, and then have *pointers* to keep them together. Consider this small file:
Adrian
Andrea
Andrew
Anthony
Brian
Charles
William
Winston
This could be tokenised like this:
Ad ri an
An dr ea
ew
th on y
Br ia n
Ch ar le s
Wi ll ia m
ns to n
As you can see, the list of choices narrows quite dramatically, the further on you get, and the information becomes quite easy to traverse. In this example, the user types in A and gets a choice of 4 entries. As soon as they press n, it breaks down to 3. Pressing d narrows it down to 2, and they keep going until they get to the end (or choose one out of your selection).
The downside to this approach, is the actual splitting of the words is the time consuming part of the process, but if your solution allows you to preparse them into smaller units up front, the results can be quite dramatic.
|
|
|
|
|
that's a great answer dude!
|
|
|
|
|
Read it and slap it in a tree structure so you can traverse quickly thru the possibilities.
|
|
|
|
|
Try SqlLite[^]- a file system based SQL database engine. Then use normal SQL queries to fetch the required data. It would be much faster.
|
|
|
|
|
A database has a lot of overhead, which you can avoid with your own data structure. I suggest a tree structure where each level takes you one letter farther in the word:
The root will have 26 sons, for the 26 possible first letters. Each of these sons will have up to 26 sons (grandsons of the root) for the (up to) 26 possible second letters, and so on.
1. It saves space because all words sharing a common prefix will use the same path from the root, giving you some compression.
2. It's faster than a database because you don't have to do any time-consuming queries; at each node you have a list of all the possible next characters.
3. When building this tree from your word list, you can increment a counter for each letter added at the current node. This will give you the frequencies for each continuation letter. You can then use these frequencies to predict the most likely continuation.
|
|
|
|
|
here is the first part of an XML file I am trying to parse.
<?xml version="1.0" encoding="utf-16" ?>
- <GPO xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.microsoft.com/GroupPolicy/Settings">
- <Identifier>
<Identifier xmlns="http://www.microsoft.com/GroupPolicy/Types">{89AEAFFE-E1F8-4786--6BE2D991625E}</Identifier>
<Domain xmlns="http://www.microsoft.com/GroupPolicy/Types">FQDN of Domain</Domain>
</Identifier>
<Name>Citrix_policy</Name>
<CreatedTime>2008-03-06T23:08:05</CreatedTime>
<ModifiedTime>2008-08-27T20:49:49</ModifiedTime>
<ReadTime>2008-10-24T01:29:41.40625Z</ReadTime>
.....
</GPO>
I am trying to use XPathNavigator but the following code returns the whole xml document
XPathDocument document = new XPathDocument(stream);
XPathNavigator nav = document.CreateNavigator();
XPathNodeIterator node = nav.Select("/GPO/Identifier/Identifier");
string test = node.Current.InnerXml.ToString();
MessageBox.Show(test);
What am I doing wrong?
|
|
|
|
|
Try to use MoveNext method on XPathNavigator after calling Select. Also if you have simple queries, you can use directly SelectNodes for XmlDocument.
The need to optimize rises from a bad design.
My articles[ ^]
|
|
|
|
|
Thanks for pointing me in the right direction, i have figured out how to do a simple query with Xpath, now the next step...
I have a large xml doc that i am parsing it has 2 big sections they have their own nodes in the doc. what will tell me when I have reached the end of the node? so I can kill my loop.
Thanks
|
|
|
|
|
If you're using XPathNodeIterator.MoveNext, it will return false when no more nodes are found.
The need to optimize rises from a bad design.
My articles[ ^]
|
|
|
|
|
a while loop will read through the whole doc
while(nodesText.MoveNext())
{
// code here
}
here is an example of the XML layout
<root>
<info>
...
<settings>
...
<adv settings="">
...
br mode="hold" />
I need to be able to tell when I reach the end of each node (info, Settings, adv settings...) so I can exit my loop, everything I have tried so far either reads the whole doc and returns the text values or get static path text. if I can find a way to use Xpath to start each loop (I can do this) but not sure how to get the loop to stop where I want it to stop.
hope this makes sense....
|
|
|
|
|
Can you post an example xml doc and your code. I'll try to reproduce the problem.
The need to optimize rises from a bad design.
My articles[ ^]
|
|
|
|
|
i will send it to you once I can clean the XML file of all confidential data.
thanks for your help
|
|
|
|
|
No problem
However, I must go and get some sleep so I'll check on this thread tomorrow again.
If possible, use just a small portion of the xml doc. Basically I believe that we need just a node (at correct level) with few (similar) descendants to find out the problem.
The need to optimize rises from a bad design.
My articles[ ^]
|
|
|
|
|
Hi,
I'm working on a solution to boost Log4Net performance in our ASP.Net web applications. I have found many posts about using Log4Net's "AsyncAppender". I made a sample web application to test this solution, but my test results shows that the log methods take double time to execute when I use the AsyncAppender.
Log4Net.config:
<br />
<appender name="AsyncAppender" <br />
type="SampleWebApplication.Appender.AsyncAppender, SampleWebApplication"><br />
<br />
<appender-ref ref="RollingLogFileAppender"/><br />
<!-- <appender-ref ref="AsyncAppender"/> --><br />
</appender><br />
<br />
<root><br />
<level value="ALL"/><br />
<appender-ref ref="RollingLogFileAppender"/><br />
</root><br />
<br />
C# Code:
<br />
Stopwatch objTimer = null;<br />
objTimer.Reset();<br />
objTimer.Start();<br />
<br />
for (int n = 0; n <= 10000; n++)<br />
{<br />
log.Debug("This is a debug message");<br />
}<br />
<br />
objTimer.Stop();<br />
Console.WriteLine("Log Ticks :{0}", objTimer.ElapsedTicks);<br />
Results:
Using "RollingLogFileAppender":
29,187,893,440 (ticks)-->9,122 (milisecond)
28,473,901,664 (ticks)-->8,898 (milisecond)
28,302,560,368 (ticks)-->8,845 (milisecond)
28,439,245,696 (ticks)-->8,888 (milisecond)
Using "AsyncAppender":
56,301,661,280 (ticks)-->17,595 (milisecond)
55,775,842,640 (ticks)-->17,431 (milisecond)
56,351,447,984 (ticks)-->17,611 (milisecond)
I'm using VS.Net 2008 and .Net 3.5 for this application.
Do you have any idea about this problem? Is there any other solution to call log methods asynchronously?
Regards,
Farzad Badili
|
|
|
|
|
10000 log messages in 10000 milliseconds. That's a thousand log messages per second. I would suggest that if this is causing you performance problems when you are writing too many log entries.
Try looking at where/when you are writing to the log and decide if it's actually beneficial to log data at this point. Surely your just going to be generating so much data you'll never actually be able to analyse it all.
Simon
|
|
|
|
|
Personally I think you are completely right, but at this moment my mission is boosting the log performance! This is what I'm paid for!
|
|
|
|
|
Ahh. I see.
If you are a contractor, then yes, you don't have much choice, just do as asked. It's probably easier.
If on the other hand you are a junior programmer just doing what your manager tells you, do consider raising your concerns. Do it very carefully, a lot of managers don't like being told they are wrong. Try to phrase it as questions, and don't whatever you do criticise any existing code, some people can be very defensive about the code they write. Ultimately though, if the manager is any good you will be recognised as being more aware of the big picture.
Good luck.
Simon
|
|
|
|
|
That depends on the implementation of Log4Net. Async operations are run on a background thread. Creating new threads is an expensive operation. The performances hit is probably there. If Log4Net is creating it's own threads, then you've found the problem. If it's already using the managed thread pool, that runs a bit quicker since the threads are already created. They're just waiting around for some code to run.
Also, you've not going to get better performance than just calling the blocking method of the library. All Async operations have an overhead to setup the background worker that the equivilent blocking call doesn't have to go through.
|
|
|
|
|
I found a very fast solution (20 ms, more than 400 times faster) but I'm not sure if it is a good solution or not?!
This is my sample code:
<br />
public class AsyncLogger<br />
{<br />
private static readonly log4net.ILog log =<br />
log4net.LogManager.GetLogger(System.Reflection.MethodBase.GetCurrentMethod().DeclaringType);<br />
<br />
public void Log(string message)<br />
{<br />
ThreadPool.QueueUserWorkItem(new WaitCallback(AsyncLog), message);<br />
}<br />
<br />
private void AsyncLog(object message)<br />
{<br />
log.Debug(message.ToString());<br />
}<br />
}<br />
<br />
objTimer.Reset();<br />
objTimer.Start();<br />
<br />
AsyncLogger objLogger = new AsyncLogger();<br />
<br />
for (int n = 0; n <= 10000; n++)<br />
{<br />
objLogger.Log("This is a debug message");<br />
}<br />
<br />
objTimer.Stop();<br />
Console.WriteLine("Log Ticks :{0}", objTimer.ElapsedTicks);<br />
<br />
What is your idea about this solution?
|
|
|
|
|
What you are doing here is potentially a bit risky.
What happens when you call ThreadPool.QueueUserWorkItem is that the method is queued on the thread pool and the main thread continues immediately. Then, when a thread in the pool becomes free, it does the requested work.
By default there are no threads in the pool, and every time work is requested one is created, up until a max (which I think is about 25 by default).
What you are doing is adding 10000 requests to the pool very quickly. So 25 threads will be created. You are then left with 9975 jobs left in the thread pool queue. These will wait until the first 25 begin to finish, then the threads will start processing them.
There are three things to be aware of with this approch.
1) When your reach the objTimer.Stop(); line, your thread pool is still processing the tasks. If you killed your app at this point without letting the thread pool finish, not all the log entries would be written.
2) You cannot guarantee that the tasks will be performed in the order you added them to the thread pool. If the order of the log entries is important, this solution is no good. (Although it's a queue, all this means is that the jobs get started in the correct order, it's possible a job could get pre-empted and switched out before it's written the log so a different log entry could get written first)
3) You should note that the logging still takes the same amount of time, just it's being done away from the main thread so you aren't timing it. This means if your main thread is adding log requests to the thread pool faster than the thread pool can process them you risk the queue getting longer and longer and longer and the thread pool not being able to keep up. Eventually, it will get so far behind that the logs actually get processes ages after they were originally added, and potentially even fill up memory if the queue gets too large. (Also, the same amount of CPU time is taken up. If it's got multiple CPUs the work may be done on a different CPU, but if the system is already maxed out all it's CPUs from your main app, this isn't going to improve anything, all that will happen is that the main app threads will have to run slower to allow the thread pool to work)
Ok. So that's the bad news. However, I have an idea that might help you.
Assuming you are happy with processing on the background thread as means for improving 'performance', you could use a producer/consumer thread queue.
The idea is that you have a queue of tasks. You can add new tasks to the end of the queue. A single thread processes tasks from the front of the queue. It has to be just 1 thread to prevent the risk of the tasks order getting swapped. (Similar to the idea of using the thread pool, but restricting it to one thread). Then, you also add a queue size restriction. When the queue reaches it's maximum size, when an attempt is made to add to it, it blocks until the queue reduces. You could set your queue restriction size to a few thousand, so your app would run, adding tasks to the queue really quickly, but if the queue ever got full, it would slow up to allow the queue to be cleared a bit.
This will only really help if your app makes a lot of log requests in a short period of time, then stops for a bit. This way the log requests won't slow it up during the busy period, and they can be processed in the background while the main app idles.
(This is fairly tricky to implement correctly, so do some research and check out existing produce/consumer queue. There are plenty of examples on google. The tricky bit is making sure the queue is locked correctly during enqueue and dequeue operations, and that the consumer thread waits correctly when the queue is empty)
[Edit: Good luck ]
Simon
|
|
|
|
|
Dear Simon,
You have answered my question on log4net very clearly. Thanks for your detailed explanation.
I have reached the same conclusions you mentioned. Using threads is always risky and complicated. Personally I always try to avoid using multi-threaded solutions as much as possible!
Best Regards,
F. Badili
|
|
|
|
|