|
Hi,
What do you mean by overkill?
This HDFS has its limitation and does not locate the processing logic power?
I have no personal experience to this product
As per summarized:
Hadoop is an Apache Software Foundation distributed file system and data management project with goals for storing and managing large amounts of data. Hadoop uses a storage system called HDFS to connect commodity personal computers, known as nodes, contained within clusters over which data blocks are distributed. You can access and store the data blocks as one seamless file system using the MapReduce processing model.
HDFS shares many common features with other distributed file systems while supporting some important differences. One significant difference is HDFS's write-once-read-many model that relaxes concurrency control requirements, simplifies data coherency, and enables high-throughput access.
In order to provide an optimized data-access model, HDFS is designed to locate processing logic near the data rather than locating data near the application space.
It sounds promising.
|
|
|
|
|
Mercurius84 wrote: What do you mean by overkill...Hadoop is an Apache Software
Foundation distributed file system and data management project with goals for
storing and managing large amounts of data.
Your stated requirements do not meet the definition of "large amounts of data".
Let me give you some examples of large data
- 2000 transactions a second sustained with a expected lifetime of 7 years and a real time need of 6 to 18 months immediate availability. Each transaction has a 1k size.
- Each originator will produce several 100 meg downloads several times a month. Sizing must expect up to 10,000 originators with a lifetime of 5 years.
|
|
|
|
|
|
|
Hi Merc,
It just hit me when reading this post that your problem is similar to to the problem of splitting a large file into chunks. As an example if you have big archieve on disk and want to store it on removable devices e.g. floppy, cdrom or dvd.
In the old days we used ARJ to split a compressed file into a number of volumes (.arj, .a01, .a02, etc.). Each volume had a fixed maximum size e.g. 1.44 MB for a floppy.
A limitation was that you could not add/remove stuff from .a02 without breaking the big file, so if you need to do this, splitting big files into compressed volumes might not be your solution.
If you want to play with this approach you can use rar. According to http://acritum.com/software/manuals/winrar/html/helparcvolumes.htm[^] this feature is called multivolume archieves.
Kind Regards,
Keld Ølykke
|
|
|
|
|
Many thanks.
But this process only works at the end of the document status.
I do not think it would be able to append more files after been compressed and segmented the files
I have a requirement too, which additional files can be added to after the above compression and segmentation.
|
|
|
|
|
Normally, I don't think further modification of the multi-volume is possible, but I don't have the insight to tell you so. There might be one archive tool that can do it.
Kind Regards,
Keld Ølykke
|
|
|
|
|
Mercurius84 wrote: I just want to compress the files and package them into one container for a configurable size.
Windows has compressed drives. Pretty sure every major OS does as well.
However your requirements still don't meet that need. Again your requirements don't require a specialized system. Current file systems are more than capable of handling that trivial amount of data.
|
|
|
|
|
Yes,
However, we need to further shrinking down the file size and better file handling.
Instead of OS handling.
|
|
|
|
|
Mercurius84 wrote: However, we need to further shrinking down the file size and better file
handling.
And you base that on what exactly?
What is your criteria? What is your desired improvement? How did you measure the criteria using the file system.
How do you escape the fact that any such solution will STILL rely on the file system?
Mercurius84 wrote: Instead of OS handling.
The OS has been optimized to handle files given that OS file systems are a key component of desktop OSes.
|
|
|
|
|
I had an assumption that programming could do 'almost' wonderful things.
for example:
I have a file sized 10 MB.
This special program shrinks down the size by ~30% (7MB for example)
and then store it in to a container.
What i mean by OS handling is unlike a normal compression and store them into a folder.
|
|
|
|
|
Mercurius84 wrote: This special program shrinks down the size by ~30% (7MB for example) and
then store it in to a container.
Fine - but why to you need to do that? What is the business or technical need that requires this?
Mercurius84 wrote: What i mean by OS handling is unlike a normal compression and store them into a
folder.
Not sure what you mean by that - as I already said desktop OSes already support compression. So that point by itself is moot.
|
|
|
|
|
Hi,
Actually our company wants to develop something similar to COLD system.
But in generically .... we are building a new product based to this concept....
|
|
|
|
|
Mercurius84 wrote: similar to COLD system.
No idea what that is but if a static storage system then your stated requirements still are not anywhere close to being reasonable for creating and marketing such a system.
Just to make it clear again, your stated requirements do not present a need for anything.
I can only presume that either you work in an highly unusual environment (an unstated requirement) or the stated file count\size does not span that actual need (another unstated requirement.)
|
|
|
|
|
|
Hi,
We are in the process of implementing a new eCommerce platform, which would replace our existing eCommerce Server 2007.
We are looking at some solutions provided out there. Does anyone have any suggestions?
Thanks!!
|
|
|
|
|
Start by collecting requirements for what the system needs to do.
And also size your business needs.
|
|
|
|
|
Hi I want to start studying process- based software.
i don't know where to start . can you give me the name of the courses to take by their order
for example Workflow - BPMN - ,...
thanks
|
|
|
|
|
I'm fairly new to writing unit tests and I've run into something of a design/architecture question.
I'm operating under the belied that any given method should not be overly large, and any that is should be refactored into smaller methods.
As a result I end up with this:
public MyClass : IMyClass
{
public string MyMainMethod()
{
Method1();
Method2();
return "somestring";
}
private void Method1()
{
}
private void Method2()
{
}
}
Obviously this is grossly simplified, but it serves my purposes for this question. I have my classes loosly couples so when writing a unit test I can stub any interface that is injected into MyClass and isolate the code under test.
How do I go about stubbing Method1 and Method2? I'm using Moles (and can't change because of company restrictions), but I suspect/hope this is a testing platform independent question.
Should I be desiging this different? Not using private methods, but public virtual would allow me more flexibility, but doesn't feel like the right approach.
Any advice or pointers would be greatly appreciated.
- Andrew
|
|
|
|
|
Andrew, there are two schools of opinion on this. In one school, you'd effectively elevate the visibility of these methods purely for the purpose of testing. In the second school, you wouldn't directly test these methods. Instead, you'd test the public method that called them. In other words, if there were no route to your private method, then it shouldn't exist.
|
|
|
|
|
That makes sense. I guess I was trying to be able to test the private methods individually and separately from the public method that calls them. It sounds like you're saying I could just test them through tests on the public method.
This feels cleaner to me, I really didn't like the idea of elevating them just for tests. I made them private for a reason, changing that just for a test felt wrong.
Thanks for the feedback, Pete.
|
|
|
|
|
If there is some need for verifying the code of those functions, I'd use either:
- protected instead of private and create a sub-class for testing
- reflection for invoking the private methods.
But testing is normally about public functions only. And you may have found a situation where that stringency is not fully appropriate.
|
|
|
|
|
Hi,
If you want to know how much of your private code is tested by your unit tests, there are tools to measure test coverage e.g. NCover.
Such a tool may report which code lines are not hit by tests.
Usually, each assembly is assigned a minimum coverage that must be reached. A 100% coverage can actually be very difficult to reach.
Getting the coverage feedback can be a useful experience.
You could get more conscious about how your coding style can make testing easier; branches and exceptions are usually up for discussion.
Kind Regards,
Keld Ølykke
|
|
|
|
|
Hi,
I am looking for a way to solve my problem in a proper way:
I have a pool of objects as input (lets name them p)
I have alot of different decission criterias (lets name them c)
I now want to sort those p with weighted c to get the best decission about the sorting of p.
I think about a system where c are objects derived from a base class (name it Criteria). I want to be able to weight those c in a way that I am not sure of yet.
Maybe it would be good to create a directed graph with those weights and let a search algo like Dijkstra run over it and get back the best result.
Sorry for the confusing description. I dont get it sorted in my brain
I thinkt I am not the first one with a problem like this and hopefully there are some design concepts/patterns to solve such a problem without using if-else.
Another question: Is there some design concept where the system can selve learn from its results and maybe start weighting those c on its own, based on earlier results?
I appreciate any input from you!
|
|
|
|
|