Hardware & Devices

10-Sep-11 14:41

It seems like my hard disk is running too slow. Perhaps it started after Windows 7 SP1 was installed a couple months or so ago. I don't know. I just know it is painful to boot. Painful to copy huge directories around. Painful to load large apps and projects and such. Any one else have this problem?

I have Windows 7 64-bit, Quad-Core Xeon, 6GB RAM, 2 WD SATA II 500 GB hard disks, etc. I ran WINSAT (right click CMD then run as Administrator then the following from the Vista/W7 command prompt):

winsat disk -read -ran -ransize 4096 -drive c
winsat disk -write -ran -ransize 4096 -drive c
winsat disk -read -ran -ransize 524288 -drive c
winsat disk -write -ran -ransize 524288 -drive c

And got this:

CSS

D:\>winsat disk -read -ran -ransize 4096 -drive c
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-read -ran -ransize 4096 -drive c'
> Run Time 00:00:10.59
> Disk  Random 4.0 Read                        0.39 MB/s
> Total Run Time 00:00:11.31

D:\>winsat disk -write -ran -ransize 4096 -drive c
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-write -ran -ransize 4096 -drive c'
> Run Time 00:00:03.39
> Disk  Random 4.0 Write                       1.44 MB/s
> Total Run Time 00:00:04.06

D:\>winsat disk -read -ran -ransize 524288 -drive c
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-read -ran -ransize 524288 -drive c'
> Run Time 00:00:26.38
> Disk  Random 512.0 Read                      20.25 MB/s
> Total Run Time 00:00:27.50

D:\>winsat disk -write -ran -ransize 524288 -drive c
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-write -ran -ransize 524288 -drive c'
> Run Time 00:00:15.24
> Disk  Random 512.0 Write                     43.47 MB/s
> Total Run Time 00:00:16.11

Any idea if that seems normal? Is your system faster? Thanks!

DaveAuld11-Sep-11 5:58

DaveAuld

11-Sep-11 5:58

Here are my results (Seagate Barracuda 7200.11 1TB SATAII)

Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-read -ran -ransize 4096 -drive c'
> Run Time 00:00:12.51
> Disk Random 4.0 Read 0.34 MB/s
> Total Run Time 00:00:13.65

Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-write -ran -ransize 4096 -drive c'
> Run Time 00:00:02.86
> Disk Random 4.0 Write 2.00 MB/s
> Total Run Time 00:00:03.88

Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-read -ran -ransize 524288 -drive c'
> Run Time 00:00:19.05
> Disk Random 512.0 Read 27.93 MB/s
> Total Run Time 00:00:20.12

Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-write -ran -ransize 524288 -drive c'
> Run Time 00:00:08.03
> Disk Random 512.0 Write 74.69 MB/s
> Total Run Time 00:00:09.08

You should check the disk is not heavily fragmented first, get a disk degraf that gives you a visual idea about the frag spread (Auslogics is free, i use it)

Heavy fragmentation can display as the symptoms you are seeing.

Dave
Find Me On: Web|Facebook|Twitter|LinkedIn

Folding Stats: Team CodeProject

chimera96711-Sep-11 7:37

11-Sep-11 7:37

Thanks. That's pretty much the same as my tests. I thought it would be a lot faster.

I'm running Diskeeper, so there's essentially zero fragmentation on the drive.

I don't understand why it is so slow on smaller sizes. Larger reads and writes are a lot faster (over 100MB/s when you get big enough).

But as a developer, we have source trees and such with TONS of small files in them. So copying a folder from one drive to another, and such, takes FOREVER. Just empting the recycling bin can be a painful process if you throw about 10,000 small files in it.

What happens is that the CPU (8 cores) ends up sitting around with not much to do all the time, because it is constantly waiting on the hard disk.

It just seems like there's something horribly wrong with that. And there's nothing that can be done about it. I wonder if a RAID array would be faster? Because right now, my girlfriends el cheapo Vista box gets better results than my super expensive tricked out dev box.

Jörgen Andersson11-Sep-11 7:48

11-Sep-11 7:48

chimera967 wrote:
we have source trees and such with TONS of small files in them

There you have the reason.

chimera967 wrote:
I wonder if a RAID array would be faster?

Hint, SSD.

DaveAuld11-Sep-11 8:53

DaveAuld

11-Sep-11 8:53

Hint.......SSD in Striped array Smile | :)

(its an old article now, but the principles are still the same) http://www.nextlevelhardware.com/storage/battleship/[^]

Dave
Find Me On: Web|Facebook|Twitter|LinkedIn

Folding Stats: Team CodeProject

Jörgen Andersson11-Sep-11 9:48

11-Sep-11 9:48

Quite right. Can't do that on my laptop though. Sigh | :sigh:

chimera96711-Sep-11 8:59

11-Sep-11 8:59

Okay, thanks. Do you know if people actually use SSD for dev boxes?

Jörgen Andersson11-Sep-11 9:46

11-Sep-11 9:46

I do.

It's the biggest single hardware improvement I've done since getting a harddrive as such (instead of a floppy).

chimera96711-Sep-11 10:21

11-Sep-11 10:21

Cool.

NTFS, win7 and 20 million files

DaveAuld11-Sep-11 8:57

DaveAuld

11-Sep-11 8:57

You will usually always see a significant lower transfer rate for smaller file sizes.

Look at the graph at the bottom of my article for my NAS box, you will see a much lower transfer rate for small block sizes compared to larger block sizes.
QNAP NAS Memory Upgrade, Hardware Change and Performance Benefits[^]

Also in the article, I have a included some benchmark software I use, and an excel spreadsheet template i use for tracking benchmarks between mods etc.

Dave
Find Me On: Web|Facebook|Twitter|LinkedIn

Folding Stats: Team CodeProject

Michael Pauli9-Sep-11 3:29

9-Sep-11 3:29

(hope this forum is right then Smile | :)

Hi all,

I'm about to begin a small project in which I must be able to store and lookup up to 20. mio. files - in the best possible way. Needless to say fast.

For this I have been around -

http://en.wikipedia.org/wiki/NTFS#Limitations
http://www.ntfs.com/ntfs_vs_fat.htm

And now my question: Dealing with a production load in the area around 60.000 files (=pictures) per day each around 300 kb in size, what would be the best ratio of what number of files in what number of directorys to make the search-time best? Obviously I do not put all the files in one directory, but in a number of dir's. So what would be the best economy for such a thing?

Seems to be hard to find information about on the web.

Thanx' in advance,
Kind regards,

Michael Pauli

Luc Pattyn9-Sep-11 4:29

Luc Pattyn

9-Sep-11 4:29

Hi,

1.
I tend to limit the number of files per folder to 50 ot 100. In my experience it is not very relevant if you never need to browse the folder with say Windows Explorer, so when your app knows which file to access, it does not matter. If you can group the files logically (say by topic), then by all means do so. OTOH if you have to open the folder in Explorer, especially on a remote computer, things may slow down considerably when the folder holds hundreds of files/folders or more. If so, use a two-stage or three-stage organization; with maximum N files/folder, that can hold N*N or N*N*N files.

2.
Search what? file content? file names? partial file names? If file names, then again, organize a multi-level folder hierarchy based on what matters most to you (could be the first and second character of the file names).

3.
Whatever is is you really need, just give it a try. In a matter of minutes a test app could create and store a huge number of files (real or dummy), and you could experiment with the result.

Smile | :)

PS: I'm sure all this is in the wrong forum, it isn't hardware related, is it?

Luc Pattyn [My Articles] Nil Volentibus Arduum

Jörgen Andersson9-Sep-11 5:14

9-Sep-11 5:14

Seriously, use a database instead. You're losing very little storage space and winning so much on the lookup. Atleast if it's properly indexed.

Michael Pauli11-Sep-11 21:59

11-Sep-11 21:59

Hi Jörgen,

I totally agree about your comment, but my customer want to use a filesystem and not a Oracle db etc. I really don't understand why, but it is something about maintenance and backup I'm told.

Kind regards,

Michael Pauli

Dave Kreskowiak12-Sep-11 2:29

Dave Kreskowiak

12-Sep-11 2:29

Yeah, that's utter bullshit.

Your customer is going to find that that method will be non-performant and limited as well as very easy to screw up while doing "maintenance".

The more files and directories you shove into the directory structure, the slower a single search is going to get. Indexing won't help much as the indexes will be limited to the properties of the files themselves as well as the metadata stored in the image files.

The more files and directories you add is going to make the NTFS data structures grow and grow, eventually taking up gigabytes of space, slowing your machines boot time, and if something should happen to those tables, God help you when performing a CHKDSK on it. Bring a cot to sleep on.

The backup argument is also garbage as it's just easy to backup a database as it is to backup the massive pile of debris you're about to litter the drive with.

A guide to posting questions on CodeProject[^]

Dave Kreskowiak

Jörgen Andersson12-Sep-11 6:23

12-Sep-11 6:23

Very nice and clear summary.

jschell12-Sep-11 8:10

jschell

12-Sep-11 8:10

Dave Kreskowiak wrote:
The more files and directories you shove into the directory structure, the
slower a single search is going to get. Indexing won't help much as the indexes
will be limited to the properties of the files themselves as well as the
metadata stored in the image files.

Not sure I understand that.

I am rather certain that both MS SQL Server and Oracle provide for a file based blob storage mechanism. And of course using a url string for a blob entry is an option for any database. There are tradeoffs as to whether one wants to keep it in the database or file system.

And it isn't that hard to implement at least some simplistic indexing schemes if one doesn't want to use a database. That requires using another file, it doesn't require searching the files themselves. And if one was using a database then one would still have to export the meta data from the files. If one didn't then I wouldn't be surprised if attempting to extract meta data from image blobs would be slower with a database.

Dave Kreskowiak wrote:
The more files and directories you add is going to make the NTFS data structures
grow and grow, eventually taking up gigabytes of space, slowing your machines
boot time

What does the storage requirement have to do with anything? If you store something in a database it takes space too.

I have never heard anyone make that claim about any OS slowing down. Could you provide a reference?

Dave Kreskowiak12-Sep-11 10:52

Dave Kreskowiak

12-Sep-11 10:52

jschell wrote:
Not sure I understand that.

In order to search 20,000,000 files and have a request return something in your lifetime, you better have the Indexing service turned on and your app better be using it.

Check the OP. He's specifically avoiding using a database because of stupid customer requirements.

jschell wrote:
What does the storage requirement have to do with anything?

The size of the NTFS tables on disk grows and grows with the number of files and folders you stick in the volume. Directory entries take up on space on the disk.

Not so much if you put everything into a database since the database is only a few files.

jschell wrote:
I have never heard anyone make that claim about any OS slowing down. Could you
provide a reference?

Don't have to. Think about it. The NTFS tables take up memory. The bigger you make those tables, the more memory that's going to be eaten up and less available for apps. Of course, what affect this has depends on how much memory is in the machine.

I meant to say that the server will take longer and longer to boot, not necessarily slow down the app once everything is loaded and running.

You want documentation? Try it yourself. Load up your C: drive with 20,000,000 files in a few thousand folders, reboot your machine and watch what happens. To take it a bit further, try scheduling a CHKDSK and reboot. Don't forget to have a pot of coffee standing by.

A guide to posting questions on CodeProject[^]

Dave Kreskowiak

jschell13-Sep-11 11:23

jschell

13-Sep-11 11:23

Dave Kreskowiak wrote:
In order to search 20,000,000 files and have a request return something in your
lifetime, you better have the Indexing service turned on and your app better be
using it.

In order to search the image data of 20 million blobs in a database it is going to take just as long and probably longer.

The only way to avoid that in the database is to extract the meta data from the images and store it somewhere else in the database.

And again one can do exactly the same thing with a file based system.

Dave Kreskowiak wrote:
The size of the NTFS tables on disk grows and grows with the number of files and
folders you stick in the volume. Directory entries take up on space on the
disk.

The size of the database on the disk grows with the number of blobs you stick in it.

So how exactly is that different?

Dave Kreskowiak wrote:
Don't have to. Think about it. The NTFS tables take up memory. The bigger you
make those tables, the more memory that's going to be eaten up and less
available for apps. Of course, what affect this has depends on how much memory
is in the machine.

That isn't how any modern file system works.

It doesn't load the entire file system into memory. Matter of fact the database is going to load more into memory than the file system will. Quite a bit more unless you constrain it.

Not that it would matter anyways since it would be using virtual memory.

Dave Kreskowiak wrote:
I meant to say that the server will take longer and longer to boot

That clarifies it for me - I don't believe that. Please provide a reference. Provide a reference that refers to booting the machine.

(Since I was interested I also determined that I have over 500,000 files on my personal development computer. If there was in fact some impact then I would certainly expect that a server class machine with a server class file system would in fact be able to handle more files than a personal dev box.)

Michael Pauli12-Sep-11 22:15

12-Sep-11 22:15

Hi Dave!

Thank you for your opinion. I must say I tend to go your way here, but to avoid any problems of a more political nature I go for the file sys. solution. I my career I've never done a thing like that and I find it hard to write - even though it's simplistic by nature.

To begin with we go for 500 directories each holding 500 sub dirs. each holding 500 sub dirs. That is 500³ = 125,000,000. I'm having a server for this so it's not on my locale dev. pc. Smile | :)

My feeling is that we would be better off having an Oracle db or likewise for it. But the decision is made Frown | :-(

Thanx' again.

Kind regards,

Michael Pauli

Jörgen Andersson12-Sep-11 6:49

12-Sep-11 6:49

Maintenance and backup are among the best reasons to use a database.
Tell them to educate their staff.

Daves summary is spot on in my opinion.

Michael Pauli12-Sep-11 22:20

12-Sep-11 22:20

Yaa sure - I agree but some technicians here would like to have this filebased and not put in a database for some more or less obscure reasons. So they get what they want. I have less than a week left on this assignment ... if you get my point Wink | ;-)

Kind regards,

Michael Pauli