Introduction
This article discusses the pros and cons of data storage for files v/s blobs in databases while designing data store application systems.
Background
For most of the web applications, this remains a common design question “whether the images should be stored in the database or in a file system?” In the web based apps like shopping cart, the site has to deal with a number of product images and the best choice is to store the images in the database together with the meta data. However, having said that, there are pros and cons for both the approaches.
While designing any data store applications, Sys Architect should take the right decision on this by taking into account the behavior of the proposed systems in a broader perspective.
As everyone knows, the IO operation is a bit faster and hence the file retrieval is when we use the file servers for storage whereas the database as the document or file store will deliver a central repository of data for the organizations and hence data management is more efficient and hence more under control. A database may follow good entity modelling and have a logical design, but if the application and users cannot access and manipulate the BLOB data stored in it efficiently, the database is not effective.
BLOBs require database designers to understand how the application will access and use the BLOBs in practice. Spend time designing how the BLOB objects will be represented in the database. How you lay out the large objects within the database affects both the response time and throughput of the application.
Advantages and Disadvantages
BLOB Storage as the Best Solution
- For better scalability. Although file systems are designed to handle a large number of objects of varying sizes, say files and folders, actually they are not optimized for a huge number (tens of millions) of small files. Database systems are optimized for such scenarios.
- For better availability. Database servers have availability features that extend beyond those provided by the file system. Database replication is a set of solutions that allow you to copy, distribute, and potentially modify data in a distributed environment whereas Log shipping provides a way of keeping a stand-by copy of a database in case the primary system fails.
- For central repository of data with controlled growth. DBA has the privilege to control and monitor the growth of database and split the database as and when needed.
- For full-text index and search operations. You can index and search certain types of data stored in BLOB columns. When a database designer decides that a table will contain a BLOB column and the column will participate in a full-text index, the designer must create, in the same table, a separate character-based data column that will hold the file extension of the file in the corresponding BLOB field. During the full-text indexing operation, the full-text service looks at the extensions listed in the character-based column (.txt, .doc, .xls, etc.), applies the corresponding filter to interpret the binary data, and extracts the textual information needed for indexing and querying.
File System Storage as the Best Solution
- For the application in which the images will be used requires streaming performance, such as real-time video playback.
- For applications such as Microsoft PhotoDraw® or Adobe PhotoShop, which only know how to access files.
- If you want to use some specific feature in the NTFS file system such as Remote Storage.
History
- 7th August, 2008: Initial post