Download catch_es.zip - 6.45 KB
Introduction
When accesses a mount of data, process often exits abnormally because it's memory is over 2.9G bytes. There are two common methods to solve this problem.
- Use database systems (for example BDB) to save the data. When wants to use a data, the process read a data from the DB, operate the data , and write it back to the DB.
- Use file to save the data. When wants to use a data, the process read a data from the file, operate the data , and write it back to the file.
Both of the methods will slow down the performance a process because they operate file systems frequently. This article gives a method which apply the MRU algorithm of the cache to save the frequent accessing data in process and the file systems to save the infrequent accessing data out of process. The main advantage of this method is high performance and can break the memory limitation of a process.
Background
This algorithm is very simple. It maintains a hash map and a bidirectional list to keep the frequent accessing data in the process memory, and the newest accessing data will push the inactivest data out of the process memory. The pushed out data will be saved in a file. The data structure of the process is as follow:
In order to map the key in the bidirectional list and the key in the hash table fast , the program replace the key in the bidirectional list with the pointer of the iterator of the the hash map.
The procedure is as follow:
A: Insert a data
- If the size of hash table is not over the setting value, the hash table will add a row. At the same time the Bata Block will allocate a unused data space and adjust the new key to the head of the bidirectional list.
- If the size of hash table is over the setting value, program will get the key of the tail of the bidirectional list, write the data of the key to the file and the vacated data space will be allocateded to the new data. And then program will adjust the tail of the bidirectional list to it's head.
B: Delete a data
- If the key of the deleted data is in the hash table, program will set the boolean of IsReleased to true and delete the corresponding data in the file.
- If the key of the deleted data is not in the hash table, program will delete the corresponding data in the file.
C: Access a data
- If the key of the accessing data is in the hash table, program will get the index of the data in the data block and adjust the key to the head of the bidirectional list. At last program will return the pointer of the data in the data block.
- If the key of the accessing data is not in the hash table. program will get the last key of the bidirectional list, write the data of the last key to the file. after the last steps there will be a vacated data space for future use. And then program read the wanted data from the file and put it to the vacated data space. Last program adjust the bidirectional list to change the tail to head and return the pointer of the data space which is put wanted data.
From the above description we can see that use the offset of the data in a file as the data key is an easy way. In another word every data will have it's own space in the file whenever it is in the memory of the process.
Because the program needs high performance and the file may be the memory of the computer, controling the size of the file and reducing the times of accessing the file are main factors in design file structure. In order to reduce the times of accessing the file, I advice that program do accessing operation of the file one time when gets or puts the data from the cache. In the source code only getting the data can cause reading and writing operation of the file. In order to control the size of the file, program adopts the rule of <bdo id="io-z0">redistributing </bdo>the used vacated data space in the file first and allocating the new data space in the file secondarily. In order to obey the rule. The file structure is as follow:
The read parts represent the vacated data space in the file and green parts represent the data space in using. Every vacated data space uses 4 bytes to record the offset of the next vacated data space in the file. The program only know the offset of the first vacated data space in the file, it can redistribute all vacated data space first.
Limitation of The Program
- The data in the cache must be bigger than 4 bytes.
- The data must be the same size.
- The limitation of the file is about 4G.
- not thread safety.
Environment
windows , Linux
gcc , vc
Using the Code
- Get data with adjusting frequency
CFileManager_EqualSize tempFileManager_ES; tempFileManager_ES.createAFile("/var/tempdatafile", sizeof(int));
tempFileManager.setblocksize(10000); int aintpos = tempFileManager.malloc(0); int **intptr;
tempFileManager.read(aintpos, 0, (void**)intptr); **inteptr = 10; tempFileManager.deleteMe();
- Get data without adjusting frequency
CFileManager_EqualSize tempFileManager_ES; tempFileManager_ES.createAFile("/var/tempdatafile", sizeof(int));
tempFileManager.setblocksize(10000); int aintpos = tempFileManager.malloc(0); int **intptr;
tempFileManager.read_nsort(aintpos, 0, (void**)intptr); **inteptr = 10; tempFileManager.write(aintpos, 0, intptr)
tempFileManager.deleteMe();
Using the Code
For Chinese Language