Introduction
When programming modules are processing huge volumes of data stored in the RAM, data storage structure affects RAM consumption and performance. Using more primitive data types, structures instead classes, native data instead structures, can be used to economize computer’s resources. This approach breaks down the OOP and returns to applying “old” programming methods. However, in some cases, such optimization is capable of handling many problems. A simple study of the issue has proved the possibility to reduce memory consumption up to three times.
The following issues are covered in this article:
- Impact of software architecture on memory consumption and performance
- Difference in running in 32 and 64-byte modes
- Difference between pointers and indexes of array
- Influence of intra-class/structure data alignment
- Influence of CPU cache on performance
- Cost assessment related to supporting OOP in high-level programming languages
- Recognizing the need to take into account the low-level features of a platform even when using high-level languages
Background
We initially applied this method for developing a new Optimal Path Finder for the portal www.GoMap.Az. The newly-created algorithm consumed more RAM than the previous one and, as a result, the application began to jam after being installed on the test server. Upgrading the hardware in this case required several days, while optimizing RAM consumption by the software enabled to solve the problem more quickly. In this article, we share our experience and describe in simple words what actions have been implemented and what benefits have been gained.
Arranging the storage of a large number of data structures and access to them is an essential problem for information systems working with geographical data. Such kind of problems tend to occur when developing other types of modern information systems.
Let’s review data storage and access on example of roads – edges of the graph. In a simplified form, the road can be represented by class Road
and the container of the roads by class RoadsContainer
. Moreover, there is class Node
representing node of the graph. Regarding the Node
, we should only know that it is a class. Let’s assume that data structures do not contain methods and are beyond inheritance relations, in other words, they are used only for storing and manipulating data.
Herein, we are discussing the method using C# language, although originally it had been applied on ?++. More specifically, the problem and its solution is lying within the domain of system programming. However, the study also showed how high the costs of OOP supporting can be. C# can be the best way to show these hidden costs, while not being a system programming language.
public class Road
{
public float Length;
public byte Lines ;
public Node NodeFrom;
public Node NodeTo;
}
public class RoadsContainer
{
public Road[] getRoads(float X1, float Y1, float X2, float Y2)
{
}
}
RAM and Performance
While assessing the performance and memory consumption, the features of the platform architecture should be considered, including:
- Data alignment. The alignment is made for the correct and rapid access of the CPU to the memory cells. Thus, depending on the CPU type, the location of the classes/structures in the memory can start at an address that is a multiple of 32 or 64. Within the classes/structures fields can also be aligned on 32-, 16- or 8- byte boundary (for example, field Lines in class
Road
can take in memory 4 bytes but not 1 byte). In this way, the unused memory space increases the memory consumption.;
- CPU cache. As it is known, the basic purpose of CPU cache is to provide faster access to frequently used memory cells. The size of the cache is very small, as it is the most expensive memory. When handling classes/structures, the unused memory space which appear as a result of data alignment also goes through the processor cache and clogs it, while not carrying any useful information. This reduces the effectiveness of caching.
- Pointer size. On 32-bit systems, a pointer to an object in memory is also usually 32-bit which restricts the ability to work with RAM above 4GB. 64-bit systems can address much more memory, while using 64-bit pointers. Objects always have a pointer to them (otherwise it is a memory leak or object is listed to be removed by the garbage collector). In our example fields
NodeFrom
and NodeTo
of class Road
will occupy 8 bytes each in a 64-bit system and 4 bytes each in a 32-bit one.
As a rule, the compiler tries to generate the most efficient code it can, but only software-architectural solutions enable to achieve the highest effectiveness.
Arrays of Objects
Data can be stored in various containers - lists, hash tables, etc. Storage in an array is possibly the most simple and popular way, so we decided to consider this method. Other containers can be studied in a similar way.
In C#, arrays of objects actually store references to objects, while every object occupies its own separate address space in the heap. This enables to easily manipulate sets of objects, since you are working with pointers instead of full objects. Thus, in our example function getRoads
of class RoadsContainer
transmits a set of specific objects of the class Road
conveniently – that is, by references instead of copying objects' internals. This behavior occurs because the objects in C# are reference data types.
The disadvantage of storing objects as arrays is primarily additional storage costs for object pointers and alignment of each object in the heap. On 64-bit platforms, each pointer takes 8 bytes of memory and each object is aligned at an address that is a multiple of 8.
Arrays of Structures
Classes designed to store roads and nodes can be converted into the structures (as mentioned in our example, there are no restrictions from the part of the OOP). Integer indexes can be used instead of pointers to the objects. The resulting code would be:
public struct Road
{
public float Length;
byte Lines ;
Int32 NodeFrom;
Int32 NodeTo;
}
public class RoadsContainer
{
Road[] Roads;
public Int32[] getRoads(float X1, float Y1, float X2, float Y2)
{
}
public Road getRoad(Int32 Index)
{
return Roads[Index];
}
}
public class NodesContainer
{
Node []Nodes;
public Node getNode (Int32 Index)
{
return Nodes[Index];
}
}
What is the result? Below we discuss in detail.
The roads are stored here as structures (C# struct
s), but not as objects. Array of Road
s in class RoadsContainer
is used for storage. getRoad
function of the same class is used to access individual structures. 32-bit integer index assumes the role of a pointer to the data structure of a particular road. The nodes and the storage class NodesContainer
are converted similarly.
Using a 32-bit index instead of a 64-bit pointer leads to reduce memory consumption and simplifies operations with it. Using indexes to refer to nodes by fields NodeFrom
and NodeTo
in Road
structure would reduce the memory consumption of the structure by 8 bytes (if the alignment equals to 32, 16, or 8 bits).
Memory allocation for storing roads runs by one call (by calling operator “new
”). The road structures are being created at the same time when an array is created. In the case of storing references to objects, each object must be created separately. Creating of a separate object not only requires time, but also consumes a certain amount of overhead memory for alignment, registering an object in the heap, and in the garbage collection system.
The disadvantage of using structures instead of objects is, strictly speaking, inability to use pointers to structures (structure is a value type, but the class is reference type). This fact leads to a restriction in manipulating sets of objects. Therefore, function getRoads
of class RoadsContainer
now returns the indexes of the structures in the array. At the same time, function getRoad
provides the structure. However, this function would copy the entire returned structure, which would lead to increased memory traffic and CPU time.
Arrays of Primitive Types
Arrays of structures can be converted into arrays of individual fields of this structure. In other words, the structure can be decapsulated and abolished. For example, after decapsulation, and the abolition of structure Road
, we would have the following code:
public class RoadsContainer
{
float[] Lengths;
byte[] Lines;
Int32[] NodesFrom;
Int32[] NodesTo;
public Int32[] getRoads(float X1, float Y1, float X2, float Y2)
{
}
public float getRoadLength(Int32 Index)
{
return Lengths[Index];
}
public byte getRoadLines(Int32 Index)
{
return Lines[Index];
}
public Int32 getRoadNodeFrom(Int32 Index)
{
return NodesFrom[Index];
}
public Int32 getRoadNodeTo(Int32 Index)
{
return NodesTo[Index];
}
}
What is the result? Below we discuss in details.
Instead of storing entire structures in a single array, individual fields of the structure are stored in different arrays. Access to the fields also produced separately by the index.
Memory waste because of the alignment of fields inside the structure is eliminated as the data of primitive types are stored close to each other. The memory is requested from the operating system not by one big block for storing all structures at once, but by several blocks for storing arrays of fields respectively. To a certain extent, such division is useful as for the system it is often easier to deliver several fairly small portions of continuous memory than one large continuous segment.
Access to each field requires the use of an index each time, while access to the entire structure requires the use of the index only once. In practice, this feature can be seen both as a disadvantage and as an advantage. Treatment of only a portion of fields, for example, only three fields Lengths
, NodesFrom
, and NodesTo
would optimize the use of CPU cache if they are located in separate arrays. Using all advantages of cache depends on the data access algorithm, however, in any case the advantage can be obvious.
Garbage Collection and Memory Management
The given problem is related to memory management. After all, the location of objects in memory affects access time. Nowadays, there is a large number of ways of organizing memory management, including the system of automatic garbage collection. These automated systems not only monitor the memory cleaning, but also defragment memory (like a file system).
Memory management systems work mainly with the pointers to objects located in the heap. In the case of an array of structures/fields memory management would not be able to work with the elements of the arrays and all the work related to their creation and destruction would lie on the shoulders of the programmer. Thus, the use of arrays of structures/fields in a certain sense deactivates the garbage collector for them. This restriction depending on the application could be considered as an advantage or disadvantage.
Measures
To evaluate benefits of the decapsulation, a small test has been performed. The source code for the test is located here. For the test, arrays for storing 10 million roads have been created, read, and written. The test was run in both 32-bit and 64-bit modes. It should be mentioned that the 32-bit mode can easily overflow the memory when working with large amounts of data. Although the 32-bit mode for server and desktop systems nowadays are seldom used, for mobile systems the 32-bit mode is currently essential. Therefore, the assessment of indicators is used for both modes.
Memory
Memory consumption, 32-byte mode:
Memory consumption, 64-byte mode:
As you can see, the most wasteful storage is an array of objects. At the same time, on a 64-bit system, storage cost rises sharply. The storage in the form of an array of structures or fields in general is equally expensive for the 32 and 64-bit mode. Storing in the form of arrays of fields has some benefits with respect to the amount of memory, but this gain is not critical. This gain includes the costs of data alignment within structures.
Access Time
Notes:
* - the number is too small comparing to the time measure accuracy.
The access time to data structures when storing objects in the array shows the biggest time consumption. At the same time, there is a quicker access to the fields if they are located in separate arrays. This acceleration is the result of a more efficient use of the CPU cache. It is worth mentioning that within the test access is being implemented through continuous reading/writing the array elements, and in this case the use of the cache is close to optimal.
Conclusion
- Avoiding the use of OOP when working with large amounts of data can lead up to 3 times more efficient use of RAM on 64-bit systems, and 2 times on 32-bit systems. This problem arises because of special aspects of hardware architecture, so, to some extent, it affects all programming languages.
- The access time in C# by means of the array index can be substantially less than the access time by means of a pointer to the object.
- Increasing of a programming technology level is resource-intensive. Low level (system) and working with primitive data (obtained after decapsulation of classes/structures) uses least resources, but requires more lines of source code, and more programmer’s efforts.
- Working with primitive data types is an optimization of code. Therefore, this architecture can be used not as the initial design but as the measures to reduce consumption of resources if needed.
- In C++, many considered problems may be solved transparently, but C# hides its underlying realization. Moreover, when C# is studied, influence of the platform often is not considered.
- In C#, structures should be considered to use instead of classes whenever possible.
References