Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / Win32

Pre-allocated Arrayed List

4.67/5 (6 votes)
21 Aug 2011CPOL14 min read 55.8K   462  
List that combines the advantage of array and linked-list and has better performance

Introduction

Two different data structures used for storing elements are array and linked-list. An array needs to be pre-allocated before use and is limited by its storage capacity. A linked-list is created and its nodes are allocated memory during the run time. The usage of linked-list results in efficient usage of memory as the nodes that form the list are allocated dynamically. However, in the long run after a series of allocation and de-allocation operations, memory fragmentation occurs. An advantage of array over linked-list is that it is easy to use, maintain and the retrieval of an element at a particular position is simple and fast.

There are some disadvantages though. During addition operation in an array, all the elements that come after the position of insertion of the new element need to be moved forward by 1 position. For deletion of an element, all elements that come after the element to be deleted should be moved back by 1 position.

Adding and deletion of elements at a particular position in a linked list consists of two operations:

  • If the position is greater than 0 (i.e. if it is not the first element in the list), find the element that comes before the point of addition/deletion.
  • Updating the link of the element (that comes before the element in question) to point to the next of the next element or to point to the new element in case of deletion and addition operations respectively. In the latter case, the next of the new node also needs to update to point to next node from the original element (previous to the insertion) at that position. In case of deletion, the memory allocated for the deleted element is freed.

Hence, the deletion and addition operations are more efficient in linked-list than in an array. A disadvantage of linked-list is that for retrieving an element, we need to traverse all the elements occurring before that element in the list. The pre-allocated array list tries to combine some of the advantages of array and linked-list while eliminating most of their disadvantages. The pre-allocated linked- list tries to eliminate the main disadvantage of linked-list, i.e., memory fragmentation which occurs after a large number of node deletions and additions. It also tries to be more efficient in random insertion/deletion operations (as we will see later) as well as accessing an element as compared to the linked list.

Motivation

Usage of linked-list causes memory fragmentation after a number of addition and deletion operations. Usage of arrays is inefficient in case of deletion and addition operations that causes whole elements to be copied. The pre-allocated array list tries to eliminate both disadvantages while retaining most of the benefits of array and linked-list. If there is a requirement to store fixed amount of elements, the store on which operations like addition, deletion, sorting, etc. is frequent then pre-allocated arrayed list is the way to go. In such cases, the pre-allocation of memory will be offset by efficiency of operations on the list, as we see in the performance comparison for PAL, Vector and List later.

Data Structures Used

As the name suggests in pre-allocated list, memory is allocated beforehand (as in array) and so the number of allowable elements are limited. Like the linked-list, each element is stored in a node, which along with space for an element also includes space for a node pointer. The pointer points to the same or different node that holds the element for that position. Hence, the only difference from linked-list is that rather than the link pointing to the node at immediate next position, it (i.e., the pointer in a pre-allocated list node) points to the node containing the element at the current position. Like the array, we can retrieve element at any position by using its position to index into the store. This means that to retrieve an element at a position is to first retrieve the node at a particular position in the store and then to retrieve the element contained in the node pointed to by the former's node pointer.

C++
template <class T>
struct node{
   T val;
   struct node *posptr;
};

template <class T=int, int N=100>
class CPArrayList{
  private:
   // The Buffer or store
      char BUFFER[N];
   // The size of each node
      int m_nodesize;
   // The maximum no of nodes that can be stored in the store
      int m_capacity;
   // No of elems currently in use
      int m_filledelems;
   // No of elems allocated from the list
      int m_allocelems;
   // Method that returns a free node(a new region from buffer or a free region
   // (that was deleted))
      node<T> *getfreenode(void);
   // Method that returns the node at a particular position
      node<T> *getNodeAtPos(int);

  public:
      CPArrayList();
      int nodesize(void) { return m_nodesize; }
      int capacity(void) { return m_capacity; }

      bool insert(T info, int pos=0);
      bool set(T info, int pos);
      T get(int pos);
      void remove(int pos);
      void printlist(void);
};
  • BUFFER: An array or buffer that holds the memory for the list.
  • m_capacity: A variable holding the limit for the store.
  • m_filledelems: The number of elements inserted/relevant in the list at a point of time. The (m_filledelems-1) value indexes to the node that has pointer to the last node of the list at that point of time.
  • m_allocelems: The number of nodes ever used from the store regardless of whether it is currently in use or not. A node is said to be not in use if it is deleted. Deletion is a special condition where the node is not returned to the Store in the real sense of the term but is added to a free list. The free list is actually the name for a subset store in the main store itself that is a list of nodes that each point to a deleted element node but which itself may not be deleted (this will be explained later) and contain a list element. The number of freed elements can be calculated as m_allocelems – m_filledelems.

Operations on the List

Element Addition

When adding an element in the list, first a free node is found. This is done by first searching it in the free node list (Alloc_elems > Filled_elems means there are free element(s) in the list). If there are no elements in the free list and Alloc_elems < Max_elems, then an unused node from the array list is procured and used. If Alloc_elems >= Max_elems, then the limit of store has been reached and so further no more nodes can be allocated.

The position to insert the element can be found by indexing to the particular position in the store. The node pointer in this position is pointed to the free node and the value of the free node is set to the new value. Before insertion, the nodes that come after the newly inserted node are updated in such a way that each of the node pointers (in the node) is assigned the previous node’s pointer as per their positions prior to the insertion. This is extended to the node at position following the last node so that the element pointed by this node now becomes the last node in the list. The count of the number of list elements is incremented by 1. If a new element is to be appended to the list, then only operations related to free node procurement and its pointing by the node in that position is required.

When compared with addition of an element in an array, the operations look almost similar to PAL. However, a major difference here is that instead of copying each element value (byte by byte), only the pointer of the node is updated. Thus the addition operation is more efficient when compared to addition in array but lesser efficient than addition in linked list.

C++
/** 
 * While adding an element at a particular position which is not the
 * last position of the list, care should be taken to update all the 
 * element pointers that begin with the old element at the point of 
 * addition.
 * For example if the current list allocation is as follows:
 * 1,3,4,5
 * If an element 2 is now added at position 2, all position pointers 
 * starting from position 2 need to be updated.
 */
template <class T,int N>
bool CPArrayList<T,N>::insert(T info, int pos)
{
  int nextpos    = m_filledelems;
  node <T> *freenode  = 0;
  
  /**
   * First look at the free node list to return any
   * node that was freed.
   */
  if(m_filledelems < m_allocelems)
  {
     freenode  = getNodeAtPos(m_filledelems)->posptr;
  }
   // Next return a freenode if any from the buffer.
  else if(m_allocelems < m_capacity)
  {
     freenode  = (node<T> *) (BUFFER + m_allocelems * m_nodesize);
     ++m_allocelems;
  }
  else
    return false;

  freenode->val  = info;

  for(; nextpos > pos; --nextpos)
  {
     getNodeAtPos(nextpos)->posptr =  getNodeAtPos(nextpos-1)->posptr;
  }
  getNodeAtPos(pos)->posptr = freenode;
  ++m_filledelems;
  return true;
}

Element Deletion

If an element is to be deleted from position x (where x>0) in the list, then first the node pointed by the node at this position is pushed to the free list stack. After this, all node pointers of the node from this position to the second last (i.e., n-1th position for n nodes) are updated such that each node’s pointer is updated with the next node’s pointer in a move to compact the list after the deletion operation. Thus, now the new last node of the list was the second last node prior to the deletion operation. The count of the number of list elements is decremented by 1.

C++
/**
 * The deletion of an element at a particular position is a critical operation
 * in the list.
 * In most of the cases related to deletion it is necessary to update links for
 * elements that follow the newly deleted element.
 */
template <class T,int N>
void  CPArrayList<T,N>::remove(int pos)
{
  /**
   * On deletion of an element at a position, we need to update
   * the position pointers of all nodes that come after the 
   * deletion position.
   */
   node <T> *newfreenode  = getNodeAtPos(pos)->posptr;

   int nextpos = pos;

   for(;nextpos < (m_filledelems-1); nextpos++)
   {
      getNodeAtPos(nextpos)->posptr =  getNodeAtPos(nextpos+1)->posptr;
   }
 
   getNodeAtPos(nextpos)->posptr = newfreenode;
   --m_filledelems;
}   

Example Illustration

  1. Initially after pre-allocation

    Allocated nodes -0, Filled nodes - 0 (All 5 nodes have been pre-allocated, but have yet to be used and hence not allocated logically, resulting in empty allocated and filled nodes count. Whenever a new node is returned from the store (and not the free list which has deleted nodes), the allocated nodes count is incremented. The Filled node count represents the number of elements being used by the program using PAL at a particular time.

    1.jpg

  2. After addition of a node with value 1 at position 1:

    Allocated nodes -1, Filled nodes – 1

    2.jpg

  3. After addition of a node with value 4 at position 2:

    Allocated nodes - 2, Filled nodes - 2

    3.jpg

  4. After addition of a node with value 2 at position 2:

    Allocated nodes - 3, Filled nodes - 3

    4.jpg

  5. After addition of a node of value 3 at position 3, total allocated nodes are 4 (1, 2, 3 and 4)

    Allocated nodes - 4, Filled nodes - 4

    5.jpg

  6. After addition of a node of value 5 at position 5 (1, 2, 3, 4, 5)

    Allocated nodes - 5, Filled nodes - 5

    6.jpg

  7. After deletion of a node at position 3 (1, 2, 4 and 5); Note that now the 4th node is holding a value that has been deleted, but it still points to a valid node (with value 5), which in turn is pointing to the deleted node. As it points to a deleted node, node 5 is part of the free list.

    Allocated nodes - 5, Filled nodes - 4, Free list nodes - 1

    7.jpg

  8. After deletion of a node at position 2 (1, 4 and 5); Along with pointing to a deleted node (the 3rd node with value 2), the 4th node itself has been deleted. Node 5 holds a valid value but points to a deleted node. The 4th Node is now inducted into the free list (which already contains the 5th node). Thus, regardless of the position of deletion, the last node (of the filled nodes before deletion) always points to the recently deleted node. This also allows ease of re-use of the deleted nodes during future addition operations.

    Allocated nodes - 5, Filled nodes - 3, Free list nodes - 2

    8.jpg

  9. After addition of node with value 3 at position 2 (1, 3, 4 and 5); As Filled nodes < Allocated nodes, the node (with value 2 above) pointed to by the node that comes after the last node is replaced with the new value. The no of Filled nodes is incremented by 1 following the decrement of the total Free nodes.

    Allocated nodes - 5, Filled nodes - 4, Free list nodes - 1

    9.jpg

  10. After addition of node with value 2 at position 2 (1, 2, 3, 4 and 5); As Filled nodes < Allocated nodes, the node (4th above with value 3) pointed to by the node that comes after the last node is filled with the new value. The no of Filled nodes is incremented by 1 following the decrement of the total Free nodes.

    Allocated nodes - 5, Filled nodes - 5, Free list nodes - 0

    10.jpg

Comparison with Vector and List

Vectors are sequence containers with elements ordered following a strict linear sequence. Vector containers are implemented as dynamic arrays and have their elements stored in contiguous storage locations, allowing its elements to be accessed using offsets on regular pointers to elements.

Vectors are good at:

  • Accessing individual elements by their position index (constant time).
  • Iterating over the elements in any order (linear time).
  • Add and remove elements from its end (constant amortized time as we see in the results table below).

List containers are implemented as doubly-linked lists. Doubly linked lists can store each of the elements they contain in different and unrelated storage locations. The ordering is kept by the association to each element of a link to the element preceding it and a link to the element following it.

This provides the following advantages to list containers:

  • Efficient insertion and removal of elements anywhere in the container (constant time).
  • Efficient moving elements and block of elements within the container or even between different containers (constant time).
  • Iterating over the elements in forward or reverse order (linear time).

Compared to the other base standard sequence containers (like list), vectors are generally the most efficient in time for accessing elements and to add or remove elements from the end of the sequence. For operations that involve inserting or removing elements at positions other than the end, they perform worse than list.

PAL while having all the above specified advantages of Vector; operations like insertion and deletion are faster in PAL when compared to Vector (as only pointers are copied and not whole elements as explained before).

Showdown with Linked list and vector container

Elements

Element size(bytes)

PAL cost (secs)

Vector cost(secs)

List cost(secs)

start

random

end

start

random

end

start

random

end

2383

77

0.016

0.02

0

0.12

0.06

0.008

0.004

0.02

0

2383

110

0.017

0.013

0

0.14

0.065

0

0

0.03

0

2383

210

0.017

0.013

0

0.22

0.111

0.001

0

0.029

0

2383

325

0.017

0.001

0

0.25

0.125

0

0.001

0.03

0

2383

435

0.016

0.015

0

0.3

0.141

0.01

0.004

0.03

0

2383

540

0.017

0.016

0

0.37

0.178

0.005

0.001

0.029

0

2383

650

0.020

0.012

0

0.44

0.219

0.01

0.004

0.023

0

2383

760

0.023

0.003

0

0.48

0.234

0.005

0

0.033

0

2383

870

0.017

0.013

0

0.62

0.307

0.01

0

0.031

0

Table 1: Costs of different insertion operations for PAL, Vector and List

For PAL and Vector, the insertions made at the start of the list is considered to be one of the most expensive operations. For List, this should be the random insertion operation.

As shown in the table above, as the size of element increases, the cost of using Vector increases in comparison with PAL (because PAL only copies address of the node(s) and not the whole node). The List performs better only in additions at the beginning, in case of random additions PAL fares better. Independent of the element size, the cost of insertion operations remains almost constant for similar operation in both PAL and List.

Elements

Element size(bytes)

PAL cost (secs)

Vector cost(secs)

List cost(secs)

start

random

end

start

random

end

start

Random

End

2383

77

0.03

0.011

0

0.113

0.052

0

0

0.027

0

2383

110

0.016

0.005

0

0.14

0.073

0

0

0.022

0.002

2383

210

0.02

0.005

0

0.211

0.106

0

0

0.029

0

2383

325

0.017

0.014

0

0.24

0.114

0

0

0.029

0

2383

435

0.016

0.014

0

0.28

0.14

0

0

0.017

0.002

2383

540

0.024

0.002

0

0.35

0.179

0

0

0.023

0

2383

650

0.015

0.005

0

0.412

0.211

0

0

0.019

0.002

2383

760

0.019

0.01

0

0.455

0.223

0

0

0.026

0

2383

870

0.018

0.011

0

0.555

0.272

0

0

0.029

0

Table 2: Costs of different deletion operations for PAL, Vector and List

For PAL and Vector, the deletions made at the start of the list is considered to be one of the most expensive operations. For List, this should be the random deletions.

As shown in the table above, as the size of element increases, the cost of using Vector increases in comparison with PAL (because PAL only copies address of the node(s) and not the whole node). The List performs better only in deletions at the beginning, in case of random deletions PAL fares better. Independent of the element size, the cost of deletion operations remains almost constant for similar operations in both PAL and List.

Conclusion

If random insertions/deletions are considered to be the norm in practice, then PAL has the advantage in terms of performance when compared to both Vector and List.

Thanks for the comments on this article that helped in its updating. Looking forward to more valuable comments and please do not forget to rate this article.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)