Click here to Skip to main content
16,015,756 members
Articles / Programming Languages / C++
Article

Multi Dimensional Arrays using Template Meta Programming

Rate me:
Please Sign up or sign in to vote.
3.05/5 (8 votes)
20 Mar 2004CPOL3 min read 87.9K   465   17   20
A Real Multi-Dimensional Array

Introduction

Multi-dimensional arrays provide a nice mathematical data structure for mathematical and soft computing projects. Tensor is usually the data structure used to store data in multi-dimensional arrays. Helper classes helps in design subscripting operators to select rows, cols, and plates. Helper classes may be implemented by adding constant template arguments describing the state of the array, or sub-array. Refer to "Tensor templates", " 2D Matrix Container with [][] indexing", and "Template-Based Helper Classes: An application to Multidimensional Arrays" for more details. A tensor is a uni-dimensional array with indexing facility that allows manipulation as N-dimensional matrix. Computational cost for accessing a specific element grows exponentially as the number of dimensions increases. Helper classes force many constructors/destructors calls for temporary objects. Actually helper classes provide a nice solution for little projects with short loops.

In mathematics, finding a solution in some space costs a lot. One way to find fast solution is transforming the problem (actually the search space) into a larger space where more dimensions may lead to faster solution.

Now, consider the following piece of code:

int* pi1=new int[20];
int** pi2=new int*[4];
for(int i=0; i<4; i++)
    pi2[i]=new int[5];

Now, what do you think? How shall I access the element located in 2nd row and 3rd col? How much does it cost for each of them?

Obviously, pi2[2][3] is faster than pi1[2*4+3].

How about very large N-dimensional arrays with complex objects!? Can you imagine this!?

"Template-Based Helper Classes: An application to Multidimensional Arrays" provides a nice solution, but subscripting operators cost a lot for computations, costs much if repeated in a very long loop, uses lots of helper temporary classes, and fragments the memory.

As described in linear algebra, adding more dimensions fastens the access to data elements.

Problem Statement

How shall we develop a template class that generates multi-dimensional pointers!!?

int i; 
int* pi; 
int** ppi; 
int*** pppi; 
...

How shall we add number of dimensions as a template argument!?

Basic Solution

The basic idea makes use of Template Meta-Programming, implementing a template class with argument M indicating the number of nested pointers.

Consider the following class:

template<_sizetype M>
class multi_ptr
{
public:
    typedef double T;
    typedef multi_ptr<M-1>::pointer* pointer; 
};

class multi_ptr<0>
{
public:
    typedef double T;
    typedef T pointer;
};

Now, consider how to declare a multi-dimensional pointer:

multi_ptr<3>::pointer p3;

Now, more facilities are needed; we need to add class template argument T indicating the type of this array.

More Enhancement - Adding Class Template Argument

Template Meta-Programming works only for single argument template classes. Adding a class template argument cannot be achieved directly.

An outer class should be used to pass the template argument through. Consider the following multi_array class:

template<class T, _sizetype N>
class multi_array
{
public:
    template<_sizetype M>
    class multi_ptr
    {
    public:
        typedef multi_ptr<M-1>::pointer* pointer; 
    };
    
    class multi_ptr<0>
    {
    public:
        typedef T pointer;
    };

    typedef multi_ptr<N>::pointer pointer;
    typedef const pointer const_pointer;
};

Class template argument T is passed through the outer multi_array class with the number of dimensions N. The inner multi_ptr class makes use of the outer template argument T for type specialization.

Now the very important question is:

How shall we allocate this multi dimensional array!? We need to specialize nested allocating loops!!

How shall we build a varying number of nested allocating loops!?

We need a little addition for allocation!!

Little Enhancement - Adding Allocators

A simple compile-time loop provides a nice solution for generating a variable number of nested loops. An inner class called multi_allocator is responsible for allocating multi-dimensional pointers.

template<_sizetype M>
class multi_allocator
{
public:
    static multi_ptr<M>::pointer alloc(_sizetype* dims)
    {
        _sizetype i;
        multi_ptr<M>::pointer rtn;
        rtn=new multi_ptr<M-1>::pointer[dims[0]];
        for(i=0; i<dims[0]; i++)
            rtn[i]=multi_allocator<M-1>::alloc(dims+1);
        return rtn;
    }
};
    
class multi_allocator<1>
{
public:
    static multi_ptr<1>::pointer alloc(_sizetype* dims)
    {
        multi_ptr<1>::pointer rtn;
        rtn=new T[dims[0]];
        return rtn;
    }
};

Now, fine touches is needed to finalize the multi_array class. Constructors, casting operators, and data members should be added.

Finalizing multi_array - Adding constructors and casting operators

typedef unsigned int _sizetype;
    
template<class T, _sizetype N>
class multi_array
{
public:
    template<_sizetype M>
    class multi_ptr
    {
    public:
        typedef multi_ptr<M-1>::pointer* pointer; 
    };
    
    class multi_ptr<0>
    {
    public:
        typedef T pointer;
    };


    template<_sizetype M>
    class multi_allocator
    {
    public:
        static multi_ptr<M>::pointer alloc(_sizetype* dims)
        {
            _sizetype i;
            multi_ptr<M>::pointer rtn;
            rtn=new multi_ptr<M-1>::pointer[dims[0]];
            for(i=0; i<dims[0]; i++)
                rtn[i]=multi_allocator<M-1>::alloc(dims+1);
            return rtn;
        }
    };
    
    class multi_allocator<1>
    {
    public:
        static multi_ptr<1>::pointer alloc(_sizetype* dims)
        {
            multi_ptr<1>::pointer rtn;
            rtn=new T[dims[0]];
            return rtn;
        }
    };
    
    typedef multi_ptr<N>::pointer pointer;
    typedef const pointer const_pointer;
    typedef multi_allocator<N> allocator;

    
    operator pointer(){return _pdata;}
    
    multi_array(const _sizetype d1, ...)
    {
        memcpy(_pdims, &d1, N*sizeof(_sizetype));
        _pdata=allocator::alloc(_pdims);
    }
    
    multi_array(const _sizetype* pdims)
    {
        memcpy(_pdims, pdims, N*sizeof(_sizetype));
        _pdata=allocator::alloc(_pdims);
    }
    
protected:
    pointer _pdata;
    _sizetype _pdims[N+1];

};

Thank You!!

Thanx a lot for your time. I hope you enjoyed this idea!! Hear from you soon:).

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Engineer
Australia Australia
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
Generalerror using g++ 4.4.5 Pin
axllaruse10-May-11 2:55
axllaruse10-May-11 2:55 
QuestionDeallocate? Pin
GregoryBook29-Sep-06 15:50
GregoryBook29-Sep-06 15:50 
AnswerRe: Deallocate? Pin
Mo Hossny23-Oct-06 18:27
Mo Hossny23-Oct-06 18:27 
Generalinvalid template argument 'N' Pin
SMJT17-Sep-04 7:31
SMJT17-Sep-04 7:31 
GeneralRe: invalid template argument 'N' Pin
SMJT17-Sep-04 7:38
SMJT17-Sep-04 7:38 
GeneralRe: invalid template argument 'N' Pin
Mo Hossny17-Sep-04 12:24
Mo Hossny17-Sep-04 12:24 
GeneralRe: invalid template argument 'N' Pin
SMJT17-Sep-04 21:35
SMJT17-Sep-04 21:35 
GeneralRe: invalid template argument 'N' Pin
Mo Hossny19-Sep-04 9:12
Mo Hossny19-Sep-04 9:12 
GeneralCompiler error on VC++.NET 2003 Pin
Rui A. Rebelo8-Apr-04 4:11
Rui A. Rebelo8-Apr-04 4:11 
GeneralRe: Compiler error on VC++.NET 2003 Pin
Mo Hossny10-Apr-04 22:01
Mo Hossny10-Apr-04 22:01 
GeneralRe: Compiler error on VC++.NET 2003 Pin
Mo Hossny3-Jun-04 1:56
Mo Hossny3-Jun-04 1:56 
Questionpi2[2][3] is faster than pi1[2*4+3] ?? Pin
sahn021-Mar-04 3:20
sahn021-Mar-04 3:20 
AnswerRe: pi2[2][3] is faster than pi1[2*4+3] ?? Pin
Mo Hossny21-Mar-04 3:23
Mo Hossny21-Mar-04 3:23 
GeneralRe: pi2[2][3] is faster than pi1[2*4+3] ?? Pin
Tim Smith21-Mar-04 6:14
Tim Smith21-Mar-04 6:14 
Nope, the code generates the same machine code.

int main (int argc, char * argv[])
{
	int pi1[20];
	int pi2[4][5];
	memset (pi1, 0, sizeof (pi1));
	memset (pi2, 0, sizeof (pi2));
	int a = 0;
	int b = 0;
	for (int i = 0; i < 4; i++)
	{
		for (int j = 0; j < 4; j++)
		{
            a += pi2[i][j];
		}
	}
	for (int i = 0; i < 4; i++)
	{
		for (int j = 0; j < 4; j++)
		{
			b += pi1[i*4+j];
		}
	}
	printf ("%d %d", a, b);
	return 0;
}


Has this output

PUBLIC	_main
PUBLIC	??_C@_05OKMLJOMC@?$CFd?5?$CFd?$AA@		; `string'
EXTRN	_printf:NEAR
;	COMDAT ??_C@_05OKMLJOMC@?$CFd?5?$CFd?$AA@
; File c:\dev\exo\exo.cpp
CONST	SEGMENT
??_C@_05OKMLJOMC@?$CFd?5?$CFd?$AA@ DB '%d %d', 00H	; `string'
; Function compile flags: /Ogty
CONST	ENDS
;	COMDAT _main
_TEXT	SEGMENT
_pi2$ = -160
_pi1$ = -80
_argc$ = 8
_argv$ = 12
_main	PROC NEAR					; COMDAT

; 42   : {

  00000	55		 push	 ebp
  00001	8b ec		 mov	 ebp, esp
  00003	83 e4 f8	 and	 esp, -8			; fffffff8H
  00006	81 ec a4 00 00
	00		 sub	 esp, 164		; 000000a4H
  0000c	53		 push	 ebx
  0000d	56		 push	 esi
  0000e	57		 push	 edi

; 43   : 	int pi1[20];
; 44   : 	int pi2[4][5];
; 45   : 	memset (pi1, 0, sizeof (pi1));

  0000f	33 c0		 xor	 eax, eax
  00011	b9 14 00 00 00	 mov	 ecx, 20			; 00000014H
  00016	8d 7c 24 60	 lea	 edi, DWORD PTR _pi1$[esp+176]
  0001a	f3 ab		 rep stosd

; 46   : 	memset (pi2, 0, sizeof (pi2));

  0001c	b9 14 00 00 00	 mov	 ecx, 20			; 00000014H
  00021	8d 7c 24 10	 lea	 edi, DWORD PTR _pi2$[esp+176]
  00025	f3 ab		 rep stosd

; 47   : 	int a = 0;

  00027	33 f6		 xor	 esi, esi

; 48   : 	int b = 0;

  00029	33 d2		 xor	 edx, edx
  0002b	8d 44 24 18	 lea	 eax, DWORD PTR _pi2$[esp+184]
  0002f	b9 04 00 00 00	 mov	 ecx, 4
$L690:

; 49   : 	for (int i = 0; i < 4; i++)
; 50   : 	{
; 51   : 		for (int j = 0; j < 4; j++)
; 52   : 		{
; 53   :             a += pi2[i][j];

  00034	8b 58 fc	 mov	 ebx, DWORD PTR [eax-4]
  00037	8b 78 f8	 mov	 edi, DWORD PTR [eax-8]
  0003a	03 fb		 add	 edi, ebx
  0003c	03 78 04	 add	 edi, DWORD PTR [eax+4]
  0003f	03 38		 add	 edi, DWORD PTR [eax]
  00041	03 f7		 add	 esi, edi
  00043	83 c0 14	 add	 eax, 20			; 00000014H
  00046	49		 dec	 ecx
  00047	75 eb		 jne	 SHORT $L690

; 54   : 		}
; 55   : 	}
; 56   : 	for (int i = 0; i < 4; i++)

  00049	8d 44 24 68	 lea	 eax, DWORD PTR _pi1$[esp+184]
  0004d	b9 04 00 00 00	 mov	 ecx, 4
$L698:

; 57   : 	{
; 58   : 		for (int j = 0; j < 4; j++)
; 59   : 		{
; 60   : 			b += pi1[i*4+j];

  00052	8b 58 fc	 mov	 ebx, DWORD PTR [eax-4]
  00055	8b 78 f8	 mov	 edi, DWORD PTR [eax-8]
  00058	03 fb		 add	 edi, ebx
  0005a	03 78 04	 add	 edi, DWORD PTR [eax+4]
  0005d	03 38		 add	 edi, DWORD PTR [eax]
  0005f	03 d7		 add	 edx, edi
  00061	83 c0 10	 add	 eax, 16			; 00000010H
  00064	49		 dec	 ecx
  00065	75 eb		 jne	 SHORT $L698

; 61   : 		}
; 62   : 	}
; 63   : 	printf ("%d %d", a, b);

  00067	52		 push	 edx
  00068	56		 push	 esi
  00069	68 00 00 00 00	 push	 OFFSET FLAT:??_C@_05OKMLJOMC@?$CFd?5?$CFd?$AA@
  0006e	e8 00 00 00 00	 call	 _printf
  00073	83 c4 0c	 add	 esp, 12			; 0000000cH

; 64   : 	return 0;
; 65   : }

  00076	5f		 pop	 edi
  00077	5e		 pop	 esi
  00078	33 c0		 xor	 eax, eax
  0007a	5b		 pop	 ebx
  0007b	8b e5		 mov	 esp, ebp
  0007d	5d		 pop	 ebp
  0007e	c3		 ret	 0
_main	ENDP
_TEXT	ENDS
END


*NOTE* The inner loop has been unrolled.

The heart of the loop uses almost the EXACT same machine code.

Tim Smith

I'm going to patent thought. I have yet to see any prior art.
GeneralRe: pi2[2][3] is faster than pi1[2*4+3] ?? Pin
Mo Hossny21-Mar-04 6:26
Mo Hossny21-Mar-04 6:26 
GeneralRe: pi2[2][3] is faster than pi1[2*4+3] ?? Pin
Tim Smith21-Mar-04 7:16
Tim Smith21-Mar-04 7:16 
GeneralRe: pi2[2][3] is faster than pi1[2*4+3] ?? Pin
Don Clugston21-Mar-04 15:28
Don Clugston21-Mar-04 15:28 
GeneralRe: pi2[2][3] is faster than pi1[2*4+3] ?? Pin
Nitron22-Mar-04 3:16
Nitron22-Mar-04 3:16 
GeneralRe: pi2[2][3] is faster than pi1[2*4+3] ?? Pin
John M. Drescher22-Mar-04 3:58
John M. Drescher22-Mar-04 3:58 
GeneralRe: pi2[2][3] is faster than pi1[2*4+3] ?? Pin
Mo Hossny22-Mar-04 22:58
Mo Hossny22-Mar-04 22:58 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.