Bird Programming Language: Part 1

Dávid Kocsis

4.92/5 (129 votes)

1 Jan 2013GPL312 min read

429.4K

2.7K

A new general purpose language that aims to be fast, high level and simple to use.

Download Bird.zip - 3.8 MB

Articles About Bird

Introduction
Requirements for Running a Program
Creating Programs with Bird
Syntax
History

Introduction

I'm developing this language because I don't find the other languages perfect for everything. C++ is known to be one of the fastest languages, but its syntax (especially header files) and the lack of C# like high level features makes developing slower in my opinion. Debugging can be also hard in C++. But C# is a managed language and it limits low level programming and application performance. 3D graphics is about 1.5-2 times slower in C# than in C++.

I have been working on Bird since March 2010 in C#. It's a strongly typed native language. Its performanceeseems to be competitive with C++ compilers currently and it's going toand it's going to have features from high level languages besides new things. There are many things that I haven't implemented yet, but it can be used for smaller programs. The syntax is similar to C# and C++ with some modification in order to make code smaller and improve readability. I was planning to make a C# parser too, but I stopped working on it for now. I will start working on a new in the future. The libraries are similar to .NET, the basic functions are going to be implemented. So I think it won't be hard to understand.

Requirements for Running a Program

Samples have a C++ equivalent code to compare performance. In order to compile them MinGW, Clang are needed to be installed and set in the PATH variable, but it's optional. Visual C++ compiler usage requires the path to "vcvarsall.bat" in to be set in "Run - VC++.bat" files.

Creating Programs with Bird

The compiler can be run from command line by "Bird.exe" which is in the "Binaries" directory:

Bird.exe -x -nodefaultlib -lBirdCore -entry Namespace.Main Something.bird -out Something.exe

I've made .bat files for the samples, so using command line for them is not needed. The -x means that the compiler should run the output file after it had been compiled. The input files can be Bird files, C, C++ or Object files.

Libraries can be specified by the -l option. Currently the BirdCore and BlitzMax are available that are included by default. The -nodefaultlib disables them. BlitzMax is another programming language, its functions are needed for graphics because I haven't implemented them yet.

Object and archive files also can be the output file to use it in other languages. It can be specified with the -format attribute. These are its possible values:

app	Executable file
arc	Archive file, it doesn't contain the libraries, they need to be linked to the exe.
obj	Object file, only contains the `.bird` files' assembly. The other files and libraries are not included.

Syntax

A Simple Function

using System

void Main()
    Console.Write "Enter a number: "
    var Number = Convert.ToInt32(Console.ReadLine())
    if Number == 0: Console.WriteLine "The number is zero"
    else if Number > 0: Console.WriteLine "The number is positive"
    else Console.WriteLine "The number is negative"
    
    for var i in 1 ... 9
        Console.WriteLine "{0} * {1} = {2}", i, Number, i * Number

    Console.WriteLine "End of the program"

The indication of code blocks are done based on the whitespaces in front of lines. One scope always have the same number of whitespaces. Colon can be used to make the command able to have the inner block in the same line. The compiler needs to know where the previous expression ends. If there's no expression, like the else statement without if, the colon is not needed.

Functions can be called without brackets, if the returned value is not used. In the for loop the var keyword means the type of i, which is the same as the initial value (1 -> int). The three dots means that the first value of i is 1, and it includes the value at the right side, so the last value is 9.

I was thinking about making able to declare variable without type (or the var keyword), but it could lead to bugs if the name of the variable is misspelled.

Literals

Number literals can have different radix and type. $ means hexadecimal, % means binary. Hexadecimal letters have to be uppercase to distinguish them from the type notation, which is the lowercase short form of the type at the end of number:

$FFb        // An unsigned byte
-$1Asb      // A signed byte
%100        // A binary number

Chained Comparison Operators

I think that this could have been implemented in C languages, because in some cases it can be useful. Each sub-expression runs only once, so it's also faster that making two relation connected with and.

bool IsMouseOver(int x, y, w, h)
    return x <= MouseX() < x + w and y <= MouseY() < y + h

The relation operators can only face to one direction to make them distinguishable from generic parameters.

Aliases

It's similar to the using alias directive in C# and the typedef keyword of C++, but in Bird aliases can be created for everything, even for variables. The order of declaration doesn't matter, so it's possible to do this:

alias int32 = int
alias int64 = long
alias varalias = variable

int64 variable

Tuples

Tuple Basics

Tuples are similar to structs but they don't have name. Unlike .NET tuple, Bird tuples are value types. Tuples are a grouping of named or unnamed values that can have different types. For example:

var a = (1, 2)            // Type: (int, int)
var b = (1, 2f)           // Type: (int, float)
var c = ("Something", 10) // Type: (string, int)

In this case the reference of members are done by the index of it (e.g. a_tuple.0), but they can get a name:

alias vec2 = (float x, y)

const var v = (2, 3 to vec2
const var vx = v.x
const var vy = v.y

These variables can be declared as constant because the compiler interprets (2, 3) as a single constant.

Tuples can be also used to swap the values of variables:

C++

a, b = b, a

Tuples as Vectors

Vector operations are based on tuples. The SSE/SSE2 packed instructions will be emitted by tuple operations instead of vectorization. The Cross function can be written as:

alias float3 = (float x, y, z)

float3 Cross(float3 a, b)
    return a.y * b.z - a.z * b.y,
           a.z * b.x - a.x * b.z,
           a.x * b.y - a.y * b.x

Vector function will be defined in the Math class, some of them are already implemented. Without using the float3 type, this is how it can be written using unnamed members:

float, float, float Cross((float, float, float) a, b)
    return a.1 * b.2 - a.2 * b.1,
           a.2 * b.0 - a.0 * b.2,
           a.0 * b.1 - a.1 * b.0

Tuple Extraction

It's possible to extract a tuple in a similar way as swapping variables:

float x, y, z
x, y, z = Cross(a, b)

Or if var is used, it can be written in a single line:

(var x, var y, var z) = Cross(a, b)

The var have to be written before all the variable in order to make the compiler able to decide which is an existing variable. E.g. if (var x, y, z) = ... would be interpreted as three new variable then it wouldn't be possible to refer to an existing y, z.

For Loops

for var i in 0 ... 9
for var i in 0 .. 10

Both loops mean the same. Two dots means that i won't have the value at right, in case of three dots it will have that value.

for var x, y in 0 .. Width, 0 .. Height

This is the same thing as two nested loops. The x goes from 0 to Width-1, the y goes from 0 to Height-1. The loop with y variable is the inner one. The break command exits from both. It can be also written like this:

for var x, y in (0, 0) .. (Width, Height)

If only one number is specified then it will be the initial or the final value of all variables. So this is the same as the previous:

for var x, y in 0 .. (Width, Height)

If there is two point, it's possible to make a single for loop that runs with all points that are in their rect. In this case the x variable goes from P1.0 to P2.0, the y goes from P1.1 to P2.1:

var P1 = (10, 12)
var P2 = (100, 110)

for var x, y in P1 ... P2
    / Something

The step can be used to specify how much the loop variables are increased. It can be both a scalar or a tuple with the same rules. It adds 1 to i and 2 to j at every cycle. The next loop increases i with 1 and j with 2:

for var i, j in 1 .. 20 step (1, 2)

Other Loops

The while, do-while loop is similar to C languages:

var i = 1
while i < 100
    i *= 2

i = 1
do
    i *= 2
while i < 100

I created two new that the code can be written smaller with. The repeat does something as many times as specified in the parameter. the cycle makes an infinite cycle.

Structures

Structures can contain fields, methods, constructors, etc. The new operator, if the type is not specified, it creates an object with the same type as it is converted to. In this program it is the return type. The original is the var type that is always automatically changed to another type:

struct Rect
    public float X, Y, Width, Height

    public Rect(float X, Y, Width, Height)
        this.X = X
        this.Y = Y
        this.Width = Width
        this.Height = Height

    public Rect Copy()
        return new(X, Y, Width, Height)

    public Rect Copy_2()
        return new:
            X = this.X
            Y = this.Y
            Width = this.Width
            Height = this.Height

    public float GetValue(int i)
        switch i
            case 0: return X
            case 1: return Y
            case 2: return Width
            case 3: return Height
            default return 0

I would note that there is never need to use the break command at the end of the case block. But I'm not sure that there is need for the switch statement, I never use it,

if

conditions are much more simple in my opinion, especially in C# where the case block must be leaved with some jumping command.

Strings

The most important .NET functions have been implemented. I haven't made a GC yet, so objects will remain allocated until the application exits. It's not a problem for now.

using System

void Main()
    Console.WriteLine "adfdfgh".PadRight(10) + "Something"
    Console.WriteLine "adfdh".PadRight(10) + "Something"
    Console.WriteLine 
    Console.WriteLine "adfdfgh".Contains("fdf")
    Console.WriteLine "adfdfgh".Contains("fdfh")
    Console.WriteLine 
    Console.WriteLine "adfdfgh".Replace('d', 'f')
    Console.WriteLine "adfdfgh".Replace("d", "ddd")
    Console.WriteLine "adfdfghléáőúó".ToUpper()

Arrays

Reference Typed Arrays

This is how 1D reference array can be declared and initialized:

var Array1D_1 = new int[234]
var Array1D_2 = new[]: 1, 2, 3

var Array1D_3 = new[]:
    1
    2
    3
    
var Array1D_4 = new[]:
    1, 2
    3, 4

The compiler takes into account how many dimension are there before interpreting the initial values. The values can be separated with both brackets and new lines. If it founds one less dimensions than specified, the new lines are dimension separators too. I'm not sure it's good, I may remove it the future because it's a bit ambiguous. But it can be also made with using only brackets.

var Array2D_1 = new[,]: (1, 2), (3, 4)

var Array2D_2 = new[,]:
    1, 2
    3, 4
    
var Array2D_3 = new[,]:
    (1000, 1001, 1002, 1003, 1004, 1005
     1006, 1007, 1008, 1009, 1010, 1011)
	 
    (2000, 2001, 2002, 2003, 2004, 2005
     2006, 2007, 2008, 2009, 2010, 2011)

Fixed Size Arrays

These are value types and stored on the stack. Their type is marked with the size unlike reference arrays (e.g. int[10]). This is how can they be created:

int[5] Arr1 = new
int[5] Arr2 = default

The default keyword is the same as in C#. It's just optional to specify the type if it can be inferred. In this case it is the same as the destination variable. The same thing happens with

new

, it would be new (int[5])(). The new for value types means the same as default. All values in both arrays are initialized to zero. Initial value can be specified as:

var FixedArr1D = [0, 1, 2, 3]      // Type: int[4]
var FixedArr2D = [(0, 1), (2, 3)]  // Type: int[2, 2]
byte[4] FixedArr1D_2 = [0, 1, 2, 3]

The FixedArr1D_2 array can be declared without an error, because the compiler takes the type of the variable into account before evaluating the initial value.
Fixed size arrays can be converted to reference types with an implicit conversion:

double[] Arr = [0, 1, 2]
Func [0, 1, 2, 3]

void Func(double[] Arr)
    // Something
    
long[], byte[] GetArrays()
    return [0, 1, 2], [2, 3, 4, 5]

Pointer and Length

The notation of this kind of array (or rather tuple) is T[*] (T is a arbitrary type), that is actually a short form of

(T*, 
	uint_ptr Length)

. It can be useful for unsafe programming. I created it because I had to write two variables for the same purpose. Both reference type and fixed size arrays can be converted to it implicitly:

using System

void OutputFloats(float[*] Floats)
    for var i in 0 .. Floats.Length
        Console.WriteLine Floats[i]

void Main()
    OutputFloats [0, 1, 2]

    var Floats = Memory.Allocate(sizeof(float) * 3) to float*
    for var i in 0 .. 3: Floats[i] = i + 10.5f
    OutputFloats (Floats, 3)

Parameters with ref, out

Using ref it's possible to use a parameter as input and output. The out can be used for only output, but it makes sure that the variable gets a value:

using System

void OutputFunc(ref int x)
    Console.WriteLine x
    x++

void Func(out int x)
    x = 10

void Main()
    Func out var x
    OutputFunc ref x
    OutputFunc ref x
    OutputFunc ref x

A variable passed with ref must have a value before the function is called, out parameters must be set to a value before leaving the function. These checks can be bypassed with unsafe_ref.

Named and Optional Parameters

Only parameters that have to be specified are that don't have default value:

// The definition of BlitzMax.Graphics
IntPtr Graphics(int Width, Height, Depth = 0, Hertz = 60, Flags = 0)

Graphics 800, 600
Graphics 800, 600, 32

With named parameters, the earlier parameters are not need to be specified:

Graphics Width: 800, Height: 600
Graphics 800, 600, Hertz: 75

Properties and Indexers

They are marked with colon. Properties are handled as variables, when using them the compiler calls the set and get methods. In case of indexer parameters can be specified too:

class Class
    int _Something
    public int Something:
        get return _Something
        set _Something = value
	
    public int AutomaticallyImplementedProperty:
        get
        set
		
    public int this[int Index]:
        get return Index * 2

    public int NamedIndexer[int Index]:
        get return this[Index]
		
void Main()
    Class Obj = new
    Console.WriteLine Obj[3]
    Console.WriteLine Obj.NamedIndexer[4]

Operator Functions

Operators can be defined for structures and classes that wouldn't allow it by default:

class Class
    int _Something
	
    public static void operator ++(Class Obj)
        Obj._Something++
		
void Main()
    Class Obj = new
    Obj++

Getting the Address of a R Value

Sometimes a parameter have to be passed with a pointer to it. In Bird, the address can be queried from constants and R values too, and it automatically copies to a variable:

using System

/* The constructor of Array class:
   public Array(IDENTIFIER_PTR ArrayType, uint_ptr[*] Dimensions,
                uint_ptr ItemSize, void* InitialData = null) */

int[,] CreateIntArray2D(uint_ptr Width, Height)
    var Obj = new Array(id_desc_ptr(int[,]), [Width, Height], 4)
    return reinterpret_cast<int[,]>(Obj)

int[] CreateIntArray1D()
    const uint_ptr Length = 16
    uint_ptr[*] Dimensions = (new: Pointer = &Length, Length = 1)
    var Obj = new Array(id_desc_ptr(int[]), Dimensions, 4)
    return reinterpret_cast<int[]>(Obj)

The type of [Width, Height] expression is uint_ptr[2], so when it casted to

uint_ptr*

the compiler have to query the address. So it creates a new variable that will be assigned to [Width, Height] and it gets the address of this variable. It does the same with &Length in the second function. reinterpret_cast basically does nothing, it just changes the type of an expression node like casting a pointer.

Reference Equality Operator

The === and !== operator can be used to compare the references of objects. It does the same thing as the

Object.ReferenceEquals

. The == can be also used for this, but it can be overwritten with an operator function.

public bool StringReferenceEquals(string A, B)
    return A === B

Higher Order Functions

The type of a function can be marked with ->. At the left side there are the input parameters, at the right side the output parameters. The calling convention and modifiers also can be specified. E.g.

birdcall 
string, object -> int, float

. When there are multiple outputs, the return type becomes a tuple. In the future I plan to allow all functions to have multiple output in a similar way.

C++

using System

int GetInt(int x)
    return x + 1
	
int Test((int -> int) Func)
    return Func(2)

void Main()
    var Time = Environment.TickCount
    var Sum = 0

    for var i in 0 .. 100000000
        Sum += Test(GetInt)

    Time = Environment.TickCount - Time
    Console.WriteLine "Time: " + Time + " ms"
    Console.WriteLine "Sum: " + Sum

This little sample shows how it works. I made it in C# too, and these are the performance result with my machine:

Compiler	Bird	C#
Time	719 ms	2234 ms

Actually it is implemented very simply. Higher order functions are just a tuples of an object and a function pointer

(object Self, void* 
	Pointer)

. The Self member can be null if the function is static. It's possible to create a static function pointer with the

static

keyword: static int -> float. When a nonstatic function is called, the Pointer member is converted to a function pointer. If the Self is not null, it is also added to the parameters. This is how the Test function is extracted:

C++

int Test((int -> int) Func)
    return if Func.Self == null: (Func.Pointer to (static int -> int))(2)
           else (Func.Pointer to (static object, int -> int))(Func.Self, 2)

To make it run faster, the parameter can be replaced to a function pointer. It runs in 542 ms in this way.

C++

int Test((static int -> int) Func)
    return Func(2)

History

1/1/2013: Higher order functions, stack alignment, and many refactoring
10/11/2012: Scope resolution operator, changed casting operator, the to keyword has the same syntax as is and as operator and it doesn't allow ambiguous code.
22/9/2012: Parameter arrays, better x86 performance
18/8/2012: Implemented stackalloc, pointer and length arrays, new and default without specifying the type, static constructors, checked, unchecked, generic parameters with <> (only at reinterpret_cast)
18/7/2012: Object type casting, boxing, unboxing, is as xor operator, low-level reflection, improved x86 code generation
16/6/2012: Exception handling, try-catch-finally, constants also can be declared inside a function
19/5/2012: Arrays, object initializations, address can be taken of r values, ref, out parameters, added Visual C++ compilation of samples
2/5/2012: Improved performance, identifier aliases instead of typedefs, strings, reference equality operator (===, !==), binary files can be linked into the assembly, changed the name from Anonymus to Bird

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)