Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
Print
(untagged)

AzharDNA New Bioinformatics Program (Basic Tools for the Analysis of DNA)

0.00/5 (No votes)
18 Jul 2011 1  
Basic tools for the analysis of DNA like transcription and reversion

Introduction

Bioinformatics is an interdisciplinary field joining the field of Biology and Computer Science. Bioinformatics is concerned with the organization and analysis of biological data. For molecular biologists, it is the practice of using software tools to analyze the biological data in order to extract new knowledge, generate new hypothesis, or search for effective molecules. For computer scientists, it is the development of new algorithms and software tools handling biological data. The most striking feature of Bioinformatics that attracts researchers from different fields is the free unlimited access to the molecular data and the software tools.

My current major goal is developing and providing a software running under Windows can provide useful tools for the analysis of Biological data especially DNA sequence data and in the same time can be loaded on personal computer.

Background

In this project, I focused on using C# and SQL database languages in bioinformatics applications and developing bioinformatics algorithms, all C# codes in this research were collected together in a single program called "AzharDNA" which can be loaded on any personal computer prepared by Microsoft Windows, The results of this program will be exported in different data types like images, texts or tables. You must have a good background in many fields like molecular biology, algorithm and mathematics which are linked to Bioinformatics, I had developed simple algorithms for every operation in this program which define how any operation begins and ends and also most algorithms will be demonstrated in this project by using flowchart techniques.

DNA Complementary

DNA is double-stranded molecule each type of base on one strand forms a bond with just one type of base on the other strand according to specific rule called base pair rule. Here purines form hydrogen bonds to pyrimidines this mean that adenine (A) forms a base pair with thymine (T) and guanine (G) forms a base pair with cytosine (C).

This method will take the first strand's sequence and export the second strand sequence or the complementary sequence.

C#
public static string DNA_complementry(string Seq)
        {
            string DNA_Comp = "";
            char[] d = Seq.ToLower().ToCharArray();
            for (int n = 0; n < d.Length; ++n)
            {
                switch (d[n])
                {
                    case ('t'): d[n] = 'a'; break;
                    case ('a'): d[n] = 't'; break;
                    case ('c'): d[n] = 'g'; break;
                    case ('g'): d[n] = 'c'; break;
                }
                DNA_Comp += Convert.ToString(d[n]);
            }
            
            return (string)DNA_Comp;
        }
C#
// or this code which has been suggested by Jaime Olivares 
C#
public static string DNA_complementry(string Seq)
        {
            string ret = Seq.Replace('t', '*');
            ret = ret.Replace('a', 't');
            ret = ret.Replace('*', 'a');

            ret = ret.Replace('c', '*');
            ret = ret.Replace('g', 'c');
            return ret.Replace('*', 'g');
        } 

DNA Transcription (Converting DNA into RNA)

Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes. During transcription, a DNA sequence is read by RNA polymerase, which produces a complementary, ant parallel RNA strand. As opposed to DNA replication, transcription results in an RNA complement that includes uracil (U) in all instances where thymine (T) would have occurred in a DNA complement.

C#
public static string DNA_To_RNA(string Seq)
        {      
            string RNA = Seq.ToLower().Replace('t', 'u');
            return (string)RNA;
        }	

DNA Reverse Transcription (Converting RNA into DNA)

Reverse transcriptase creates single-stranded DNA from an RNA template (link).

C#
public static string RNA_To_DNA(string Seq)
        {   //suggested by Pete O'Hanlon
            DNA = Seq.ToLower().Replace('u', 't');
            return (string)DNA;
        }

RNA Complementary

It's like DNA complementary but there is U here instead of A.

C#
public static string RNA_complementry(string Seq)
        {   string RNA_Comp = Seq.Replace('u', '*');
            RNA_Comp = RNA_Comp.Replace('a', 'u');
            RNA_Comp = RNA_Comp.Replace('*', 'a');

            RNA_Comp = RNA_Comp.Replace('c', '*');
            RNA_Comp = RNA_Comp.Replace('g', 'c');
            return RNA_Comp.Replace('*', 'g');
            return (string)RNA_Comp;        
        }

DNA Reversion

This method returns the reversed sequence:

C#
public static string Reversion(string Seq)
        {
            string Rev_Seq="";
            char[] d = Seq.ToLower().ToCharArray();
            Array.Reverse(d);
            for (int i = 0; i < d.Length; i++)
            {

                Rev_Seq += d[i];
            }

            return (string)Rev_Seq; 
        }

Nucleotide Percentage

This method calculates the percentage of each nucleotide type against the total length of the sequence. The actual percentages vary between species and organism. The specific ratio that you as a human have is part of who you are, though order, of course, also matters.

First, we will count the existence for every nucleotide.

C#
public static void G_C_A_T_Content
	(string Seq, out int A, out int C, out int G, out int T)
        {
            int g = 0;
            int a = 0;
            int c = 0;
            int t = 0;
            for (int i = 0; i < Seq.Length; i++)
            {
                if (Seq[i] == 'a')
                    a++;
                else if (Seq[i] == 't')
                    t++;
                else if (Seq[i] == 'c')
                    c++;
                else if (Seq[i] == 'g')
                    g++;
            }
            G = g;
            C = c;
            T = t;
            A = a;
        }  

Then we will use this method in the percentage method.

C#
public static void Nu_Percentage
	(string Seq,out float Apr,float Cpr,float Tpr,float Gpr)
        {
            float gn = 0;
            float cn = 0;
            float tn = 0;
            float an = 0;
            G_C_A_T_Content(Seq.ToLower(), out an, out cn, out gn, out tn);
            float apr = an / k.Length * 100;
            float tpr = tn / k.Length * 100;
            float gpr = gn / k.Length * 100;
            float cpr = cn / k.Length * 100;
            Gpr = gpr;
            Cpr = cpr;
            Tpr = tpr;
            Apr = apr;
        }

GC & AT Content

In molecular biology, GC-content (or guanine-cytosine content) is the percentage of nitrogenous bases on a DNA molecule that are either guanine or cytosine (from a possibility of four different ones, also including adenine and thymine). This may refer to a specific fragment of DNA or RNA, or that of the whole genome. When it refers to a fragment of the genetic material, it may denote the GC-content of part of a gene (domain), single gene, group of genes (or gene clusters), or even a non-coding region. G (guanine) and C (cytosine) undergo a specific hydrogen bonding, whereas A (adenine) bonds specifically with T (thymine).

In PCR experiments, the GC-content of primers are used to predict their annealing temperature to the template DNA. A higher GC-content level indicates a higher melting temperature.

C#
 public static void GC_AT_Content(string Seq,out int GC_Content,out int AT_Content )
        {
            int gc = 0;
            int at=0;
            for (int s = 0; s < Seq.Length; s++)
            {
                if (Seq[s] == 'C' || Seq[s] == 'G')
                    gc++;
                if(Seq[s]=='A'||Seq[s]=='T')
                    at++;
            }
            GC_Content=gc;
            AT_Content=at;
        }

DNA Molecular Weight or Molecular Mass

The molecular mass (m) of a substance is the mass of one molecule of that substance, in unified atomic mass unit(s) u (equal to 1/12 the mass of one atom of the isotope carbon-12). This is numerically equivalent to the relative molecular mass (Mr) of a molecule, frequently referred to by the term molecular weight, which is the ratio of the mass of that molecule to 1/12 of the mass of carbon-12 and is a dimensionless number. Thus, it is incorrect to express relative molecular mass (molecular weight) in Daltons (Da).

This will be used in PCR Primer design equations and many tools in upcoming articles.

C#
public static double DNA_MW(string Seq)
        {
            int a = 0;
            int c = 0;
            int g = 0;
            int t = 0;
            G_C_A_T_Content(Seq, out a, out c, out g, out t);
            double MW = 329.2 * g + 313.2 * a + 304.2 * t + 289.2 * c;
            return MW;
        } 

DNA Melting Temperature

DNA denaturation, also called DNA melting, is the process by which double-stranded deoxyribonucleic acid unwinds and separates into single-stranded strands through the breaking of hydrogen bonding between the bases.

C#
public static int DNA_Melting_Temp(string Sequence)
        {
            int GC_Content;
            int AT_Content;
            GC_AT_Content(Sequence, out GC_Content, out AT_Content);
            int Melt = 4*GC_Content + 2*AT_Content;
            return Melt;
        }

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here