Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / VB

The use of MS Chart Control and VB.NET for Statistical Purposes: Plotting Kaplan-Meier Estimate.

4.90/5 (15 votes)
9 Nov 2012CPOL4 min read 61.3K  
We used VB.NET and MS Chart in a Web Application to extract the data from a user-uploaded or online database, create life tables using the Kaplan-Meier Formula and plot the survival curve. The whole operation is performed through a user friendly web-based wizard.

Introduction

Since the 17th century, the survival analysis appeared with the beginning of the development of actuarial science and demography. [1] Survival analysis, today being used extensively in clinical research, is not only confined to the analysis of actual survival, measuring the death rates, but has a much broader meaning including any “Event” an Investigator in a clinical trial might want to study. A research paper, published by Kaplan and Meier in 1958 and cited more than 33,000 times [2] presenting their “famous” survival curve estimator, led to a major advance in survival analysis.

 Image 1

Kaplan and Meier proposed a “continuous-time” version of the classical life table, the latter based on division of time into fixed intervals[1]. 

 Image 2

There are a lot of commercially available desktop software, that are used to calculate the Overall or Event-Free Survival and to plot the Kaplan-Meier Estimate; for example IBM SPSS ®  and MedCalc ®. Opensource statistical software like “R” are available, however, “standard” users will still need a specific training to be able to use the software efficiently. Web enabled data collection and analysis interface, though doable, is not an easy task to perform. 

We used VB.Net and MS Chart in a Web Application to Extract the data from a user-uploaded or online database, create life tables using the Kaplan-Meier Formula and plot the survival curve. The whole operation is performed through a user friendly web-based wizard.

Image 3

The wizard starts by asking the user to choose a database table to analyze. Once selected, the user will match the “Date Start”, “Date End”, “Event” and “Basis for Stratification” with fields from his uploaded table.

Image 4 

The curve will then be plotted along with life table. The programmatically generated chart will describe survival data at specified time points through the tooltip property of the MS Chart.

 Image 5 

Our tool provides a means to analyze survival data "real-time", the code can be manipulated to be compliant to the the study protocol statistical consideration section (for example setting the optimum follow up period). the produced life tables can be exported to be studied furthermore. Being a web-based application, no client set-up is needed. In contrast to other statistical software, no additional software is needed to be installed on the server. 

Background 

Let S(t) be the probability that a patient from a given study population will have a lifetime exceeding t [3,4].For a sample of size N from this population, let the observed times until death of the N sample members be

 Image 6

In the following example, ten patients are enrolled in a study, the “Duration” represents [Date End - Date Start] where Date Start is Date of Enrollment of the Study or Clinical Trial. Date End represent the Date of Death or Date of Event if “Death” or “Event” happens, respectively. If the patient is still alive or the specified “Event” did not happen, then Date End is the Last Follow-Up Date.

Image 7  

Corresponding to each ti is ni (the number "at risk" just prior to time ti), and di (the number of deaths or events at time ti ).

 Image 8

To apply this on the previous example, lets set the time points where “Event” = 1

First Time-point @ 3 months 

ni = 10

di = 1 

Second Time-point  @ 11 months

ni = 10 (initial) – 1 (patient died at timepoint 1) – 1 (patient alive at 9 months duration) = 8

di = 2 (two patient died @ duration of 11 months)

Image 9

i.e. The 11 months’ survival for this example = 67.5 % 

Using the code 

The Project consists of two web-pages (Wizard and Plot) in addition to a VB.Net code file (SurvivalCurve.vb) placed in the "App_Code" folder. The following code is documented in-line. 

The Wizard Page 

The following are the ASP.Net controls needed to construct the wizard 

ASP.NET
<asp:Label ID="Label1" runat="server" Text="Label"> </asp:Label> 
<asp:Label ID="Label2" runat="server" Text="Label"> </asp:Label> 
<asp:DropDownList ID="myList" runat="server"> </asp:DropDownList>
<asp:Button ID="Button3" runat="server" Text="Select Table" />
<asp:TextBox ID="choosetbl" runat="server">Table / Query Name</asp:TextBox>
<asp:Button ID="Button2" runat="server" Text="Cancel" />
<asp:Button ID="Button1" runat="server" Text="Next" /> 

 And the following is the code-behind file 

VB.NET
 Imports System.Data
Imports System.Data.OleDb
Partial Class DynamicKaplan
    Inherits System.Web.UI.Page
    Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        If Session("ChosenTable") = "" Then InitMyDB()
    End Sub
    Private Sub InitMyDB()
        Dim myDB As New OleDbConnection
        Try
            myDB = New OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + Session("CurrentDB") + ";Persist Security Info=True")
            myDB.Open()
        Catch
            Threading.Thread.Sleep(1)
            Response.Redirect("Default.aspx")
        End Try
        'Retrieve the name of the current database
        Dim tmpString = myDB.DataSource.ToString
        Dim tmpInd = tmpString.LastIndexOf("|")
        tmpString = Right(tmpString, Len(tmpString) - (tmpInd + 1))
        Label1.Text = "The currently used Database is " & tmpString
        Label2.Text = "Please select a table or query from the database to estimate Kaplan Meier curves from: "
        'Retrieve the tables & queries schema
        Dim mytables = myDB.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, New Object() {})
        myDB.Close()
        'Iterate through the schema & fill the names to the dynamically created list
        '  myList.Items.Clear()
        
        For i = 1 To mytables.Rows.Count
            'to clean the list from annoying system tables
            If mytables.Rows(i - 1).Item(3).ToString = "VIEW" Then
                'Item(2 is the third column containing table name from the schema)
                myList.Items.Add(mytables.Rows(i - 1).Item(2).ToString)
            End If
        Next i
    End Sub
    Protected Sub Button1_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles Button1.Click
        If Session("ChosenTable") = "" Then
            'this is the first time click
            Session("ChosenTable") = choosetbl.Text
            Session("CurrentLoadState") = "Tables"
        End If
        LoadCases()
    End Sub
    Private Sub LoadCases()

        Dim CurrentLoadStateEx = Session("CurrentLoadState")
        Select Case CurrentLoadStateEx
            Case "Tables"
                'The user has pressed the "Next" Button for the first time, and chosen a table. It's time to fill the list with the available columns in the chosen table.
                InitSurvivalStatus()
                Session("CurrentLoadState") = "SurvivalStatus"
            Case "SurvivalStatus"
                Session("SurvivalStatusColumn") = myList.SelectedItem.Text
                Label1.Text = Label1.Text & ", 'EVENT STATUS' Column is '" & Session("SurvivalStatusColumn") & "' "
                Label2.Text = "Please select the field of 'REGISTRATION DATE' (The date of first contact) : "
                myList.Items.Remove(myList.SelectedItem)
                Session("CurrentLoadState") = "RegDate"
            Case "RegDate"
                Session("RegDateColumn") = myList.SelectedItem.Text
                Label1.Text = Label1.Text & "& the 'REGISTRATION DATE' Column is '" & Session("RegDateColumn") & "' "
                Label2.Text = "Please select the field of 'LAST CONTACT DATE' : "
                myList.Items.Remove(myList.SelectedItem)
                Session("CurrentLoadState") = "LastContact"
            Case "LastContact"
                Session("LastContactColumn") = myList.SelectedItem.Text
                Label1.Text = Label1.Text & "& the 'LAST CONTACT DATE' Column is '" & Session("LastContactColumn") & "'"
                Label2.Text = "Please select the field of 'STRATIFICATION' : "
                myList.Items.Remove(myList.SelectedItem)
                myList.Items.Insert(0, "None")
                Session("CurrentLoadState") = "Strata"
            Case "Strata"
                Session("Strata") = myList.SelectedItem.Text
                If Session("Strata") <> "None" Then
                    CheckStrata()
                Else
                    Session("StrataCount") = 1
                End If
                Response.Redirect("KaplanAccess.aspx")
        End Select
    End Sub
    Private Sub InitSurvivalStatus()
        Dim myDB = New OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & Session("CurrentDB") & ";Persist Security Info=True")
        myDB.Open()
        Label1.Text = "The currently chosen table or view is '" & Session("ChosenTable") & "'"
        Label2.Text = "Please select the field of 'SURVIVAL STATUS' where 1 means occurence of the event while 0 means no event : "
        'Retrieve the tables & queries schema
        Dim SqlString = "SELECT * FROM [" & Session("ChosenTable") & "];"
        Dim myDBCommand = New OleDbCommand(SqlString, myDB)
        Dim myReader As IDataReader = myDBCommand.ExecuteReader()
        Dim myColumns = myReader.GetSchemaTable
        myDB.Close()
        myList.Items.Clear()
        For I = 1 To myColumns.Rows.Count
            myList.Items.Add(myColumns.Rows(I - 1).Item(0).ToString)
        Next I
    End Sub
    Private Sub CheckStrata()
        Dim myDB = New OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + Session("CurrentDB") + ";Persist Security Info=True")
        myDB.Open()
        Dim SqlString = "SELECT DISTINCT [" + Session("Strata") + "] FROM [" + Session("ChosenTable") + "] WHERE ([" + Session("Strata") + "] Is Not Null);"
        Dim myDBCommand = New OleDbCommand(SqlString, myDB)
        Dim myReader As IDataReader = myDBCommand.ExecuteReader()
        Dim StrataCount As Integer
        Dim StrataArray As New ArrayList
        Do While myReader.Read
            StrataArray.Add(myReader.GetValue(0))
            StrataCount = StrataCount + 1
        Loop
        Session("StrataCount") = StrataCount
        Session("StrataArray") = StrataArray
    End Sub
    Protected Sub Button2_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles Button2.Click
        Response.Redirect("default.aspx")
    End Sub
    Protected Sub Button3_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles Button3.Click
        choosetbl.Text = myList.SelectedItem.Text
    End Sub
<pre lang="vb.net">End Class  

 

 The Plotting Page 

Here we add MS Chart as follows

ASP.NET
<asp:Chart ID="Chart1" runat="server" Height="600px" Width="742px">
<pre lang="aspnet">     <ChartAreas>
            <asp:ChartArea Name="ChartArea1">
             </asp:ChartArea>
     </ChartAreas>
</asp:Chart> 

 and the code behind file

VB.NET
 Imports System.Web.UI.DataVisualization.Charting
Imports Microsoft.VisualBasic
Imports System.Data
Imports System.Data.OleDb
Imports System.Math

Partial Class KaplanAccess
    Inherits System.Web.UI.Page

    Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load

        LoadKaplanData()
        'Chart visual settings
        Chart1.ChartAreas(0).AxisY.Maximum = 1
        Chart1.ChartAreas(0).AxisY.Interval = 0.1
        Chart1.ChartAreas(0).AxisX.MajorGrid.Enabled = False
        Chart1.ChartAreas(0).AxisY.MajorGrid.Enabled = False
        Chart1.ChartAreas(0).AxisX.Title = "Duration of Survival in months"
        Chart1.ChartAreas(0).AxisY.Title = "Cumulative Survival Propabaility"
        Dim myTitle As New Title
       
        myTitle.Text = "Kaplan Meier Survival Curve(s) for total of " & Session("TotalCasesCount").ToString & " cases"
        Chart1.Titles.Add(myTitle)

    End Sub
    Private Sub LoadKaplanData()
        'Get all settings from the session
        Dim CurrentDB = Session("CurrentDB")
        Dim ChosenTable = Session("ChosenTable")
        Dim SurvivalStatusColumn = Session("SurvivalStatusColumn")
        Dim RegDateColumn = Session("RegDateColumn")
        Dim LastContactColumn = Session("LastContactColumn")
        Dim StrataColumn = Session("Strata")
        Dim StrataCount = CInt(Session("StrataCount"))
        Dim StrataArray = CType(Session("StrataArray"), ArrayList)
        'Open the Database
        Dim myDB = New OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & CurrentDB & ";Persist Security Info=True")
        myDB.Open()
        'Base of the SQL String, Note that you can manipulate the DateDiff to add the optimum followup period recommended by the study protocol statistical consideration
        Dim SQLString = "SELECT [" & RegDateColumn & "], [" & LastContactColumn & "], [" & SurvivalStatusColumn & "] , DateDiff('m',[" & RegDateColumn & "],[" & LastContactColumn & "]) AS survivalduration" _
                        & " FROM([" & ChosenTable _
                        & "]) WHERE ((([" & RegDateColumn & "]) Is Not Null) AND (([" & LastContactColumn & "]) Is Not Null) AND (([" & SurvivalStatusColumn & "]) Is Not Null)) and ([" & RegDateColumn & "] < [" & LastContactColumn & "])"
        'Do this as many times as there is strata
        For I = 1 To StrataCount
            Dim FinalSQLString, SQLString2, Stratum As String
            SQLString2 = SQLString
            Stratum = "All cases"
            'Append strata selection criteria if there is any strata
            If StrataCount > 1 Then
                If IsNumeric(StrataArray.Item(I - 1)) Then
                    SQLString2 = SQLString & "and ([" & StrataColumn & "]=" & StrataArray.Item(I - 1).ToString & ")"
                Else
                    SQLString2 = SQLString & "and ([" & StrataColumn & "]='" & StrataArray.Item(I - 1).ToString & "')"
                End If
                Stratum = StrataArray.Item(I - 1).ToString
            End If
            'Formulate the final SQL String
            FinalSQLString = SQLString2 & "ORDER BY DateDiff('m',[" & RegDateColumn & "],[" & LastContactColumn & "]), " & SurvivalStatusColumn & " DESC;"
            Dim myDBCommand = New OleDbCommand(FinalSQLString, myDB)
            'Generate a temporary reader to get the number of cases
            Dim myReader As IDataReader = myDBCommand.ExecuteReader()
            Dim CasesCount As Integer = 0
            Do While myReader.Read
                CasesCount = CasesCount + 1
            Loop
            myReader.Close()
            'Generate the main reader
            myReader = myDBCommand.ExecuteReader()
            'Call the function which performs calculations
            Session("TotalCasesCount") = CInt(Session("TotalCasesCount")) + Kaplan.KaplanTables(Me, myReader, CasesCount, Stratum, Chart1)
            myReader.Close()
        Next I
        myDB.Close()
    End Sub
End Class  

 

 SurvivalCurve.vb 

VB.NET
Imports Microsoft.VisualBasic
Imports System.Web.UI.DataVisualization.Charting
Imports System.Data
Imports System.Data.OleDb
Imports System.Math
Public Class Kaplan
    Public Shared Function KaplanTables(ByRef myPage As Page, ByRef myReader As IDataReader, ByVal CasesCount As Integer, ByVal Stratum As String, ByRef myChart As Chart) As Int32
        'This function will take two variables, an OLE Database Connection with a table named "Query1", _
        'This query should contain the Survival status as the third column, and the survival duration as the fourth colum,
        'the other variable passed to this function should be a gridview to produce the lifetables in.
        'The function will bind the generated lifetables to the gridview & return the total number of processed cases as Int32


        'Start generating the life tables
        Dim myTable As New DataTable
        myTable.Columns.Add("Survial Status")
        myTable.Columns.Add("Remaining Patients")
        myTable.Columns.Add("Survial Duration")
        myTable.Columns.Add("Survial Propability")
        myTable.Columns.Add("Cumulative Survial Propability")
        Dim myFirstRow As DataRow = myTable.NewRow
        myFirstRow.Item(0) = 1
        myFirstRow.Item(1) = CasesCount
        myFirstRow.Item(2) = 0
        myFirstRow.Item(3) = 1
        myFirstRow.Item(4) = 1
        myTable.Rows.Add(myFirstRow)
        Dim Ptnseq = CasesCount
        For I = 1 To CasesCount
            Dim myRow As DataRow = myTable.NewRow
            'Get only one record  from KaplanTable
            Dim Kaplantmp = myReader.Read
            Ptnseq = Ptnseq - 1
            myRow.Item(0) = myReader.GetValue(2)
            myRow.Item(1) = Ptnseq 'Sets the total number of remaining patients
            myRow.Item(2) = myReader.GetValue(3)
            If myRow.Item(0) = 0 Then
                myRow.Item(3) = myTable.Rows(I - 1).Item(3)
                myRow.Item(4) = myTable.Rows(I - 1).Item(4)
            ElseIf myRow.Item(0) = 1 Then
                myRow.Item(3) = myRow.Item(1) / myTable.Rows(I - 1).Item(1)
                myRow.Item(4) = myRow.Item(3) * myTable.Rows(I - 1).Item(4)
            End If
            myTable.Rows.Add(myRow)
        Next I
        'Finished generating the lifetables, bind it to a grid
        Dim myGrid As New GridView 'Create a new dynamic Grid
        Dim myLabel As New Label 'Create a new dynamic label for this grid
        myPage.Form.Controls.Add(myLabel) 'add the label, then
        myPage.Form.Controls.Add(myGrid) 'add the grid
        myGrid.DataSource = myTable 'Bind the grid to the calculated lifetables
        myGrid.DataBind()
        DrawKaplanCurve(myTable, myChart, Stratum)
        myLabel.Text = "Current Stratum is: " & Stratum & "<br/>" & "Total Number of cases is: " & (myTable.Rows.Count - 1).ToString & " Cases" '(if  strata)
        Return myTable.Rows.Count - 1
    End Function
    Public Shared Sub DrawKaplanCurve(ByVal myTable As DataTable, ByVal myChart As Chart, ByVal Stratum As String)
        Dim KaplanSeries As New Series
        KaplanSeries.ChartType = SeriesChartType.StepLine
        KaplanSeries.Name = Stratum
        Dim CensoredSeries As New Series
        CensoredSeries.ChartType = SeriesChartType.Stock
        CensoredSeries.Name = "Censored " & Stratum
        For I = 1 To myTable.Rows.Count - 1
            Dim myPoint As New DataPoint
            Dim xval As Double = myTable.Rows(I).Item(2)
            Dim yval As Double = myTable.Rows(I).Item(4)
            myPoint.SetValueXY(xval, yval)

            ' If alive case, then add to censored data
            If myTable.Rows(I).Item(0) = 0 Then
                Dim CensoredPoint As New DataPoint
                CensoredPoint.SetValueXY(myPoint.XValue, yval - 0.01, yval + 0.01)
                CensoredPoint.ToolTip = "Censored Case Number " & myTable.Rows(I).Item(1).ToString & vbNewLine & "Survival Duration = " & myTable.Rows(I).Item(2).ToString & " months" & vbNewLine & "Cumulative Survival Propability = " & Round(yval * 100, 2).ToString & "%"
                CensoredPoint.Color = myPoint.Color
                If I <> myTable.Rows.Count - 1 Then CensoredSeries.Points.Add(CensoredPoint) 'add all except the last point because it shouldn't be censored
            End If
            
            If I = myTable.Rows.Count - 1 Then myPoint.Label = Round(yval * 100, 2).ToString & "%"
            KaplanSeries.Points.Add(myPoint)
        Next
        myChart.Series.Add(KaplanSeries)
        myChart.Series.Add(CensoredSeries)
       
        Dim myLegend As New Legend
        myLegend.TitleForeColor = myChart.Series(myChart.Series.Count - 1).Color
        myChart.Legends.Add(myLegend)
    End Sub
End Class 

References 

1. History of applications of martingales in survival analysis. Odd O.AALEN, PerKragh ANDERSEN , Ørnulf BORGAN, Richard D. GILL and Niels KEIDING. June 2009, Electronic Journal for History of Propability and Statistics, Vol. 5.

2. Non-parametric estimation from incomplete observations. Kaplan, E. L. and Meier. 1958, Journal of the American Statistical Association, pp. P.53, 457–481 and 562–563.

3. community, wiki. Kaplan–Meier estimator. wikipedia.org. [Online] 2012. 

4. Dunn, Steve. Survival Curves: Accrual and The Kaplan-Meier Estimate. cancerguide.org. [Online] 2002. http://www.cancerguide.org/scurve_km.html.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)