Click here to Skip to main content
16,004,505 members
Articles / Programming Languages / C
Tip/Trick

Reading a Whole File at Once (slurp)

Rate me:
Please Sign up or sign in to vote.
1.00/5 (3 votes)
19 Jun 2024CPOL2 min read 2.8K   1   8
These two functions load an entire text and a binary file into memory, avoiding some common pitfalls.

Introduction

Whilst still a beginner-level task, loading an entire file into memory in C is not entirely easy. One problem is you have to grown the buffer dynamically, because some streams are not seekable, and therefore it is impossible to determine how much data is in the file. Then this is a situation where memory allocation failure has to be treated as a real rather then theoretical possibility, as some files can get quite large.

Using the code

Snip and paste these two functions

C
/*
  load a text file into memory

*/
static char *fslurp(FILE *fp)
{
  char *answer;
  char *temp;
  int buffsize = 1024;
  int i = 0;
  int ch;

  answer = malloc(1024);
  if(!answer)
    return 0;
  while( (ch = fgetc(fp)) != EOF )
  {
    if(i == buffsize-2)
    {
      if(buffsize > INT_MAX - 100 - buffsize/10)
      {
    free(answer);
        return 0;
      }
      buffsize = buffsize + 100 * buffsize/10;
      temp = realloc(answer, buffsize);
      if(temp == 0)
      {
        free(answer);
        return 0;
      }
      answer = temp;
    }
    answer[i++] = (char) ch;
  }
  answer[i++] = 0;

  temp = realloc(answer, i);
  if(temp)
    return temp;
  else
    return answer;
}

We have the canonical fgetc() loop. And we grow the buffer when we run out of memory. And we use a temporary pointer in case we run out of memory, to avoid a leak. And we sanity test the input, limiting it to slightly under INT_MAX. This means that a string can be indexed by an int, and that avoids a lot of subtle bugs.

Of course you can disable this test if the strings are encyclopedias or other massive inputs. But you will know when you are dealing with such data. This is the function for general use. We write the function to take a FILE * rather than a path. The most likely error is failure to open the stream with fopen, and user code is best equipped to handle that. And you can write a wrapper to take a filename very easily. And some data comes in streams. So writing the function to take a FILE * is the better approach.

C
    /*
     * Losd a binsry file into memory.
    */
    static unsigned char *fslurpb(FILE *fp, int *len)

{
    unsigned char *answer = 0;
    unsigned char *temp;
    int capacity = 1024;
    int N = 0;
    int ch;
    
    answer = malloc(capacity);
    if (!answer)
        goto out_of_memory;
    while ( (ch = fgetc(fp)) != EOF)
    {
        answer[N++] = ch;
        if (N >= capacity)
        {
            if (capacity > INT_MAX/2)
                goto  out_of_memory;
            temp = realloc(answer, capacity + capacity / 2);
            if (!temp)
                goto out_of_memory;
            answer = temp;
            capacity = capacity + capacity / 2;
        }
      
    }
    *len = N;

    return answer;
    
out_of_memory:
    *len = -1;
    free(answer);
    return 0;
}

This is very similar to the previous function, but it uses a rather more aggressive approach to growing the buffer. And because you cannot obtain the length by looking for a null byte, it has to take a length return parameter. It's hard to know which strategy is best, and often realloc() after the first reallocation, puts the block in a location where it can expand indefinitely and so further reallocations are very cheap.

Don't hesitate to use goto as an out of memory handler. Technically it is not a structured function as a result. But it is neater than using lots of if statements to treat out-of-memory conditions as part of normal control flow.

Points of Interest

You can actually put the reallocation strategy on a noir firm mathematical footing by looking at the statistical distribution of file sizes, which is a log-normal distribution. The means and medians are different for PC, Linux, and Mac, however.

History

I try to keep these functions handy. You never know when you will reach for them.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United Kingdom United Kingdom
I started programming when we were taught Basic on the Commodore Pet at school, and was immediately hooked. But my parents were not generous with money, and it was a while before I saved up enough money to buy a second-hand ZX81. Then a friend gave me "Machine Code on your ZX81" by Toni Baker (not "Tony", a lady), and that book changed my life, because it enabled me to master something that most adults couldn't do. And I realised the power of good textbooks and self study. I have written two books on programming in consequence.

Then I want to Oxford to study English Literature, and programming came to an end, except for a brief course on Snobol, and statistical analysis of grammar words (words like "and" and "he"). But the expected job with the Civil Service did not materialise, I needed to earn a living somehow, and so it was to games programming that I turned. But I was never entirely happy as a game programmer. Whilst I enjoy programming games, I'm not so fond of playing them, except for Dungeons and Dragons style games. And for a games programmer, that's a big handicap.

I've got other interests aside from programming, and after I had collected a big pile of cash from games programming, I decided to spend it on doing a second degree, at Leeds University, in biology. That then led naturally to a PhD in computational biochemistry, working on the protein folding problem, and that turned me into a really good programmer.

However there's only one faculty position for every 10 PhDs, and I was one of the unlucky nine, and so I found a job doing graphics programming. Which I kept until unfortunately ill health forced me to give up work. And I am now a full time hobby programmer.

And my main project is Baby X and its attendant subsystems.


Comments and Discussions

 
GeneralMy vote of 1 Pin
CHill605-Jul-24 4:40
mveCHill605-Jul-24 4:40 
Basically this line - goto out_of_memory;
GeneralMy vote of 1 Pin
Richard MacCutchan24-Jun-24 6:09
mveRichard MacCutchan24-Jun-24 6:09 
GeneralRe: My vote of 1 Pin
Malcolm Arthur McLean 24-Jun-24 6:38
Malcolm Arthur McLean 24-Jun-24 6:38 
GeneralRe: My vote of 1 Pin
Richard MacCutchan24-Jun-24 6:42
mveRichard MacCutchan24-Jun-24 6:42 
Questionanother approach Pin
Michael Sydney Balloni21-Jun-24 18:09
professionalMichael Sydney Balloni21-Jun-24 18:09 
AnswerRe: another approach Pin
Malcolm Arthur McLean 22-Jun-24 6:15
Malcolm Arthur McLean 22-Jun-24 6:15 
GeneralRe: another approach Pin
Michael Sydney Balloni22-Jun-24 7:45
professionalMichael Sydney Balloni22-Jun-24 7:45 
GeneralRe: another approach Pin
Malcolm Arthur McLean 22-Jun-24 7:59
Malcolm Arthur McLean 22-Jun-24 7:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.