C-Strings in C++

Strings in C are simply arrays of char values - or in other words, a contiguous block of memory containing chars. C++ inherits that representation, but also provides a safer and easier-to-use option called std::string. In C++, the old C-style strings are often called C-Strings.

Most C++ gurus would advise you to avoid C-Strings and just use std::string. And it is true that std::string is safer and easier to use than C-Strings. Whereas std::string manages memory for you and has a ton of built-in functionality, C-Strings are essentially just blocks of char memory that you must manipulate with error-prone and inconsistent functions.

However, avoiding C-Strings entirely is difficult - sometimes you inherit code that’s using them, sometimes SDKs or libraries require you to use them, sometimes they are the most efficient option.

C-Strings can be confusing to work with. There are a variety of functions used to manipulate C-Strings, but some are deprecated or insecure, some are only available in certain compilers, and some have intricate ins and outs for using them properly.

So, my goal with this post is to catalogue some common operations you’d want to perform on C-Strings and identify the best options available, what to avoid, and what pitfalls exist.

Creating C-Strings

You may think that creating a C-String is very simple, but that isn’t necessarily the case. C-Strings are inherently tied to concepts like memory, pointers, and arrays, so some understanding of those concepts is required.

The simplest option is to create a C-String using a string literal (that is, a sequence of characters in double quotes):

const char* str = "Hello";

String literals are stored in read-only memory at runtime, so you aren’t supposed to modify them. To reflect this, string literals should be stored in variables of type const char*.

const char* str1 = "Hello"; // correct
char* str2 = "Hello"; // this should generate a warning
str2[0] = 'x'; // this will cause a crash or exception

If you want to modify a C-String that is created using a string literal, you need to use slightly different creation syntax that copies the string literal into a writable array:

char str[] = "Hello"; // creates an array and copies literal into it
str[0] = 'x'; // ok to modify our local copy

We did not specify the size of the char array - it was calculated automatically based on the size of the string literal. How big do you think the array is?

The word “Hello” consists of 5 characters. But actually the size of the array is 6! This is because all C-Strings end with a special character called the null-terminator (\0). All C-Strings should end with a null-terminator character, as this is the only way to detect the end of a C-String when you don’t have the length stored elsewhere.

So, the contents of the above array is actually Hello\0.

We can also specify an explicit size:

char str[256]; // can hold 255 chars, plus the null-terminator

The contents of that array are undefined - it is not guaranteed to be filled with a specific value. Depending on your needs, you may want to initialize the contents of the array in some way:

char str[256] = { 0 }; // all elements are zeroed
char str[256] = "Hello"; // first 6 elements contain Hello\0, rest is zeroed out

Arrays and pointers in C++ are closely related, and generally you can treat arrays as pointers and vice-versa. It is very important to understand this, since the majority of C-String operations take in a char* or const char* as an argument. Passing an array is totally fine:

char str[64];
char* strPtr = str; // strPtr points to first element of array
strPtr[0] = 'x'; // we can use array accessors on pointers
printf(str); // can pass an array as a char* argument
printf(strPtr); // or pass the pointer if you prefer

Dynamic Size C-Strings

All the above examples use a maximum array size known at compile time. The size is either a constant, or the size can be derived by the compiler (the compiler can calculate that Hello is 6 characters long, including the null terminator).

If you need a C-String that is variable length, you must allocate the C-String on the heap:

char* AllocCString(int size)
{
    return new char[size];
}

Of course, if the C-String is allocated on the heap, it must be deleted at a later time.

char* cstring = AllocCString(10);
delete[] cstring;

Modifying C-Strings

Once you’ve created a C-String, modifying the contents is not very easy or intuitive. It requires some memory manipulation that is sometimes error-prone or dangerous.

As mentioned earlier, if the C-String points at a string literal in read-only memory, you cannot modify it. Doing so will crash the program. To guard against this, always store these types of C-Strings as const char* (the compiler should warn you about this if you forget).

But even if a C-String is writable, modifying it isn’t immediately clear:

char array[256] = "Hello";
array = "Goodbye"; // Doesn't work!
array[0] = "Goodbye"; // Also doesn't work!

// Works, but tedious
array[0] = 'G';
array[1] = 'o';
array[2] = 'o';
array[3] = 'd';

If you want to modify just a single element of the string, use the array accessors as shown above. But if you want to do more complex manipulation, dedicated functions are preferred.

The standard library provides many different functions to change a C-String’s contents. It can be unclear which option to choose.

strcpy

The most basic option is strcpy, which copies the contents of one C-String to another:

char str[] = "Hello";
strcpy(str, "Ah"); // copies "Ah\0" (the source) into str (the dest)

Note that this simply overwrites existing characters and leaves any extra characters alone. After the above operation, the array contains Ah\0lo\0.

strcpy is a bit dangerous because you might accidentally specify a source string that is longer than the destination buffer size. This can lead to dangerous memory corruption that is hard to detect.

char str[] = "Hello"; // the char array is 6 elements long
strcpy(str, "Bad Idea"); // copy 9 chars into a 6-element array - not good!

This produces no error or warning or exception, and it’ll appear to work fine. However, we just modified memory outside the bounds of C-string array, possibly corrupting unrelated memory. The program may crash or malfunction later, and it’ll be hard to determine that this was the cause.

strncpy

To avoid that problem, you can instead use strncpy:

char str[] = "Hello";
strncpy(str, "Wow", 6); // memory corruption is not possible

The whole point of strncpy is that you specify the max number of chars to write to the destination. In most cases, this should just be the size of the destination array, which can be derived using sizeof:

char str[] = "Hello";
strncpy(str, "Wow", sizeof(str)); // memory corruption is not possible

However, strncpy has one fatal flaw. It will avoid writing too much data to the destination buffer, BUT it will not add the null terminator if the source string is longer than the size specified:

char str[] = "Hello";
strncpy(str, "Bad Idea", sizeof(str)); // no null terminator!

This leaves the contents of str as Bad Id with no null terminator. As a result, it isn’t a valid C-String, and other operations that depend on the null-terminator will probably fail or crash or behave oddly. Dangerous stuff.

So, neither strcpy nor strncpy are ideal. Both are only safe if you KNOW that the source C-String is same size or smaller than the destination C-String. Alternatively, you need to add checks for string length and manually add a null-terminator “just in case.”

But fortunately, there are additional functions that fix these problems AND provide a ton of useful formatting options.

Formatting C-Strings

Sometimes, you want to change the contents of a C-String, but the new contents contain some variable elements. For example, you might want the user to input their name and age and then save a C-String that contains My name is X and I'm Y years old.

Fortunately, the standard library provides sprintf and snprintf to enable this sort of C-String formatting.

sprintf functions by specifying a special format string, which is a string literal, but with dynamic parts replaced by special symbols indicting the type of data to replace them with later:

const char* name = "Clark";
int age = 32;

char str[256];
sprintf(str, "My name is %s and I'm %i years old.", name, age);
// str contains "My name is Clark and I'm 32 years old.\0"

In the format string, anything starting with % is meant to be replaced with a provided variable. The letter after the % indicates the type of variable.

sprintf has the same problem as strcpy - it’s possible for the destination to be too small, causing you to overwrite memory outside the C-String. And like strcpy, this is resolved by using a separate version of the function that takes in the size of the destination buffer.

snprintf does just that - it formats a destination C-String, but also takes in the size of the destination to avoid overflow:

char str[256];
snprintf(str, sizeof(str), "My name is %s and I'm %i years old.", name, age);
// str contains "My name is Clark and I'm 32 years old.\0"

snprintf is probably the safest option for modifying the contents of a string. Not only does it avoid writing outside of the destination buffer, it also always appends a null terminator, meaning that the resulting C-String is always valid. So, regardless of whether the destination buffer is big enough, at least memory won’t be corrupted.

char str[5]; // way too small buffer
snprintf(str, sizeof(str), "My name is %s and I'm %i years old.", name, age);
// str contains "My n\0" - not ideal, but also not dangerous!

snprintf can be used as an alternative to strncpy if you use a very simple format string. For example:

char str[256] = "Hello";
snprintf(str, sizeof(str), "%s", "Goodbye");
// str contains "Goodbye\0"

One other cool use for snprintf is determining the amount of memory you need for a dynamic char array. If you pass in nullptr as the destination buffer, the function will return the number of characters needed for the formatted text. This can then be used to allocate a buffer.

int size = snprintf(nullptr, 0, "Hello %s", "Clark"); // returns 11 (doesn't include null-terminator)
char* str = new char[size + 1]; // need +1 for null terminator
snprintf(str, size + 1, "Hello %s", "Clark"); // actually copy formatted string into str

So, to summarize:

  • snprintf is the safest and most feature-rich way to modify the contents of a C-String. Prefer it over strcpy, strncpy, or sprintf.
  • Because they don’t do any formatting, strcpy or strncpy may be faster if speed is of utmost importance. Just use with caution.
  • If you plan to use these functions frequently, consider using wrappers to hide the boilerplate and gotchas in one place.

Concatenating C-Strings

Unfortunately, you can’t use the + operator to concatenate C-Strings. You can use strcat to concatenate one C-String onto another:

char str[256] = "Hello";
strcat(str, "Goodbye"); // str now contains "HelloGoodbye"

As you might guess, this function has the same buffer overflow issues as strcpy or sprintf. If the C-String is not big enough to hold the original contents plus the appended text, you’ll corrupt your memory.

There is a strncat variant to help alleviate this, but it annoyingly takes the max number of characters to append, rather than the size of the destination buffer:

char str[256] = "Hello";
strncat(str, "Goodbye", sizeof(str) - 6); // Append max of (256 - 6) chars.

In the above example, you can’t simply pass sizeof(str) because that gives 256 and the available space for concatenation is really only 250 due to the existing Hello text and space needed by the null terminator.

You might think we can again turn to snprintf to get a clever solution:

char str[256] = "Hello";
snprintf(str, sizeof(str), "%s%s", str, "Goodbye");

This might work, but it is dangerous to pass the destination buffer as a source object. This seemed to work when I tested it on my compiler, but I would not risk it!

Ultimately, concatenation is surprisingly problematic. It’s important to be absolutely sure you have enough space available at the end of the destination buffer to hold the appended text. Consider creating a wrapper function to hide complexity and gotchas.

C-String Length

Taking the length of a C-String is one of the simplest operations, using strlen:

char str[256] = "Hello";
int length = strlen(str); // would return 5 (null terminator not included in count)

Note that strlen does not include the null-terminator in the count. Though “Hello” would take up 6 chars with the null-terminator, strlen only returns 5.

Taking the length of a C-String is fairly inefficient. The length is not stored, so it must be computed each time the function is called. This is an O(n) operation, since the entire array must be iterated until a null terminator is discovered. To see how this can be problematic in practice, check out this fun case study highlighting how this oversight tanked loading times in GTA V for years.

If you just need to know whether the string is empty or not, it can be a lot more efficient to just check whether the first element is a null terminator or not!

bool CStringEmpty(const char* str)
{
    // Assuming we are passed a valid C-String.
    // If first char is not null terminator, we'll consider this a non-empty string.
    return str[0] != '\0';
}

Comparing C-Strings

You can’t use the == operator to compare the contents of two C-Strings. Instead, you must use strcmp:

if(strcmp(str, "Hello") == 0)
{
    // str is equal to "Hello" 
}
else
{
    // str is not equal to "Hello"
}

strcmp will return zero if the two C-Strings are equal.

To perform an equality check, it does what you’d probably expect: it iterates both strings, comparing the char at each position. If they aren’t the same char, or a null terminator is found in one string but not the other, then they aren’t equal.

You might think you can be clever and do a length check to early out - after all, two strings can’t be equal if they’re different lengths. However, this is actually less efficient because both length check and comparison must iterate the elements of the C-Strings. At least with strcmp it can early out when the values aren’t equal.

If you want to compare only the start of the string (perhaps to check for a prefix), you can use strncmp, which compares only the first n letters of the strings.

One final note: case-insensitive comparisons are unfortunately not part of the standard library. You can use toupper or tolower to do a case-insensitive comparison. I would recommend creating a wrapper/utility function for this.

Conclusion

Even if you try to avoid C-Strings, they are bound to pop up from time to time in just about any application. std::string is built upon C-Strings, so you’re using them whether you like it or not!

Determining the correct functions to use for various C-String operations can be an adventure. There are many non-standard and non-portable functions out there, and there are plenty of Stack Overflow answers recommending that you use this or that helper function that only works on Windows or isn’t part of the standard. Be careful, and always check before using! In my opinion, stick to the core set in the standard, plus your own wrappers, and you’ll do fine.

By the way, notice how many times I mentioned wrapper functions in this post? I can’t stress enough how valuable wrapping this complexity can be:

  1. To use many of these functions safely, you need null checks, length checks, etc. Instead of copy/pasting boilerplate a million times, put it in a single spot.

  2. Does every programmer on your team know the ins and outs of strcpy, strncpy, sprintf, snprintf, etc? Which one will they choose when they need to do string manipulation? To maintain consistency in your codebase, create wrappers and require everyone to use them.

  3. The standard library doesn’t provide everything (case-insensitive searches are a good example). Along with basic wrappers, you can also build your own additional C-String functions using a consistent singular API.

C++ 
comments powered by Disqus