FAQs in section [26]:
- [26.1] Can sizeof(char) be 2 on some machines? For example, what about double-byte characters?
- [26.2] What are the units of sizeof?
- [26.3] Whoa, but what about machines or compilers that support multibyte characters. Are you saying that a "character" and a char might be different?!?
- [26.4] But, but, but what about machines where a char has more than 8 bits? Surely you're not saying a C++ byte might have more than 8 bits, are you?!?
- [26.5] Okay, I could imagine a machine with 9-bit bytes. But surely not 16-bit bytes or 32-bit bytes, right?
- [26.6] I'm sooooo confused. Would you please go over the rules about bytes, chars, and characters one more time?
- [26.7] What is a "POD type"?
- [26.8] When initializing non-static data members of built-in / intrinsic / primitive types, should I use the "initialization list" or assignment?
- [26.9] When initializing static data members of built-in / intrinsic / primitive types, should I worry about the "static initialization order fiasco"?
- [26.10] Can I define an operator overload that works with built-in / intrinsic / primitive types?
- [26.11] When I delete an array of some built-in / intrinsic / primitive type, why can't I just say delete a instead of delete[] a?
- [26.12] How can I tell if an integer is a power of two without looping?
[26.1] Can sizeof(char) be 2 on some machines? For example, what about double-byte characters?
No, sizeof(char) is always 1. Always. It is never 2. Never, never, never.
Even if you think of a "character" as a multi-byte thingy, char is not.
sizeof(char) is always exactly 1. No exceptions, ever.
Look, I know this is going to hurt your head, so please, please just
read the next few FAQs in sequence and hopefully the pain will go away by
sometime next week.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.2] What are the units of sizeof?
Bytes.
For example, if sizeof(Fred) is 8, the distance between two Fred objects
in an array of Freds will be exactly 8 bytes.
As another example, this means sizeof(char) is one
byte. That's right: one byte. One, one, one, exactly one byte,
always one byte. Never two bytes. No exceptions.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.3] Whoa, but what about machines or compilers that support multibyte characters. Are you saying that a "character" and a char might be different?!?
Yes that's right: the thing commonly referred to as a "character" might be
different from the thing C++ calls a char.
I'm really sorry if that hurts, but believe me, it's better to get all the
pain over with at once. Take a deep breath and repeat after me: "character
and char might be different." There, doesn't that feel better? No? Well
keep reading it gets worse.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.4] But, but, but what about machines where a char has more than 8 bits? Surely you're not saying a C++ byte might have more than 8 bits, are you?!?
Yep, that's right: a C++ byte might have more than 8 bits.
The C++ language guarantees a byte must always have at least 8 bits.
But there are implementations of C++ that have more than 8 bits per byte.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.5] Okay, I could imagine a machine with 9-bit bytes. But surely not 16-bit bytes or 32-bit bytes, right?
Wrong.
I have heard of one implementation of C++ that has 64-bit "bytes." You read
that right: a byte on that implementation has 64 bits. 64 bits per byte. 64.
As in 8 times 8.
And yes, you're right, combining with the above would
mean that a char on that implementation would have 64 bits.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.6] I'm sooooo confused. Would you please go over the rules about bytes, chars, and characters one more time?
Here are the rules:
- The C++ language gives the programmer the impression that memory is
laid out as a sequence of something C++ calls "bytes."
- Each of these things that the C++ language calls a byte has at
least 8 bits, but might have more than 8 bits.
- The C++ language guarantees that a char* (char pointers) can
address individual bytes.
- The C++ language guarantees there are no bits between two
bytes. This means every bit in memory is part of a byte. If you grind your
way through memory via a char*, you will be able to see every
bit.
- The C++ language guarantees there are no bits that are part of two
distinct bytes. This means a change to one byte will never cause a change
to a different byte.
- The C++ language gives you a way to find out how many bits are in a
byte in your particular implementation: include the header <climits>,
then the actual number of bits per byte will be given by the CHAR_BIT
macro.
Let's work an example to illustrate these rules. The PDP-10 has 36-bit words
with no hardware facility to address anything within one of those words. That
means a pointer can point only at things on a 36-bit boundary: it is not
possible for a pointer to point 8 bits to the right of where some other
pointer points.
One way to abide by all the above rules is for a PDP-10 C++ compiler to define
a "byte" as 36 bits. Another valid approach would be to define a "byte" as 9
bits, and simulate a char* by two words of memory: the first could point to
the 36-bit word, the second could be a bit-offset within that word. In that
case, the C++ compiler would need to add extra instructions when compiling
code using char* pointers. For example, the code generated for *p =
'x' might read the word into a register, then use bit-masks and bit-shifts
to change the appropriate 9-bit byte within that word. An int* could
still be implemented as a single hardware pointer, since C++ allows
sizeof(char*) != sizeof(int*).
Using the same logic, it would also be possible to define a PDP-10 C++ "byte"
as 12-bits or 18-bits. However the above technique wouldn't allow us to
define a PDP-10 C++ "byte" as 8-bits, since 8*4 is 32, meaning every 4th byte
we would skip 4 bits. A more complicated approach could be used for
those 4 bits, e.g., by packing nine bytes (of 8-bits each) into two adjacent
36-bit words. The important point here is that memcpy() has to be
able to see every bit of memory: there can't be any bits between two adjacent
bytes.
Note: one of the popular non-C/C++ approaches on the PDP-10 was to pack 5
bytes (of 7-bits each) into each 36-bit word. However this won't work in C or
C++ since 5*7 = 35, meaning using char*s to walk through memory would "skip"
a bit every fifth byte (and also because C++ requires bytes to have at least 8
bits).
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.7] What is a "POD type"?
A type that consists of nothing but Plain Old
Data.
A POD type is a C++ type that has an equivalent in C, and that uses the same
rules as C uses for initialization, copying, layout, and addressing.
As an example, the C declaration struct Fred x; does not initialize the
members of the Fred variable x. To make this same behavior happen in C++,
Fred would need to not have any constructors. Similarly to make the
C++ version of copying the same as the C version, the C++ Fred must not have
overloaded the assignment operator. To make sure the other rules match, the
C++ version must not have virtual functions, base classes, non-static members
that are private or protected, or a destructor. It can, however, have
static data members, static member functions, and non-static non-virtual
member functions.
The actual definition of a POD type is recursive and gets a little gnarly.
Here's a slightly simplified definition of POD: a POD type's
non-static data members must be public and can be of any of these types:
bool, any numeric type including the various char variants, any
enumeration type, any data-pointer type (that is, any type convertible to
void*), any pointer-to-function type, or any POD type, including arrays of
any of these. Note: data-pointers and pointers-to-function are okay, but
pointers-to-member are not. Also note that
references are not allowed. In addition, a POD type can't have constructors,
virtual functions, base classes, or an overloaded assignment operator.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.8] When initializing non-static data members of built-in / intrinsic / primitive types, should I use the "initialization list" or assignment?
For symmetry, it is usually best to initialize all non-static data members in
the constructor's "initialization list," even those that are of a built-in /
intrinsic / primitive type. The FAQ shows you why and
how.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.9] When initializing static data members of built-in / intrinsic / primitive types, should I worry about the "static initialization order fiasco"?
Yes, if you initialize your built-in / intrinsic / primitive variable by an
expression that the compiler doesn't evaluate solely at compile-time. The FAQ
provides several solutions for
this (subtle!) problem.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.10] Can I define an operator overload that works with built-in / intrinsic / primitive types?
No, the C++ language requires that your operator overloads take at least one
operand of a "class type" or enumeration type. The C++ language will not let you define an
operator all of whose operands / parameters are of primitive types.
For example, you can't define an
operator== that takes two char*s and uses string comparison. That's
good news because if s1 and s2 are of type char*, the
expression s1 == s2 already has a well defined meaning: it compares
the two pointers, not the two strings pointed to by those pointers.
You shouldn't use pointers anyway. Use
std::string instead of char*.
If C++ let you redefine the meaning of operators on built-in types, you
wouldn't ever know what 1 + 1 is: it would depend on which headers got
included and whether one of those headers redefined addition to mean, for
example, subtraction.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.11] When I delete an array of some built-in / intrinsic / primitive type, why can't I just say delete a instead of delete[] a?
Because you can't.
Look, please don't write me an email asking me why C++ is what it is.
It just is. If you really want a rationale, buy Bjarne Stroustrup's excellent
book, "Design and Evolution of C++" (Addison-Wesley publishers). But if your
real goal is to write some code, don't waste too much time figuring out
why C++ has these rules, and instead just abide by its rules.
So here's the rule: if a points to an array of thingies that was
allocated via new T[n], then you must,
must, must delete it via delete[] a. Even if the
elements in the array are built-in types. Even if they're of type char or
int or void*. Even if you don't understand why.
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[26.12] How can I tell if an integer is a power of two without looping?
inline bool isPowerOf2(int i)
{
return i > 0 && (i & (i - 1)) == 0;
}
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
E-mail the author
[ C++ FAQ Lite
| Table of contents
| Subject index
| About the author
| ©
| Download your own copy ]
Revised Mar 1, 2006