Archive for the 'Math' Category

Integer overflows

Wednesday, November 1st, 2006

"What is a string library? It's a way to pretend that computers can manipulate strings just as easily as they can manipulate numbers."

-- Joel Spolsky, The Law of Leaky Abstractions.

Most C++ code uses the integer mod 232 (or 264) type C++ calls "int" as if they were integers. This is great for performance -- many operations on int32 are a single CPU instruction -- but dangerous for security and correctness when the numbers can be large. This can cause security holes in at least two ways.

First, code might use int32 arithmetic to decide how much memory to allocate. Consider an image decoder that allocates width * height * 4 bytes to store RGBA pixels and then decodes the image data into the structure. But since width and height are unsigned ints, it doesn't really allocate width*height*4 bytes; it allocates width*height*4 mod 232 bytes. If the integer used to decide how much memory to allocate has overflowed in such a way that it comes out as a small integer, the code is likely to overflow the buffer as it writes the decoded image into the structure.

Second, code might use int32 arithmetic to decide when to deallocate an object. In code that uses reference counting, an extra call to "release" can obviously lead to a dangling pointer situation. But thanks to integer overflows, 232 unbalanced calls to "addref" followed by a normal "release" can have the same effect. (Luckily, you can't cause this situation by merely making 232 objects point to a specific object, because you'd run out of memory first. So this could be addressed by auditing for addref-without-release leak bugs rather than modifying the addref function to make it safer.)

Explicit checks

Some code in Gecko has explicit checks to prevent overflows. (This must be done carefully -- "width*height*4 > 232" doesn't mean anything to a C++ compiler!) If you remember to think "integer mod 232 or 264" every time you see "int", you may be able to avoid introducing new security holes due to integer overflow when you write code.

Michael Howard at Microsoft advocates this approach, at least for C code that is used near things like allocation sizes and reference counts, and provides functions to do checked arithmetic operations. These functions return a boolean indicating whether the arithmetic operation succeeded. This leads to code where it is hard to see what calculation is being done but easy to see that each step of the calculation is done safely.

Safe integer classes

Another strategy is to avoid using "int", at least in code used to compute allocation sizes, and instead use a "safe" integer class. A safe class might do correct arithmetic on large numbers, allocating extra memory when needed, but perhaps that is overkill for keeping allocations safe. A proponent of this approach might say "int is the new char *", referring to how string buffer overflows have been nearly eliminated through the use of string classes, and make fun of Joel Spolsky for the quote at the beginning of this post.

David LeBlanc, also at Microsoft, advocates a slightly different approach: using a class that treats overflow as an error and can throw exceptions. This keeps arithmetic formulas readable at the expense of having to design the function to handle exceptions correctly.

Static analysis can be used to scan for calls to malloc that use "int" and need to be converted to using SafeInt.

Will Gecko soon have a multitude of integer classes, each with different performance characteristics, signedness, and overflow behavior? Probably not, because numbers used to decide how much memory to allocate are almost always unsigned integers where overflows can be treated as errors. But I wouldn't be surprised to see different parts of the code use different strategies, with C code using the "explicit checks with helpers" strategy and XPCOM C++ code using another strategy.

Other languages

Many languages share C++'s behavior of exposing "integer mod 232" types as "int", but JavaScript and Python are two major exceptions. JavaScript has a hybrid "number" type that is sometimes stored as an integer and sometimes stored as a floating-point number. Overflowing integer arithmetic turns your numbers into floating-point numbers, while treating a floating-point number as a bit field tries to turn it back into an integer by computing its value mod 232. While JavaScript's behavior is more useful in most situations than wrapping around, you wouldn't want to use it for memory allocation.

Python instead takes advantage of its dynamic type system to make integers safe. Overflowed integers are replaced with a "long integer" type that is slower to operate on but has safe, correct behavior for integers of any size (until you run out of memory).

Fifty million Firefox downloads

Friday, April 29th, 2005

Firefox 1.0.x reached approximately 50,000,000 downloads today. That's an average of almost 300,000 downloads per day since Firefox was released on November 9, 2004. Spread Firefox is celebrating with a photo contest, which already has several winners. It's great to be involved in such a successful project.

I don't have anything against the convention of celebrating numbers that have a lot of zeros at the end when written in base 10, but I'd like to make fun of it by pointing out that Firefox is likely to reach 0x3000000 (50,331,648) downloads within a week.

How to find an octahedron

Saturday, May 1st, 2004

I was working on my last Abstract Algebra homework problem: "In how many ways can you color the faces of an octahedron with n colors?" Solving the problem required figuring out all the ways you can rotate an octahedron onto itself, which I was having trouble doing in my head. I asked Selene for an octahedron, not expecting her to have one.

JesseRud: do you have an octahedron i can borrow?
AshLykos: you could go to random.org and look at the first 3 bits of a binary result
JesseRud: what does that have to do with anything?
JesseRud: oh

She had a d8 and let me borrow it after I explained why I needed a physical octahedron. I never expected having friends who role-play to come in handy.

Bad abstract algebra jokes

Thursday, April 22nd, 2004
  • A carpool is a group of people who commute with each other. Therefore, a carpool is an abelian group.
  • If 1=0 in the zero ring, why don't they call it The One Ring instead?

Yes, I came up with them.

Honest prof

Tuesday, February 24th, 2004

The Abstract Algebra homework is due next Friday evening. Jamie wouldn't say exactly when he would pick it up, but he did give us a probability distribution:

Everything is a set

Monday, December 22nd, 2003

I think I'm taking Zermelo's idea that "everything is a set" too seriously.

I saw a freeway billboard advertising a SUV or small truck, saying "Category of One". But the Pairing Axiom tell us that for any x, the set of x (written {x}) exists.

My family uses the following strategy to light the Hanukkah menorah: light the Shamash, use the Shamash to melt the wax at the bottom of the other candles and light the other candles, use one of the other candles to melt the wax at the bottom of the Shamash, put the Shamash in place. During Set Theory lectures, I always pointed out when the prof implicitly assumed that a set was nonempty in his proofs. I told my family that our strategy would not work on the 0th night of Hanukkah. (Question for any other Jews reading my blog: Is using another Hanukkah candle to melt the wax of the Smamash disallowed?)

I'm going to take Abstract Algebra I next semester at Pomona College. Groups, rings, and fields... this could be dangerous.