Real mode
Real mode used segmented addresses. Originally, there were two kinds of programs: .COM and .EXE.
COM (command) files run inside 64 KB of memory, which allowed programs written for 8080-compatible CPUs to be easily ported to the 8086. COM files don't care where their 64 KB page of RAM is located; the operating system can reserve 64 KB anywhere it wants. Programmers can use a single 16-bit address to refer to locations within the memory page, and the operating system will keep track of where that 64 KB is located.
Programs that need more than 64 KB of RAM can be compiled as EXE (executable) files. Executable files can access more than 64 KB of memory by using the segmented address. Memory was broken into two 16-bit addresses: one to describe a location within a 64 KB page of memory, and the other to describe how far the beginning of that page was from the beginning of RAM. A brief explanation of memory addresses will make things a bit more clear.
Computers store data in binary because every number can be broken into digits that are either 0 or 1, as opposed to the decimal system where numbers are broken into the ten digits, 0 to 9. Binary allows computers to store data using "states" to represent 0 or 1, and those states can be read very quickly, and changed very easily. For instance, the first programmable computers used switches which the programmer could flick on or off to represent 0 or 1. Floppy drives store data magnetically, using north and south polarity as 0 and 1. CDs store data reflectively, using "lands" and "pits" to either reflect a laser beam or diffuse it to represent 0 or 1. RAM stores data electrically by having transistors that are either charged or not charged to represent 0 or 1. Modern hard drives use spintronics to use the direction of electron spin to represent 0 or 1. Fibre optic cable transmits data photonically, using the presence or absence of light to represent 0 or 1. Early computers used vacuum tubes to hold or not hold a charge to represent 0 or 1. The discovery of semiconductors led to the integrated circuit, which allows transistors to either conduct or not conduct electricity to represent 0 or 1. However it's accomplished, the result of any query is as simple as any answer can ever be: on or off, true or false, 0 or 1.
The switches on switchboard computers were replaced by transistors in integrated circuits, but however it's accomplished, the "switches" are arranged in logic gates, which allows any group of 0s or 1s to pass through a series of transistors, and boolean arithmetic determines what number will come out. Logic gates can be arranged so that they can perform addition and subtraction, perform boolean arithmetic, compare two numbers, shift the digits of a number left or right, or just about anything else. These logic gates represent instructions which, collectively, become the instruction set for any CPU architecture.
Binary is great for computers and their ability to quickly use true or false statements to calculate 0s and 1s. They're not so great for humans, because the numbers become quite large. The largest address that a 16-bit address allows is 1111111111111111 in binary. It's awkward to translate binary into decimal because 2 doesn't go into 10 very well. In decimal, the largest 16-bit number is 65 536. That's why programmers use hexadecimal, which is base 16. Numbers go from 0 to 9, and then from A to F. A single hexadecimal digit can represent four binary digts, known as a nibble. This makes binary numbers four times smaller, and turns 1111 1111 1111 1111 into FFFF.
So, let's get back to addresses. 20 binary digits (bits) can be described in 5 hexadecimal digits, such as FFFFF. The 8080 used 16-bit addresses going from 0000 to FFFF, and they could be ported to COM files and continue to use addresses in that range. EXE files could have just used flat 20-bit addresses that went one nibble higher than the addresses that COM files used, but that would have required a programming workaround that would have slowed programs down. 8086 CPUs retrieve data 16-bits at a time and store the results in 16-bit registers, so Intel decided to use two 16-bit addresses to describe where an address is within a 20-bit address space. A segment address of F000 tells me that the memory location is the C000th byte from the offset address, and an offset address of A000 tells me that the page begins at A000. It's like saying, "I can't tell you how far away Sydney is from London because it's more than 9999 miles away, and I can only use 4 digits. I can tell you that London is 5945 miles from Tokyo, and Tokyo is 4869 miles from Sydney." Not exactly because it wouldn't be a straight line, but you get the idea.
So, if I have an address of C000:A000, I'm saying that the address is C000 bytes from A000. Since we're using a total of 32 bits to describe a 20-bit address, there is significant overlap. I could describe the same memory location as D000:B000, or E000:C000, or F000:D000. I could decrease the offset and describe the same location as B000:9000, or A000:8000, or 9000:7000. In fact, there are 2^16 ways that I could describe the same memory location. The computer can very quickly add the two numbers together to produce the correct address, but having to create addresses by adding two numbers together, and having multiple ways of referring to the same address, made things confusing for programmers. The first headache of the backwards compatible era was born.
Anyway, this was called real mode. Real mode assumes that you have no more than 1 MB of RAM, and you have to use a segmented address to describe memory locations. Operating systems that use real mode also need to be able to handle COM files that will only use a segment address, not an offset address, so the operating system needs to pick an offset address and keep track of it in order to run COM files.
CP/M was ported to the 8086, but some of the command names weren't obvious. The 8086 was too expensive to use in computers that were designed to compete with 6502 and Z80-based processors, which had created an enormous market for home computers. Intel made a cheaper version of the 8086 called the 8088, which operated at 4.77 MHz and reduced the data bus from 16-bits to 8-bits. IBM selected it for its Personal Computer, the first "PC", and a little company called Microsoft developed an operating system known as MS-DOS.
1 MB was a lot of memory at the time, but DOS only reserved 640 KB for programs, and the remaining 384 KB was reserved the BIOS and add-in hardware such as graphics cards. The first 640 KB is known as conventional memory, and the last 384 KB is known as upper memory. It's famously claimed that Bill Gates said, "640K of RAM ought to be enough for anybody." Later versions of DOS could try to load themselves into upper memory to leave more conventional memory free for programs, but eventually 640 KB of RAM wasn't enough any more.
In 1982, Intel released the 80286, which was still backwards compatible with the 8086. The success of the IBM PC ensured that every generation of Intel CPU would have to retain backwards compatibility and, to this day, a Core 2 Quad or a Phenom processor use the same instructions and registers as the 8086, although they have greatly expanded the instruction set.
Operating at 6, 8, and 12.5 MHz, the 286 also significantly increased performance per clock and added the ability to read 24-bit memory addresses, allowing 16 MB of RAM. This, again, created a problem with backwards compatibility with software that used 20-bit segmented addresses. Real mode programs needed a new way to access data beyond the 1 MB barrier, and extended memory was born.
Programs wouldn't normally need more than 640 KB of code, but they might need extra memory to store data, such as graphics, music, and level maps. The Extended Memory Specification (XMS) allowed real mode software to access data -- but not executable code -- from the extended memory space by using a special instruction called an interrupt, which temporarily lets some other program (such as the operating system) run some code before returning control to the program. Programmers didn't need to code extended memory support themselves, because they could call the interrupt and let XMS worry about the technical details.
Other programs, like mouse, display, sound card, printer, and CD-ROM drivers could be run before starting a real mode program and could remain active. These were known as Terminate and Stay Resident programs (TSRs). Real mode programs could access the computer's hardware directly, although they didn't always need to. Programmers didn't need to know how to control a floppy drive in order to write programs that could read and write files, because the operating system already did that. Instead, the programmer would place an interrupt into the program that called the operating system's disk access routines. It could also access the graphics card to take advantage of new graphics standards like MDA, CGA, Hercules, EGA, MCGA, VGA, 8514, and a number of competing Super VGA (SVGA) standards. Standards like AdLib, Gravis Ultrasound, and SoundBlaster brought music, voice, and MIDI instruments to DOS software.
Thanks to generations of backwards compatibility, as well as solutions that allowed greater and greater amounts of memory to be accessed on CPUs that supported larger addresses, real mode software evolved from programs that ran inside 64 KB of RAM, displayed monochrome text, produced a single note at a time using the PC speaker, and used a keyboard for input, to programs that could access 16 MB of RAM, displayed images at 1024×768 with 256 colors or higher, played dozens of notes and voices simultaneously, and could use a mouse or joystick for control. DOS games were sold on everything from 360 KB 5.25" floppy disks to 650 MB CD-ROMs. The base of real mode software was so great that every version of DOS supports real mode exclusively, even when alternatives arrived.
Protected Mode
Real mode programs were designed to run one at a time, with no multi-tasking. XMS only allowed data to be stored beyond the first megabyte of RAM, and could not execute code stored in that space. The 286 tried to resolve both problems by introducing a 24-bit protected mode.
To ensure compatibility, the 286 CPU would enter real mode when it was powered up, and the operating system could set the CPU to protected mode to take advantage of the 286's extra capabilities. Since real mode was the only mode prior to the 286, real mode got its name when the 286 came along.
The idea behind the protected memory model was that the operating system could reserve memory for a specific program so that no other program could access or overwrite that memory space, thus protecting programs from each other. This would allow multi-tasking -- that is, the use of multiple programs at the same time. Protected mode also enabled 24-bit addressing so that all 16 MB of RAM could be accessed by software for whatever purpose they wanted. The problem was that the 286 was still using 16-bit segments, so only 64 KB of RAM could be accessed at a time. It also had to reboot to return to real mode. This was never a popular solution.
In 1986, Intel released the first 32-bit x86 CPU, the 80386. The 386 extended the general purpose registers from 16 to 32-bits, but continued to allow the lower 16-bits to be called by software written for the 8086 through 80286, and the 16-bit registers could still be broken into 8-bit registers for software ported from the 8080. The 386 also added 32-bit memory addresses, allowing it to access 4 GB of memory. The segment sizes were also increased to 32-bit, allowing all 4 GB to be addressed without the need to switch between multiple segments. This was presumed to be enough to future-proof the architecture for a very long time, and it was. As of 2007, very few PCs have more than 4 GB of RAM.
32-bit addresses could only be used in protected mode, and protected mode finally allowed significantly greater memory access than real mode. Computers could not only run multiple programs simultaneously, but they finally had enough memory to actually do it!
The 24-bit, segmented protected mode of the 286 came to be known as standard mode, and the full 32-bit memory model came to be known as protected mode. Both were protected but, in practice, only the 32-bit protected mode of 386 and newer CPUs is referred to as protected mode.
Long Mode
On 22 April 2003, the first 64-bit x86 CPU, the Opteron, was released, followed by the desktop variant, the Athlon 64, on 23 September 2003. Intel later released 64-bit Xeon and Pentium 4 processors in 2004. Known as x86-64 processors, or just x64, these processors extended the general purpose registers to 64-bits, but they are still sub-divided into 32, 16, and 8-bit registers for backwards compatibility. They can also still operate in real or protected mode.
64-bit x86 CPUs can also support 40-bit memory addresses, which allows 1 terabyte of RAM. The architecture also supports expanding memory addresses to 56-bits in the future, to allow up to 4 petabytes of RAM. To access the 64-bit registers and use 40-bit memory addresses, the operating system needs to run in long mode. Long mode is a protected memory model, but differs from the memory model known as protected mode by supporting 64-bit registers and 40-bit memory addresses.
Running DOS games in any operating system