MASM Basics
Last updated
Last updated
Optional leading +
or -
sign.
Binary, Decimal, Hexadecimal.
Common radix characters:
h
=> hexadecimal (Use as much as possible)
d
=> Decimal (When hex makes no sense)
b
=> Binary (For bitwise clarity)
r
=> Encoded real (Real) Examples: 30d
, 6Ah
, 42
, 1101b
NOTE: Hexadecimal can't begin with a letter. => 0A2h
As any programming language
Enclose character in single or double quotes.
'a'
, "a"
=> (ASCII char == 1 byte).
Enclose strings in single or double quotes.
"Hello"
, 'Hello'
=> (Each ASCII char == 1 byte).
Embedded quotes are allowed
'Say "Abuqasem" is learning'
"This isn't a test"
NOTE: Each string must end with a '0' to tell the function to print until the zero. => "Hello world",0 (old assemblers used '$' instead of zero)
Reserved words cannot be used as identifiers.
Instruction mnemonics, directives, type attributes, operators, predefined symbols.
Identifiers
1-247 chars, including digits.
Not case sensitive.
First character must be a letter. => (_
, @
, ?
, $
)
Used for labels (Procedure names, variables, constants).
Instructions on how to assemble (Not at runtime).
Commands that are recognized ad acted upon by the assembler.
Not part of the intel instruction set.
Used to declare code , data areas, select memory model, declare procedures, variables etc..
Not case sensitive (.data,.DATA,.DAta).
Different assemblers have different directives
GNU , netwide are not the same as MASM.
One important function of assembler directives is to define program sections, or segments
.data (define variables)
.code (write code)
.stack 100h
Act as a place markers
Marks the address (offset) of code ad data.
Follow identifier rules
Data label (Variable names)
Must be unique.
count DWORD 100
=> (not followed by a colon)
Code label
Target of jump and loop instructions.
L1:
=> (Followed by a colon)
No operands
stc
=> (set carry flag)
One operand
inc eax
=> (register)
inc myByte
=> (memory)
Two operands
add ebx,ecx
=> (register, register)
sub myByte,25
=> (memory, constant)
add eax,36 * 25
=> (register, const-expr)
No Operation.
Uses 1 byte of storage.
CPU: Reads it, Decodes it, ignores it.
Used to allign code to even-address boundaries (multiple of 4):
x86 processors are designed to load code and data more quickly from even-doubleword addresses.
Assembled into machine code by assembler and executed at runtime by CPU.
An instruction contains:
Label => (Optional)
Mnemonic => (Required)
Operands => (Depends on the instruction)
Comment => (Optional) begins with a ';'
BYTE
, SBYTE
: 8-bit unsigned & signed integers.
WORD
, SWORD
: 16-bit unsigned & signed integers.
DWORD
, SDWORD
: 32-bit unsigned & signed integers.
QWORD
: 64-bit integer. => (Not signed/unsigned)
TBYTE
: 80-bit integer. => (ten byte)
REAL4
, REAL8
: 4-byte & 8-byte long reals.
REAL10
: 10-byte IEEE extended real.
DB
: 8-bit integer.
DW
: 6-bit integer.
DD
: 32-bit integer or real.
DQ
: 64-bit integer or real.
DT
: 80-bit integer. => (ten bytes).
A data definition statement sets aside storage in memory for a variable.
May optionally assign a name (label) to the data.
Syntax ![[variable statement.png]]
Use the ?
symbol for undefined variables.
All initializers become binary data in memory.
Each of the following defines a single byte of storage.
Value1 BYTE 'A'
=> character constant.
Value2 BYTE 0
=> smallest unsigned byte.
Value3 BYTE 255
=> largest unsigned byte.
Value4 SBYTE -128
=> smallest signed byte.
Value5 SBYTE +127
=> largest signed byte.
value6 BYTE ?
=> uninitialized byte.
The optional name is label marking the variable's offset from the beginning of it's enclosing segment.
If value1 is located at offset 0000
in the data segment and consumes 1 byte of storage, value2 is automatically located at offset 0001
If you declare a SBYTE
variable, the microsoft debugger will automatically display it's value in decimal with a leading sign.
![[offset.png]]
An array is simply a set of sequential memory locations.
The directive (BYTE) indicates the offset needed to get to the next array element.
No length, no termination flag, no special properties.
A string is implemented as a sequence of characters.
For convenience, it's usually enclosed in quotation marks.
It's usually null terminated.
Characters are bytes.
Hex characters 0Dh
(CR) and 0Ah
(LF) are useful.
Example ![[define strings.png]]
Data Transfer Instructions
Operand types
MOV
, MOVZX
, MOVSX
LAHF
, SAHF
XCHG
Addition and Subtraction
INC
, DEC
ADD
, SUB
NEG
Data-related operators and directives
Indirect addressing
JMP
, LOOP
Immediate (constant integer(8,16,32 bits))
Register (the name of register)
Memory (reference to location in memory)
Memory address is encoded with the instruction, or a register holds the address of a memory location
Move from source to destination
MOV destination, source
Both operands must be the same size.
No more than one memory operand permitted.
CS
, EIP
, IP
cannot be the destination.
No immediate to segment registers moves.
To MOV
memory to memory.
Direct memory operands
When you copy a smaller value into a larger destination, the MOVZX
instruction fills (extends) the upper half of the destination with zeros. ![[zeroext.png]]
The MOVSX
instruction fills the upper half of the destination with a copy of the source operand's sign bit. ![[signext.png]]
XCHG exchanges the values of two operands.
At least one operand must be a register.
No immediate operands are permitted.
There is no "range checking" - the address is calculated and used.
Size of transfer is based on the destination.
Write a program that adds the following three bytes:
Add/Subtract 1 from operand (register/memory)
INC
destination => (e.g destination++)
DEC
destination => (e.g destination--)
ADD
destination, source
SUB
destination, source
NOTE: Same operand rules as for the MOV instructions.
Reverses the sign of an operand in a register/memory location (2nd complement).
The ALIGN
directive aligns a variable on a byte, word, doubleword, or a paragraph boundary:
Overrides the default type of a label (Variable)
Provides the flexibility to access part of a variable.
Requires a prefixed size specifier
Little Endian order (revise)
PTR example
Combine elements of a smaller data type into a larger operand
The CPU will automatically reverse the bytes
More examples
Returns the size of a single element of a data declaration (in bytes).
Counts the number of elements in a single data declaration
Equivalent of multiplying SIZEOF =LENGTHOF * TYPE
Spanning multiple lines ![[spanningmultiplelines.png]]
Anonymous data ![[anonymousdata.png]]
Assigns an alternate label name and type to an existing storage location.
Does not allocate any storage of it's own.
Avoids the need for the PTR operator.
dwList
,wordList
,intList
are the same offset (address).
Used for indirect addressing
OFFSET
returns the distance in bytes of a label from the beginning of it's enclosing segment.
Protected mode
: 32 bits
Real mode
: 16 bits
Example: Assume that bVal
is located at offset 0040400h
Another example
Is an indirect operand (Register as a pointer).
It holds the address of a variable, usually an array or a string.
It can be de-referenced (just like a pointer) using [ESI]
.
Works with OFFSET
to produce the address to de-reference.
Use it to clarify the size attribute of a memory operand
When we have an address (offset) we don't know the size of the values at that offset and must specify them explicitly.
Offsets are of size DWORD
.
A variable if size DWORD
can hold an offset.
i.e you can declare a pointer variable that contains the offset of another variable.
Indirect operands are ideal for traversing an array.
NOTE
: The register in brackets must be incremented by a value that matches the array type (i.e 2 for WORD, 4 for DWORD, 8 for QWORD).
Jumps are the basics of most control flow.
HLL compilers turn loops, if statements, switches etc. into same kind of jump.
JMP is an unconditional jump
to a label that is usually within the same procedure.
Syntax: JMP target
Logic:EIP <- target
A jump outside the current procedure must be to a special type of label called a
global
label.
It creates a Counted loop
using ECX
Syntax: LOOP target
Target should precede the instruction
ECX
must contain the iteration count.
Logic:
ECX <- ECX -1
If ECX !=0
, jump back to target, else go to the next instruction.
*