Jesse's Software Engineering Blog
MySQL Data Types
When designing a database it is very important that you use the correct data types for your tables. Data types affect everything from overall size to querying efficiency. This is a great article to bookmark for reference as data types are a very important part of database design. I will be covering numeric, both integer and float, string, including binary, and temporal (time) data types in this article. These are some notes I took while studying for the MySQL Developer exam.
Numeric Data Types
Integer Data Types
Integer data types may be declared with a display width: INT (4). This is for display only, will not truncate data, nor is it related to storage requirements. If none specified, will default to required amount needed to display full range of values based on type range.
|Type||Storage Size||Unsigned Range|
|TINYINT||1 byte||0 to 255|
|SMALLINT||2 bytes||0 to 65,535|
|MEDIUMINT||3 bytes||0 to ~16.7 million|
|INT||4 bytes||0 to ~4.2 trillion|
|BIGINT||8 byte||0 to ~18 0s|
Floating-Point Data Types
Two diff float point types, FLOAT and DOUBLE. They are very efficient for storage and computation but subject to rounding errors, don’t use for currency columns. FLOAT represents single-precision, requires 4 bytes, DOUBLE represents double precision and requires 8 bytes. You can specify precision and scale with declaration: FLOAT (10,3) – 10 digits with 3 decimal places. By default will use maximum accuracy allowed by MySQL host hardware.
Fixed-Point Data Types
DECIMAL, used to represent exact value numbers. These are less efficient but more accurate than the float point numbers. All values in declared column will have the exact same number of decimal places. DECMIMAL (10,2) – 10 digits with 2 decimal places. DECIMAL Defaults to (10,0) and requires ~ 4 bytes per nine digits on each side of the decimal point of storage. In MySQL, NUMERIC does the exact same thing as DECIMAL.
String Data Types
Things to consider when choosing string data types:
Binary vs. non-binary (see more)
Fixed or variable length
How trailing spaces are handled
You can mix binary and non binary in a single table, as well as have different character sets and collations for different columns. Consider the following example:
CREATE TABLE myTable ( column1 CHAR(32) CHARACTER SET latin1, column2 CHAR(32) CHARACTER SET latin1 COLLATE latin1_general_cs )
When matching against column1 ABC = aBC where as column2 ABC != aBC. This a good technique so that you don’t have to worry about matching case during comparisons for either the whole table or just specific columns.
Non Binary String Data Types
There are three non binary string data types: CHAR, VARCHAR, and TEXT.
|Type||Storage Size||Max Length|
|CHAR (M)||M characters||255 characters|
|VARCHAR (M)||L chars + 1 or 2 bytes||65,535 characters|
|TINYTEXT||L chars + 1 byte||255 characters|
|TEXT||L chars + 2 bytes||65,535 characters|
|MEDIUMTEXT||L chars + 3 bytes||~16.7 million characters|
|LONGTEXT||L chars + 4 bytes||~4 billion characters|
CHAR has a fixed length with a maximum of 255 characters. Values entered that don’t meet the column requirements are padded with spaces which are removed upon data retrieval. This may result in retrieval value being different than the value that was stored. MySQL allocates the specified storage space even when columns are empty; they are just populated with spaces.
VARCHAR is variable length but requires a () when declaring, i.e. VARCHAR(200). Values that exceed the specified number will be truncated and either a warning or error will be given depending on the SQL mode. Trailing spaces are neither added nor removed when storing or retrieving values. Generally more efficient on storage as only the characters stored is allocated for storage where as CHAR has to allocate enough space regardless of what is actually stored in column.
All non binary strings have both a character set and collation. By default they are assigned that of the table’s.
Binary String Data Types
There are three binary string data types: BINARY, VARBINARY, and BLOB.
BINARY has the same characteristics of CHAR, VARBINARY like VARCHAR, and BLOB like TEXT.
|Type||Storage Size||Max Length|
|BINARY (M)||M characters||255 characters|
|VARBINARY (M)||L chars + 1 or 2 bytes||65,535 characters|
|TINYBLOB||L chars + 1 byte||255 characters|
|BLOB||L chars + 2 bytes||65,535 characters|
|MEDIUMBLOB||L chars + 3 bytes||~16.7 million characters|
|LONGBLOB||L chars + 4 bytes||~4 billion characters|
ENUM and SET Data Types
ENUM and SET are used when column values should be selected from a specific fixed set of values. Values are defined as strings but stored as integers.
ENUM is an enumeration type. Every value in an ENUM column must match one of the specified values. ENUM column can list up to 65,535 members. Sample ENUM declaration:
column1 ENUM (‘value1’,’value2’)
Internally values are stored as integers, starting with the number 1. 0 is reserved for illegal values, which is displayed as an empty string. When retrieving ENUM values you can use either the string value or the numerical index. (In strict SQL mode an error will be thrown for illegal insertions).
SET is similar to ENUM except that entries may be any combination of values. The SET column is declared the same way:
column1 SET (‘value1’,’value2’,’value3’)
SET columns may contain up to 64 values. These values are stored internally as a bitmap using one int per member. If you enter invalid members they are simply ignored. (Except in strict mode).
Temporal Data Types
|DATE||3 bytes||’1000-01-01′ to ’9999-12-31′|
|TIME||3 bytes||‘-838:59:59′ to ’838:59:59′|
|DATETIME||8 bytes||’1000-01-01 00:00:00′ to ’9999-12-31 23:59:59′|
|TIMESTAMP||4 bytes||’1970-01-01 00:00:00′ to mid-year 2037|
|YEAR||1 byte||1901 to 2155 YEAR(4)
1970 to 2069 YEAR(2)
DATE, TIME, DATETIME, and YEAR
Each temporal data type has a “zero” value used when you try and store an illegal value.
MySQL represents DATE values in ‘YYYY-MM-DD’ format. This can be changed using the DATE_FORMAT() function. When submitting values, the leading zeros can be omitted. You can also use a different delimiter like ‘/’. Two digit years will automatically be converted to four digits with the assumption 70 to 99 are from the 1900s. When putting poorly formatted values into a DATE column manipulate them with string functions first to get them to the correct ISO format before inserting.
TIME values are represented as hh:mm:ss. Again, the trailing 0s may be omitted. To change the format can use TIME_FORMAT(). These columns can be used to record lapsed time therefore have a range outside of the time in a day
DATETIME is similar to both DATE and TIME except that TIME must be within hours of a day.
YEAR is a very efficient temporal storage, use when possible. Defaults to YEAR(4).
TIMESTAMP is a representation of seconds since Jan. 1, 1970. It is converted to the server’s time zone for storage and to user’s time zone on retrieval. The TIMESTAMP column can be initialized and updated automatically without having to declare explicit values. TIMESTAMP columns are the most useful when using the auto updating functionality.
Two useful auto date manipulations are DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTATMP, both of which can only be used with the TIMESTAMP column. You also have the choice of just using one of the attributes but you can’t use one on one column and one on another. However you can get around this by setting a TIMESTAMP column to DEFAULT to 0 and passing NULL on creation.
Time Zone Manipulation
The TIMESTAMP columns are affected by the server’s time zone due to the way in which they are stored. The time zone can be read/set with the variable time_zone:
SELECT @@global.time_zone, @@session.time_zone;
The session variable is instantiated from the global variable when the server starts running. Changing the global variable will not affect the current session until the server is restarted. You can change the time_zone by:
SET GLOBAL time_zone = ‘America/Anchorage’; SET SESSION time_zone = ‘America/Anchorage’;
There are various different formats the time_zone variable can accept. For textual versions see table mysql. time_zone_name. You can use offsets in the format ‘+hh:mm’ i.e. ‘-05:00’ for New York. Or you can simply use the keyword SYSTEM to set it to the system where the server is running’s time. When you change the session time all the TIMESTAMP values will update to display the new time zone, but the other temporal columns will remain untouched. Changing the time zone also will affect the NOW() and CURDATE() functions. So if you enter data using those functions then switch time zones, those values will not be updated to reflect the new time zone, unless they are stored in a TIMESTAMP column.
You can do explicit time zone conversions on data using the CONVERT_TZ() function.