Mappings from generic datatypes to those in specific databases are defined in the sqltypes module. There are explicit mappings for MySQL, Oracle, and PostgreSQL. SQLite is also supported but no custom mappings are provided for its datatypes.

A comprehensive chart within the Felis documentation shows the mappings for many different database flavors and file formats.

MySQL Data Mappings


TypeMySQL TypeHas Length?Notes
binaryLONGBLOBYESMaps to LONGBLOB but could also potentially use BLOB or MEDIUMBLOB (more efficient in certain cases as BLOB is in-lined with the row); a user override could potentially be useful for this type.
booleanBIT(1)NOBOOL or BOOLEAN could be used for the mapping instead, as these are already aliases to TINYINT(1) in MySQL DDL. In either case, overriding this type should not be necessary in practice.
byteTINYINTNO
charCHARYES
doubleDOUBLENO
floatFLOATNO
intINTEGERNO
longLONGNO
shortSMALLINTNO
stringVARCHARYESWhen overridden by a mysql type, length has to be provided twice. (See below for more details.)
textLONGTEXTYESText type is defined incorrectly in Felis as it requires a length, and none of the supported databases accept a length on the mapped type.
timestampTIMESTAMPNOTimestamps in MySQL do not have a length. Some of the production schemas have a length argument for timestamp fields but these are ignored in the output DDL.
unicodeVARCHARYESThere is a minor difference in the output where the default mapping has the "NATIONAL" keyword in the DDL, and the "mysql:datatype" mapping does not.

This type is unused in the current set of production schemas.

Notes on MySQL Type Mappings

  • After examining and verifying the default datatype mappings, the vast majority of "mysql:datatype" annotations in the SDM schemas appear to be superfluous and unnecessary.
    • The default type mappings should result in the same DDL as when "mysql:datatype" is provided explicitly in all but a few uncommonly used cases.
  • Length values currently have to be specified twice if overridden by a "mysql:datatype" annotation as in: 

    length: 255
    mysql:datatype: VARCHAR(255)


    If the mysql type is listed as "VARCHAR" only without the length, then an error is thrown: sqlalchemy.exc.CompileError: (in table 'datatype_test', column 'string_t_mysql'): VARCHAR requires a length on dialect mysql 
    • This is the case for any type that accepts a length when it is overridden by "mysql:datatype" (VARCHAR, CHAR, etc.).
    • When using the default type mapping, this is not a problem, so that should be the preferred syntax when possible.
  • Many of the schemas contain timestamp fields which also have a "length" value. The timestamp type does not have a length in any of the database flavors (MySQL, PostgreSQL, Oracle), so this does not make sense. It is actually ignored when generating the output DDL.
  • The DDL for the unicode type is slightly different when overridden, in that the "NATIONAL" keyword is present when using the default type mapping and it disappears when overridden.
    • It is unclear whether this matters or not in practice. VARCHAR fields support utf8 by default.

Proposed Jira Tickets

  • Remove most "mysql:datatype" annotations from production schemas in favor of the default mapping. (sdm_schemas)
    • Check carefully that the "mysql:datatype" corresponds to the Felis default before doing this for a particular column.
    • A tool could be developed that emits a warning if the "mysql:datatype" definition is redundant and results in the same DDL as the default mapping.
  • Remove length from timestamp fields in production schemas. (sdm_schemas)
    • This information is ignored and does not make sense, because timestamp fields in MySQL, PostgreSQL and Oracle do not accept a length specifier. Timestamp fields in all of these databases have a fixed length.
    • Is the length information useful or meaningful in some other context? Does "length: 6" actually mean anything in a schema for a timestamp column or is this value arbitrary?
  • Fix the type mapping to TEXT. (felis)
    • This is broken for MySQL and likely does not work for PostgreSQL or Oracle either.
    • This type should not require a length as it does now, because TEXT fields do not have a fixed length and one cannot be provided in the DDL or an error will occur. 
  • Verify that the default unicode settings and keywords are correct in the DDL and the resultant database tables for applicable fields such as VARCHAR. (felis)
    • Determine if it matters whether the NATIONAL keyword is present or not.
    • Is the utf8 default okay in most/all cases?
    • Is it actually even possible currently to customize the unicode settings in the DDL via the schema? If not, does it need to be?
  • Disallow a length field on column types which do not support it. (felis)
    • Or at least generate a warning in the schema checker that says the length is being ignored (e.g. for TIMESTAMP).

Files

YAML schema file for comparing MySQL mappings with defaults.

datatype_check.yaml

DDL output used to compare MySQL mappings with defaults


  • No labels