taeluf/json-to-mysql

v2.0.x-dev 2023-07-15 22:40 UTC

This package is auto-updated.

Last update: 2024-10-16 06:35:01 UTC


README

Convert large JSON files into MySql. Uses fgets() & parses json line-by-line to minimize memory usage. Also works for smaller json. Also chunks the output sql files to your desired size.

Example

JSON must be in this format:

[  
    {"name": "row1", "index": 0},  
    {"name": "row2", "index": 1}  
]  

Executing it:

<?php  
// variables you'll want to change  
$source_file = $this->file('test/input/main.json');  
$out_prefix = $this->file('test/input/main.out/test_sql');  
$table_name = 'test_sql';  
// max `strlen()` of the sql put into any one output file  
$max_sql_len = 1024*1024;  
  
// a pdo instance for running the create & insert  
$pdo = new \PDO("sqlite::memory:");  
  
// generate all the sql  
$jtm = new \Tlf\Jtm();  
$schema_info = $jtm->generate_schema_info($source_file);  
$schema = $jtm->generate_schema($source_file, $schema_info);  
$create_sql = $jtm->generate_sql_create($table_name, $schema);  
$jtm->generate_sql_insert($source_file, $out_prefix, $table_name, $schema, $max_sql_len);  
  
//execute the sql  
$pdo->exec($create_sql);  
$num_rows = $jtm->execute_insert_files($out_prefix, $pdo);  

Install

composer require taeluf/json-to-mysql v2.0.x-dev   

or in your composer.json

{"require":{ "taeluf/json-to-mysql": "v2.0.x-dev"}}  

Versions (branches)

  • v1.0: old version with a disappointing api
  • v2.0: new version with a much cleaner api

Features

  • Read small or large JSON source file and output mysql create statement & insert statements
  • Chunk insert statements, limiting max length of output files
  • Output column data from all rows of json file
  • Convert nested arrays into JSON strings for INSERT
  • TRIES to use correct column type (text, bool, varchar, int, float)
  • Build table from generated schema
  • Very small RAM footprint - Reads input file line-by-line & outputs insert sql files line-by-line

Extra Notes

Warnings

  • Set your length to at least 1024*1024 (roughly one megabyte), or you might end up with several thousand files from large datasets
  • Setting a length SMALLER than the length of an insert statement... is probably really bad

Column Data ($jtm->generate_schema_info())

  • Each is_ entry counts the number of rows that matched that is_.
  • sampleValue is the longest found value for that column
<?php  
[  
'name' => [  
      
        'maxlen' => 4,  
        'minlen' => 4,  
        'count' => 2,  
        'is_string' => 2,  
        'is_numeric' => 0,  
        'is_int' => 0,  
        'is_float' => 0,  
        'is_bool' => 0,  
        'is_array' => 0,  
        'is_date' => 0,  
        'is_json' => 0,  
        'sampleValue' => 'row2',  
    ],  
'index'=>[  
        'maxlen' => 1,  
        'minlen' => 1,  
        'count' => 2,  
        'is_string' => 0,  
        'is_numeric' => 2,  
        'is_int' => 2,  
        'is_float' => 2,  
        'is_bool' => 0,  
        'is_array' => 0,  
        'is_date' => 0,  
        'is_json' => 0,  
        'sampleValue' => 1,  
    ]  
],  

TODO (maybe)

  • a function like create_and_insert_from_json($json_file_path, $pdo) ... to just have a one-liner that does it all
  • Write bash script to execute generation from cli
  • Write a bash script to compress, upload, & uncompress the sql files
  • Write a bash script that will import all the sql files into the database