Install Docker
Official Guide:
Following are the steps(assume you have ubuntu installed on WSL):
Update and Upgrade
sudo apt update && sudo apt upgrade
Install Dependecies
sudo apt-get install \ apt-transport-https \ ca-certificates \ curl \ gnupg-agent \ software-properties-common
Tell the System that all the software provided by Docker is trustable(downlad the GPG key and add to system).
curl -fsSL | sudo apt-key add -
You should see “OK” as response.
Add Docker Repository (docker source)
sudo add-apt-repository "deb [arch=amd64] $(lsb_release -cs) stable"
now the problem here is you might not find docker has the corresponding version for your ubuntu-ish linux distro, so the
lsb_release -cs
might give “ulyssa”(mint release name) instead of “focal” (ubuntu release name), and docker doesn’t have any release for “ulysssa”. you need to check “” where all the ubuntu release name avaiable, and then change the command like:sudo add-apt-repository "deb [arch=amd64] focal stable"
Refresh the dependencies
sudo apt update
Install Docker-ce
sudo apt-get install docker-ce docker-ce-cli
Manage Docker as non-root user
sudo groupadd docker sudo usermod -aG docker $USER
Log out and log back
Check the Docker installation with hello-world
docker run hello-world
Install MySQL docker Image
pull mysql image from docer hub:
sudo docker pull mysql
verfiy the mysql docker image pulled
docker images
Create persistent data storage and start the service in docker
sudo mkdir /var/lib/mysql -p sudo docker run -d --name mysql-server -p 3306:3306 -v /var/lib/mysql:/var/lib/mysql -e "MYSQL_ROOT_PASSWORD=123" mysql
Use any mysql client to connect “” with password “123” to verify the server install successful.
Load csv file into MySQL
Install pip3
sudo apt update sudo apt install python3-pip export PATH="$HOME/.local/bin:$PATH"
pip3 install csvkit pip3 install PyMySQL
tool to pipe in the csv file into mysql WITHOUT CREATING TABLEcsvsql --db mysql+pymysql://root:123@localhost:3306/Test --tables HNPQCountry --insert HNPQCountry.csv
If you wanto control the data type, then import the data, you can use
to generate table creation sql:csvsql --dialect mysql HNPQCountry.csv > /tmp/createTable.sql cat /tmp/createTable.sql CREATE TABLE `HNPQCountry` ( `Country Code` VARCHAR(3) NOT NULL, `Short Name` VARCHAR(30) NOT NULL, `Table Name` VARCHAR(30) NOT NULL, `Long Name` VARCHAR(73) NOT NULL, `2-alpha code` VARCHAR(2), `Currency Unit` VARCHAR(42) NOT NULL, `Special Notes` VARCHAR(939), `Region` VARCHAR(26) NOT NULL, `Income Group` VARCHAR(19) NOT NULL, `WB-2 code` VARCHAR(2), `National accounts base year` VARCHAR(50), `National accounts reference year` VARCHAR(9), `SNA price valuation` VARCHAR(36), `Lending category` VARCHAR(5), `Other groups` VARCHAR(9), `System of National Accounts` VARCHAR(61), `Alternative conversion factor` VARCHAR(18), `PPP survey year` BOOL, `Balance of Payments Manual in use` VARCHAR(33), `External debt Reporting status` VARCHAR(11), `System of trade` VARCHAR(20), `Government Accounting concept` VARCHAR(31), `IMF data dissemination standard` VARCHAR(51), `Latest population census` VARCHAR(166) NOT NULL, `Latest household survey` VARCHAR(77), `Source of most recent Income and expenditure data` VARCHAR(88), `Vital registration complete` VARCHAR(48), `Latest agricultural census` VARCHAR(130), `Latest industrial data` DECIMAL(38, 0), `Latest trade data` DECIMAL(38, 0), CHECK (`PPP survey year` IN (0, 1)) );
Make Spark Docker image
Github has official Dockerfile that we can use to make customized image: Offical Spark Docker File
download the Dockerfile
wget -O /tmp/Dockerfile
build image