Spring Boot and Database initialization

Spring boot is hands down a great framework, saving the developer a lot of time and energy when developing a spring application.

One of its great features is database initialization.
You can use spring boot in order to initialize your sql database.

We will start with the gradle file

group 'com.gkatzioura'
version '1.0-SNAPSHOT'

apply plugin: 'java'

sourceCompatibility = 1.5

buildscript {
    repositories {
        mavenCentral()
    }
    dependencies {
        classpath("org.springframework.boot:spring-boot-gradle-plugin:1.3.3.RELEASE")
    }
}

apply plugin: 'idea'
apply plugin: 'java'
apply plugin: 'spring-boot'

repositories {
    mavenCentral()
} 

dependencies {
    compile("org.springframework.boot:spring-boot-starter-web") {
        exclude module: "spring-boot-starter-tomcat"
    }
    compile("org.springframework.boot:spring-boot-starter-jetty")
    compile("org.springframework:spring-jdbc")
    compile("org.springframework.boot:spring-boot-starter-actuator")
    compile("com.h2database:h2:1.4.191")
    testCompile group: 'junit', name: 'junit', version: '4.11'
}

Pay special attention to the org.springframework:spring-jdbc dependency. Actually this is the dependency that assists with the database initialization.
H2 database is more than enough for this example.

The applications main class

package com.gkatzioura.bootdatabaseinitialization;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.ApplicationContext;

/**
 * Created by gkatzioura on 29/4/2016.
 */
@SpringBootApplication
public class Application {

    public static void main(String[] args) {

        SpringApplication springApplication = new SpringApplication();
        ApplicationContext applicationContext = springApplication.run(Application.class,args);
    }

}

The next step is to specify the datasource

package com.gkatzioura.bootdatabaseinitialization.config;

import org.h2.jdbcx.JdbcDataSource;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Primary;

import javax.naming.Context;
import javax.naming.InitialContext;
import javax.naming.NamingException;
import javax.sql.DataSource;

/**
 * Created by gkatzioura on 29/4/2016.
 */
@Configuration
public class DataSourceConfig {

    private static final String TEMP_DIRECTORY = System.getProperty("java.io.tmpdir");

    @Bean(name = "mainDataSource")
    public DataSource createMainDataSource() {

        JdbcDataSource ds = new JdbcDataSource();
        ds.setURL("jdbc:h2:"+TEMP_DIRECTORY+"/testdata;MODE=MySQL");
        return ds;
    }

}

We will add a schema.sql file to the resource folder so it would be loaded to classpath. The schema.sql file would contain all the table definitions needed for our database.

CREATE TABLE IF NOT EXISTS `Users` (
    `user_id` bigint(20) NOT NULL AUTO_INCREMENT,
    `name` varchar(200) NOT NULL,
    PRIMARY KEY (`user_id`)
);

Next file to add is data.sql on the resources folder. This file will contain the sql statements needed to populate our database.

INSERT INTO `Users` (`user_id`,`name`) VALUES (null,'nick');
INSERT INTO `Users` (`user_id`,`name`) VALUES (null,'george');

On initialization spring boot will search for the data.sql and schema.sql files and execute them with the Database initializer.

So far so good, however when you have two datasources defined, things get complicated.
We shall add a secondary datasource

package com.gkatzioura.bootdatabaseinitialization.config;

import org.h2.jdbcx.JdbcDataSource;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Primary;

import javax.naming.Context;
import javax.naming.InitialContext;
import javax.naming.NamingException;
import javax.sql.DataSource;

/**
 * Created by gkatzioura on 29/4/2016.
 */
@Configuration
public class DataSourceConfig {

    private static final String TEMP_DIRECTORY = System.getProperty("java.io.tmpdir");

    @Bean(name = "mainDataSource")
    public DataSource createMainDataSource() {

        JdbcDataSource ds = new JdbcDataSource();
        ds.setURL("jdbc:h2:"+TEMP_DIRECTORY+"/testdata;MODE=MySQL");
        return ds;
    }

    @Bean(name = "secondaryDataSource")
    public DataSource createSecondaryDataSource() {

        JdbcDataSource ds = new JdbcDataSource();
        ds.setURL("jdbc:h2:"+TEMP_DIRECTORY+"/secondarydata;MODE=MySQL");
        return ds;
    }
}

By starting the application we get an error

Caused by: org.springframework.beans.factory.NoUniqueBeanDefinitionException: No qualifying bean of type [javax.sql.DataSource] is defined: expected single matching bean but found 2: mainDataSource,secondaryDataSource

The problem is that the datasource initializer gets injected with a datasource. So we have to specify the datasource inject or else we will get an exception.
A workaround is to specify which datasource bean is the primary one.

    @Bean(name = "mainDataSource")
    @Primary
    public DataSource createMainDataSource() {

        JdbcDataSource ds = new JdbcDataSource();
        ds.setURL("jdbc:h2:"+TEMP_DIRECTORY+"/testdata;MODE=MySQL");
        return ds;
    }

By doing so the initializer will run the schema.sql and data.sql scripts using the mainDataSource bean.

Another great feature of spring boot database is initialization is that it can be integrated with flyway. Get more information on flyway here.

You can find the project source code here

Add ssl to Mysql and Postgresql

Adding ssl support to a relational database like mysql or postgresql is a standard task.

First we need to have our certificates ready.
We can either use mysql workbench which has a nice wizard.
Or we can create them using openssl.

In the end we will end up with three files

ssl-ca=ca.pem
ssl-cert=server-cert.pem
ssl-key=server-key.pem

We can also check that everything is ok by making a basic test.
Start an open ssl server

/usr/bin/openssl s_server -cert server-cert.pem -key server-key.pem

and a client to connect

openssl s_client -CAfile ca.pem -connect 127.0.0.1:4433

In case of no errors you are good to go.

In case of mysql we shall create a directory and put our certificates in it

mkdir /etc/mysql-ssl
mv ca.pem /etc/mysql-ssl
mv server-cert.pem /etc/mysql-ssl
mv server-key.pem /etc/mysql-ssl
chown -R mysql mysql-ssl

Now we shall edit /etc/my.cnf and on the [mysqld] section add

[mysqld]
ssl-ca=/etc/mysql-ssl/ca.pem
ssl-cert=/etc/mysql-ssl/server-cert.pem
ssl-key=/etc/mysql-ssl/server-key.pem

Now when we login to mysql by issuing show global variables like ‘%ssl%’ we get

mysql> show global variables like '%ssl%';
+---------------+--------------------------------+
| Variable_name | Value                          |
+---------------+--------------------------------+
| have_openssl  | YES                            |
| have_ssl      | YES                            |
| ssl_ca        | /etc/mysql-ssl/ca.pem          |
| ssl_capath    |                                |
| ssl_cert      | /etc/mysql-ssl/server-cert.pem |
| ssl_cipher    |                                |
| ssl_crl       |                                |
| ssl_crlpath   |                                |
| ssl_key       | /etc/mysql-ssl/server-key.pem  |
+---------------+--------------------------------+

Suppose we have a database called tutorial_database, we will create a user that will have access to it only through ssl

create user 'tutorial_user'@'%' identified by 'yourpass';
grant all privileges on tutorial_database.* to 'tutorial_user'@'%' REQUIRE SSL;;

It order to connect with this user for example by using mysql client you need

mysql --ssl-ca=ca.pem -u tutorial_user -h yourhost -p

Using the ca.pem created previously

Now on postgresql things are easy too

Place your server certificate and your server key to your postgres data directory

cp server-cert.pem $PGDATA/server.crt
cp server-key.pem $PGDATA/server.key

Also change your server key properties or else postgresql will not start

chmod og-rwx server.key

Next step is to edit postgresql.conf and add

ssl=on

After restarting we will be able to connect through ssl to postgres. Just add the ssl setting.

psql "sslmode=require host=yourhost dbname=tutorial_database" tutorial_user

However if we want a specific user to connect to a database with ssl then we should edit pg_hba.conf

# TYPE  DATABASE          USER            ADDRESS  METHOD
hostssl tutorial_database tutorial_user   all      md5

Testing has become Mandatory (and there are no more excuses)

There are many posts out there about the value of automated testing and why it is a must through the life cycle of a software product.

Most people and teams totally agree with this statement, so what goes wrong?

It is just a prototype

It seems to be a valid statement, but if we consider how many times a prototype made it into production then it is not so valid.
In fact most prototype codebase ends up to production because this is the goal of a prototype.
Adding tests in a prototype phase makes absolutely sense.

Too difficult to test

Yes there cases where testing is really difficult due to limitations. For example the android and he iphone emulators do not give you the ability
to set accelerometer events, or you use a service which provides you with no test utilities at all, or a testing environment.
Even if mocking would not make an absolute test case, it really can assist you on making a specification on how things work.
The other scenario is testing being hard to implement due to the codebase.
Consider this as an indicator that things are not simple enough. If it is just your codebase then your are lucky, It is up to you to make it more testable and more simple.

Too much to do, too little time

This is our weakest spot and the one that we are most prone too succumb.
Suppose you develop a smartphone application. In case of manual testing each mistake will cost you 3-5 minutes. Starting the emulator or even worse deploy to a physical device, load the application and press some buttons to create events, and in case of an error repeat and loose other 3-5 minutes.
In case you develop a server application things might get even worse. Connect to the sever, upload the application, check that that the upload was successful and then manually test that it works ok, and in case of an error repeat and loose some time again.
Also keep in mind that in case of projects with more than one developers involved, mistakes are more common to happen.
We tend to believe that the best case will happen where everything will work as expected.
It might work most of the time but in case of a problem you loose big. We end up being hooked on an “It’s ok all it needs is another version upload” mode.
Next time this happens just count how much time you end up loosing by testing manually until everything is ok.
Then estimate how much time tests will cost you and how much time you win in such cases.

Everything changes so rapidly

Yes a project on its initial phase, will have a completely different codebase in two months than its initial one.
But since big changes are made more errors are likely to occur. A minimal amount of tests ensures that basic features would continue to work.

All in all testing is not as hard as it used to be.
Nowadays almost every service or utility that we use comes with some test utils.
As the software industry becomes more challenging, codebases without tests would become extinct.
Pick whatever methodology you want but just do it. Your life will become much easier.

Scheduling jobs on Node.js with node-schedule

Batching is a great part of todays software development. The business world runs on batch from bank statements to promotion emails.

Node.js has some good libraries for such cases.

Node Schedule is a light cron like scheduler for node.

npm install node-schedule

In case your are used to cron and the cron expression format, it will be pretty easy for you.


var scheduler = require('node-schedule');
 
var montlyJob  = scheduler.scheduleJob('0 0 1 * *', function(){
  console.log('I run the first day of the month');
});

But you also have a javascript object approach

var scheduler = require('node-schedule');

var rule = new scheduler.RecurrenceRule();
rule.hour = 7
rule.dayOfWeek = new schedule.Range(0,6)
 
var dailyJob = schedule.scheduleJob(date, function(){
  console.log('I run on days at 7:00');
});

scheduler.scheduleJob(rule,task);

Also you can have tasks submitted by giving a date

var scheduler = require('node-schedule');

var date = new Date(2017, 1, 1, 0, 0, 0);
var newYearJob = scheduler.scheduleJob(date, function() {
    console.log("Happy new year");
});

However in case your job is not needed you can cancel it pretty easy


newYearJob.cancel();

Systemd and Upstart Services

Most linux servers that I use are either Debian based or RedHat based.

A common task is adding daemon services.

Suppose that we want to start a tomcat application on startup

First we shall install tomcat

mkdir /opt/tomcat
groupadd tomcat
useradd -s /bin/false -g tomcat -d /opt/tomcat tomcat
wget http://apache.cc.uoc.gr/tomcat/tomcat-8/v8.0.33/bin/apache-tomcat-8.0.33.tar.gz
tar xvf apache-tomcat-8.0.33.tar.gz
mv apache-tomcat-8.0.33/* /opt/tomcat
rm -r apache-tomcat-8.0.33 apache-tomcat-8.0.33.tar.gz 
cd /opt/tomcat
chgrp -R tomcat conf
chmod g+rwx conf
chmod g+r conf/*
chown -R tomcat work/ temp/ logs/

In case of Systemd we should add a tomcat.service file on /etc/systemd/system.
The file /etc/systemd/system/tomcat.service shall contain

[Unit]
Description=Apache Tomcat Web Application Container
After=syslog.target network.target

[Service]
Type=forking

Environment=JAVA_HOME=/usr/java/default
Environment=CATALINA_PID=/opt/tomcat/temp/tomcat.pid
Environment=CATALINA_HOME=/opt/tomcat
Environment=CATALINA_BASE=/opt/tomcat
Environment='CATALINA_OPTS=-Xms512M -Xmx1024M -server -XX:+UseParallelGC'
Environment='JAVA_OPTS=-Duser.timezone=UTC -Djava.awt.headless=true -Djava.security.egd=file:/dev/./urandom'

ExecStart=/opt/tomcat/bin/startup.sh
ExecStop=/bin/kill -15 $MAINPID

User=tomcat
Group=tomcat

[Install]
WantedBy=multi-user.target

I specified the script to start after syslog and network are enabled
As we can see systemd handles the tomcat as a daemon and kills the pid.
With User and Group we specify the user and the group that the process should be run as.
Systemd will handle the upstart process and kill it using the PID.

to enable and run you have to issue

systemctl enable tomcat
systemctl start tomcat

In case of upstart we should create a tomcat.conf file in /etc/init/
The content of /etc/init/tomcat.conf

description     "Tomcat instance"
author          "Emmanouil Gkatziouras"

respawn
respawn limit 2 5

start on runlevel [2345]
stop on runlevel [!2345]

setuid tomcat
setgid tomcat

env CATALINA_HOME=/opt/tomcat

script
        $CATALINA_HOME/bin/catalina.sh run
end script

post-stop script
        rm -rf $CATALINA_HOME/temp/*
end script

It will start on run levels 2,3,4 or 5
The group and the user id to be executed would be tomcat
After tomcat is stopped the post script block will remove the temp files.
Instead of starting the process inn the background as a daemon,, upstart will handle the process on the foreground.

To start just issue

sudo initctl start tomcat

Implement a DynamoDB docker Image

When you use DynamoDB and you have good codebase test coverage, chances are that you tend to use a lot local DynamoDB.
Docker comes really in handy in order to distribute a pre-configured local dynamo db among your dev teams or your Continuous integration server.

I will use a Centos image.

We will need Java.
I prefer the oracle jdk therefore I have to accept the license and download locally the java rpm.

In case you want open jdk you can just install it through yum

So we create the Dockerfile.
I will use the default port which is 8000 so I will expose port 8000.
jdk-8u91-linux-x64.rpm is the oracle java I downloaded previously.

FROM centos

ADD jdk-8u91-linux-x64.rpm /

RUN rpm -Uvh jdk-8u91-linux-x64.rpm

RUN rm /jdk-8u91-linux-x64.rpm

RUN mkdir /opt/DynamoDB

RUN curl -O -L http://dynamodb-local.s3-website-us-west-2.amazonaws.com/dynamodb_local_latest.tar.gz

RUN mv dynamodb_local_latest.tar.gz /opt/DynamoDB/

RUN cd /opt/DynamoDB && tar xvf dynamodb_local_latest.tar.gz && rm dynamodb_local_latest.tar.gz

EXPOSE 8000

ENTRYPOINT ["java","-Djava.library.path=/opt/DynamoDB/DynamoDBLocal_lib","-jar","/opt/DynamoDB/DynamoDBLocal.jar","-sharedDb"]

Then we build our image

docker build -t dynamodb .

No we run the container on the background

docker run -p 8000:8000 -d dynamodb

Implement a SciPy Stack Docker Image

SciPy is a powerful python library, but it has many dependencies including Fortran.
So Running your Scipy code in a docker container makes absolute sense.

We will use a private registry

docker run -d -p 5000:5000 --name registry registry:2

I will use a Centos image.
Centos is a very popular linux distribution based on RedHat which is a commercial Linux distribution. Oracle’s Linux and Amazon Linux is based on Red Hat Linux.

docker pull centos
docker tag centos localhost:5000/centos
docker push localhost:5000/centos

Then we start a container

docker run -i -t --name centoscontainer localhost:5000/centos /bin/bash

We install all binary dependencies

yum install -y epel-release
yum -y update
yum -y groupinstall "Development Tools"
yum -y install python-devel
yum -y install blas --enablerepo=epel
yum -y install lapack --enablerepo=epel
yum -y install Cython --enablerepo=epel
yum -y install python-pip

Then we install the scipy stack

pip install boto3
pip install numpy
pip install pandas
pip install scipy

And we are ready. Now we should proceed on committing the image.

docker commit -m 'Added scipy stack' -a "Emmanouil Gkatziouras" 4954f603d93b localhost:5000/scipy
docker push localhost:5000/scipy

Now we are ok to run our SciPy enabled container.

docker run -t -i localhost:5000/scipy /bin/bash

Last but not least we clear our registry.

docker stop registry && docker rm -v registry