Spring Boot – Load users from database

This is the third article in a series on authentication with Spring Boot.
In the first article we authenticated using social networks, and allowed any user to access our application.
In the second article we used inMemoryAuthentication for users that used the login form. In essence, we hardcoded our users.
This article is about adding users to a database. We are not going to allow users to sign up, we’re just going to add the users manually.

Setup Postgres

For our user entity, we want to save the following fields:

  • username
  • password (optional)
  • role
  • email
  • name

You can add the clientIds for the social networks that you allow your users to connect with for extra security. But we won’t do that here.
But we do want to be the email to be unique. Every user must have his own email propperty.

CREATE TABLE public."user"
(
    user_name text NOT NULL,
    password text,
    role text NOT NULL,
    email text NOT NULL,
    name text,
	UNIQUE(email)
)

To test this our login later, we need to add a user. The password is BCrypt encoded for “password”.

insert into "user"
(user_name, password, role, email, name)
values
('user','$2a$10$bXetyuwpEai6LomSykjZAuQ5mxU8WqhMBXGuWYnxlveCySRlGxh2i', 'USER', 'test@example.com', 'Test User')

JPA Database access

Once we have the database in place, it would be nice to actually use it in our code. To do this, we need to configure Spring to connect to our database, create a representation of the database in our code, and create a Repository that glues it together.
First, the configuration. Since we created the database schema ourselves, we don’t want Hibernate to do that. You could set spring.jpa.hibernate.ddl-auto to ‘validate’ to make sure the schema matches your model. Also, we want to show the SQL it is executing, so it’s easier to see what’s wrong. You want to turn this off when you’re done, because it generates a lot of logging.

spring:
  jpa:
    hibernate:
      ddl-auto: none
    show-sql: true
  datasource:
    url: jdbc:postgresql://localhost:5432/database
    username: databaseuser
    password: password

For the model we’re going to use Lombok, so we don’t have to deal with the boilerplate code of getters and setters. We are using the Java Persistance API to handle the database mapping.

@Getter
@Setter
@Entity
@Table(name="user", schema = "public")
public class User {

    @Id
    private String email;

    @Column(name="user_name")
    private String userName;
    private String name;
    private String password;
    private String role;
}

We use a Repository to get the User objects from the database. Spring will do a lot of magic for us, so we only need to specify the JPA query and a method in an interface to retrieve the data.

@Repository
public interface UserRepository extends CrudRepository<User, String> {

    @Query("SELECT u FROM User u WHERE u.userName = :username")
    User getUserByUsername(String username);

    @Query("SELECT u FROM User u WHERE u.email = :email")
    User getUserByEmail(String email);
}

UserDetails and UserDetailsService

At this point we need to have UserDetails, and a service to get them from the database. Since we’re using bot FormLogin and OAuth2, I’ve decided to implement both UserDetails and OAuth2User in the same class. This makes things easier later on.

@Repository
public class MyUserDetails implements UserDetails, OAuth2User {

    private final User user;

    public MyUserDetails(User user){
        this.user = user;
    }

    @Override
    public Map<String, Object> getAttributes() {
        return Collections.emptyMap();
    }

    @Override
    public Collection<? extends GrantedAuthority> getAuthorities() {
        SimpleGrantedAuthority authority = new SimpleGrantedAuthority(user.getRole());
        return Collections.singletonList(authority);
    }

    @Override
    public String getPassword() {
        return user.getPassword();
    }

    @Override
    public String getUsername() {
        return user.getUserName();
    }

    @Override
    public boolean isAccountNonExpired() {
        return true;
    }

    @Override
    public boolean isAccountNonLocked() {
        return true;
    }

    @Override
    public boolean isCredentialsNonExpired() {
        return true;
    }

    @Override
    public boolean isEnabled() {
        return true;
    }

    @Override
    public String getName() {
        return user.getName();
    }
}

The service tries to load a user by username, and throws an exception when no user with that username could be found.

@Component
public class MyUserDetailsService implements UserDetailsService {

    private final UserRepository userRepository;

    @Autowired
    public MyUserDetailsService(UserRepository userRepository) {
        this.userRepository = userRepository;
    }

    @Override
    public UserDetails loadUserByUsername(String username)
            throws UsernameNotFoundException {
        User user = userRepository.getUserByUsername(username);

        if (user == null) {
            throw new UsernameNotFoundException("Could not find user");
        }

        return new MyUserDetails(user);
    }
}

Update FormLogin configuration

Previously we configured in-memory authentication. Now that we have all pieces in place to retrieve our users from the database, we need to configure it.
We need to configure the UserDetailsService.

@Bean
public UserDetailsService userDetailsService(){
	return new MyUserDetailsService(userRepository);
}

Then we’ll configure a DaoAuthenticationProvider using the UserDetailService. The passwordEncoder was already configured in the last blogpost.

@Bean
public DaoAuthenticationProvider authenticationProvider() {
	DaoAuthenticationProvider authProvider = new DaoAuthenticationProvider();
	authProvider.setUserDetailsService(userDetailsService());
	authProvider.setPasswordEncoder(passwordEncoder());
	return authProvider;
}

And then to tie it toghether, we use this AuthenticationProvider as the source of the users for the LoginForm

@Override
protected void configure(AuthenticationManagerBuilder auth) throws Exception {
	auth.authenticationProvider(authenticationProvider());
}

Update OAuth configuration

Since we now use the database to store our users, we also need to update the OAuth configuration. We need to verify whether the user who’s trying to login using OAuth is actually known to us. The key information that we can use here is the email address. That’s why there is a method to get the user by email address in the repository, which we are going to use here. If there is no user with the email address that was found in the OAuth2 principle, we throw an exception. Otherwise, we return that user.

@Bean
public OAuth2UserService<OAuth2UserRequest, OAuth2User> oauth2UserService() {
	DefaultOAuth2UserService delegate = new DefaultOAuth2UserService();
	return request -> {
		OAuth2User auth2User = delegate.loadUser(request);
		String email = auth2User.getAttribute("email");
		User user = userRepository.getUserByEmail(email);
		if (user != null){
			return new MyUserDetails(user);
		}
		throw new InternalAuthenticationServiceException("User not registered");
	};
}

Update WebController and frontend

Now that we have all the pieces in place to use the database for user verification, we want to use this information on our site. Since there will be problems with some login attempts (maybe the user misspelled his username), we would like to be able to show an error message on the login page. So we update the /login endpoint like this:

@RequestMapping(value = "/login")
public String login(HttpServletRequest request, Model model){
	if (request.getSession().getAttribute("error.message")!= null) {
		String errorMessage = request.getSession().getAttribute("error.message").toString();
		log.info("Error message: "+errorMessage);
		model.addAttribute("errormessage", errorMessage);
	}
	return "login";
}

On the login page, we need to add the following to display this error message:

<div class="alert alert-danger" role="alert" th:if="${errormessage}">
    <span id="user" th:text="${errormessage}"></span>
</div>

In other places we would like to get the user’s name. To do this, we need to get the principal from the authentication token.

private Optional<MyUserDetails> extractMyUserDetails(Principal principal){
	if (principal instanceof UsernamePasswordAuthenticationToken) {
		return Optional.of((MyUserDetails) ((UsernamePasswordAuthenticationToken) principal).getPrincipal());
	} else if (principal instanceof OAuth2AuthenticationToken){
		return Optional.of((MyUserDetails) ((OAuth2AuthenticationToken) principal).getPrincipal());
	}
	log.severe("Unknown Authentication token type!");
	return Optional.empty();
}

And then we get the username from the MyUserDetails class

@RequestMapping(value = "/welcome")
public String welcome(Principal principal, Model model) {
	MyUserDetails userDetails = extractMyUserDetails(principal)
			.orElseThrow(IllegalStateException::new);
	model.addAttribute("name", userDetails.getName());
	return "welcome";
}

Apache HttpClient

For a hackaton I wanted to read some files from our BitBucket server. I knew the URLs of the files, but there were some complications. First, you need to be authenticated. According to the documentation, the preferred way of authentication is HTTP Basic Authentication when authenticating over SSL. We are using an SSL connection, but with self-signed certificates.

When working with SSL, Java uses keystores and truststores. The difference between the two is that keystores store private keys, which are used by the server side, and truststores contain public keys, which are used by the client side. We have our own custom truststore, and we can tell the JVM to use that one by passing the following parameters:

-Djavax.net.ssl.trustStore=/truststore/location -Djavax.net.ssl.trustStorePassword=password

This works when you only want to access sites using your custom truststore. As soon as you want to make a connection to public sites, this fails. By default you can only use one truststore at a time. If you want more, you have create some custom code.

Or you can use the Apache HttpClient.

HTTP Basic Authentication

There are a couple of ways to use Basic Authentication. For BitBucket we need to use Preemptive Basic Authentication, which means we need to configure a HttpClientContext.
The first thing we need to do is to setup the CredentialsProvider. This doesn’t need much explanation.

        // Configure CredentialsProvider
        final CredentialsProvider provider = new BasicCredentialsProvider();
        final UsernamePasswordCredentials credentials
                = new UsernamePasswordCredentials("username", "password");
        provider.setCredentials(AuthScope.ANY, credentials);

Next we need to configure an AuthCache. We’re going to cache the authentication for a specific host.

        // configure AuthCache
        final HttpHost targetHost = new HttpHost("host", PORT, "https");
        final AuthCache authCache = new BasicAuthCache();
        authCache.put(targetHost, new BasicScheme());

Last, we use the previous steps to configure our HttpClientContext

        // configure HttpClientContext
        final HttpClientContext context = HttpClientContext.create();
        context.setCredentialsProvider(provider);
        context.setAuthCache(authCache);

HttpClient with custom SSL context

Now it’s time to configure our HttpClient. We’re going to load our truststore specifically for this client. This means that other clients and connections will still use the default Java truststore.

        SSLContext sslContext = new SSLContextBuilder()
                .loadTrustMaterial(
                        new File(configuration.getTruststoreLocation()),
                        configuration.getTruststorePassword().toCharArray()
                ).build();

        SSLConnectionSocketFactory sslSocketFactory = 
                new SSLConnectionSocketFactory(sslContext);
        return HttpClientBuilder.create()
                .setSSLSocketFactory(sslSocketFactory)
                .build();

Using the HttpClient to execute the request

Now that we have configured both the HttpClient and its context, executing a request also becomes easy. Note that we need to pass the context to the execute method.

        final HttpGet request = new HttpGet(link);
        HttpResponse response = httpClient.execute(request, context);
        InputStream connectionDataStream = response.getEntity().getContent();

Multiple Full-Screens in Java

Working with Java’s Full-Screen Exclusive Mode is a bit different than working with AWT, Swing or JavaFX. The tutorial describes how, and why, this works. The information, however, is a bit spread out. Also, it doesn’t mention how to work with multiple full screens at the same time. Not that it’s very different from working with only one screen, but it’s at least fun to try out.

The Frame

The first part of this exercise is to create a Frame that should be displayed. Since the OS is managing the video memory, we should guard against it. We could lose our drawing at any time, because the OS can simply reclaim the memory. The draw method looks a bit complex because of this, with its double loop structure. But on the positive side, we can just ignore any hint by the OS that the frame should be repainted.

package nl.ghyze.fullscreen;

import java.awt.Color;
import java.awt.Frame;
import java.awt.Graphics;
import java.awt.image.BufferStrategy;

public class TestFrame extends Frame {
    private final String id;

    public TestFrame(String id) {
        this.id = id;

        // ignore OS initiated paint events
        this.setIgnoreRepaint(true);
    }

    public void draw() {
        BufferStrategy strategy = this.getBufferStrategy();
        do {
            // The following loop ensures that the contents of the drawing buffer
            // are consistent in case the underlying surface was recreated
            do {
                // Get a new graphics context every time through the loop
                // to make sure the strategy is validated
                Graphics graphics = strategy.getDrawGraphics();

                int w = this.getWidth();
                int h = this.getHeight();

                // clear screen
                graphics.setColor(Color.black);
                graphics.fillRect(0,0,w,h);

                // draw screen
                graphics.setColor(Color.ORANGE);
                graphics.drawString("Screen: " + this.id, w / 2, h / 2);

                // Dispose the graphics
                graphics.dispose();

                // Repeat the rendering if the drawing buffer contents
                // were restored
            } while (strategy.contentsRestored());

            // Display the buffer
            strategy.show();

            // Repeat the rendering if the drawing buffer was lost
        } while (strategy.contentsLost());
    }
}

The ScreenFactory

This is just a simple utility class to figure out which screens are available, and what quality images we can show on these screens.

package nl.ghyze.fullscreen;

import java.awt.DisplayMode;
import java.awt.GraphicsDevice;
import java.awt.GraphicsEnvironment;
import java.util.Arrays;
import java.util.List;
import java.util.Optional;
import java.util.stream.Collectors;

public class ScreenFactory {

    private final GraphicsDevice[] graphicsDevices;

    /**
     * Constructor. Finds all available GraphicsDevices.
     */
    public ScreenFactory(){
        GraphicsEnvironment localGraphicsEnvironment = GraphicsEnvironment.getLocalGraphicsEnvironment();
        graphicsDevices = localGraphicsEnvironment.getScreenDevices();
    }

    /**
     * Get a list of the IDs of all available GraphicsDevices
     * @return the list of the IDs of all available GraphicsDevices
     */
    public List<String> getGraphicsDeviceIds(){
        return Arrays.stream(graphicsDevices).map(GraphicsDevice::getIDstring).collect(Collectors.toList());
    }

    /**
     * Get a single GraphicsDevice, by ID. Return an empty optional if none is found.
     * @param graphicsDeviceId the ID of the requested GraphicsDevice
     * @return an optional which contains the GraphicsDevice, if found.
     */
    public Optional<GraphicsDevice> getGraphicsDevice(String graphicsDeviceId){
        return Arrays.stream(graphicsDevices)
                .filter(graphicsDevice -> graphicsDevice.getIDstring().equals(graphicsDeviceId))
                .findAny();
    }

    /**
     * Get all available DisplayModes for the selected GraphicsDevice
     * @param graphicsDeviceId the ID of the GraphicsDevice
     * @return a list of DisplayModes
     */
    public List<DisplayMode> getDisplayModes(String graphicsDeviceId){
        GraphicsDevice gd = Arrays.stream(graphicsDevices)
                .filter(graphicsDevice -> graphicsDevice.getIDstring().equals(graphicsDeviceId))
                .findFirst()
                .orElseThrow(IllegalArgumentException::new);

        return Arrays.stream(gd.getDisplayModes()).collect(Collectors.toList());
    }

    /**
     * Get the best DisplayMode for the selected GraphicsDevice.
     * Best is defined here as the most pixels, highest bit-depth and highest refresh-rate.
     * @param graphicsDeviceId the ID of the GraphicsDevice
     * @return the best DisplayMode for this GraphicsDevice
     */
    public DisplayMode getBestDisplayMode(String graphicsDeviceId){
        List<DisplayMode> displayModes = getDisplayModes(graphicsDeviceId);
        DisplayMode best = null;
        for (DisplayMode displayMode : displayModes){
            if (best == null){
                best = displayMode;
            } else {
                if (isScreensizeBetterOrEqual(best, displayMode) ){
                    best = displayMode;
                } else if (isScreensizeBetterOrEqual(best, displayMode)
                        && isBitDepthBetterOrEqual(best, displayMode)){
                    best = displayMode;
                } else if (isScreensizeBetterOrEqual(best, displayMode)
                        && isBitDepthBetterOrEqual(best, displayMode)
                && isRefreshRateBetterOrEqual(best, displayMode)){
                    best = displayMode;
                }
            }
        }
        return best;
    }

    private boolean isScreensizeBetterOrEqual(DisplayMode current, DisplayMode potential){
        return potential.getHeight() * potential.getWidth() >= current.getHeight() * current.getWidth();
    }

    private boolean isBitDepthBetterOrEqual(DisplayMode current, DisplayMode potential){
        if (current.getBitDepth() == DisplayMode.BIT_DEPTH_MULTI) {
            return false;
        } else if (potential.getBitDepth()  == DisplayMode.BIT_DEPTH_MULTI){
            return true;
        }
        return potential.getBitDepth() >= current.getBitDepth();
    }

    private boolean isRefreshRateBetterOrEqual(DisplayMode current, DisplayMode potential){
        if (current.getRefreshRate() == DisplayMode.REFRESH_RATE_UNKNOWN) {
            return false;
        } else if (potential.getRefreshRate()  == DisplayMode.REFRESH_RATE_UNKNOWN){
            return true;
        }
        return potential.getRefreshRate() >= current.getRefreshRate();
    }
}

Bringing it together

For every screen that we have, we’re going to make a Frame. If the screen supports a Full Screen mode, we’re going to use it. Otherwise, we’re just going to use a maximized Frame. Once we’ve setup the screens, we’re going to loop over the frames, and draw each of them. We’ll do this in an infinite loop until we reach some stop condition. In this case, we’re going to stop after two seconds, but you can implement it in any way you’d like. I’ve found that if you don’t implement a stop condition, and just use an infinite loop, it can be quite challenging to actually stop the program. When the program should shut down, we’re going to reset the screens to normal and dispose of the Frames. Once the Frames are disposed, the program stops.

package nl.ghyze.fullscreen;

import java.awt.DisplayMode;
import java.awt.GraphicsDevice;
import java.util.ArrayList;
import java.util.List;

public class MultiFullScreen {
    private final List<TestFrame> frames = new ArrayList<>();

    private final long startTime = System.currentTimeMillis();
    private final ScreenFactory screenFactory = new ScreenFactory();

    public MultiFullScreen(){
        try {
            for (String graphicsDeviceId : screenFactory.getGraphicsDeviceIds()) {
                GraphicsDevice graphicsDevice = screenFactory.getGraphicsDevice(graphicsDeviceId).orElseThrow(IllegalStateException::new);
                DisplayMode best = screenFactory.getBestDisplayMode(graphicsDeviceId);

                TestFrame tf = new TestFrame(graphicsDeviceId);
                // remove borders, if supported. Not needed, but looks better.
                tf.setUndecorated(graphicsDevice.isFullScreenSupported());

                // first set fullscreen window, then set display mode
                graphicsDevice.setFullScreenWindow(tf);
                graphicsDevice.setDisplayMode(best);

                // can only be called after it has been set as a FullScreenWindow
                tf.createBufferStrategy(2);

                frames.add(tf);
            }
            run();
            shutDown();
        }catch (Exception e){
            e.printStackTrace();
        } finally {
            shutDown();
        }
    }

    private void shutDown() {
        // unset full screen windows
        for (String graphicsDeviceId : screenFactory.getGraphicsDeviceIds()) {
            try {
                GraphicsDevice graphicsDevice = screenFactory.getGraphicsDevice(graphicsDeviceId).orElseThrow(IllegalStateException::new);
                graphicsDevice.setFullScreenWindow(null);
            } catch (Exception e){
                e.printStackTrace();
            }
        }

        // Dispose frames, so the application can exit.
        for (TestFrame frame : frames){
            frame.dispose();
        }
    }

    public void run(){
        while(shouldRun()){
            for(TestFrame tf : frames){
                tf.draw();
            }
        }
    }

    private boolean shouldRun(){
        final long runningTime = System.currentTimeMillis() - startTime;
        return runningTime < 2000L;
    }

    public static void main(String[] args) {
        new MultiFullScreen();
    }
}

Java XML Api

Who uses XML in 2021? We all use JSON these days, aren’t we? Well, it turns out XML is still being used. These code fragments could help get you up to speed when you’re new to the Java XML API.

Create an empty XML document

To start from scratch, you’ll need to create an empty document:


Document doc = DocumentBuilderFactory
                .newInstance()
                .newDocumentBuilder()
                .newDocument();

Create document from existing file

To load an XML file, use the following fragment.


File source = new File("/location/of.xml");
Document doc = DocumentBuilderFactory
                    .newInstance()
                    .newDocumentBuilder()
                    .parse(target);

Add a new element

Now that we have the document, it’s time to add some elements and attributes. Note that Document and Element are both Nodes.


final Element newNode = doc.createElement(xmlElement.getName());
newNode.setAttribute(attributeName, attributeValue);
	
node.appendChild(newNode);

Get nodes matching XPath expression

XPath is a powerful way to search your XML document. Every XPath expression can match multiple nodes, or it can match none. Here’s how to use it in Java:


final String xPathExpression = "//node";
final XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xpath.evaluate(xPathExpression, 
        doc, 
        XPathConstants.NODESET);

JSON to XML

JSON is simpler and much more in use these days. However, it has less features than XML. Namespaces and attributes are missing, for example. Usually these aren’t needed, but they can be useful. To convert JSON to XML, you could use the org.json:json dependency. This will create an XML structure similar to the input JSON.

<properties>
	<org-json.version>20201115</org-json.version>
</properties>

<dependencies>
	<dependency>
		<groupId>org.json</groupId>
		<artifactId>json</artifactId>
		<version>${org-json.version}</version>
	</dependency>
</dependencies>

JSONObject content = new JSONObject(json);
String xmlFragment = XML.toString(content);

Writing XML

When we’re done manipulating the DOM, it’s time to write the XML to file or to a String. The following fragment does the trick


private void writeToConsole(final Document doc) 
    throws TransformerException{
	final StringWriter writer = new StringWriter();
	writeToTarget(doc, new StreamResult(writer));
	System.out.println(writer.toString());
}

private void writeToFile(final Document doc, File target) 
    throws TransformerException, IOException{
	try (final FileWriter fileWriter = new FileWriter(target)) {
		final StreamResult streamResult = new StreamResult(fileWriter);
		writeToTarget(doc, streamResult);
	}
}

private void writeToTarget(final Document doc, final StreamResult target) 
    throws TransformerException {
	final Transformer xformer = TransformerFactory.newInstance()
              .newTransformer();
	xformer.transform(new DOMSource(doc), target);
}

Leveraging Lucene

Imagine a catalog of a few hundred thousand items. These items have been labeled into a few hundred categories. Each item can be linked to up to three categories. New categories need to be added in order to make things easier to find. However, categories without content are useless. So some content needs to be linked to the new categories. Luckily both the items and the categories have a description, and that makes things easier.

The idea is simple:

  • Put all items into a searchable index.
  • For each category, find out what the most important words are.
  • Create a search term using these most important words, by just sticking them together.
  • Search the index of items for best matches.

And we’re done, sort of. It’s a bit more complicated than that, but that’s mostly because of finetuning.

Building the searchable index

This is where Apache Lucene comes in. Lucene is an open source full text indexing and search library, supported by the Apache Software Foundation.
First released in 1999, it is still in active development.

To create an index, you need to create an IndexWriter, and use it to add Documents to the index.

public IndexWriter createIndexWriter() throws IOException {
        Directory indexDir = FSDirectory.open(Paths.get(INDEX_DIR));
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig icw = new IndexWriterConfig(analyzer);
        icw.setOpenMode(IndexWriterConfig.OpenMode.CREATE);

        return new IndexWriter(indexDir, icw);
    }

Only one writer is needed for adding items to the index. Note that IndexWriterConfig.OpenMode.CREATE will create a new index.
If there is anything already in the index, it will be removed. IndexWriterConfig.OpenMode.CREATE_OR_APPEND could be used if you want to add to an existing index.

Next up is actually adding things to the index. Each item in the index is called a Document. Documents have Fields that can be used for searching.

Creating and adding a document can be done like this:

try {
	Document document = new Document();
	document.add(new StringField("url", "http://ghyze.nl", Field.Store.YES));
	document.add(new TextField("title", "My awesome blog",  Field.Store.YES));
	document.add(new TextField("description", "Blogging about things",  Field.Store.YES));

	writer.addDocument(document);
} catch (IOException e){
	e.printStackTrace();
}

When we’re done with adding the documents, we need to close the IndexWriter:

writer.close();

Find important words

Let’s first define what an “important word” is. Important words are words that are the most relevant for each document in the collection.
There is a nice algorithm to determine the relevance of each word: tf-idf (short for term frequency–inverse document frequency).

The premise of this algorithm is that a word is more relevant for a document the more it appears in that document, and less relevant for a single document when it appears in more documents.

  • For each document, we count how many times each word appears and we divide that by the total number of words in this document. This is the term-frequency part.
  • For each word in every document, take the total number of documents and divide it by the number of documents containing this word. We don’t want this number to be too large, so we take the log of this. This is the inverse document frequency part.
  • Multiply these numbers to get the relative importance of each word for every document. A Higher score means the word is more important.

The code for this algorithm consists of two classes and a value object

De first class represents a single document in the collection. It is responsible for calculating the importance of the words it contains in relation to the words of all other documents.

import lombok.Getter;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class Document {

    /**
     * The identifier for this document
     */
    @Getter
    private final String id;

    /**
     * The complete text for this document
     */
    @Getter
    private final Collection<String> lines;

    /**
     * Every word in this document, with the number of times it appears
     */
    private Map<String, Integer> wordsInDocument;

    /**
     * Constructor
     */
    public Document(String id, Collection<String> lines){
        this.id = id;
        this.lines = lines;
    }

    /**
     * Get the map with the unique words in this document, and the number of times they appear
     */
    public Map<String, Integer> getWordsInDocument(){
        if (wordsInDocument == null){
            calculateWordMap();
        }
        return wordsInDocument;
    }

    /**
     * Calculate the number of times each unique word appears in this document
     */
    private void calculateWordMap(){
        wordsInDocument = new HashMap<>();
        for (String line : lines){
            String[] words = line.split("\\s");
            for (String word : words){
                if (word.trim().length() > 1) {
                    Integer wordCount = wordsInDocument.getOrDefault(word, Integer.valueOf(0));
                    wordsInDocument.put(word, wordCount+1);
                }
            }
        }
    }

    /**
     * Calculate the importance of each word, compared to other words in this document
     * and all other documents in the index
     * @param index The collection of documents that also contains this document.
     * @return An ordered list indicating the importance of each word in this document.
     */
    public List<WordImportance> calculateWordImportance(WordIndex index){
        List<WordImportance> wordImportance = new ArrayList<>();
        double totalWordsInDocument = getWordsInDocument().values().stream().mapToInt(Integer::intValue).sum();
        double totalNumberOfDocuments = index.getNumberOfDocuments();
        for (String word : getWordsInDocument().keySet()){
            double tf = ((double) getWordsInDocument().get(word)) / totalWordsInDocument;
            double idf = Math.log(totalNumberOfDocuments / ((double) index.getNumberOfDocumentsContaining(word)));
            wordImportance.add(new WordImportance(word, tf*idf));
        }

        // most important word first
        wordImportance.sort(Comparator.comparing(WordImportance::getImportance).reversed());

        return wordImportance;
    }
}

The next class represents the collection of all documents. It is responsible for calculating for each word the number of documents that contain it.

import lombok.Getter;

import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;

public class WordIndex {

    /**
     * The index, key is the identifier of the document
     */
    @Getter
    private Map<String, Document> index = new HashMap<>();

    /**
     * A map with all the words in all the documents, and the number of documents containing those words
     */
    private Map<String, Integer> documentCountForWords = null;

    /**
     * Constructor
     */
    public WordIndex(){
    }

    /**
     * Add a document to the index, overwriting when it already exists.
     * @param document the document to add
     */
    public void addDocument(Document document) {
        if (index.containsKey(document.getId())) {
            System.out.println("Overwriting document with ID "+ document.getId());
        }

        index.put(document.getId(), document);
    }

    /**
     * Get all words in all documents. If a word appears multiple times, it is only returned once.
     * @return All words in this document
     */
    public Collection<String> getAllWords(){
        Set<String> allWords = new HashSet<>();

        index.values()
                .forEach(e -> allWords.addAll(e.getWordsInDocument().keySet()));

        return allWords;
    }

    /**
     * Get the map with number of documents per word
     * @return the map with number of documents per word
     */
    public Map<String, Integer> getDocumentCountForWords(){
        if (documentCountForWords == null){
            calculateDocumentCountForAllWords();
        }
        return documentCountForWords;
    }

    /**
     * Iterate over every word in every document, and count the number of documents that word appears in.
     */
    private void calculateDocumentCountForAllWords(){
        Collection<String> allWords = getAllWords();
        documentCountForWords = new HashMap<>();
        for (String word : allWords){
            for (String documentId : index.keySet()){
                Map<String, Integer> document = index.get(documentId).getWordsInDocument();
                if (document.keySet().stream().anyMatch(e -> e.equals(word))){
                    Integer count = documentCountForWords.getOrDefault(word, 0);
                    documentCountForWords.put(word, count+1);
                }
            }
        }
    }

    /**
     * Get the total number of documents
     * @return
     */
    public int getNumberOfDocuments(){
        return index.size();
    }

    /**
     * Get the number of documents this word appears in.
     * @param word The word we're interested in
     * @return The number of documents containing this word
     */
    public int getNumberOfDocumentsContaining(String word){
        Map<String, Integer> wordCount = getDocumentCountForWords();
        return wordCount.getOrDefault(word, 0);
    }
}

Then we have a value object, that is used by the document to indicate the relative importance of each word.

import lombok.AllArgsConstructor;
import lombok.Value;

@Value
@AllArgsConstructor
public class WordImportance {
    String word;
    double importance;
}

The easy steps

Now it’s time to search for content for the new categories. To do this, we take most important words of the new categories and string them together, separated by spaces. Then we use that search term to search the index, and extract the result.

This is what the code for performing the search would look like:

/** 
 * Prepare the search engine
 */
public void initializeSearch(){
	try {
		indexReader = DirectoryReader.open(FSDirectory.open(Paths.get(BuildStudyIndex.INDEX_DIR)));
		searcher = new IndexSearcher(indexReader);
		analyzer = new StandardAnalyzer();

		queryParser = new MultiFieldQueryParser(new String[]{"title", "shortDescription", "longDescription"}, analyzer);
	} catch (IOException e){
		e.printStackTrace();
	}
}

/**
 * Perform search
 */
public TopDocs search(Collection<String> words) throws IOException, ParseException{
	String searchTerm = String.join(" ", words);
	Query query = queryParser.parse(searchTerm);
	TopDocs results = searcher.search(query, 200000);
	return results;
}

The code to extract the search results would be something like this:

TopDocs result = search(searchTerms);
for (ScoreDoc hit : result.scoreDocs){
	Document found = searcher.doc(hit.doc);
	double score = hit.score;
}

Conclusion

There are some steps that we needed to do that I haven’t mentioned. But those are mainly plumbing and finetuning.

We have seen how to use Apache Lucene as a custom search engine. First we’ve built a searchable index, and later we have searched that index for relevant items.
We have also seen how to implement an algorithm that determines the most relevant words in a specific document, compared to the other documents in a collection.

The reason this works is that words have meaning. I know, stating the obvious. Each word gives meaning to the text, and this meaning has varying degrees of relevance to that text. The words that are most relevant to the text distinguish the meaning of the text from the other texts in the collection. We don’t need to know the actual meaning of the text, we just need to separate it from all other texts. Then, through the magic of search engines, we can match the texts that have the most similar meanings.

Spring Boot, MongoDB and raw JSON

Sometimes you want to store and retrieve raw JSON in MongoDB. With Spring Boot storing the JSON isn’t very hard, but retrieving can be a bit more challenging.

Setting up

To start using MongoDB from Spring Boot, you add the dependency to spring-boot-starter-data-mongodb

	<dependency>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-data-mongodb</artifactId>
	</dependency>

And then you inject MongoTemplate into your class

@Autowired
private MongoTemplate mongoTemplate;

Inserting into MongoDB

Inserting JSON is just a matter of converting the JSON into a Document, and inserting that document into the right collection

String json = getJson();
Document doc = Document.parse(json);
mongoTemplate.insert(doc, "CollectionName");

Retrieving JSON

Retrieving JSON is a bit more complicated. First you need to get a cursor for the collection. This allows you to iterate over all the documents within that collection. Then you’ll retrieve each document from the collection, and cast it to a BasicDBObject. Once you have that, you can retrieve the raw JSON.

DBCursor cursor = mongoTemplate.getCollection("CollectionName").find();
Iterator iterator = cursor.iterator();
while (iterator.hasNext()){
   BasicDBObject next = (BasicDBObject) iterator.next();
   String json = next.toJson();
   // do stuff with json
}

Transforming raw JSON to Object

With Jackson you can transform the retrieved JSON to an object. However, your object might miss a few fields, since MongoDB adds some to keep track of the stored documents. To get around this problem, you need to configure the ObjectMapper to ignore those extra fields.

ObjectMapper mapper = new ObjectMapper().configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
MyObject object = mapper.readValue(json, MyObject.class);

Java: Remove an element from a List

One of the more common tasks in programming is removing a specific element from a list. Although this seems to be straight-forward in Java, it’s a bit more tricky.

Before we start, we should build our list:

   public ArrayList<String> createList(){
      ArrayList<String> myList = new ArrayList<String>();
      
      myList.add("String 1");
      myList.add("String 2");
      myList.add("String 3");
      myList.add("String 4");
      myList.add("String 5");
      
      
      return myList;
   }

Let’s say we want to remove the String “String 2”. The first thing that comes to mind is to loop through the list, until you find the element “String 2”, and remove that element:

   public List<String> removeFromListUsingForEach(List<String> sourceList){
      for(String s : sourceList){
         if (s.equals("String 2")){
            sourceList.remove(s);
         }
      }
      return sourceList;
   }

Unfortunately, in this case, this will throw a

java.util.ConcurrentModificationException

I said “in this case”, because the exception is not always thrown. The details of this strange behavior is out of scope for this blogpost, but can be found here.

There are several ways to remove an element from a list. Depending on your personal preference, and which version of Java you use, here are some examples.

1. Use a for-loop which loops backwards
You can use a for-loop, which runs from the end of the list to the beginning. The reason you want to loop in this direction is, that when you’ve found the element you want to remove, you remove the element at that index. Every element after this one will shift one position towards the beginning of the list. If you’d run the loop forward, you’d have to compensate for this, which just isn’t worth the effort.

   public List<String> removeFromListUsingReversedForLoop(List<String> sourceList){
      for(int i = sourceList.size()-1; i >= 0; i--){
         String s = sourceList.get(i);
         if (s.equals("String 2")){
            sourceList.remove(i);
         }
      }
      return sourceList;
   }

This works in every Java version since 1.2, although you can’t use generics until Java 1.5.

2. Use an Iterator
Another way to remove an element from a list is to use an Iterator. The Iterator will loop through the list, and, if needed, can remove the current element from that list. This is done by calling

Iterator.remove()
   public List<String> removeFromListUsingIterator(List<String> sourceList){
      Iterator<String> iter = sourceList.iterator();
      while (iter.hasNext()){
         if (iter.next().equals("String 2")){
            iter.remove();
         }
      }
      return sourceList;
   }

This works in every Java version since 1.2, although you can’t use generics until Java 1.5.

3. Use Java 8 Streams
What you’re essentially doing here is make a copy of the list, and filter out the unwanted elements.

   public List<String> removeFromListUsingStream(List<String> sourceList){
      List<String> targetList = sourceList.stream()
            .filter(s -> !s.equals("String 2"))
            .collect(Collectors.toList());
      return targetList;
   }

This works since Java 1.8. More about Java 8 can be found here.

The Pomodoro Technique

One of my favorite time management methods is the Pomodoro Technique. The method is basically as follows:

  1. Pick a task.
  2. Work on it for 25 minutes. This is called a Pomodoro.
  3. Take a 5 minute break. Do something totally unrelated to your work.
  4. Work for another Pomodoro, or 25 minutes. This can be on the same task, or a new one if the previous one is finished.
  5. After 4 Pomodoros, take a longer break of 15 to 30 minutes.

Because of the breaks, your mind gets just enough rest to stay focussed. Also, because you’re supposed to work on one task, and one task only during your Pomodoro, the quality of your work can go up.

Ofcourse, the technique is completely customizable. Do you think 25 minutes is too short (or too long)? Try 45 minutes (or 10 minutes). Do you need a long break sooner? Do it every 3 Pomodoros.

No I’ve made my own Pomodoro tracker. It’s written in Java 8, and it’s open source. You can find the source here . A runnable version can be found here.

This is still work in progress, and I mainly made it as a challenge to myself. If you like it, do whatever you want with it.

The pomodoro is running!
The pomodoro is running!

When the screen and the tray icon are red, a pomodoro is running. Right clicking the tray icon will bring up a menu.

Pomodoro break.
Pomodoro break.

When the screen and the tray icon are green, you are on a break.

Waiting for the next Pomodoro to be started.
Waiting for the next Pomodoro to be started.

When the screen and the tray icon are blue, the program is waiting.

Changing the settings
Changing the settings

The settings allow you to specify the location of the program, the times and the number of Pomodoros between long breaks.

Java 8

On 18 march 2014, Oracle launched Java 8. I’ve had a little time to play with it. Here are my first experiences.

Eclipse

Eclipse 4.4 (Luna) will get support for Java 8. However, Eclipse 4.3.2 (Kepler) can support Java 8 by installing a feature patch. This page shows how to install the patch.

Once installed, you’ll need to tell your projects to use java 8. First add the JDK to eclipse:

  • Go to Window -> Preferences
  • Go to Java -> Installed JREs
  • Add Standard VM, and point to the location of the JRE
  • Then go to Compiler
  • Set Compiler compliance level to 1.8

Then tell the project to use JDK 1.8:

  • Go to Project -> preferences
  • Go to Java Compiler
  •  Enable project specific settings
  •  Set Compiler compliance level to 1.8

Now you should be able to develop your applications using Java 8.

Maven

To enable Java 8 in Maven, two things need to be done:

  1. Maven must use JDK 1.8
  2. Your project must be Java 8 compliant.

To tell maven to use JDK 1.8, point the JAVA_HOME variable to the correct location.
For the second point, make the project Java 8 compliant, add the following snippet to you pom.xml file:

<build>
  <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
        </configuration>
      </plugin>
  </plugins>
</build>

Feature: streams

Streams can be used to iterate over collections, possibly in a parallel way. This has the advantage of making use of the multi-core architecture in modern computers. But more importantly, it makes the code shorter and more readable.

Case in point, consider the following code, for getting the minimum and maximum number in an array:

   public void minMax(int[] array){
      int min = array[0], max = array[0];
      for (int i : array) {
         if (i < min) {
            min = i;
         } else {
            if (i > max)
               max = i;
         }
      }
      System.out.println("Max is :" + max);
      System.out.println("Min is :" + min);
   }

Nothing too shocking. But with Java 8 this could be done shorter and easier:

   public void java8(int[] array){
      IntSummaryStatistics stats = 
            IntStream.of(array)
            .summaryStatistics();

      System.out.println("Max is :" + stats.getMax());
      System.out.println("Min is :" + stats.getMin());
   }

This method converts the array to an IntStream, and then collects the statistics of all numbers in that stream into an IntSummaryStatistics object. When testing this with an array of 10.000.000 items, spanning the range of 1.000.000 numbers, the performance is more than 5 times better with the first method though. The first running in 12 ms, the second in 69 ms.

Feature: lambda expressions

The biggest new feature of Java 8 is Lambda Expressions. These are sort of inline methods, and are mostly used in combination with streams. To explain this, let’s take a look at the following pieces of code. This will get all the files ending in “.csv” from a directory.

First, using a FilenameFilter:

      File sourceDir = new File("D:\\Tools");
      List<String> filteredList = Arrays.asList(sourceDir.list(new FilenameFilter(){

         @Override
         public boolean accept(File dir, String name)
         {
            return name.toLowerCase().endsWith(".csv");
         }
         
      }));

Now, using a Lambda:

      File sourceDir = new File("D:\\Tools");
      List<String> filteredList = Arrays.asList(sourceDir.list())
            .stream()
            .filter(s -> s.toLowerCase().endsWith(".csv"))
            .collect(Collectors.toList());

Notice line 4, with the filter command. This replaces the accept method in the FilenameFilter. What it effectively does is the following:

For each String in the stream:
 - assign the String to s
 - Call s.toLowerCase().endsWith(".csv"), this will return a boolean
 - If the result is true, the String is passed to the next method in the stream
 - If the result is false, the next String is evaluated