# Leveraging Lucene

Imagine a catalog of a few hundred thousand items. These items have been labeled into a few hundred categories. Each item can be linked to up to three categories. New categories need to be added in order to make things easier to find. However, categories without content are useless. So some content needs to be linked to the new categories. Luckily both the items and the categories have a description, and that makes things easier.

The idea is simple:

• Put all items into a searchable index.
• For each category, find out what the most important words are.
• Create a search term using these most important words, by just sticking them together.
• Search the index of items for best matches.

And we’re done, sort of. It’s a bit more complicated than that, but that’s mostly because of finetuning.

## Building the searchable index

This is where Apache Lucene comes in. Lucene is an open source full text indexing and search library, supported by the Apache Software Foundation.
First released in 1999, it is still in active development.

To create an index, you need to create an IndexWriter, and use it to add Documents to the index.

public IndexWriter createIndexWriter() throws IOException {
Directory indexDir = FSDirectory.open(Paths.get(INDEX_DIR));
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig icw = new IndexWriterConfig(analyzer);
icw.setOpenMode(IndexWriterConfig.OpenMode.CREATE);

return new IndexWriter(indexDir, icw);
}


Only one writer is needed for adding items to the index. Note that IndexWriterConfig.OpenMode.CREATE will create a new index.
If there is anything already in the index, it will be removed. IndexWriterConfig.OpenMode.CREATE_OR_APPEND could be used if you want to add to an existing index.

Next up is actually adding things to the index. Each item in the index is called a Document. Documents have Fields that can be used for searching.

Creating and adding a document can be done like this:

try {
Document document = new Document();
document.add(new TextField("title", "My awesome blog",  Field.Store.YES));

} catch (IOException e){
e.printStackTrace();
}


When we’re done with adding the documents, we need to close the IndexWriter:

writer.close();


## Find important words

Let’s first define what an “important word” is. Important words are words that are the most relevant for each document in the collection.
There is a nice algorithm to determine the relevance of each word: tf-idf (short for term frequency–inverse document frequency).

The premise of this algorithm is that a word is more relevant for a document the more it appears in that document, and less relevant for a single document when it appears in more documents.

• For each document, we count how many times each word appears and we divide that by the total number of words in this document. This is the term-frequency part.
• For each word in every document, take the total number of documents and divide it by the number of documents containing this word. We don’t want this number to be too large, so we take the log of this. This is the inverse document frequency part.
• Multiply these numbers to get the relative importance of each word for every document. A Higher score means the word is more important.

The code for this algorithm consists of two classes and a value object

De first class represents a single document in the collection. It is responsible for calculating the importance of the words it contains in relation to the words of all other documents.

import lombok.Getter;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class Document {

/**
* The identifier for this document
*/
@Getter
private final String id;

/**
* The complete text for this document
*/
@Getter
private final Collection<String> lines;

/**
* Every word in this document, with the number of times it appears
*/
private Map<String, Integer> wordsInDocument;

/**
* Constructor
*/
public Document(String id, Collection<String> lines){
this.id = id;
this.lines = lines;
}

/**
* Get the map with the unique words in this document, and the number of times they appear
*/
public Map<String, Integer> getWordsInDocument(){
if (wordsInDocument == null){
calculateWordMap();
}
return wordsInDocument;
}

/**
* Calculate the number of times each unique word appears in this document
*/
private void calculateWordMap(){
wordsInDocument = new HashMap<>();
for (String line : lines){
String[] words = line.split("\\s");
for (String word : words){
if (word.trim().length() > 1) {
Integer wordCount = wordsInDocument.getOrDefault(word, Integer.valueOf(0));
wordsInDocument.put(word, wordCount+1);
}
}
}
}

/**
* Calculate the importance of each word, compared to other words in this document
* and all other documents in the index
* @param index The collection of documents that also contains this document.
* @return An ordered list indicating the importance of each word in this document.
*/
public List<WordImportance> calculateWordImportance(WordIndex index){
List<WordImportance> wordImportance = new ArrayList<>();
double totalWordsInDocument = getWordsInDocument().values().stream().mapToInt(Integer::intValue).sum();
double totalNumberOfDocuments = index.getNumberOfDocuments();
for (String word : getWordsInDocument().keySet()){
double tf = ((double) getWordsInDocument().get(word)) / totalWordsInDocument;
double idf = Math.log(totalNumberOfDocuments / ((double) index.getNumberOfDocumentsContaining(word)));
}

// most important word first
wordImportance.sort(Comparator.comparing(WordImportance::getImportance).reversed());

return wordImportance;
}
}


The next class represents the collection of all documents. It is responsible for calculating for each word the number of documents that contain it.

import lombok.Getter;

import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;

public class WordIndex {

/**
* The index, key is the identifier of the document
*/
@Getter
private Map<String, Document> index = new HashMap<>();

/**
* A map with all the words in all the documents, and the number of documents containing those words
*/
private Map<String, Integer> documentCountForWords = null;

/**
* Constructor
*/
public WordIndex(){
}

/**
* Add a document to the index, overwriting when it already exists.
* @param document the document to add
*/
if (index.containsKey(document.getId())) {
System.out.println("Overwriting document with ID "+ document.getId());
}

index.put(document.getId(), document);
}

/**
* Get all words in all documents. If a word appears multiple times, it is only returned once.
* @return All words in this document
*/
public Collection<String> getAllWords(){
Set<String> allWords = new HashSet<>();

index.values()

return allWords;
}

/**
* Get the map with number of documents per word
* @return the map with number of documents per word
*/
public Map<String, Integer> getDocumentCountForWords(){
if (documentCountForWords == null){
calculateDocumentCountForAllWords();
}
return documentCountForWords;
}

/**
* Iterate over every word in every document, and count the number of documents that word appears in.
*/
private void calculateDocumentCountForAllWords(){
Collection<String> allWords = getAllWords();
documentCountForWords = new HashMap<>();
for (String word : allWords){
for (String documentId : index.keySet()){
Map<String, Integer> document = index.get(documentId).getWordsInDocument();
if (document.keySet().stream().anyMatch(e -> e.equals(word))){
Integer count = documentCountForWords.getOrDefault(word, 0);
documentCountForWords.put(word, count+1);
}
}
}
}

/**
* Get the total number of documents
* @return
*/
public int getNumberOfDocuments(){
return index.size();
}

/**
* Get the number of documents this word appears in.
* @param word The word we're interested in
* @return The number of documents containing this word
*/
public int getNumberOfDocumentsContaining(String word){
Map<String, Integer> wordCount = getDocumentCountForWords();
return wordCount.getOrDefault(word, 0);
}
}


Then we have a value object, that is used by the document to indicate the relative importance of each word.

import lombok.AllArgsConstructor;
import lombok.Value;

@Value
@AllArgsConstructor
public class WordImportance {
String word;
double importance;
}


## The easy steps

Now it’s time to search for content for the new categories. To do this, we take most important words of the new categories and string them together, separated by spaces. Then we use that search term to search the index, and extract the result.

This is what the code for performing the search would look like:

/**
* Prepare the search engine
*/
public void initializeSearch(){
try {
analyzer = new StandardAnalyzer();

queryParser = new MultiFieldQueryParser(new String[]{"title", "shortDescription", "longDescription"}, analyzer);
} catch (IOException e){
e.printStackTrace();
}
}

/**
* Perform search
*/
public TopDocs search(Collection<String> words) throws IOException, ParseException{
String searchTerm = String.join(" ", words);
Query query = queryParser.parse(searchTerm);
TopDocs results = searcher.search(query, 200000);
return results;
}



The code to extract the search results would be something like this:

TopDocs result = search(searchTerms);
for (ScoreDoc hit : result.scoreDocs){
Document found = searcher.doc(hit.doc);
double score = hit.score;
}


## Conclusion

There are some steps that we needed to do that I haven’t mentioned. But those are mainly plumbing and finetuning.

We have seen how to use Apache Lucene as a custom search engine. First we’ve built a searchable index, and later we have searched that index for relevant items.
We have also seen how to implement an algorithm that determines the most relevant words in a specific document, compared to the other documents in a collection.

The reason this works is that words have meaning. I know, stating the obvious. Each word gives meaning to the text, and this meaning has varying degrees of relevance to that text. The words that are most relevant to the text distinguish the meaning of the text from the other texts in the collection. We don’t need to know the actual meaning of the text, we just need to separate it from all other texts. Then, through the magic of search engines, we can match the texts that have the most similar meanings.

# Spring Boot, MongoDB and raw JSON

Sometimes you want to store and retrieve raw JSON in MongoDB. With Spring Boot storing the JSON isn’t very hard, but retrieving can be a bit more challenging.

## Setting up

To start using MongoDB from Spring Boot, you add the dependency to spring-boot-starter-data-mongodb

	<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-mongodb</artifactId>
</dependency>


And then you inject MongoTemplate into your class

@Autowired
private MongoTemplate mongoTemplate;


## Inserting into MongoDB

Inserting JSON is just a matter of converting the JSON into a Document, and inserting that document into the right collection

String json = getJson();
Document doc = Document.parse(json);
mongoTemplate.insert(doc, "CollectionName");


## Retrieving JSON

Retrieving JSON is a bit more complicated. First you need to get a cursor for the collection. This allows you to iterate over all the documents within that collection. Then you’ll retrieve each document from the collection, and cast it to a BasicDBObject. Once you have that, you can retrieve the raw JSON.

DBCursor cursor = mongoTemplate.getCollection("CollectionName").find();
Iterator iterator = cursor.iterator();
while (iterator.hasNext()){
BasicDBObject next = (BasicDBObject) iterator.next();
String json = next.toJson();
// do stuff with json
}


## Transforming raw JSON to Object

With Jackson you can transform the retrieved JSON to an object. However, your object might miss a few fields, since MongoDB adds some to keep track of the stored documents. To get around this problem, you need to configure the ObjectMapper to ignore those extra fields.

ObjectMapper mapper = new ObjectMapper().configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)


# Lessons learned

Pressure makes diamonds, as the saying goes. I worked on a high-pressure project for a couple of weeks (as in, it needed to be done before we even started), and these are some of the lessons we learned as a team. The lessons are mostly tips and tricks, as we learned a lot on the job.

## General lessons learned

### Way of working

Bring (at least) two developers to the project. One will focus on the algorithm, the other will focus on the code quality and support as much as possible. Notice the choice of words: “focus”. This means that all developers do all the things, but their main task is different.
Don’t underestimate the impact of code quality. Code should be as clear as possible, so that it doesn’t get in the way of solving the business problem. When you’re constantly thinking about what the code does, you’re not thinking about how to solve the business problem. On that note, the first versions were set up as procedural. Refactor to object oriented. OO has advantages over procedural, and it would be a waste to not have access to those advantages. This refactoring was well worth the effort, as we had our codebase audited. No major flaws were encountered during the audit.

### Version control

Get a version control tool in place, and choose the one that is easiest to use. You can share code by emailing .zip files, but that’s too cumbersome. Besides, errors get made. Use git, ask around how to do that, and ignore project managers who tell you not to do this. Even a paid github repository is better than nothing.

### maven

#### Manually include dependencies

It is possible to add dependencies to the build, without the need for those dependencies to be available in a repository. You’ll include them from a /lib folder or something like that:

        <dependency>
<groupId>group.id</groupId>
<artifactId>artifact</artifactId>
<version>1.0</version>
<scope>system</scope>
<systemPath>${project.basedir}/src/test/resources/to-include.jar</systemPath> </dependency>  #### Create complete jar To build the resulting jar with dependencies, use the following command: mvn assembly:assembly -DdescriptorId=jar-with-dependencies #### Version tracking Resource filtering, to update variables in your resources with maven properties. But only variables in certain files, all other files should not be filtered because that might corrupt them:  <build> <resources> <resource> <directory>src/main/resources</directory> <filtering>false</filtering> </resource> <resource> <directory>src/main/resources</directory> <filtering>true</filtering> <includes> <include>project-version.properties</include> </includes> </resource> </resources> </build>  Contents of project-version.properties: version =${build.version}


where ${build.version} is a property in the pom file, along with the format for this timestamp: <properties> <maven.build.timestamp.format>yyyyMMdd-HHmm</maven.build.timestamp.format> <build.version>${maven.build.timestamp}</build.version>
</properties>


 mvn dependency:sources

This will allow you to inspect the actual source code when you’re in a debugging session.

#### Skip tests

There are two ways of skipping unit tests:

mvn -DskipTests <task>

Only skips _executing_ the tests. The unit tests will still be compiled

mvn -Dmaven.test.skip=true

Does not compile the tests, and therefore the tests are not executed.

### One piece of software

For testing purposes, we made our program so it ran locally. The same program could run, without modifications, on the server. We used hard-coded paths and keys for the server version, with fallbacks for the local standalone version. This allowed us to focus on the algorithms, and find/fix environments issues quite fast.

### Patching jars

We had to patch the Mendelson jars a few times, before we decided to create a maven build script for the source code.

javac -classpath <jar to be patched>;<jars containing non-local classes used by the class to be compiled> path\to\modified\file.java

Then open the jar with a zip-tool (7zip, for example), and replace the old class with the newly compiled version.

### Logging

Add as much logging as useful. This is probably more than you think. In our case, logging wasn’t showing up. So we wrote a LoggingFacade which wrote its output to the default logging framework, AND to System.out or System.err if needed.

### Debugging

Debugging will provide more information than logging, but is not always possible.
Make one version that run standalone, so you can attach a debugger while developing.
Make sure you can remotely debug the server. Start the server with debug enabled, with the following command-line parameter:

-agentlib:jdwp=transport=dt_socket,address=localhost:4000,server=y,suspend=y

This starts the program in debug mode, listening to debuggers on TCP port 4000. You can choose any port that is convenient for you.

You might need to open an SSH tunnel to your server, listening locally to port 4000, and forwarding it to localhost:4000. Notice that localhost is the localhost of the server, not the localhost from which you make the connection to the server.

Then configure your IDE to connect to a remote application.

### Spring-Boot

One of the avenues we’ve explored was to build a standalone program to intercept and process the messages in a more controllable way. Spring-Boot was introduced for this, but not continued. It is worth exploring these kinds of avenues when you’re stuck, because they might give some insight in how to continue.
Spring-Boot offers quite a lot of extras that we can use for our project, such as a standalone server (run with mvn spring-boot:run). Any application can still be run from within the IDE, because the applications still have a main() function.

Switching from application: https://stackoverflow.com/questions/23217002/how-do-i-tell-spring-boot-which-main-class-to-use-for-the-executable-jar

To test the producing service, use Postman (https://www.getpostman.com/apps)
The service can be reached with a POST request on http://localhost:8080
Body type: raw
Body contents can be found on the producing link, file is called “request.xml”

## Project specific

### Decrypting XML

The XML might have been encrypted with a cipher that isn’t available to you. Find the correct cipher in the following section:

	<xenc:EncryptionMethod Algorithm="http://www.w3.org/2009/xmlenc11#rsa-oaep">
<ds:DigestMethod xmlns:ds="http://www.w3.org/2000/09/xmldsig#" Algorithm="http://www.w3.org/2001/04/xmlenc#sha256"/>
<xenc11:MGF xmlns:xenc11="http://www.w3.org/2009/xmlenc11#" Algorithm="http://www.w3.org/2009/xmlenc11#mgf1sha256"/>
</xenc:EncryptionMethod>


Take special note of the Digest Method and the Mask Generation Function, as these might not be available to you. You need to use a third party library that implements the exact cipher that is used. In our case that is Apache Santuario.

### Initializing Santuario

Santuario must be initialized before it’s used. However, before initializing the main cryptography engine, the Internationalization framework needs to be initialized. Normally this is initialized with the locale en-US, but only the en (without the _US_ part) properties file is available. This should not be a problem, since this properties file is part of a fallback mechanism. However, in our case, this fallback mechanism doesn’t work.
First initialize Santuario with an empty resource bundle, then initialize the cryptography engine.

### Binary data

In one instance of our project, the binary file had a repeating sequence EF BF BD. This is caused by creating a String from the binary data, and requesting the bytes from that String. Strings and binary aren’t the best of friends, keep them separated!

# Class size

Imagine that you need to maintain two applications. Both are about 20.000 lines of code. Now imagine that one has about 10 classes, and the other has 200 classes. Which one would you rather work with?

I’ve had discussions about whether you should favor many classes over fewer. When you only take into account the amount of functionality delivered through those classes, it doesn’t matter. The same functionality will be in the codebase, whether there are few classes or many. Since creating a new class takes effort (not much, but still), it’s easier to have a few big ones. One could say that having a few big classes would contain related functionality in the same files.

The amount of functionality in the system isn’t the only metric. The ease of adding functionality and solving defects, and unit-testing are examples of other metrics that should be taken into account.

Big classes usually have lots of private methods. So, how are you going to write unit-tests for them? Are you going to use reflection to make those methods accessible? Are you going to write extensive setup code to reach those methods? Or are you going to extract classes containing those methods, and make them publicly accessible?

How are you going to change the functionality? How are you going to fix defects? Big classes are big, and usually it’s hard to keep track of what’s going on. Because of this, you’re spending more time figuring out what the code is doing, and what it actually should do. The clearer the intention of your code, the less time you need to spend on getting to know what it’s doing.

Personally, I prefer lots of small classes. But how do we get there? When you’re presented with a legacy project, it requires a lot of refactoring. But beware, don’t just go out and refactor. If there are no issues, and the required functionality doesn’t change, that part of the codebase is just fine. On the other hand, when you start a new project, it’s a bit easier.

One of the first thing I’d recommend is to read up on the SOLID principles. SOLID stands for Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation and Dependency Inversion. Knowing and applying these principles will help you create a well-factored system. You probably won’t be able to apply these principles all of the time, but it definitely helps to know about them.

Put some tests in place, and make sure these tests are of the highest quality. The more and better tests you have, the more secure your refactorings will be. As an added bonus, you gain knowledge of and insight in the system you’re working on. As you progress with fixing defects and implementing new functionality, the amount of code under test will increase, and the faster you can develop and refactor.

Practice Test Driven Development. Write a test, make it pass, and refactor to optimise readability. Make sure you do the last step, TDD won’t work otherwise. TDD will help you create a clear system with very high test-coverage. And that coverage will be high quality.

Use as few if-statements and switch/cases as possible.Using as few conditionals as possible makes the codebase more usable, because it forces you to use more object oriented design. You could use an inheritance structure, or a table-/map-based approach. There may be other patterns, if you’re creative enough to discover them.

In our current project there is a guideline that all of the public API must have Javadoc. I disagree with this guideline. Our project does not have a public API, since we’re the only team that works on this project and no other team is allowed to use the codebase. Public classes and methods are not the only criterium for a public API.

So I asked, how many times is the Javadoc actually read, versus looking at the actual code to see what the method does? Personally, I never look at the Javadoc, and I suspect I’m not alone in this practice.

So why don’t I read Javadoc in our project? I don’t trust it. We’re using IntelliJ Ultimate, and this IDE also finds errors in Javadoc. And there are lots and lots of errors. So I don’t trust it.

When you’re working on the code, you’re not updating the accompanying Javadoc. The reason for this is the same as why you don’t write unittests – unless you’re using TDD: you’re solving a problem, and you need your attention there.

Javadoc is useful when you’re writing a library, and actually want third parties to use it. They need this form of documentation.

As for our project, there are other ways of documenting the code. You can (and must, in my opinion) factor the code to be as clear as possible. Good naming practices are a good step towards self-documenting code.

Another way of documenting, is writing clear unittests. You don’t need to write them first, though that is often the easiest way to write them. As long as you run them, you have a form of documentation that fails if incorrect.

For the same reason, you can use the assert keyword. To enable this, you need a runtime JVM argument, by default it is disabled. This means assert statements are not executed in production, but neither is Javadoc. However, during development, asserts can provide a treasure of information and verify the correctness of your program. The assert keyword IS documentation.

No documentation is bad, wrong documentation is worse. Javadoc tends to turn wrong when the codebase changes, and should therefore be avoided if possible.

# Pomodoro 0.2 released

It has been a while since the first version of this pomodoro tracker has been released. Development has continued, and there have been a few changes:

• Backported to Java 1.7.
• Added popup dialog when Pomodori are done, to ask whether to register it as a complete one:
• Added popup dialog when breaks are done, to ask whether to start a new Pomodoro
• Added the ability to manually reset the current amount of done pomodori
• The current amount of done pomodori will can be automatically reset after X amount of minutes (default 60 minutes)
• Popup dialogs will auto-close after 5 minutes
• Reduced font size of wait screen so it will display correctly on Linux

The updated version can be downloaded here. Source code can be found here.

# Java: Remove an element from a List

One of the more common tasks in programming is removing a specific element from a list. Although this seems to be straight-forward in Java, it’s a bit more tricky.

Before we start, we should build our list:

   public ArrayList<String> createList(){
ArrayList<String> myList = new ArrayList<String>();

return myList;
}


Let’s say we want to remove the String “String 2”. The first thing that comes to mind is to loop through the list, until you find the element “String 2”, and remove that element:

   public List<String> removeFromListUsingForEach(List<String> sourceList){
for(String s : sourceList){
if (s.equals("String 2")){
sourceList.remove(s);
}
}
return sourceList;
}


Unfortunately, in this case, this will throw a

java.util.ConcurrentModificationException

I said “in this case”, because the exception is not always thrown. The details of this strange behavior is out of scope for this blogpost, but can be found here.

There are several ways to remove an element from a list. Depending on your personal preference, and which version of Java you use, here are some examples.

1. Use a for-loop which loops backwards
You can use a for-loop, which runs from the end of the list to the beginning. The reason you want to loop in this direction is, that when you’ve found the element you want to remove, you remove the element at that index. Every element after this one will shift one position towards the beginning of the list. If you’d run the loop forward, you’d have to compensate for this, which just isn’t worth the effort.

   public List<String> removeFromListUsingReversedForLoop(List<String> sourceList){
for(int i = sourceList.size()-1; i >= 0; i--){
String s = sourceList.get(i);
if (s.equals("String 2")){
sourceList.remove(i);
}
}
return sourceList;
}


This works in every Java version since 1.2, although you can’t use generics until Java 1.5.

2. Use an Iterator
Another way to remove an element from a list is to use an Iterator. The Iterator will loop through the list, and, if needed, can remove the current element from that list. This is done by calling

Iterator.remove()
   public List<String> removeFromListUsingIterator(List<String> sourceList){
Iterator<String> iter = sourceList.iterator();
while (iter.hasNext()){
if (iter.next().equals("String 2")){
iter.remove();
}
}
return sourceList;
}


This works in every Java version since 1.2, although you can’t use generics until Java 1.5.

3. Use Java 8 Streams
What you’re essentially doing here is make a copy of the list, and filter out the unwanted elements.

   public List<String> removeFromListUsingStream(List<String> sourceList){
List<String> targetList = sourceList.stream()
.filter(s -> !s.equals("String 2"))
.collect(Collectors.toList());
return targetList;
}


This works since Java 1.8. More about Java 8 can be found here.

# Java 8

On 18 march 2014, Oracle launched Java 8. I’ve had a little time to play with it. Here are my first experiences.

# Eclipse

Eclipse 4.4 (Luna) will get support for Java 8. However, Eclipse 4.3.2 (Kepler) can support Java 8 by installing a feature patch. This page shows how to install the patch.

Once installed, you’ll need to tell your projects to use java 8. First add the JDK to eclipse:

• Go to Window -> Preferences
• Go to Java -> Installed JREs
• Add Standard VM, and point to the location of the JRE
• Then go to Compiler
• Set Compiler compliance level to 1.8

Then tell the project to use JDK 1.8:

• Go to Project -> preferences
• Go to Java Compiler
•  Enable project specific settings
•  Set Compiler compliance level to 1.8

Now you should be able to develop your applications using Java 8.

# Maven

To enable Java 8 in Maven, two things need to be done:

1. Maven must use JDK 1.8
2. Your project must be Java 8 compliant.

To tell maven to use JDK 1.8, point the JAVA_HOME variable to the correct location.
For the second point, make the project Java 8 compliant, add the following snippet to you pom.xml file:

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>


# Feature: streams

Streams can be used to iterate over collections, possibly in a parallel way. This has the advantage of making use of the multi-core architecture in modern computers. But more importantly, it makes the code shorter and more readable.

Case in point, consider the following code, for getting the minimum and maximum number in an array:

   public void minMax(int[] array){
int min = array[0], max = array[0];
for (int i : array) {
if (i < min) {
min = i;
} else {
if (i > max)
max = i;
}
}
System.out.println("Max is :" + max);
System.out.println("Min is :" + min);
}


Nothing too shocking. But with Java 8 this could be done shorter and easier:

   public void java8(int[] array){
IntSummaryStatistics stats =
IntStream.of(array)
.summaryStatistics();

System.out.println("Max is :" + stats.getMax());
System.out.println("Min is :" + stats.getMin());
}


This method converts the array to an IntStream, and then collects the statistics of all numbers in that stream into an IntSummaryStatistics object. When testing this with an array of 10.000.000 items, spanning the range of 1.000.000 numbers, the performance is more than 5 times better with the first method though. The first running in 12 ms, the second in 69 ms.

# Feature: lambda expressions

The biggest new feature of Java 8 is Lambda Expressions. These are sort of inline methods, and are mostly used in combination with streams. To explain this, let’s take a look at the following pieces of code. This will get all the files ending in “.csv” from a directory.

First, using a FilenameFilter:

      File sourceDir = new File("D:\\Tools");
List<String> filteredList = Arrays.asList(sourceDir.list(new FilenameFilter(){

@Override
public boolean accept(File dir, String name)
{
return name.toLowerCase().endsWith(".csv");
}

}));


Now, using a Lambda:

      File sourceDir = new File("D:\\Tools");
List<String> filteredList = Arrays.asList(sourceDir.list())
.stream()
.filter(s -> s.toLowerCase().endsWith(".csv"))
.collect(Collectors.toList());


Notice line 4, with the filter command. This replaces the accept method in the FilenameFilter. What it effectively does is the following:

For each String in the stream:
- assign the String to s
- Call s.toLowerCase().endsWith(".csv"), this will return a boolean
- If the result is true, the String is passed to the next method in the stream
- If the result is false, the next String is evaluated

# Using JNA to get the active program on Windows

This question on StackOverflow explains how to get the currently active program on Windows. This means, the program that is in the foreground, and receiving user input. However, there’s a lot going on there that isn’t explained. And the code could use a bit of cleaning up.

The example uses JNA , or Java Native Access. JNA is a way of accessing platform dependent functions, without the development overhead that JNI (Java Native Interface) requires. You’ll need to read through the tutorial to really get going with JNA, since it’s not that easy.

# Library mapping

The first thing you need to do to access the native functions, is to map library you want to use. You can do this using an interface, or using a class. In this case, we’ll use a class. The following code will load the Process Status API.

static class Psapi
{
static
{
Native.register("psapi");
}
}


# Function mapping

When the library is mapped, you need to map the functions of that particular library you want to use. Depending on the method you chose for mapping the library, this too can be done in an interface or a class. Since we chose the class, the method will be added there.

static class Psapi
{
//mapping the library is skipped.
public static native int GetModuleBaseNameW(Pointer hProcess, Pointer hmodule, char[] lpBaseName, int size);
}


# Getting the needed functions

The example displays two properties of the active window: the title, and the name of the corresponding process. To get this information, we need the following libraries and functions:

psapi:
GetModuleBaseNameW

kernel32:
OpenProcess

In addition, we need some static fields to correctly call kernel32’s OpenProcess:

 public static int PROCESS_QUERY_INFORMATION = 0x0400; //1
public static int PROCESS_VM_READ = 0x0010; //1


# Getting the title of the active window

To get the title of the active window, we need to do the following:

• Create a buffer
• Get the active window
• Read the title to the buffer
• Convert the contents of the buffer to a String

In code, it looks like this:

private static String getActiveWindowTitle(){
char[] buffer = new char[MAX_TITLE_LENGTH * 2]; // create buffer
HWND foregroundWindow = User32DLL.GetForegroundWindow(); // get active window
User32DLL.GetWindowTextW(foregroundWindow, buffer, MAX_TITLE_LENGTH); // read title into buffer
String title =  Native.toString(buffer); // convert buffer to String
return title;
}


# Getting the active window process name

Getting the name of the process is a bit more complicated. To do this, we need the following steps:

• Create a buffer
• Create a pointer
• Get the active window
• Get a reference to the process ID
• Get a reference to the process
• Read the name of the process to the buffer
• Convert the contents of the buffer to a String

In code, it looks like this:

private static String getActiveWindowProcess(){
char[] buffer = new char[MAX_TITLE_LENGTH * 2]; // create buffer
PointerByReference pointer = new PointerByReference(); // create pointer
HWND foregroundWindow = User32DLL.GetForegroundWindow(); // get active window
User32DLL.GetWindowThreadProcessId(foregroundWindow, pointer); // Get a reference to the process ID
Pointer process = Kernel32.OpenProcess(Kernel32.PROCESS_QUERY_INFORMATION | Kernel32.PROCESS_VM_READ, false, pointer.getValue()); // get a reference to the process
Psapi.GetModuleBaseNameW(process, null, buffer, MAX_TITLE_LENGTH); // read the name of the process into buffer
String processName = Native.toString(buffer); // convert buffer to String
return processName;
}


# Full code

The complete program will check every second which window is in the foreground, and reports any changes. It will also display how long the window was in the foreground.

import com.sun.jna.Native;
import com.sun.jna.Pointer;
import com.sun.jna.platform.win32.WinDef.HWND;
import com.sun.jna.ptr.PointerByReference;

public class EnumerateWindows
{
private static final int MAX_TITLE_LENGTH = 1024;

public static void main(String[] args) throws Exception
{
String lastTitle = "none";
String lastProcess = "none";
long lastChange = System.currentTimeMillis();

while (true)
{
String currentTitle = getActiveWindowTitle();
String currentProcess = getActiveWindowProcess();
if (!lastTitle.equals(currentTitle))
{
long change = System.currentTimeMillis();
long time = (change - lastChange) / 1000;
lastChange = change;
System.out.println("Change! Last title: " + lastTitle + " lastProcess: " + lastProcess + " time: " + time + " seconds");
lastTitle = currentTitle;
lastProcess = currentProcess;
}
try
{
}
catch (InterruptedException ex)
{
// ignore
}
}
}

private static String getActiveWindowTitle()
{
char[] buffer = new char[MAX_TITLE_LENGTH * 2];
HWND foregroundWindow = User32DLL.GetForegroundWindow();
User32DLL.GetWindowTextW(foregroundWindow, buffer, MAX_TITLE_LENGTH);
String title = Native.toString(buffer);
return title;
}

private static String getActiveWindowProcess()
{
char[] buffer = new char[MAX_TITLE_LENGTH * 2];
PointerByReference pointer = new PointerByReference();
HWND foregroundWindow = User32DLL.GetForegroundWindow();
Pointer process = Kernel32.OpenProcess(Kernel32.PROCESS_QUERY_INFORMATION | Kernel32.PROCESS_VM_READ, false, pointer.getValue());
Psapi.GetModuleBaseNameW(process, null, buffer, MAX_TITLE_LENGTH);
String processName = Native.toString(buffer);
return processName;
}

static class Psapi
{
static
{
Native.register("psapi");
}

public static native int GetModuleBaseNameW(Pointer hProcess, Pointer hmodule, char[] lpBaseName, int size);
}

static class Kernel32
{
static
{
Native.register("kernel32");
}

public static int PROCESS_QUERY_INFORMATION = 0x0400;
public static int PROCESS_VM_READ = 0x0010;

public static native Pointer OpenProcess(int dwDesiredAccess, boolean bInheritHandle, Pointer pointer);
}

static class User32DLL
{
static
{
Native.register("user32");
}

public static native int GetWindowThreadProcessId(HWND hWnd, PointerByReference pref);
public static native HWND GetForegroundWindow();
public static native int GetWindowTextW(HWND hWnd, char[] lpString, int nMaxCount);
}
}