Java XML Api

Who uses XML in 2021? We all use JSON these days, aren’t we? Well, it turns out XML is still being used. These code fragments could help get you up to speed when you’re new to the Java XML API.

Create an empty XML document

To start from scratch, you’ll need to create an empty document:


Document doc = DocumentBuilderFactory
                .newInstance()
                .newDocumentBuilder()
                .newDocument();

Create document from existing file

To load an XML file, use the following fragment.


File source = new File("/location/of.xml");
Document doc = DocumentBuilderFactory
                    .newInstance()
                    .newDocumentBuilder()
                    .parse(target);

Add a new element

Now that we have the document, it’s time to add some elements and attributes. Note that Document and Element are both Nodes.


final Element newNode = doc.createElement(xmlElement.getName());
newNode.setAttribute(attributeName, attributeValue);
	
node.appendChild(newNode);

Get nodes matching XPath expression

XPath is a powerful way to search your XML document. Every XPath expression can match multiple nodes, or it can match none. Here’s how to use it in Java:


final String xPathExpression = "//node";
final XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xpath.evaluate(xPathExpression, 
        doc, 
        XPathConstants.NODESET);

JSON to XML

JSON is simpler and much more in use these days. However, it has less features than XML. Namespaces and attributes are missing, for example. Usually these aren’t needed, but they can be useful. To convert JSON to XML, you could use the org.json:json dependency. This will create an XML structure similar to the input JSON.

<properties>
	<org-json.version>20201115</org-json.version>
</properties>

<dependencies>
	<dependency>
		<groupId>org.json</groupId>
		<artifactId>json</artifactId>
		<version>${org-json.version}</version>
	</dependency>
</dependencies>

JSONObject content = new JSONObject(json);
String xmlFragment = XML.toString(content);

Writing XML

When we’re done manipulating the DOM, it’s time to write the XML to file or to a String. The following fragment does the trick


private void writeToConsole(final Document doc) 
    throws TransformerException{
	final StringWriter writer = new StringWriter();
	writeToTarget(doc, new StreamResult(writer));
	System.out.println(writer.toString());
}

private void writeToFile(final Document doc, File target) 
    throws TransformerException, IOException{
	try (final FileWriter fileWriter = new FileWriter(target)) {
		final StreamResult streamResult = new StreamResult(fileWriter);
		writeToTarget(doc, streamResult);
	}
}

private void writeToTarget(final Document doc, final StreamResult target) 
    throws TransformerException {
	final Transformer xformer = TransformerFactory.newInstance()
              .newTransformer();
	xformer.transform(new DOMSource(doc), target);
}

Can you solve this puzzle?

Every once in a while, I come across this little puzzle. I have no idea who made it, or what the intended answer is.

Usually there are a lot of people giving the same answer. And although that answer is reasonable, I don’t think it’s correct. When we modify the calculation to (x * y) + x, we’ll get the answers matching the question. But we’ll need to modify the calculation, and I don’t think that is needed.

Another solution would be to take the previous answer, and add the current sum.


There’s an old joke among nerds that goes like this:

There’s a hint in that joke that points to a different answer to the puzzle. You can write numbers in different ways. In computer science we use a couple of different number systems, like binary, or Base 2, which deals with the digits 0 and 1. That’s not the only system we use. There’s also octal (Base 8, digits 0 to 7) and hexadecimal (Base 16, digits 0 to 9 and A to F). And of course we still use decimal.

That joke about 10 kind of people, we can do better:


If we now go back to the puzzle, taking into account these number systems, we can arrive at a solution that doesn’t need to modify the calculation. We only need to modify the representation of the answer.

And there we have it, the answer is 13! Which is written in Base 3 as 111. There’s no need to modify the calculation. All we needed to do is change the representation of the answer.

Leveraging Lucene

Imagine a catalog of a few hundred thousand items. These items have been labeled into a few hundred categories. Each item can be linked to up to three categories. New categories need to be added in order to make things easier to find. However, categories without content are useless. So some content needs to be linked to the new categories. Luckily both the items and the categories have a description, and that makes things easier.

The idea is simple:

  • Put all items into a searchable index.
  • For each category, find out what the most important words are.
  • Create a search term using these most important words, by just sticking them together.
  • Search the index of items for best matches.

And we’re done, sort of. It’s a bit more complicated than that, but that’s mostly because of finetuning.

Building the searchable index

This is where Apache Lucene comes in. Lucene is an open source full text indexing and search library, supported by the Apache Software Foundation.
First released in 1999, it is still in active development.

To create an index, you need to create an IndexWriter, and use it to add Documents to the index.

public IndexWriter createIndexWriter() throws IOException {
        Directory indexDir = FSDirectory.open(Paths.get(INDEX_DIR));
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig icw = new IndexWriterConfig(analyzer);
        icw.setOpenMode(IndexWriterConfig.OpenMode.CREATE);

        return new IndexWriter(indexDir, icw);
    }

Only one writer is needed for adding items to the index. Note that IndexWriterConfig.OpenMode.CREATE will create a new index.
If there is anything already in the index, it will be removed. IndexWriterConfig.OpenMode.CREATE_OR_APPEND could be used if you want to add to an existing index.

Next up is actually adding things to the index. Each item in the index is called a Document. Documents have Fields that can be used for searching.

Creating and adding a document can be done like this:

try {
	Document document = new Document();
	document.add(new StringField("url", "http://ghyze.nl", Field.Store.YES));
	document.add(new TextField("title", "My awesome blog",  Field.Store.YES));
	document.add(new TextField("description", "Blogging about things",  Field.Store.YES));

	writer.addDocument(document);
} catch (IOException e){
	e.printStackTrace();
}

When we’re done with adding the documents, we need to close the IndexWriter:

writer.close();

Find important words

Let’s first define what an “important word” is. Important words are words that are the most relevant for each document in the collection.
There is a nice algorithm to determine the relevance of each word: tf-idf (short for term frequency–inverse document frequency).

The premise of this algorithm is that a word is more relevant for a document the more it appears in that document, and less relevant for a single document when it appears in more documents.

  • For each document, we count how many times each word appears and we divide that by the total number of words in this document. This is the term-frequency part.
  • For each word in every document, take the total number of documents and divide it by the number of documents containing this word. We don’t want this number to be too large, so we take the log of this. This is the inverse document frequency part.
  • Multiply these numbers to get the relative importance of each word for every document. A Higher score means the word is more important.

The code for this algorithm consists of two classes and a value object

De first class represents a single document in the collection. It is responsible for calculating the importance of the words it contains in relation to the words of all other documents.

import lombok.Getter;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class Document {

    /**
     * The identifier for this document
     */
    @Getter
    private final String id;

    /**
     * The complete text for this document
     */
    @Getter
    private final Collection<String> lines;

    /**
     * Every word in this document, with the number of times it appears
     */
    private Map<String, Integer> wordsInDocument;

    /**
     * Constructor
     */
    public Document(String id, Collection<String> lines){
        this.id = id;
        this.lines = lines;
    }

    /**
     * Get the map with the unique words in this document, and the number of times they appear
     */
    public Map<String, Integer> getWordsInDocument(){
        if (wordsInDocument == null){
            calculateWordMap();
        }
        return wordsInDocument;
    }

    /**
     * Calculate the number of times each unique word appears in this document
     */
    private void calculateWordMap(){
        wordsInDocument = new HashMap<>();
        for (String line : lines){
            String[] words = line.split("\\s");
            for (String word : words){
                if (word.trim().length() > 1) {
                    Integer wordCount = wordsInDocument.getOrDefault(word, Integer.valueOf(0));
                    wordsInDocument.put(word, wordCount+1);
                }
            }
        }
    }

    /**
     * Calculate the importance of each word, compared to other words in this document
     * and all other documents in the index
     * @param index The collection of documents that also contains this document.
     * @return An ordered list indicating the importance of each word in this document.
     */
    public List<WordImportance> calculateWordImportance(WordIndex index){
        List<WordImportance> wordImportance = new ArrayList<>();
        double totalWordsInDocument = getWordsInDocument().values().stream().mapToInt(Integer::intValue).sum();
        double totalNumberOfDocuments = index.getNumberOfDocuments();
        for (String word : getWordsInDocument().keySet()){
            double tf = ((double) getWordsInDocument().get(word)) / totalWordsInDocument;
            double idf = Math.log(totalNumberOfDocuments / ((double) index.getNumberOfDocumentsContaining(word)));
            wordImportance.add(new WordImportance(word, tf*idf));
        }

        // most important word first
        wordImportance.sort(Comparator.comparing(WordImportance::getImportance).reversed());

        return wordImportance;
    }
}

The next class represents the collection of all documents. It is responsible for calculating for each word the number of documents that contain it.

import lombok.Getter;

import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;

public class WordIndex {

    /**
     * The index, key is the identifier of the document
     */
    @Getter
    private Map<String, Document> index = new HashMap<>();

    /**
     * A map with all the words in all the documents, and the number of documents containing those words
     */
    private Map<String, Integer> documentCountForWords = null;

    /**
     * Constructor
     */
    public WordIndex(){
    }

    /**
     * Add a document to the index, overwriting when it already exists.
     * @param document the document to add
     */
    public void addDocument(Document document) {
        if (index.containsKey(document.getId())) {
            System.out.println("Overwriting document with ID "+ document.getId());
        }

        index.put(document.getId(), document);
    }

    /**
     * Get all words in all documents. If a word appears multiple times, it is only returned once.
     * @return All words in this document
     */
    public Collection<String> getAllWords(){
        Set<String> allWords = new HashSet<>();

        index.values()
                .forEach(e -> allWords.addAll(e.getWordsInDocument().keySet()));

        return allWords;
    }

    /**
     * Get the map with number of documents per word
     * @return the map with number of documents per word
     */
    public Map<String, Integer> getDocumentCountForWords(){
        if (documentCountForWords == null){
            calculateDocumentCountForAllWords();
        }
        return documentCountForWords;
    }

    /**
     * Iterate over every word in every document, and count the number of documents that word appears in.
     */
    private void calculateDocumentCountForAllWords(){
        Collection<String> allWords = getAllWords();
        documentCountForWords = new HashMap<>();
        for (String word : allWords){
            for (String documentId : index.keySet()){
                Map<String, Integer> document = index.get(documentId).getWordsInDocument();
                if (document.keySet().stream().anyMatch(e -> e.equals(word))){
                    Integer count = documentCountForWords.getOrDefault(word, 0);
                    documentCountForWords.put(word, count+1);
                }
            }
        }
    }

    /**
     * Get the total number of documents
     * @return
     */
    public int getNumberOfDocuments(){
        return index.size();
    }

    /**
     * Get the number of documents this word appears in.
     * @param word The word we're interested in
     * @return The number of documents containing this word
     */
    public int getNumberOfDocumentsContaining(String word){
        Map<String, Integer> wordCount = getDocumentCountForWords();
        return wordCount.getOrDefault(word, 0);
    }
}

Then we have a value object, that is used by the document to indicate the relative importance of each word.

import lombok.AllArgsConstructor;
import lombok.Value;

@Value
@AllArgsConstructor
public class WordImportance {
    String word;
    double importance;
}

The easy steps

Now it’s time to search for content for the new categories. To do this, we take most important words of the new categories and string them together, separated by spaces. Then we use that search term to search the index, and extract the result.

This is what the code for performing the search would look like:

/** 
 * Prepare the search engine
 */
public void initializeSearch(){
	try {
		indexReader = DirectoryReader.open(FSDirectory.open(Paths.get(BuildStudyIndex.INDEX_DIR)));
		searcher = new IndexSearcher(indexReader);
		analyzer = new StandardAnalyzer();

		queryParser = new MultiFieldQueryParser(new String[]{"title", "shortDescription", "longDescription"}, analyzer);
	} catch (IOException e){
		e.printStackTrace();
	}
}

/**
 * Perform search
 */
public TopDocs search(Collection<String> words) throws IOException, ParseException{
	String searchTerm = String.join(" ", words);
	Query query = queryParser.parse(searchTerm);
	TopDocs results = searcher.search(query, 200000);
	return results;
}

The code to extract the search results would be something like this:

TopDocs result = search(searchTerms);
for (ScoreDoc hit : result.scoreDocs){
	Document found = searcher.doc(hit.doc);
	double score = hit.score;
}

Conclusion

There are some steps that we needed to do that I haven’t mentioned. But those are mainly plumbing and finetuning.

We have seen how to use Apache Lucene as a custom search engine. First we’ve built a searchable index, and later we have searched that index for relevant items.
We have also seen how to implement an algorithm that determines the most relevant words in a specific document, compared to the other documents in a collection.

The reason this works is that words have meaning. I know, stating the obvious. Each word gives meaning to the text, and this meaning has varying degrees of relevance to that text. The words that are most relevant to the text distinguish the meaning of the text from the other texts in the collection. We don’t need to know the actual meaning of the text, we just need to separate it from all other texts. Then, through the magic of search engines, we can match the texts that have the most similar meanings.

Be agile, don’t do “Agile”

Over the last decade I’ve been involved in several projects that are doing Agile. Most of them were using Scrum, and most of them didn’t really deliver what was promised. The promise of Scrum is that the team delivers more value, but what it actually delivers is more reporting tools and ceremonies to keep managers busy. Functionality is also delivered, but I’ve found that the process is more of a burden than it helps.
Our team at Studyportals uses something inspired on GrowthBan, which is a variation on Kanban aimed at teams that are concerned with growth. I’m not really a fan of naming things, because that leads to cargo-culting. “Hey, look! That team uses GrowthBan, and it works for them! Let’s use that too!”. And then you implement the rituals and stuff, and you find out that it may not be working for you at all.

Why does this work for us?

Team setup

First of all, we have a multi-disciplinary team with access to anybody in our organisation that might be able to help us. Our team consists of the following roles:

  • A Product Owner who decides what we should work on
  • A Marketeer who handles the marketing side of our product, like partnerships
  • A Growth Hacker who performs experiments to increase the KPIs of our product
  • A Team Lead who makes sure that the team performs optimally
  • Two frontend engineers, one of them also has another role.
  • A backend engineer
  • Since we’re not doing scrum, we don’t have a scrummaster. We invented the role of Mastermaster instead.

Since our team is not a pure engineering team, we have more knowledge about what can be done and why we do it. This helps us to deliver the right things faster. Each of the team members has his own area of expertise, and is trusted to do the best he can.

Autonomy

What I’m used to in Scrum is that the product owner, toghether with business analysts and architects decide what needs to be done. They create an epic, divide it up in stories, and describe what they would like to see as outcome. In this way, Scrum is a bit like mini-waterfall. Engineers only have a say in how these stories are built, not in what we should work on.
Since our team setup is a bit different, we have actual experts in the domains that we are working on. Everybody can, and should, create stories for work that they think is important. In the end it’s the product owner that decides on the priority, but he can be influenced.

Rhytm

While on projects that use Scrum, I was used to a rhytm om two to four weeks. We now have a rhytm of one week: we have a week planning on monday morning, and a short retrospective on thursday. Fridays are always a strange day here, since we have hack-days. During our week planning session on monday, we look back on the previous week, and determine the focus for the upcomming week. What we don’t do is commit to the work that must be finished at the end of the week. If we’re finished early, we start on new stories. If stories are not finished (because, unfortunate accidents happen), we continue next week. Every day we have a normal standup meeting, where we look at the tasks at hand. I did this before, and I prefer this over the format where you tell about what you did, what you’re going to do and if you need help. At the end of the standup meeting, we take a short look at our KPIs to see if we’re still on track.

Looking back

Retrospective is the time where you look back on the past sprint. Most of the time this session is spent on trying to make the process better and more effective. We also do this.
However, we also look at the effect of our work. Did we improve our google ranking? Did we increase the speed of our website? Did google pick up our content changes? Did we make a difference? Success is celebrated.
We also make a big deal about kicking off and finishing stories. Each story that we start to work on is kicked off, and the team is informed about that. Each story that is finished is burned, and celebrated.

Agile

In the end, it looks like we’re finally doing what the agile manifesto told us to do. Working together is more important than our processes (though we have them), we deliver working software (though we also document), we track what our users are doing to see if they like what we did, and when we need to change course, we do. We still have plans and things that we would like to work on.

Spring Boot, MongoDB and raw JSON

Sometimes you want to store and retrieve raw JSON in MongoDB. With Spring Boot storing the JSON isn’t very hard, but retrieving can be a bit more challenging.

Setting up

To start using MongoDB from Spring Boot, you add the dependency to spring-boot-starter-data-mongodb

	<dependency>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-data-mongodb</artifactId>
	</dependency>

And then you inject MongoTemplate into your class

@Autowired
private MongoTemplate mongoTemplate;

Inserting into MongoDB

Inserting JSON is just a matter of converting the JSON into a Document, and inserting that document into the right collection

String json = getJson();
Document doc = Document.parse(json);
mongoTemplate.insert(doc, "CollectionName");

Retrieving JSON

Retrieving JSON is a bit more complicated. First you need to get a cursor for the collection. This allows you to iterate over all the documents within that collection. Then you’ll retrieve each document from the collection, and cast it to a BasicDBObject. Once you have that, you can retrieve the raw JSON.

DBCursor cursor = mongoTemplate.getCollection("CollectionName").find();
Iterator iterator = cursor.iterator();
while (iterator.hasNext()){
   BasicDBObject next = (BasicDBObject) iterator.next();
   String json = next.toJson();
   // do stuff with json
}

Transforming raw JSON to Object

With Jackson you can transform the retrieved JSON to an object. However, your object might miss a few fields, since MongoDB adds some to keep track of the stored documents. To get around this problem, you need to configure the ObjectMapper to ignore those extra fields.

ObjectMapper mapper = new ObjectMapper().configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
MyObject object = mapper.readValue(json, MyObject.class);

Lessons learned

Pressure makes diamonds, as the saying goes. I worked on a high-pressure project for a couple of weeks (as in, it needed to be done before we even started), and these are some of the lessons we learned as a team. The lessons are mostly tips and tricks, as we learned a lot on the job.

General lessons learned

Way of working

Bring (at least) two developers to the project. One will focus on the algorithm, the other will focus on the code quality and support as much as possible. Notice the choice of words: “focus”. This means that all developers do all the things, but their main task is different.
Don’t underestimate the impact of code quality. Code should be as clear as possible, so that it doesn’t get in the way of solving the business problem. When you’re constantly thinking about what the code does, you’re not thinking about how to solve the business problem. On that note, the first versions were set up as procedural. Refactor to object oriented. OO has advantages over procedural, and it would be a waste to not have access to those advantages. This refactoring was well worth the effort, as we had our codebase audited. No major flaws were encountered during the audit.

Version control

Get a version control tool in place, and choose the one that is easiest to use. You can share code by emailing .zip files, but that’s too cumbersome. Besides, errors get made. Use git, ask around how to do that, and ignore project managers who tell you not to do this. Even a paid github repository is better than nothing.

maven

Manually include dependencies

It is possible to add dependencies to the build, without the need for those dependencies to be available in a repository. You’ll include them from a /lib folder or something like that:

        <dependency>
            <groupId>group.id</groupId>
            <artifactId>artifact</artifactId>
            <version>1.0</version>
            <scope>system</scope>
            <systemPath>${project.basedir}/src/test/resources/to-include.jar</systemPath>
        </dependency>

Create complete jar

To build the resulting jar with dependencies, use the following command:

mvn assembly:assembly -DdescriptorId=jar-with-dependencies

Version tracking

Resource filtering, to update variables in your resources with maven properties. But only variables in certain files, all other files should not be filtered because that might corrupt them:

   <build>
        <resources>
            <resource>
                <directory>src/main/resources</directory>
                <filtering>false</filtering>
            </resource>
            <resource>
                <directory>src/main/resources</directory>
                <filtering>true</filtering>
                <includes>
                    <include>project-version.properties</include>
                </includes>
            </resource>
        </resources>
    </build>

Contents of project-version.properties:

version = ${build.version}

where ${build.version} is a property in the pom file, along with the format for this timestamp:

<properties>
   <maven.build.timestamp.format>yyyyMMdd-HHmm</maven.build.timestamp.format>
   <build.version>${maven.build.timestamp}</build.version>
</properties>

Download sources

To download all sources from the dependencies (when available), type

 mvn dependency:sources

This will allow you to inspect the actual source code when you’re in a debugging session.

Skip tests

There are two ways of skipping unit tests:

mvn -DskipTests <task>

Only skips _executing_ the tests. The unit tests will still be compiled

mvn -Dmaven.test.skip=true

Does not compile the tests, and therefore the tests are not executed.

One piece of software

For testing purposes, we made our program so it ran locally. The same program could run, without modifications, on the server. We used hard-coded paths and keys for the server version, with fallbacks for the local standalone version. This allowed us to focus on the algorithms, and find/fix environments issues quite fast.

Patching jars

We had to patch the Mendelson jars a few times, before we decided to create a maven build script for the source code.

javac -classpath <jar to be patched>;<jars containing non-local classes used by the class to be compiled> path\to\modified\file.java

Then open the jar with a zip-tool (7zip, for example), and replace the old class with the newly compiled version.

Logging

Add as much logging as useful. This is probably more than you think. In our case, logging wasn’t showing up. So we wrote a LoggingFacade which wrote its output to the default logging framework, AND to System.out or System.err if needed.

Debugging

Debugging will provide more information than logging, but is not always possible.
Make one version that run standalone, so you can attach a debugger while developing.
Make sure you can remotely debug the server. Start the server with debug enabled, with the following command-line parameter:

-agentlib:jdwp=transport=dt_socket,address=localhost:4000,server=y,suspend=y

This starts the program in debug mode, listening to debuggers on TCP port 4000. You can choose any port that is convenient for you.

You might need to open an SSH tunnel to your server, listening locally to port 4000, and forwarding it to localhost:4000. Notice that localhost is the localhost of the server, not the localhost from which you make the connection to the server.

Then configure your IDE to connect to a remote application.

Spring-Boot

One of the avenues we’ve explored was to build a standalone program to intercept and process the messages in a more controllable way. Spring-Boot was introduced for this, but not continued. It is worth exploring these kinds of avenues when you’re stuck, because they might give some insight in how to continue.
Spring-Boot offers quite a lot of extras that we can use for our project, such as a standalone server (run with mvn spring-boot:run). Any application can still be run from within the IDE, because the applications still have a main() function.

Links:
About the producing service: https://spring.io/guides/gs/producing-web-service/
About the consuming service: https://spring.io/guides/gs/consuming-web-service/
Switching from application: https://stackoverflow.com/questions/23217002/how-do-i-tell-spring-boot-which-main-class-to-use-for-the-executable-jar

To test the producing service, use Postman (https://www.getpostman.com/apps)
The service can be reached with a POST request on http://localhost:8080
Headers: content-type: text/xml
Body type: raw
Body contents can be found on the producing link, file is called “request.xml”

Project specific

Decrypting XML

The XML might have been encrypted with a cipher that isn’t available to you. Find the correct cipher in the following section:

	<xenc:EncryptionMethod Algorithm="http://www.w3.org/2009/xmlenc11#rsa-oaep">
		<ds:DigestMethod xmlns:ds="http://www.w3.org/2000/09/xmldsig#" Algorithm="http://www.w3.org/2001/04/xmlenc#sha256"/>
		<xenc11:MGF xmlns:xenc11="http://www.w3.org/2009/xmlenc11#" Algorithm="http://www.w3.org/2009/xmlenc11#mgf1sha256"/>
	</xenc:EncryptionMethod>

Take special note of the Digest Method and the Mask Generation Function, as these might not be available to you. You need to use a third party library that implements the exact cipher that is used. In our case that is Apache Santuario.

Initializing Santuario

Santuario must be initialized before it’s used. However, before initializing the main cryptography engine, the Internationalization framework needs to be initialized. Normally this is initialized with the locale en-US, but only the en (without the _US_ part) properties file is available. This should not be a problem, since this properties file is part of a fallback mechanism. However, in our case, this fallback mechanism doesn’t work.
First initialize Santuario with an empty resource bundle, then initialize the cryptography engine.

Binary data

In one instance of our project, the binary file had a repeating sequence EF BF BD. This is caused by creating a String from the binary data, and requesting the bytes from that String. Strings and binary aren’t the best of friends, keep them separated!

JVMCon

On 30 January 2018, the first edition of JVMCon was organised. It was a small conference, only a couple of hundred attendees, but it was sold-out anyway. I attended five sessions, and I will list them in order of least to most awesome.

The first session I attended was by Alexander Yadayadayada (his joke, not mine) about gamification to increase code quality. Most of the ideas came down to “make it a contest!”. And most of the ideas that he tried so far weren’t very succesful. When you make a set of rules for engineers, they will always find ways to game the system and get ahead without any effort. For example, one point per unittest: create empty unittests. Give someone a taco on slack for doing something awesome: “Hey, I don’t have any tacos, could anyone throw me some?” But the biggest problem seems to be that people usually decide to stop playing, because the game isn’t balanced properly.

Hadi Hariri’s session was named Kotlin 102. It didn’t cover the basics of Kotlin, but a bit more advanced stuff. Hadi had a live-demo presentation, which is always impressive. However, he was talking to a tough crowd. Perhaps the reason was because his session was after lunch. Or maybe Kotlin isn’t known well enough yet to get into the more advanced stuff.

The third session in this list is that of Angelo van der Sijpt: What you didn’t know you wanted to know about the JVM. This one started to tickle my nerdy-senses. He spoke of Java, bytecode, C and even assembler, right down to the individual instructions that will be executed by the CPU. Then there were bits about how the memory is really used. There was a quizz: is this word a CPU instruction or not? Hint, IDCLIP is not. It was an interesting talk, but a bit too advanced for me.

Then there is Venkat Subramaniam’s talk about Java 9 Modularization. Modules are here, and they are here to stay. However, I don’t think the world is ready for it just yet. And with the new Java release cycle (And totally messed-up versioning system. WTF Oracle, really?), I don’t think there will be many production systems that will run Java 9. Anyway, Venkat started his talk with the remark “If there are any questions, please interrupt”. Then he started to spew information faster than the audience could process it. So, when you wanted to ask a question, he was already three topics ahead. He also had a live demo, which didn’t always go as planned. But then again, he disguised a typo with a joke: “If you do this, things go so wrong that you don’t even get an error. You get an ERRRO!” If you want to go to an information rollercoaster, see Venkat live.

Take the sum of the awesomeness of all the previous talks, and then multiply it with the sum of their nerdyness, and you’re still not even close to the last talk. This one was in a league of it’s own: Don’t hack the platform? by Jan Ouwens. If you need inspiration for messing with your colleagues, this is the one for you. From Unicode-hacks to overwriting JVM constants to changing complete method implementations. …On a running system. …remotely. This guy had some evil, evil hacks.

Class size

Imagine that you need to maintain two applications. Both are about 20.000 lines of code. Now imagine that one has about 10 classes, and the other has 200 classes. Which one would you rather work with?

I’ve had discussions about whether you should favor many classes over fewer. When you only take into account the amount of functionality delivered through those classes, it doesn’t matter. The same functionality will be in the codebase, whether there are few classes or many. Since creating a new class takes effort (not much, but still), it’s easier to have a few big ones. One could say that having a few big classes would contain related functionality in the same files.

The amount of functionality in the system isn’t the only metric. The ease of adding functionality and solving defects, and unit-testing are examples of other metrics that should be taken into account.

Big classes usually have lots of private methods. So, how are you going to write unit-tests for them? Are you going to use reflection to make those methods accessible? Are you going to write extensive setup code to reach those methods? Or are you going to extract classes containing those methods, and make them publicly accessible?

How are you going to change the functionality? How are you going to fix defects? Big classes are big, and usually it’s hard to keep track of what’s going on. Because of this, you’re spending more time figuring out what the code is doing, and what it actually should do. The clearer the intention of your code, the less time you need to spend on getting to know what it’s doing.

Personally, I prefer lots of small classes. But how do we get there? When you’re presented with a legacy project, it requires a lot of refactoring. But beware, don’t just go out and refactor. If there are no issues, and the required functionality doesn’t change, that part of the codebase is just fine. On the other hand, when you start a new project, it’s a bit easier.

One of the first thing I’d recommend is to read up on the SOLID principles. SOLID stands for Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation and Dependency Inversion. Knowing and applying these principles will help you create a well-factored system. You probably won’t be able to apply these principles all of the time, but it definitely helps to know about them.

Put some tests in place, and make sure these tests are of the highest quality. The more and better tests you have, the more secure your refactorings will be. As an added bonus, you gain knowledge of and insight in the system you’re working on. As you progress with fixing defects and implementing new functionality, the amount of code under test will increase, and the faster you can develop and refactor.

Practice Test Driven Development. Write a test, make it pass, and refactor to optimise readability. Make sure you do the last step, TDD won’t work otherwise. TDD will help you create a clear system with very high test-coverage. And that coverage will be high quality.

Use as few if-statements and switch/cases as possible.Using as few conditionals as possible makes the codebase more usable, because it forces you to use more object oriented design. You could use an inheritance structure, or a table-/map-based approach. There may be other patterns, if you’re creative enough to discover them.

Javadoc

In our current project there is a guideline that all of the public API must have Javadoc. I disagree with this guideline. Our project does not have a public API, since we’re the only team that works on this project and no other team is allowed to use the codebase. Public classes and methods are not the only criterium for a public API.

So I asked, how many times is the Javadoc actually read, versus looking at the actual code to see what the method does? Personally, I never look at the Javadoc, and I suspect I’m not alone in this practice.

So why don’t I read Javadoc in our project? I don’t trust it. We’re using IntelliJ Ultimate, and this IDE also finds errors in Javadoc. And there are lots and lots of errors. So I don’t trust it.

When you’re working on the code, you’re not updating the accompanying Javadoc. The reason for this is the same as why you don’t write unittests – unless you’re using TDD: you’re solving a problem, and you need your attention there.

Javadoc is useful when you’re writing a library, and actually want third parties to use it. They need this form of documentation.

As for our project, there are other ways of documenting the code. You can (and must, in my opinion) factor the code to be as clear as possible. Good naming practices are a good step towards self-documenting code.

Another way of documenting, is writing clear unittests. You don’t need to write them first, though that is often the easiest way to write them. As long as you run them, you have a form of documentation that fails if incorrect.

For the same reason, you can use the assert keyword. To enable this, you need a runtime JVM argument, by default it is disabled. This means assert statements are not executed in production, but neither is Javadoc. However, during development, asserts can provide a treasure of information and verify the correctness of your program. The assert keyword IS documentation.

A nice Stackoverflow thread about assert can be found here: https://stackoverflow.com/questions/1208404/do-you-use-assertions

No documentation is bad, wrong documentation is worse. Javadoc tends to turn wrong when the codebase changes, and should therefore be avoided if possible.

Internet of Things

On 19 April 2017 about 1300 tech enthusiasts gathered in Utrecht for the IoT tech day. During this day I noticed two themes: on the one hand there were project demos, on the other hand there were talks about why you should use certain products and services to connect your projects. This showed me the state of IoT at this moment: it’s still early on the technology curve. This means that the tools and frameworks need to be developed further, and, more importantly, that standards still need to emerge. This in turn will lead to new ways of thinking about computing.

Part of the problem is that the term Internet of Things (IoT) is not very well defined yet. Is virtual/augmented/mixed reality part of IoT? How about cloud computing? Artificial Intelligence or Machine Learning? Mobile apps?

In my view, IoT consists of three parts: data collection using sensors, data processing using a cloud infrastructure, and presentation. In this regard, IoT is not different from any computer program. The difference is in the way the data is collected and processed, the infrastructure of IoT. In traditional computing, the data is sourced from a few sources, processed, and presented to the user. In IoT data is generated from thousands, perhaps millions of sensors. Due to the volume of data that is potentially generated, processing speed and storage becomes a challenge.

This is where the infrastructure comes in. Instead of one monolithic program, running on a giant machine, we build and connect any number of (micro-) services, running on multiple servers, located all around the world. These services aren’t written in a single language: we can choose whatever seems best for that particular step.

This is a game-changer. The role of a software engineer is not just to translate functional requirements to source-code. Maintaining changeability is also a key objective. We do this by abstracting complexity away, and organizing the code so its intention is as clear as possible. Methods should do only one thing, and they should do it well. Classes should have one responsibility, one reason to change. Now we can add services to this list, although I’m not sure what role these services will play in the abstraction and organization of code.

For now, the Internet of Things is the domain of tech-enthusiasts and big companies cashing in on infrastructure. However, this is the next revolution in computing. The personal computer, the Internet, and mobile each supplied their own challenges and solutions. Each revolution built on the previous one. IoT is the next revolution moving computing forward.