Saturday, May 24, 2014

Parsing a file with Stream API in Java 8

Streams are everywhere in Java 8. Just look around and for sure you will find them. It also applies to java.io.BufferedReader. Parsing a file in Java 8 with Stream API is extremely easy.

I have a CSV file that I want to be read. An example below:

username;visited
jdoe;10
kolorobot;4

A contract for my reader is to provide a header as list of strings and all records as list of lists of strings. My reader accepts java.io.Reader as a source to read from.

I will start with reading the header. The algorithm for reading the header is as follows:
  • Open a source for reading,
  • Get the first line and parse it,
  • Split line by a separator,
  • Get the first line and parse it,
  • Convert the line to list of strings and return.
And the implementation:

class CsvReader {

    private static final String SEPARATOR = ";";

    private final Reader source;

    CsvReader(Reader source) {
        this.source = source;
    }
    List<String> readHeader() {
        try (BufferedReader reader = new BufferedReader(source)) {
            return reader.lines()
                    .findFirst()
                    .map(line -> Arrays.asList(line.split(SEPARATOR)))
                    .get();
        } catch (IOException e) {
            throw new UncheckedIOException(e);
        }
    }    
}

Fairly simple. Self-explanatory. Similarly, I created a method to read all records. The algorithm for reading the records is as follows:
  • Open a source for reading,
  • Skip the first line,
  • Split line by a separator,
  • Apply a mapper on each line that maps a line to a list of strings
And the implementation:

class CsvReader {

    List<List<String>> readRecords() {
        try (BufferedReader reader = new BufferedReader(source)) {
            return reader.lines()
                    .substream(1)
                    .map(line -> Arrays.asList(line.split(separator)))
                    .collect(Collectors.toList());
        } catch (IOException e) {
            throw new UncheckedIOException(e);
        }
    }  
}

Nothing fancy here. What you could notice that a mapper in both methods is exactly the same. In fact, it can be easily extracted to a variable:

Function<String, List<String>> mapper 
    = line -> Arrays.asList(line.split(separator));

To finish up, I created a simple test.

public class CsvReaderTest {

    @Test
    public void readsHeader() {
        CsvReader csvReader = createCsvReader();
        List<String> header = csvReader.readHeader();
        assertThat(header)
                .contains("username")
                .contains("visited")
                .hasSize(2);
    }

    @Test
    public void readsRecords() {
        CsvReader csvReader = createCsvReader();
        List<List<String>> records = csvReader.readRecords();
        assertThat(records)
                .contains(Arrays.asList("jdoe", "10"))
                .contains(Arrays.asList("kolorobot", "4"))
                .hasSize(2);
    }

    private CsvReader createCsvReader() {
        try {
            Path path = Paths.get("src/test/resources", "sample.csv");
            Reader reader = Files.newBufferedReader(
                path, Charset.forName("UTF-8"));
            return new CsvReader(reader);
        } catch (IOException e) {
            throw new UncheckedIOException(e);
        }
    }
}

3 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Thank you for sharing! I'm really glad that streams have been introduced into Java 8.

    I have just two small remarks:

    It seems there is a typo in CsvReader in line 8. It is a recursive constructor invocation and it will not compile. I assume there should be: this.source = source;

    In real life we probably wouldn't want to access/read the same file twice just to parse header and body, because of performance cost (even if only the first line is read in the first example). I know that these are only the examples, but still I had to write about it ;)

    ReplyDelete
  3. Thanks for your comment.

    You are right. This is for visualization purpose. It is not a real life example. It only shows how simple it is to use Streams on BufferedReader.

    ReplyDelete