Read and parse CSV containing Key-value pairs using Akka Streams

Let's say we want to read and parse a CSV file containing Key value pairs.
We will be using Alpakka's CSVParser for this.

A snippet of a file (src/main/resources/CountryNicCurrencyKeyValueMap.csv) that shows mapping from Country NIC code to currency code with pipe (|) as field delimiter:

AD|EUR
AE|AED
AF|AFN
AG|XCD
AI|XCD
AL|ALL
AM|AMD
AN|ANG
AO|AOA
AQ|AQD
AR|ARS
AS|EUR
AT|EUR
AU|AUD
AW|ANG
AX|EUR
AZ|AZN
BA|BAM
BB|BBD
BD|BDT
BE|EUR
BF|XOF
BG|BGN
BH|BHD
BI|BIF
BJ|XOF
BL|EUR
BM|BMD
BN|BND
BO|BOB
BR|BRL
BS|BSD
BT|INR

Following is the code:

import java.io.File
import java.nio.charset.StandardCharsets

import akka.actor.ActorSystem
import akka.stream._
import akka.stream.alpakka.csv.scaladsl.CsvParsing
import akka.stream.scaladsl.{FileIO, Flow, Sink}
import akka.util.ByteString

import scala.collection.immutable
import scala.concurrent.{ExecutionContext, _}
import scala.concurrent.duration._

implicit val system: ActorSystem = ActorSystem("TestApplication")
implicit val materializer: ActorMaterializer =  ActorMaterializer()
implicit val ec: ExecutionContext = system.dispatcher

val path = getClass.getResource("CountryNicCurrencyKeyValueMap.csv").getPath
val file = new File(path)

val f: Future[Map[String, String]] = FileIO.fromPath(file.toPath)
  .via(CsvParsing.lineScanner('|','"','\\', 256))
  .via(
    Flow[immutable.Seq[ByteString]]
      .map { x =>
        (x.head.decodeString(StandardCharsets.UTF_8) -> x.tail.head.decodeString(StandardCharsets.UTF_8))
      }
  )
  .runWith(
    Sink.fold[Map[String, String], (String, String)](Map.empty[String, String])(_ + _)
  )

val purchase = f map {
  println(_)
}

Await.result(purchase , 10 seconds)

If your CSV file is embedded inside an uber JAR file, then JDK won't be able to treat it as a FileSystem. In that case following will handle both the cases:

import java.nio.charset.StandardCharsets

import akka.actor.ActorSystem
import akka.stream._
import akka.stream.alpakka.csv.scaladsl.CsvParsing
import akka.stream.scaladsl.{Flow, Sink, StreamConverters}
import akka.util.ByteString

import scala.collection.immutable
import scala.concurrent.duration._
import scala.concurrent.{ExecutionContext, _}

implicit val system: ActorSystem = ActorSystem("BulletinApplication")
implicit val materializer: ActorMaterializer =  ActorMaterializer()
implicit val ec: ExecutionContext = system.dispatcher

val f: Future[Map[String, String]] = StreamConverters.fromInputStream(() => getClass.getResourceAsStream("CountryNicCurrencyKeyValueMap.csv"))
  .via(CsvParsing.lineScanner('|','"','\\', 256))
  .via(
    Flow[immutable.Seq[ByteString]]
      .map { x =>
        (x.head.decodeString(StandardCharsets.UTF_8) -> x.tail.head.decodeString(StandardCharsets.UTF_8))
      }
  )
  .runWith(
    Sink.fold[Map[String, String], (String, String)](Map.empty[String, String])(_ + _)
  )

val purchase = f map {
  println(_)
}

Await.result(purchase , 10 seconds)

Rahul Singhai Blog

Search This Blog

Read and parse CSV containing Key-value pairs using Akka Streams

Labels

Comments

Popular posts from this blog

MPlayer subtitle font problem in Windows

Kafka performance tuning

wget and curl behind corporate proxy throws certificate is not trusted or certificate doesn't have a known issuer