Apache Kafka is a high-throughput, low-latency event processing system designed to handle millions of data feeds per second. It has the capability to store, read and analyze streaming data. In an object-based system, what we are interested is the objects and its interaction with one another and the state of each object which we may store in a database. In an event-based system, our interest will be more on events generated by those objects or the system than the objects itself. Event data is stored in a structure called log and log is an ordered sequence of these events. Unlike databases, these logs are easy to scale. And Apache Kafka is a system to manage and process these logs. The main components of Kafka system are

Continue Reading ...

Parquet is a binary file format designed with big data in mind where we must access data frequently and efficiently. The way it stores file on the disk is also different from other file formats. It is a column-based data file. And in reality it uses both row based and column based approach to bring the best of both worlds. The data is encoded on disk which ensures that the size remains small compared to actual data and is then compressed where the file is scanned as whole and cut out redundant parts. The query/read speed is dramatically fast when compared to other file formats. Nested data is handled efficiently which is quite cumbersome in other file format to achieve. Doesn’t require to parse the entire file to find data due to its way of storing data. This makes it efficient in reading data. Works quite efficiently with data processing frameworks. Automatically stores schema information. SQL querying is possible with this file format using proper tools.

Data formats:

Data formats can be

  • Unstructured – When there is no specific structure. e.g Text, csv
  • Semi structured – XML, Json
  • Structured – Has records and rows, well defined schema, has very predictable locations where you can find the data - SQL, Parquet
Continue Reading ...

Here in the scenario, what I have was a folder repository where users will commit the changes. Users will treat the folder which is in a network location and will commit the changes to this repository. Now I want this to be moved to a bit bucket server so that I can use the proper process to get the changes merged. Here the challenge is to retain the history of changes without losing it while porting. Let’s see how this can be achieved. for this what we need is

Continue Reading ...

It seems to be difficult to check the read permission on a directory in Delphi 2009. Here we will see how to find the read permission in an alternate way. In later version of Delphi we have more straight way of achieving this.

Continue Reading ...

In this post we will evaluate a scenario where we try to execute a process and are trying to collect the output from the process but the output from a process is not collected by the calling process. If you are in this page, then you must have experienced the similar issue where the process is tested to collect the output from the process which we executed but it fails intermittently while collecting the output from the process. You might have done everything right by collecting the output from the process in an asynchronous manner but still it fails to collect the output in between. If you are using process.WaitForExit with a timeout, then it is the culprit. process.WaitForExit with a timeout is known to create issue when we have some parallelism in place and we are trying to execute say the same process multiple times in parallel or many different processes in parallel. The way to get out of this is to wait for the process without a timeout. This may have practical difficulties because if we don’t have a timeout, then there can be situation where we may wait forever for the process to exit. The way to get out of this is to run the process and wait for it without a timeout but timeout via the thread which is executing it. The below example will show you how to implement this.

Continue Reading ...