Kafka, Sadly Its Time To Part Ways

I had big dreams for the perfect union between my company and Kafka.  I could see jagigabytes (technical term for a huge number) upon jagigabytes of data passing through our network from the massive clickstream data that we would produce.  The power of having our data in-house and not relying on the paid services to store and cull our data was huge.

That was the dream; and now for the reality :(.  We have tried to bend the will of Kafka to meet our use case but Kafka didn’t break.  I wanted badly for the pub/sub application to be able to work at our small scale.  When I say our scale, I mean somewhere south of 1000 messages per day for business transactions purposes.

My thinking was that if we could get it to work at our scale, then we would have learned a great deal to help us with my grander vision.  I can say that I achieved the goal of learning, but not much more.

The first issue that we had was the messages were not returned from the queue during a single fetch request.  I saw that during development, but I didn’t pay enough attention to what I was seeing.  That turned out to be a fatal flaw.

We were losing messages

When we configured our jobs to read from various topics, we configured them to poll at specific intervals.   When we spaced them out to an hour or greater, we were closing the window between the retention policy and the opportunities to read data.  For example, if we have a retention policy of 16 hours and a poll interval of one hour, then we have 16 chances to read data.  If during those 16 individual read attempts, data was not returned it was lost.

What happened is that we were missing critical event data and we couldn’t figure out why.  It took some time before I figured out that you have to ask for the data until it is returned.  That was issue number one.65831061

We were losing messages

Now that we were able to get the data back, all of a sudden all the data was gone.  This was really baffling!  I thought we had solved our problems with receiving the data, but to the outside it looked as if we were having the same issue again.  I couldn’t figure out why after 16 hours our queue was empty regardless of how recent the last message was published.

I did all the reading that someone should have to do in a lifetime (except for you, please continue reading) and I couldn’t solve it.  So I turned to the Kafka mailing list for help.  It turns out that Kafka will delete the log file with the message that is outside of the retention policy.  This was exactly what we were seeing.

We could send a steady stream of data and like clockwork, all of it would be gone once the flush began.  It turns out that the initial log file is a gigabyte in size.  Remember, my volumes are very low and we wouldn’t fill that up in a year.  That could be solved by setting the log file size really low, we set it to 1024B.

We were losing messages

That brings us to our third and last issue.  The straw that broke the camel’s back.  Nail in the coffin.  Ok, I will stop.  Now we are receiving data reliably and our logs files are right sized, what else could be going on?

With their rest client, there are two methods of committing back an offset when operating in a group.  You can auto-commit where you set your cursor to the last entry that was returned or you can wait and commit that cursor position once you are done with the data.  To be fair, we had some issues in our code that was causing the processing to halt and stop processing messages.  These were messages that were already committed, but were not processed.

Without the ability to grab a single message at a time we were stuck.  We had hoped that Confluent 3.0 (Kafka 0.10) was going to save the day with the max.poll.records, but they didn’t roll that into the rest client.  Disappointed, we realized that we had really hit a wall.

We sucked it up and decided to turn our backs to Kafka for now.  We were diligent to create abstractions that will allow us to change with reasonable ease.  We will be taking a day to research and design what the new solution will be.  I think that this was a good lesson on picking a solution that matches the current use case.  Even though I really wanted to set us up to use Kafka for my grander vision, it just wasn’t the right choice.

I haven’t turned my back on Kafka completely, I still think it is awesome and will have a home with us in the future.  Sadly, for now I can’t fit your size so I will have to leave you on the rack.  Goodbye.

 

 

SSH.NET & echo | sudo -S

This is an extension on yesterdays post about getting the test harness to connect to vSphere.  I am going to show how to use SSH.NET to run commands on a server, but more specifically how to make sudo calls remotely.

The goal for today was to take the new fresh vm server and install and configure Kafka.  Sounds easy, right?  I use SSH.NET to upload all of the files that I need as well as a shell script to orchestrate the operations.  We are using systemctl as the daemon manager, so we need the *.service files to be in the right directory.

/usr/lib/systemd/system

There is one snag.  You need to sudo because that directory is protected. Looking at some of the options for sudo, I came across the -S toggle.  This allows sudo to take the password from the standard input.

echo "my_password" | sudo mv myfile /protected_directory/

This will take the password and pipe it into the sudo move command.  I tried a bunch of ways to try to make this work.

using (var client = new SshClient("my_host_server", "my_user_name", "my_password"))
{
     client.Connect();
     var command = 
         client.CreateCommand(
         "echo 'my_password' | sudo -S sh ./home/my_user_name/scripts/install_and_configure.sh");
     
     var output = command.Execute();
     client.Disconnect();
}

This command works great on the server, but it doesn’t work in the SshClient’s session. I tried a bunch of variations of the code above, but none of them worked.  It was pretty frustrating. If you take a peek at the out the Execute() method, you will see the message sorry you need a tty to run sudo.  This is a significant clue.  A tty, or terminal emulator is what you would use if you created a ssh session with putty.   When we are using SSH.Net like we are above, we do not have a virtual terminal running.  That is what the error is telling us.  I started to piece together what might look like a solution after googling the interwebs.

download (4)

We need to somehow emulate a tty terminal using the library.

First thing that we need to do is create a new client and create a session.

SshClient client = new SshClient(server_address, 22, login, password);
client.Connect();

We need to start creating the terminal that will be used by the ShellStream.  First we have to create a dictionary of the terminal modes that we want to enable.

IDictionary<Renci.SshNet.Common.TerminalModes, uint> modes = 
new Dictionary<Renci.SshNet.Common.TerminalModes, uint>();
termkvp.Add(Renci.SshNet.Common.TerminalModes.ECHO, 53);

Adding the terminal mode to 53 or ECHO, will enable the echo functionality.  Now we need to create our ShellStream.  So we need to specify the terminal emulator that we want, the dimensions for the terminal and lastly the modes that we want to enable.

ShellStream shellStream = 
sshClient.CreateShellStream("xterm", 80, 24, 800, 600, 1024, modes);

Now that we have a terminal emulator to work with, we can start sending commands.  There are three commands that we need and they each will have a response that we expect.

  1. Login
  2. Send our sudo command
  3. Send the password

We have already created our session, logged in, so there should be an output waiting for us.  After we send our command, we should expect the password prompt.  Once we know that we are in the right spot, we can forward on the password.

var output = shellStream.Expect(new Regex(@"[$>]")); 

shellStream.WriteLine("sudo sh /home/my_user_name/scripts/install_and_configure.sh"); 
output = shellStream.Expect(new Regex(@"([$#>:])"));
shellStream.WriteLine(password);

At this point, we should have execute our install_and_configure.sh script successfully. Putting it all together:

SshClient client = new SshClient(server_address, 22, login, password);
client.Connect();

IDictionary<Renci.SshNet.Common.TerminalModes, uint> modes = 
new Dictionary<Renci.SshNet.Common.TerminalModes, uint>();
termkvp.Add(Renci.SshNet.Common.TerminalModes.ECHO, 53);

ShellStream shellStream = 
sshClient.CreateShellStream("xterm", 80, 24, 800, 600, 1024, modes);
var output = shellStream.Expect(new Regex(@"[$>]")); 

shellStream.WriteLine("sudo sh /home/my_user_name/scripts/install_and_configure.sh"); 
output = shellStream.Expect(new Regex(@"([$#>:])"));
shellStream.WriteLine(password);
client.Disconnect();

That is pretty much it.  I hope that this helps someone! Cheers

 

Eating Crow

download (1)

Last year we started to work on a project that was pretty high profile within the business. The dates had been adjusted several times once we had more information, which added more and more levels of stress.  In parallel, we have been re-architecting our platform so there were opportunities to make changes in how we process business transactions.  Based on these factors we had to make several design decisions.

We decided that we wanted to start making our business transactions more asynchronous. One of the easy ways to do this is with message queueing.  Leading up to this, for another initiative, I had been doing some investigation into the various queueing tools that are available in both open source and commercial and my conclusion was that Kafka was the choice that we should use.

Given the tight schedule we decided that we will go with it for now, foregoing any prototyping.  I had ulterior motives for choosing Kafka.  We currently shuttle all of our click data to Adobe SiteCatalyst, which many of us feel we should keep that data internally. Kafka was designed at LinkedIn to handle a ridiculous amount of data that they were producing.  I believed that by choosing Kafka as our message queue we would be setup to bring our data in-house.

When started to integrate the queue into our processes, the lack of prototyping that we skipped floated to the surface.  I took on the task to create the infrastructure so that the client application can leverage it.  I learned that there were a lot of things that weren’t as straightforward as I thought.

Some of the challenges were the available tooling to interact with the queue.  I tried a few of the libraries that were out there, but for one reason or another I didn’t think that they would meet our needs.  I decided to use the rest proxy from Confluent.  This part of the integration was promising, everyone knows rest!  There was one problem…this rest proxy is stateful.

I don’t want to detail all of the challenges that we encountered and continue to in this post, maybe another day.  Skipping ahead, we have turned off the queue for now until we can build integrations tests to make sure that Kafka or any queue will fit our needs.  It may turn out that we need to choose a different technology like RabbitMQ or ZeroMQ, but I will eat that crow when/if the time comes.  I still believe that Kafka is a great tool and can meet our needs short and long term.  For now, I need to iron out all of the wrinkles so we can get it back into production.  I wonder what crow tastes like if you brine it first?

 

 

 

 

Call for papers, yeah right!

Lucky for me, a co-worker turned me on to Audible.com.  For the last 2 weeks I have listened to one lecture series and two books (well 1 1/2, I round up).  One of the books that I am reading right now is John Sonmez’s book Soft Skills: The Software Developer’s Life Manual.

One of the topics that he reiterates, a bunch of times, is about marketing your brand.  One of the methods that he is adamant about is blogging.  He remarks that a lot of engineers either do not blog at all or do not blog regularly. The feedback that he gets from engineers is that they don’t have anything to say or they don’t have time.  I have both!

The question that I need to ask myself, is why don’t I do it regularly?  I guess there are a bunch of reasons that I could proclaim, but it is unlikely that they are worth a nickel.  One of the things that he said that makes a lot of sense for me is to take note of the ideas that I have, the things that interest me and the trends that I see.  That is the reason for the title “Call For Papers”.  This is my call to come up with ideas that are blog worthy!

Here are some topics that I think are worthwhile:

  • Docker
  • Kafka
  • Zookeeper
  • DDD
  • Architecture
  • Home Automation

These are all relevant in my work today.  Let’s see how this goes.

 

Kafka Client for .NET

They say that every journey start with a single step, well even those can get screwed up.  I am going to attempt to write a comprehensive client for Kafka written in my native tongue, c# and I want to document the progress.

And now for the misstep… I have been working on this client for several days now and I have been making a great deal of progress.  I felt that this evening was a good enough time to add my work to Github.  Turns out that when Xamarin says that the folder is not empty, and do I want it to be deleted, they mean it.  I didn’t notice that the folder I had in the box was at the root of my projects.  Luckily this a fairly new laptop so I only had a single project.

The documentation that I am using as a reference guide is located here https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-Requests.

Some of my first struggles were due to my lack of familiarity with the backus-naur syntax.  Another nuance that I glanced over where the default serialization that Kafka expects.  Obvious now, but i had missed some of the required around the inclusion of variable width sized fields such as strings.  In Kafka, they specify that for arrays it expects the field to be preceded by an 32-bit byte count.  For instance, an array of 32 fields you would add the 32 to the stream of data and then the content of the array.  The same pattern was true for string fields, except they need to be preceded by a 16-bit count.

Another big thing that luckily I picked up on early was that this was built on big endian so I had to reverse all of the bytes.  I am going to start the journey again tomorrow on the Metadata API.  Here is the git repo if you want to follow along.

https://github.com/hivie7510/KafkaSharp