Kafka, Sadly Its Time To Part Ways

I had big dreams for the perfect union between my company and Kafka.  I could see jagigabytes (technical term for a huge number) upon jagigabytes of data passing through our network from the massive clickstream data that we would produce.  The power of having our data in-house and not relying on the paid services to store and cull our data was huge.

That was the dream; and now for the reality :(.  We have tried to bend the will of Kafka to meet our use case but Kafka didn’t break.  I wanted badly for the pub/sub application to be able to work at our small scale.  When I say our scale, I mean somewhere south of 1000 messages per day for business transactions purposes.

My thinking was that if we could get it to work at our scale, then we would have learned a great deal to help us with my grander vision.  I can say that I achieved the goal of learning, but not much more.

The first issue that we had was the messages were not returned from the queue during a single fetch request.  I saw that during development, but I didn’t pay enough attention to what I was seeing.  That turned out to be a fatal flaw.

We were losing messages

When we configured our jobs to read from various topics, we configured them to poll at specific intervals.   When we spaced them out to an hour or greater, we were closing the window between the retention policy and the opportunities to read data.  For example, if we have a retention policy of 16 hours and a poll interval of one hour, then we have 16 chances to read data.  If during those 16 individual read attempts, data was not returned it was lost.

What happened is that we were missing critical event data and we couldn’t figure out why.  It took some time before I figured out that you have to ask for the data until it is returned.  That was issue number one.65831061

We were losing messages

Now that we were able to get the data back, all of a sudden all the data was gone.  This was really baffling!  I thought we had solved our problems with receiving the data, but to the outside it looked as if we were having the same issue again.  I couldn’t figure out why after 16 hours our queue was empty regardless of how recent the last message was published.

I did all the reading that someone should have to do in a lifetime (except for you, please continue reading) and I couldn’t solve it.  So I turned to the Kafka mailing list for help.  It turns out that Kafka will delete the log file with the message that is outside of the retention policy.  This was exactly what we were seeing.

We could send a steady stream of data and like clockwork, all of it would be gone once the flush began.  It turns out that the initial log file is a gigabyte in size.  Remember, my volumes are very low and we wouldn’t fill that up in a year.  That could be solved by setting the log file size really low, we set it to 1024B.

We were losing messages

That brings us to our third and last issue.  The straw that broke the camel’s back.  Nail in the coffin.  Ok, I will stop.  Now we are receiving data reliably and our logs files are right sized, what else could be going on?

With their rest client, there are two methods of committing back an offset when operating in a group.  You can auto-commit where you set your cursor to the last entry that was returned or you can wait and commit that cursor position once you are done with the data.  To be fair, we had some issues in our code that was causing the processing to halt and stop processing messages.  These were messages that were already committed, but were not processed.

Without the ability to grab a single message at a time we were stuck.  We had hoped that Confluent 3.0 (Kafka 0.10) was going to save the day with the max.poll.records, but they didn’t roll that into the rest client.  Disappointed, we realized that we had really hit a wall.

We sucked it up and decided to turn our backs to Kafka for now.  We were diligent to create abstractions that will allow us to change with reasonable ease.  We will be taking a day to research and design what the new solution will be.  I think that this was a good lesson on picking a solution that matches the current use case.  Even though I really wanted to set us up to use Kafka for my grander vision, it just wasn’t the right choice.

I haven’t turned my back on Kafka completely, I still think it is awesome and will have a home with us in the future.  Sadly, for now I can’t fit your size so I will have to leave you on the rack.  Goodbye.

 

 

Is Scrum Agile?

You may think that the title is utterly ridiculous, but bear with me.  I recently had the opportunity to sit through a class with Allen Holub on Designing for Volatility and it was there that he disrupted one of my long held beliefs.  I was trained on Scrum by Ken Schwaber in 2008 and again in 2012, so I was sold but now I am thinking a little different.  I want to explore the question of “Is Scrum Agile“?

Scrum works on a timed boundary that begins with a planning session and ends with a review/retrospective.  These are designed to setup an interval (sprint) where the work is immutable. Typically a team sets an interval boundary of two, three or four week intervals.    If there is a necessary change in the work, it does have an allowance to abort the interval and start over.

Aborting a sprint is a significant decision and activity and is not to be taken lightly.   In this event, the team would stop what they are doing, button up, and start over with a planning session to plan the new work.  This seems to be agile, but…agile  What I see all too often is where we try to make sure we have enough work for the team to work on.  As opposed to make sure that we are delivering the most value to the customer as fast as a quality job can provide.

I believe that trying to find enough items so that each member is busy may start to divert from agile.  If this is your practice, you will inevitable set lower value items higher in the backlog to fill the team’s time.  That seems to stand in the face of the agile principle of delivering the highest value items as fast as possible.

Our highest priority is to satisfy the customer
through early and continuous delivery
of valuable software” – Agile Manifesto

The team should always be working on the next highest priority items, no exception.  I would guess the question that follows that statement is, “What will the rest of the team members do during the sprint?”  It is a good question to ask, but it is also easy to answer.

In a development cycle, there are many activities that have to take place such as requirement refinement, test case development, test automation and of course development.  Teams can rally around a single item to see it through to the release.  I was very skeptical about swarming around a user story and I assumed it was full of waste, but I have since been proven wrong.

Swarming around the user stories is the optimal activity of a self-organizing team.  If you have a cross functional development team (quality, development and business) then you execute on another one of the principles.

The best architectures, requirements, and designs emerge from self-organizing teams. – Agile Manifesto

Another comparison that is a divergence from agile is the idea of continuous delivery.  The models that I have seen, creates the activity of deployment at the end of a sprint.  This is in opposition to the first principle in the agile manifest.

Our highest priority is to satisfy the customer through early and continuous delivery
of valuable software – Agile Manifesto

Now to be fair, there is nothing in scrum that says you cannot deliver software as often as possible.  The goal of each sprint is to complete a potentially deploy-able increment of software.  At face value,  this end of sprint demarcation promotes deployment at the end instead of when it is ready.

One of the main things that Allen drove home was how agile means that you are always working on the highest priority without the need for artificial boundaries.  I think I agree with that.  Scrum has several ceremonies that occur every sprint and I wonder how many of them are needed in that regimented fashion.

Whether you are moving around backlog items to fill time or you are waiting till the end of the interval to deploy, you have to ask if you are really an agile shop.  The team that I am on is taking a more agile approach.  We are a scrum shop, so we have to operate at some level within that process.  We have only been taking in one work item at a time and we swarm until it is done and then we ask the product owner what is next.    As a self-organizing team, we have decided that this is what allows us to be agile and how we can deliver quality stories to the customer and it works.  This are just my thoughts, but I would love to debate this further.  Cheers

Efficacy Of The Daily Scrum

For the last 8 years, I have seen many interpretations of how a scrum team is to operate.   All have been modified in some way, some good and some bad.  One component that I have not seen changed is the format of the daily scrum. What I would like to talk about is the efficacy of the daily scrum as it is defined in the scrum.org handbook.

All the way back to 2008, the daily stand-up has always answering the three following questions:

  1. What did I do yesterday that helped the Development Team meet the Sprint Goal?
  2. What will I do today to help the Development Team meet the Sprint Goal?
  3. Do I see any impediment that prevents me or the Development Team from meeting the Sprint Goal?

Each morning, we would go around the room (or voice chat) and talk about these three questions.  I noticed that there was very low engagement and most people were just reporting status to the scrum master.  I have even seen a fourth question asked, “What have I learned since yesterday” and it woke people up.  I think that it is a great question since it can help the team and I always liked to share anyway.

Recently I was on a particular scrum team working on a project that had a lot of dependencies.  As each person spoke up, I started to realize that I was not getting any value out of what they did yesterday.  This began to needle at me, since this was a ceremony that was required to happen daily.

I started to only answer the second question, “What will I do today to help the Development Team meet the Sprint Goal?”.  I felt that this was really what the team cared about.  What is interesting about the work that I am about to do that will help to drive us forward.

It didn’t take too long before it caught on and now the meetings are shorter.  Not only are the meetings shorter, but we are acting more collaboratively.  I don’t think that all of the credit for the increase collaboration can be attributed to the modified daily scrum, because the team was already pretty kick-ass to begin with!

I know that scrum is a guide and you have to inspect / adapt to make it work for you, but the daily stand-up has, in my experience, remained unchanged over the last 8 years.  If I were asked about the efficacy of the traditional daily stand-up, I would have to say it is low.

Changing the format of this meeting has helped our team be more successful.  One of the next posts that I am going to write is inspired by Allen Holub.  I want to write about the efficacy of scrum compared to a more bare bones agile process and kanban.  Until then, cheers.

Self Organized Architecture

How does an agile developer know which frameworks can be used?  How does the product team know which open source licenses can be used in their products?  In a multi-team environment, who coordinates the architecture to ensure that it is sustainable?  These are the questions that I want to examine.

One of the tenets of the agile manifesto states the architecture is best crafted by the product team.

The best architectures, requirements, and designs emerge from self-organizing teams” – agile manifesto

Framework du jour is something that is hot today and is the new shiny object.  I have found its use to be prevalent in web applications and the JavaScript libraries. There is something to be said about the boundaries that should be defined.  If you have not seen framework du jour, please trust me I have witnessed and taken part in this practice.

What are the suitability parameters that aid in the selection process when choosing a supporting framework? Is the longevity of the project sufficient, can the open source license (oss) license model be adhered to?  Is the library actually support by an organization or by an individual?  These are important questions that should be asked when the product team wants to augment their code with external projects.

Some oversight should be there to make sure that the products and their dependencies are built on a solid foundation.  One of the gotchas that is present in most projects are their use of oss.  Who is there to monitor licenses that are being used to ensure compliance?  I do not see this to be any different than making sure that the appropriate Microsoft licenses are being maintained.

Open source license violations are not monitored like the those of Microsoft’s software.  For the majority of the users of oss, their use will go unseen from the community as well as any corporate oversight.  This does not mean that it is OK, but without oversight the likelihood of violation is greater.

The manifesto quote above speaks about the architecture spawning from within the product team.  This makes sense to me when you look at a team that is chartered to build a single product.  Does this statement still hold true when you span across several products and their teams?  If they all stay within their boundaries, this statement still makes sense.

Who designs the architecture when the product teams must integrate with each other?  That is a question that I have be trying to understand for quite some time.  There is nothing that precludes the product teams from building an excellent integration for their products, but how do they match to the long term IT plan?

One of the things that I see and read is how the extreme agile practices are in use at spotify or facebook.  These are awesome examples of how teams can quickly iterate over a product to release amazing software very quickly.  Can this be applied in a corporate environment where you are building tools to support internal business users?  I don’t know the answer, but I believe that you can not do a 1:1 comparison of an internal business application with a social product where there are little or no regulations.  Also business applications typically have uptime requirements and other SLAs and a down system sings a much different song.

I believe that product teams should be equipped with the parameters so that they can pick the best fit frameworks to help them deliver awesome software.  It makes sense since they are closest to the problem.  Open source licensing should still be monitored, because that is just being a good citizen.

Building an infrastructure to support the organizational growth is a different story.  I think there needs to be some central body, whether an architect or not, to look at the future of the portfolio and make sure that the products are driving that direction.  I would love to hear others opinions on this topic.  Cheers

Reading Rainbow, err audio book rainbow

Growing up I was never interested in reading books. Up until I join the military, I could count the number of fiction books that I had read on one hand.  They just couldn’t keep my interest.

One of the issues with reading books is that I have a hard time focusing for more than a minute or two.  And that is meant to be literal.  What this means is that I end up having to re-read a sentence, paragraph several times in order to grasp it.  This leads to a great deal of frustration and surprisingly peaceful slumber.  To read perchance to dream 🙂

There was one day that actually changed my life, and the lesson wasn’t intended for me.  I was standing in the hallway of the van that we worked in, when I overheard my sergeant tell another marine to read a book on combustion engines.  My friend replied with “Why would I read about engines?”.  The response is what lit the fire.  The sergeant explained  that reading anything, regardless of immediate need because it will teach you to learn and thereby improving your problem solving.

I took that lesson and ran with it.  At the time, my job was as a calibrator (making sure things are measured correctly) and it was not very exciting.  Once I got out I was working in the civilian world doing the same thing, and it was worse.   It was a rat race, the only goal was to do as much work as we could as fast as we could to increase our billing.

I couldn’t stand to do this work to much longer, I have a hard time doing repetitive work, I go crazy.  I decided that I wanted to write software to do the work for me (I will spare you the details).  Taking the lesson that I overheard, I went to town.

I started reading books, magazines and anything else I could get my hands on to help me learn how to develop software.  Somehow I was now able to read and read without falling asleep and staying up all night to learn more.  Interestingly enough, I was learning to program on my own.

Here I am, 15 years later and I am still reading.  Not as much as I would like, but things are changing.  Remembering that I had the issue with having to re-read somethings several times I have been turned on to audiobooks.  These are great because audible has the feature to go forward or backwards by 30 seconds.  This works for me and I do work those buttons like a pro.

I still haven’t got around to listening to fiction, but there is always hope.  I am on a leadership book binge currently and the next topic that I was to listen to is history.  As I get older, I realize that you have to read and read often so that you can stay sharp and in my field, relevant.

I am not sure that I told me sergeant about the lesson that he taught, but it was one of the best lessons that I have learned in life and I thank him for that.  I wish everyone could experience when that torch is lit.

Change My Tune

One of the things that I learned from the Marine Corp was how to make command decisions.  I also learned how to make demands to get tough problems solved and how to lead from the front.  Has this worked well for me in the civilian world, yes and no.

All of my life, I have been taught to be very efficient in how I do all things.   I should clarify, it was “my idea of efficiency”.  I was always very quick to identify and make it known where there was something out of place.  I think that I had the need to ingratiate myself and that need was met.

Labeling me over the years has undergone several transitions, from “Bull in a china shop” to “fireman“.  I used to wear these like a badge of honor thinking that it made me special or unique.  The truth of the matter is that I was a one man show and those labels aren’t flattering.

I always thought that being the hero was what was wanted and needed, but now I know different.  The company that I work for have gone through many transitions over the last 6 1/2 years.  When I started there were many opportunities to improve the software, the development process and other facets of the IT group.

I have had many achievements and probably an equal amounts of failure.  I was fortunate to work for a group that is very compassionate.  I went through some challenging times, but they were there for me.  As you can see, I was not the best team member like I should have been.

I was very fortunate that I had the opportunity to work with a leadership consulting firm.  The coach that I worked with was top notch, she helped me navigate through troubled waters many times.  You could say that my emotional IQ was 0 back then.  I was not capable of empathy and a leader must, must have the ability to empathize.

Like I said, she helped me with a lot of the skills that I lacked.  I would like to say that she solved all of my problems, but I would be lying.  And now things have changed again and I need to make sure I can adapt like the mice in “Who Moved My Cheese“.  I have adapted fine, but I feel that I have a lot effort to prove to others that I am ok with the changes.  I can understand how it can be hard for people, who have worked with me over the years, to see that I am ok with the shift in responsibility.

In some ways, a lot of the burden has been removed.  I don’t have to constantly lookout for landmines where I have to navigate it to alway prove my worth.  Now I can focus on doing my part to help my team and fellow team members to achieve better results.

Side Bar:  They already kick ass, I am lucky to work with them.

Lately, I have been reading various books to help me with my communication and my social skills.  Here is a list of the books so far, and I recommend them very much:

  1. How Conversation Works: 6 Lessons for Better Communication
  2. Emotional Intelligence: 100+ Skills, Tips, Tricks & Techniques to Improve Interpersonal Connection, Control Your Emotions, Build Self Confidence & Find Long Lasting Success!
  3. Notes to a Software Team Leader
  4. Soft Skills: The Software Developer’s Life Manual
  5. Speaking As a Leader
  6. How to Win Friends & Influence People

These are all great books and I highly recommend each of them.

The Marine Corps taught me many things, but how to play nice with others wasn’t one of them.  My view on the world that I live in and how I should behave has been changing.  I haven’t changed my personal goals, but I think I’m building my tool belt that will help me get there.  Sometimes change is happening around you and it is up to you if you want to change your tune or continue on a sour note.

You must be in tune with the times and prepared to break with traditionJames Agee

Side Bar:  I would feel guilty if I didn’t give credit to Michelle Saul of Possibilities Consulting

 

 

 

First Chance Exception Settings, Your Friend

“I am trying to debug, but this stupid exception keeps happening”.  I have seen that situation play out many times over the years.  You are trying to exercise your code, but the same spot keeps throwing an exception.  This doesn’t impact your functionality, but it’s an annoyance.

Sometimes it takes a while for the frustration to hit a peak, but when it does you are besides yourself.  There is functionality in Visual Studio to help you out, but before that let’s talk philosophy.

There are many schools of thought with regards to developing code.  On one side, if your tests (unit,integration,etc.) are not passing then you cannot check-in.  This would imply that if the code base you are working on is throwing exceptions, then you cannot check-in until it is fixed.  Under this philosophy, everything stops until the code is in good working order.

On the other side, you have blinders on and you only focus on getting your code to work.  In this scenario it is OK to skip over some exceptions, because it is someone else’s problem.  I don’t have to preach about the issue with the “someone else’s” problem, but that does not help your team.

The scenario that I left out  is less philosophy and more practice.  Sometimes in the code there are bugs that are OK to not be fixed.  This could be for many reasons, like priority or planned obsolescence.

I alway recommend that my team members use a little known set of toggles in Visual Studio called Exception Settings.  Exception settings allow you to have a first crack at code (first chance) that is about to throw an exception.

ExceptionSettings

Without having this functionality turned on, the code will throw and you may not be able to proceed.  The first chance exceptions would allow you to move the stack pointer to a different line that would function normally and continue.  This is very powerful when you are trying to figure out where an exception is coming from and the state of the system (threads, call stack, variables, etc…) when it happens.

I have been in situations where it was OK and even expected for code to throw, but it I don’t want to have to view it each time.  In this case, Visual Studio allows you to deselect the exceptions that you don’t care about.  You can see below that I chosen to to break on all exceptions except for Microsft.JScript.JScriptException.

ExceptionSettings_CLR

To use this functionality you have to set these toggles in advance of starting your application.   Consider this, you are running a long process, but you got interrupted by an exception that you did not expect, but you do not care if it throws.  In that case, you have the option to turn off first chance exceptions for the exception type for all future executions.  You need only deselect the “Break when this exception type is thrown” checkbox.

ExceptionSettingsDisable

I the case above, the exception is expected as part of the SSPI authentication process so I can ignore this exception.

It doesn’t matter which school of thought you or your team subscribes to, but the ability to toggle first chance exceptions is an important hammer to have in your toolbox.

Side Bar:  I can’t recommend enough to turn on this functionality.  It has helped me catch a lot of bugs before they got into the wild.  Hope this helps, cheers.