Edward Capriolo

Saturday Mar 03, 2018

Interview tips: Two proverbs

Keep in mind these two famous sayings as you read this post: "Honesty is the best policy" and "discretion is the better part of valor"

The conundrum

Sometimes telling the truth can yield unfortunate results. For example, lets go through a scenario. Imagine a candidate is interviewing at three companies in the next weeks. During the second interview the candidate is asked:

  • Q: "Are you interviewing at any other companies?"

Honesty is the best policy

The candidate answers the question honestly:

  • A: "Yes. I am"

The interviewer is juggling a variety of thoughts. On one extreme the interviewer understands that the best thing for the candidate is to explore all options to find the best possible position for them. One the other extreme they are motivated by their personal concerns and the concerns of the company (which are intermingled). For example, the interviewer may believe they are wasting their time with the candidate because the candidate could take another offer. The candidate could also leverage the other offer further along in the process (for more money, etc) once the interviewer is more invested in the candidate (interview time, plans to leverage their skills).

One benefit of honestly is the candidate presents themselves as highly desirable. If a company really likes the candidate they may increase their offer to outbid potential competitors. The interviewer might attempt to close on the candidate faster if the market is highly competitive.

The candidate can answer the question dishonestly:

  • A: "No" (No, I love this place . It is my dream job etc)

We all know lying is bad. Outside of the unethical parts of lying there are several other downsides:

  • Lying is harder then telling the truth: The candidate has to keep track of which lies they told and to who. If someone wishes to dive in and discuss the topic further the candidate has stack the lies on top of each other.
  • Getting caught in a lie hurts your credibility and at worst put an end to the entire process.

The odds of getting caught on this particular lie are low. One way the interviewer could know is if somehow they know someone else that is interviewing the candidate. Of course if the interviewer reaches back out to the candidate in two days and the candidate has accepted another offer they will probably guess that they had been lied to. That could come back to haunt the candidate in a "It is a small world" scenario in the future.

One to many scenarios:

One other thing to keep in mind: There is not necessarily a one-to-one relationship between the interviewer and the candidate. The job could potentially have N candidates. Now, this gets more interesting. Suppose there exists two equally qualified candidates (candidate-a and candidate-b) applying for the same job, both candidates actually are applying for other jobs. Suppose candidate-a answers, "Yes, I am applying to other jobs" and candidate-b answers "No, I am not applying to other jobs". Most companies will only make the offer to a single person at a time, so it is reasonable to assume they might extend the offer to the candidate-b because that person has a higher chance to accept the job. They may not want to extend to candidate-a because while they wait for what might be a rejection they might miss the chance to offer the job to candidate-b.

Discretion is the better part of valor

"it is better to avoid a dangerous situation than to confront it."

If the candidate wishes to avoid the conversation. Using discretion is one way to approach the response.

When asked:
Q: "Are you interviewing at any other companies?"

A candidate can reply with:

A: "I would only like discuss this opportunity and if we are a good fit."

That is not answering the question. However the candidate is respectfully stating they wish to avoid the conversation. Sometimes the interviewer will not accept this. Generally, they will not press as hard as "you-go-first" type salary negotiation for an answer (probably assuming that most will lie anyway).

If the interviewer does press the issue it might come in a variety of forms. They might say something like:

  • "I am just curious"
  • "This helps us understand if we need to hurry along the process"

If the candidate came into the conversation not wanting to answer this question, they probably should stick to their guns and not answer. If the candidate is swayed by the interviewer's rational they could make an attempt at trading information. Calling to mind my one-to-many scenario above, the candidate can attempt to determine if there are other candidates in the mix.

The candidate might wish to ask a question like:

  • "Do you need to have this position filled quickly?"
  • "How long has the position been open?"
  • "Has the position been offered to anyone else yet?"

The candidate has entered the waters of some "close-to-the-vest" information trading :) (where both sides are trading hard to verify facts).

Tuesday Feb 27, 2018

Resume Design Tips

While I was a student at THE Westchester Community College I had a work study job at the Career and Transfer Center. This was one of the best jobs I ever had. I wanted to summarize some of the skills I learned and fine tuned after years of resume reading and writing.

Tip #1: No matter how proud you are of bash scripts do not mention it more than once!

Job 2 - Company 2 - 200x - 200y

  • Wrote bash scripts

Job 1 - Company 1 - 200x - 200y

  • Wrote bash scripts

First, you want to show career development. Second, you do not want to waste space. Third, you should not be bragging about shell scripting. Forth, do not mention the same technology twice. Fifth, see next tip.

Tip #2 Talk about the effect not the tech

Job 1 - Company 1 - 200x - 200y

  • Wrote bash scripts

Is it affect or effect? (sub tip: have a friend read it over) Regardless. Agile stories have a certain format:

A user story typically follows a simple template:

As a <type of user>,
I want <to perform some task>
so that I can <achieve some goal/benefit/value>.

Your bullet points should have a consistent format. Do not:

  • Use "I": I wrote bash scripts
  • Write sentences with periods

Focus on what you did and how it changed the business. The technology should be an afterthought.

Example bullet point:

Streamlined deploy automation using shell scripting resulting in faster setup of customer portals

In this case "faster setup" should be a clear win, but you can also embellish more if you believe the description is too abstract.

Streamlined deploy automation using shell scripting resulting in faster setup of customer portals saving operations three hours a week

With this description you transform a lame bullet point into a clear example of how you saved time and money. Everyone likes saving time and money. 

Tip #3 If you talk about it, be about it

Job 1 - Company 1 - 200x - 200y

  • Wrote bash scripts

One huge red flag is when I probe people on bullet points and they clearly do not know much about the things they say they are experienced in. This is what a bad dialog might sound like:

Ed: "From your resume, I see you wrote bash scripts. You mentioned it 3 times can you tell me why used bash and not perl?"
Them: "They were just small scripts, cleaned some data."
Ed: "Can you name a feature of bash that made you chose it for this task?"
Them: "...(clearly grasping) You do not need to compile it."

In short: do not list it if you can not speak to it. Do not list it if your co-worker did all the work. Do not list things that you only have small knowledge of. If I find that a candidate lists multiple things that they did not do, I trust other bullet points on the resume less. Every once in a while it is safe for a candidate to say "I forgot the implementation", but if the candidate is pleading the 5th to every bullet point they are fairly exposed in terms of trust during the process.

Tuesday Feb 20, 2018

BlockChain! Building a NoSQL store - Part 15

My last blog entry in this series was on:

04/17/15 10:53 PM     04/17/15 11:36 PM     Triggers and Coprocessors - Building a NoSQL store - Part 14

That is a God damn shame. That is what that is. So we are back! Humans are creatures of habit. I started looking at blockchain recently and the first thing I thought was, (you might have guessed) 'Lets build a nosql with this!' and here we are.

First up. I really like the Ethereum stack. I am not interested in mining coins using GPU's or trying to make a big stock trade. I barely even contribute to my 401k. I am interested in is ethereum's smart contracts. Smart Contracts look something like and RPC language, however the interesting part is that you also also define data that the contracts manipulate.

It is a really heavy and interesting topic that I have not fully wrapped my brain around yet, but I made it work and I am impressed with myself.


Ethereum offers remix which is an online IDE for smart contracts. This is a pretty nifty way to get started.


Contact basics

Lets cover some smart contract basics.

contract ColumnFamily1 {

struct Right {
string column;
string value;
int256 version;

address public owner;
string public columnFamilyName;
uint8 keepVersions;

function ColumnFamily1(string name, uint8 versions) public {
owner = msg.sender;
columnFamilyName = name;
keepVersions = versions;

On the surface look like a class definition. Constructors are functions that have the same name as the contract. Variable definitions placed outside of a function are considered storage variables. If a function modifies a storage variable the result is stored in the blockchain.

Some things to note:

  • Address is a type that represents an entity like a person or contract in the block chain.
  • msg is an implicit variable that has information about the message

One pattern for access is storing the address of the entity that created the contract so that subsequent calls can require that the caller is the creator. Keep in mind that is not the only or even the preferred way to handle auth. There are other schemes where maps are keyed by address and users can only update there own keys, and others. That discussion is out-of-scope (aka I do not understand it well)

There is a particularly interesting feature called mapping.

 mapping (string => Right[]) data;
//Maps only the latest value of a single sell
mapping (string => mapping(string => string)) latestData;

Mappings are like hashmaps inside the block chain. They can contain other mappings ,however the left side of the mapping needs to be an immutable simple type. This makes sense as it serves as a pointer to data in the blockchain.

Getting started NoSQL in block chain

First, lets briefly talk about what flavor of NoSQL we want to implement. To keep it simple we will implement a version-ed ColumnFamily which is somewhat like hbase. We will punt on deleting (tombstones( for now.


  • put (string rowkey, string column, string value, long version)
  • get (string rowkey, string column)

put("ecapriolo", "firstname", "edward", 1)
put("ecapriolo", "firstname", "bla", 0)

get("ecapriolo", "firstname") --> returns "edward"

contract ColumnFamily1 {

struct Right {
string column;
string value;
int256 version;

address public owner;
string public columnFamilyName;
//number of live versions of a cell to keep
uint8 keepVersions;
//Maps a rowkey to all columns
mapping (string => Right[]) data;
//Maps only the latest value of a single sell
mapping (string => mapping(string => string)) latestData;


Notice there are two mappings data and latestData. This is more of an implementation detail, however in ethereum the closest notion to CPU time is "gas". Gas is consumed during execution.

The data mapping keeps multiple versions of all cells inside a row. During a scan you would have to loop through it using gas.

The second mapping latestData holds the most recent version of each cell which provides an optimized get path. This will hopefully make more sense after seeing how set and get are implemented.

Implementing get

    function get(string rk, string column) public constant returns (string value) {
return latestData[rk][column];

'That was easy button' pressed. In short we know that latestData is a map-> map (string, string) designed specifically to make this lookup "optimal". Somehow the pointers around the blockchain manage this scale-able structure.

Implementing set

This is more meaty then get. The purpose of writeCell is to manipulate both mapping structures.

function writeCell(string rk, string column, string value, int256 version) public returns (int code) {

Right[] memory r = data[rk];
//If r is empty simply add
if (r.length == 0){
data[rk].push(Right({column: column, value: value, version: version}));
latestData[rk][column] = value;
return 10;

Note: during debugging I added some return codes to make it easier to see which branches were hit and avoid using the built in remix debugger. 'Return 10' above is an example/

int256 highestVersion = int256(-1);
int256 highestIndex = int256(-1);
//figure out what the max value is for real
uint256 lowestIndex = uint256(10000);
int256 lowestVersion = int256(10000);
uint256 matchedCount = uint256(0);
for (uint256 i=0; i< r.length; i++){
if (StringUtils.equal(r[i].column, column)){
matchedCount = matchedCount + 1;
if (r[i].version > highestVersion){
highestIndex = int256(i);
highestVersion = r[i].version;
if (r[i].version < lowestVersion){
lowestIndex = i;
lowestVersion = r[i].version;

Above is the standard interate the list and find the highest/lowest cells.

//if not found add it
if (matchedCount == uint256(0)){
latestData[rk][column] = value;
data[rk].push(Right({column: column, value: value, version: version}));
return 20;

These are built in logging macros


This is where the bulk of the latest version logic is implemented. The only magic being done here is that we replace the oldest cell if cell count is larger then keep versions. This will keep the structure inside a rowkey fixed with the number of columns. We could do more to sort it etc, but manipulation also uses gas so it becomes a workload optimization problem at that point.

if (version > highestVersion){
if (matchedCount >= keepVersions) {
latestData[rk][column] = value;
data[rk][lowestIndex] = Right({column: column, value: value, version: version});
return 30;
} else {
latestData[rk][column] = value;
data[rk].push(Right({column: column, value: value, version: version}));
return 40;
return 50;

Running it from Remix

Remix makes it pretty simple to get things running. You can not do "everything", but for a Javascript editor I was impressed. Here is a dialog where I try the constructor then call writeCell and get functions.

Maybe next we can implement scan or something fun. But there you have it, NoSQL + BlockChain, if you are interested as this as a unicorn please ping me! I want to get rich and retire.

I checked in the code here https://github.com/edwardcapriolo/ethereum-nosql

Friday Mar 03, 2017

Support Hive


My important email to hive-dev. To discuss actions by those in the Apache Spark community.


I have compiled a short (non exhaustive) list of items related to Spark's
forking of Apache Hive code and usage of Apache Hive trademarks.

The original spark proposal repeatedly claims that Spark "inter operates"
with hive.


"Finally, Shark (a higher layer framework built on Spark) inter-operates
with Apache Hive."

(EC note: Originally spark may have linked to hive, but now the situation
is much different.)

Spark distributes jar files to maven repositories carrying the hive name.


(EC note These are not simple "ports" features are added/missing/broken in
artifacts named "hive")

Spark carries forked and modified copies of hive source code


Spark has "imported" and modified components of hive


(EC note: Further discussions of the code make little no reference to it's
origins in propaganda)

Databricks, a company heaving involved in spark development, uses the Hive
trademark to make claims


"The Databricks platform provides a fully managed Hive Metastore that
allows users to share a data catalog across multiple Spark clusters."

This blog defining hadoop (draft) is clear on this:

"Products that are derivative works of Apache Hadoop are not Apache Hadoop,
and may not call themselves versions of Apache Hadoop, nor Distributions of
Apache Hadoop."



"Apache Spark supports multiple versions of Hive, from 0.12 up to 1.2.1. "

Apache spark can NOT support multiple versions of Hive because they are
working with a fork, and there is no standard body for "supporting hive"

Some products have been released that have been described as "compatible"
with Hadoop, even though parts of the Hadoop codebase have either been
changed or replaced. The Apache™ Hadoop® developer team are not a standards
body: they do not qualify such (derivative) works as compatible. Nor do
they feel constrained by the requirements of external entities when
changing the behavior of Apache Hadoop software or related Apache software.

The spark committers openly use the word "take" during the process of
"importing" hive code.

"are there unit tests from Hive that we can take?"

Apache foundation will not take a hostile fork for a proposal. Had the
original Spark proposal implied they wished to fork portions of the hive
code base, I would have considered it a hostile fork. (this is open to

(EC Note: Is this the Apache way? How can we build communities? How would
small projects feel if for example hive "imported" copying code while they
sat in incubation)

Databricks (after borrowing slabs of hive code, using our trademarks, etc)
makes disparaging comments about the performance of hive.


"Spark-based pipelines can scale comfortably to process many times more
input data than what Hive could handle at peak. "

(EC Note: How is this statement verifiable?)


It's easily enough added, to the code, there's just the risk of the fork
diverging more from ASF hive.

(EC Note Even those responsible for this admit the code is diverging and
will diverge more from there actions.)


My opinion of all of this:
The above points are hurtful to Hive.First, we are robbed of community.
People could be improving hive by making it more modular, but instead they
are improving Spark's fork of hive. Next, our code base is subject to
continued "poaching". Apache Spark "imports", copies, alter, and claim
compatibility with/from Hive (I pointed out above why the compatibility
claims should not be made). Finally, We are subject to unfair performance
comparisons "x is faster then hive", by software (spark) that is

*POWERED BY Hive (via the forking and code copying).  *

Hive has always been a bullseye as the best hadoop SQL

In my hood we have a saying, "Haters gonna hate"

For every Impala and every Spark claiming to be better then hive, there is
10 HadoopDB's that collapsed under the weight of themselves. We outlasted
fleets of them.

That being said, software like Hive Metastore our baby. It is our TM. It is
our creation. It is what makes us special. People have the right to fork it
via the licence. We can not stop that. But it cant be both ways: either
downstream needs to bring in our published artifacts, or they fork and give
what they are doing another name.

None of this activity represents what I believe is the "Apache Way". I
believe the Apache Way would be to communicate to us, the hive community,
about ways to make the components more modular and easier to use in other
projects. Users suffer when the same code "moves" between two projects
there is fragmentation and typically it leads to negative effects for both




Saturday Feb 04, 2017

Forget bowling green, worry about the Lusitania

There has been a lot of outrage, talk, and comedy about the Bowling Green Massacre. While revolting in it's own way, it is not new ground for the current state of affairs in the US. We know where the administration stands on immigration and we know this is a divisive issue. Sides are dug in and we all know where we stand. However, I find two events far more troubling: 

The first event where the White House Press Secretary, Sean Spicer, claimed Iranian's attacked a US ship. This claim was instantly debunked inside the press conference when a savvy reporter pointed out the fact that the ship was a Saudi ship. Now, why is this a big deal? Well in school we leaned that the sinking of the RMS Lusitania is what drew the US into World War I. Claiming that Iran attacked a US ship is a big deal. 

For the second event, examine the US raid on Yemen in which a US soldier and several civilians were killed. After the event the US released videos that they claim were captured during the raid. It was very quickly determined that the videos were not captured in the raid. Instead the videos were 10 years old

These two events are shocking. Either the administration is willfully misleading us, or they are grossly incompetent. What is worse? being the white house press secretary and confusing the fact that a ship is US or Saudi or purposely manipulating us. When the country is a tinderbox of opinions, pulling a video out of the archives and parading it as a result serves what purpose?

Knee-jerk twitter reactions are unavoidable now. Even if something is "retracted" a day later, the damage is already done. Had a reporter not been 'Johnny on the spot' during Spicer's blunder there would have been a massive fallout and more alternative-facts that might never be corrected in the mind of some. The result could be more far reaching than executive orders on immigration.

Thank you for listening 


Sunday Jan 29, 2017

Apache Incubator Gossip first release

In case you are wondering I have a been in the blogging world. A good portion of my free time has been spent with https://github.com/apache/incubator-gossip which is now in the apache incubator. 



We are building a lot of very cool stuff including talk about adding support for the SWIM protocol.

Friday Oct 28, 2016

Feature Flag dark launch library from the guy who has his own everything


Hopefully you do not deploy "dangerous" code, but sometimes if you want to "get dangerous" you might only want to role the code out to a portion of your users. So if someone says "lets get dangerous" think 


If you really want a good argument about why you should have this read this thing at facebook

Tuesday Oct 18, 2016

What is collusion?


I just happened upon this article: 

WIKILEAKS : More Media Collusion – HuffPo Reporter Emails Hillary Camp to WARN Them About FOX NEWS Programming


First, huffpost bloggers are not the same as huffpost reporters. I have a huffpo blog and I am not affiliated with huffpost anymore.(I am not sure if the writer mentioned in the article is a blogger or an employee)

Second, The huffingtonpost has been openly against trump from the beginning. At the end of every trump article is this footer: http://www.huffingtonpost.com/entry/donald-trump-women-sick_us_5804d6ece4b0e8c198a8fb66

Editor’s note: Donald Trump regularly incites political violence and is a serial liarrampant xenophoberacistmisogynist and birther who has repeatedly pledged to ban all Muslims — 1.6 billion members of an entire religion — from entering the U.S.

Third. the definition of collusion:

secret or illegal cooperation or conspiracy, especially in order to cheat or deceive others.

The huffingtonpost has declared it does not like trump it is not a secret . I can not speak for what is legal, and without the secret you can not have a conspiracy, and I also do not see how the information is deceptive. 


Tuesday Mar 29, 2016

Deus X: The Fall - Ed's review

I have decided to change gears a bit and review one of my favorite andriod games Deus Ex: The Fall . I was a big fan of Deus Ex 3 which came out on the xbox. For those not familiar, Deux Ex is a sneak shooter. I actually play 'the fall' on train rides home, it took me a few months of playing it periodically to beat it.

What makes this game special? 

In the near future humans can be outfitted with augmentations "augs". They do things like steady your gun arm, mimetic camouflage etc. The way the Deus Ex game balances is you can not afford all the augs, so you pick and chose ones that match your game play. For example if you like run and gun type, you focus on body armor, speed enhancements and take downs, but if you want to sneak around you focus on stealth enhancements.

What makes a BAD sneak shooter ?

What makes a bad sneak shooters is huge missions, when your walking through a warehouse and you have to choke out 500 people over 4 hours of game play , this is just annoying. Think about it, could you image that in three or four hours no one realized that 500 security guards have not checked in?  Or in 4 hours that one guy at the computer would not go for a bathroom break and just happen to look in one of the 90 lockers you have hidden bodies in? Just not possible and kinda silly.

Why does 'The Fall' avoid this ?

Well obviously this is an Android game, so by its nature it avoids huge levels. This actually gives the game the right feel, they are small levels with a few rooms, you execute a few tactical take downs and you get a reward! In the xbox game a lot of time is spent moving/hiding bodies, so as not to alert others and bring about a free for all. In 'the fall' the bodies just vanish after a few seconds. Bodies vanishing is not realistic, but I think it goes with the style you knock someone out and you move on. When I play I simply force myself in the mind of a character and play a 'realistic' way, there is no way an augmented human is going to huddle in a corner waiting for 3 hours for 3 different people to be in the "perfect place",. You just make a move and be dammed with the consequences. 


I was rather impressed with the controls in fact I enjoyed them more than the console version. On screen you can switch weapons fast, icons appear when you are in take down range. A rather cool thing is that in the settings menu you can adjust the placement of each of the on screen controls. I was super impressed by this. I really did not have to move anything but the fact you could I thought was pretty neat.


One thing I enjoy is that around the game there are PDAs and computers that you can read or hack into to get some back story into the game and hints into what is unfolding. I really like that in all games, they did this in a gears of war with Journals and cogs, the nice part is this is always optional. You are not forced to watch 10 minute movies but if you care you can review the data in the world better. You can also talk to random people like a standard RPG and while they do not offer a ton to say that is still pretty cool.



You are an ex special forces character with augs drawn into something bigger than you. You are living below the radar and have to go on a variety of missions to acquire the drugs that keep you from rejecting your augs. As that goes down you have to deal with people who offer you what you need in exchange for your services and you are free to embark on side quests.For a 99 cent andriod game this plot is on a amazing and it would still be a fairly in depth plot for a console game.


Flexible game play, large environment to explore, up gradable character attributes, upgradeable weapons. Nice graphics and controls for a cell phone game. Retained a lot of the feel from the xbox game while moving to a cell platform.


While it is a sneak shooter the game is more biased towards the sneak, even with armor upgrades a couple well placed shots from enemies can put you down. The game is less fun to play as a shooter IMHO. Environments seem more detailed than characters. 


If you like the console game and you have a 30 minute train ride home everyday this game is amazing. Since it is an older game it is totally worth the cost ~ 0.99 cents. I would still happily pay 3 or 4 dollars for it. 

My score is 9.  

Wednesday Mar 09, 2016

Great Moments in Job Negotiation: Volume 1

Huffington Post is my current employer.Huffington Post is owned by AOL. The interview process has to go through two stages of HR. At the time the head of AOL also approved each hire. 

After multiple interviews with multiple people over three weeks I finally got my offer letter.

I replied to the recruiter, "This is a nice offer, but if I don't have a floppy disk in my mail box my Monday  with 30 free hours of AOL, the deal is off"

Sunday Mar 06, 2016

Rasp PI 3 is here

Up until this point you have had to attach Wifii or a 4G card to your 'internet of thing'. Well no more! the new Raspberry Pi 3 has build in wireless networking. This is going to get interesting.



Wednesday Feb 17, 2016

Python users / Data Scientists measuring PITA levels

Before I get started trashing people me say I have the greatest respect for former and current colleges, but there is a large looming problem that needs to be addressed.

The fanboy level of Python usage in people, mainly data scientists, needs to stop.

A sick blind devotion to python complete unchecked by reason

I was talking to a Python user about Spark: 
Me: "What were you looking to use spark for"
Them: I hear there is PySpark
Me: Yes very interesting, what are you looking to use it for,
Them: PySpark 

ROFL: The only take away about the spark platform is PySpark? Nothing else seemingly was interesting or caught your attention? Really nothing about streaming or in memory processing, just PySpark? lol #blinders

Your would think [data] scientist want to learn things?

I encounter this debate mostly with hive-streaming. When someone asks me about hive streaming I look at the problem. Admittedly there are actually a couple of tasks most easily addressed with streaming. But the majority of streaming things can be solved much more efficiently and correctly by writing a simple UDF UDAF in Java. What normally is a common reply when a Hive Committer, who wrote a book on hive, explains unequivocally  that a UDF is better for performance, debugging, test ability, and is not that hard to write?

"I don't want learn how to compile things | learn about java | learn about what you think is the right way to do things", You would think that a data scientist who is trying to search for great truths would actually want to find the best way to use a tool they have been working with for years.

Just to note: In hive streaming everything moving in between processes via pipes and is like 4 context switches and two serializations for each row (not including the processing that has to happen in the pipe). 

I don't care that 100% of the environment is Java, im f*ckin special

A few years back someone (prototyping in python) suggested we install LibHDFS. later someone suggested we install WebHDFS. The only reason to install these things is they must use python to do things, even if there already is prior examples of doing this exact task in java in our code base. Sysadmins should install new libraries, open new ports, monitor new services, and we should change our architecture, just because the python user does want to use Java for a task that 10 previous people have used java for. 

"I'm Just prototyping"

This is the biggest hand waiver. When scoping out a new project don't bother looking for the best tool for the job. Just start hacking away at something and then whatever type of monstrosity appears, just say its already done, someone will just have you jam it into production anyway. Good lucky supporting the "prototype" with no unit tests in production for next 4 years. You would think that someone would take lead from a professional coder and absorb their best practices. No of course not, they instead will just tell you how best practices don't apply to them.#ThisISSparta!

Anyway its 7:00 am and I woke up to write this so that I can vent. But yea its not python, its not data scientists, but there is just a hybrid intersection of the two that is so vexing. 


Friday Jan 22, 2016

My day

[edward@bjack event-horizon-app]$ git log
commit 9de21fbc97a7f573f6b0564daff20f5ce23c723e
Author: Edward Capriolo <edward.capriolo@.com>
Date:   Fri Jan 22 16:14:20 2016 -0500

    Ow yes yaml cares about spaces...beacause ansible

commit de07401a0087e86253cbf9c0369010e21d248eb9
Author: Edward Capriolo <edward.capriolo@.com>
Date:   Fri Jan 22 16:10:57 2016 -0500

    Why not

commit 0be598151962f647528406bad21b3b8c8e887ffd
Author: Edward Capriolo <edward.capriolo@com>
Date:   Fri Jan 22 16:05:06 2016 -0500

    This is soo much better than just writing a shell script

commit 4f4ea0b8b462a61e3ecde71ff656da9e1324095b
Author: Edward Capriolo <edward.capriolo@.com>
Date:   Fri Jan 22 16:01:53 2016 -0500

    Why dont we have a release engineer

commit b77264618f2fbe689ecc09e4575e10935ba20600
Author: Edward Capriolo <edward.capriolo@.com>
Date:   Fri Jan 22 15:57:56 2016 -0500


commit 912597f1ba4284a5312398ad770f6fd1d76301a1
Author: Edward Capriolo <edward.capriolo@.com>
Date:   Fri Jan 22 15:52:21 2016 -0500

    The real yaml apparently

commit ee64c5c4340202b95a0f05784f30b63abd755d2d
Author: Edward Capriolo <edward.capriolo@.com>
Date:   Fri Jan 22 15:32:28 2016 -0500

    Always asume kill worked. so we can start if nothing is running

Tuesday Jan 12, 2016

'No deal' is better than a 'Bad Deal'

After working for a few companies a few things have become clear to me. Some background, I have been at small companies with no code, large companies with little code, small companies with a lot of code, and large companies where we constantly re-write the same code. 

I was watching an episode of 'shark tank'. Contestant X had a product, call it 'Product X', and four of the five sharks offered nothing. The 5th shark, being very shark like, used this opportunity to offer a 'bad' deal. The maker of 'Product X' thought it over, refused the deal, and left with no deal. The other sharks were more impressed with 'Contestant X' than Product X'. They remarked that , "No deal is better than a Bad Deal". This statement is profound and software products should be managed the same way.

Think about the phrase tech-debt. People might say tech-debt kills your agility. But it is really not the tech-debt alone that kills your agility, it is 'bad deals' that lead to tech debt. As software gets larger it becomes harder to shape and harder to manage. At some point software becomes very big, and change causes a cascade of tech debt. Few people want to remove a feature. Think about Mokeys on a Ladder, and compare this to your software. Does anyone ever ask you to remove a feature? Even if something is rarely used or never used someone might advocate keeping it, as it might be used later. Removing something is viewed as a loss, even if it really is addition by subtraction. Even if no one knows who asked for this rule people might advocate keeping it anyway! Heck even if you find the person who wanted the feature and they are no longer at the company, and no one else uses it, people might advocate keeping it anyway!

The result of just-keep-it thinking is you end up keeping around code you won't use, which prevents you from easily adding new code. How many times have your heard someone say, 'Project X (scoff)!? That thing is a mess! I can re-write that in scala-on-rails in 3 days'. 4 weeks later when Project X on-scala-on-rails is released a customer contacts you about how they were affected because some small business rule was not ported correctly due to an over-site.

The solution to these over-sites is not test-coverage or sprints dedicated to removing tech-dept. The solution is never to make a bad deal. Do not write software with niche cases. Do not write software with surprising rules. The way I do this is a mental litmus test: Take the exit criteria of an issue and ask yourself, "Will I remember this rule in one year". If someone asks you to implement something and you realize it was implemented a year ago and no one ever used it, push back let them know the software has already gone in this direction and it led no where. If your a business and your struggling to close deals because the 'tech people' can not implement X in time, close a deal that does not involve X.

'No deal' is better than a 'Bad Deal'

'No code' is better than 'Bad Code'

'No feature' is better than 'Bad Feature' 



Saturday Dec 26, 2015

Introducting TUnit

Some of my unit tests have annoying sleep statements in them. I open sourced TUnit for changing this.

The old way:

Assert.assertEquals(2 , s[2].getClusterMembership().getLiveMembers().size());

The new way:

TUnit.assertThat( new Callable(){
public Object call() throws Exception {
return s[2].getClusterMembership().getLiveMembers().size();
}}).afterWaitingAtMost(11, TimeUnit.SECONDS).isEqualTo(2);  

You can see this in action here.