Tableau Filter the data on display and not the underlying data
I was recently set a challenge to filter a chart based on something that happened in the past, but still to show all time periods. The specifics of the task was to look at the number of employees a company had in 2010, and depending on that amount either show or hide all years for those companies that had the right amount of employees. In other words create a ‘Number of employees in 2010′ filter.
The dataset contained only 3 fields, Company, Year and Employees.
Initially it seems simple. Create a calculated field to calculate the number of employees the company had in 2010 and use that as a filter. This is what I did, the field was called [2010 Employee Count] and the formula I used was SUM(IF [Year] = 2010 THEN [Employees] END). Next I had to set up the visualisation to test what I created worked.
I dragged the Year to the columns shelf, Employees to the rows shelf and summed it, the Company to the colour shelf, giving me a line graph with annual employee counts over time split by company. This gives me the following line chart:
Next I tested the filter by dragging the calculated field [2010 Employee Count] to the filter shelf and using the At Least filter to show all companies with over X employees in 2010. This is where it became evident that this wasn’t going to work – the filter affected the entire dataset and not only the selected companies as can be seen below:
The reason this filter doesn’t work is because it’s returning Null for all years that are not 2010.
The challenge was to work out how to filter the data that was already displayed and not the entire underlying dataset. I was browsing the internet for solutions to this and came across a comment on the forums that if using a table calculation as a filter this only filters the data on display and not the entire source.
This meant a change in strategy to complete this task. First of all I had to change the calculated field [2010 Employee Count] to be a table calculation. I used the WINDOW_SUM as I still wanted to sum the number of employees in 2010. The formula became:
WINDOW_SUM(SUM(IF [Year] = 2010 THEN [Employees] END))
The default calculate by table(across) works fine in this case but make sure when using the field that the results are calculated along Year for each Company.
If this new field is dragged into the filters, exactly as like previously, it will now work showing all years for all companies with the selected amount of employees in 2010.
To tidy this up I actually used an integer parameter to select the minimum number of employees and created another calculated field based on the parameter value and the 2010 employee count returned by the calculated field. The filter field [2010EmployeeFilter] is
IF [2010EmployeeCount] >= [MinEmployees2010] THEN 1 ELSE 0 END
Put this into the filter shelf, make sure it’s the only filter on the shelf, set the value to be At Least 1 and then you have the same as I created above.
The lesson from this exercise is about using table calculations as filters as they only filter the results and don’t filter the entire dataset.