Engineering Analytics

Unlock True Engineering Intelligence: Mastering Custom Formulas in Keypup

Thomas Williams

June 10, 2025

min

Back to blogs

Table of Content

In today's data-driven software development landscape, generic dashboards and pre-canned reports often fall short. Engineering leaders and teams need the power to slice, dice, and interpret their data in ways that answer their specific questions, reflect their unique workflows, and drive their strategic goals.

This is where the true magic of Keypup shines: Custom Formulas.

Imagine having a Business Intelligence (BI) tool, but one meticulously designed for the intricacies of software development data. That's the level of flexibility Keypup's custom formulas unlock. No longer are you confined to out-of-the-box metrics. You can now build bespoke dimensions and metrics that precisely model your team's performance, project health, and operational efficiency. This isn't just reporting; it's deep, actionable engineering intelligence.

Why Embrace Custom Formulas? The Power of Tailored Insights

Go Beyond the Standard: While Keypup offers a rich set of default metrics, your team's needs might be unique. Custom formulas let you define what "Lead Time" truly means for your process, or how you categorize PR complexity.
Answer Unique Business Questions: "What's the average time spent by senior developers on bug-fix PRs for Project X?" or "What percentage of our features are delivered without any P1/P2 bugs in the subsequent sprint?" Custom formulas can tackle these.
Combine Data in Novel Ways: Mix and match fields, apply conditional logic, and perform calculations across different data points to uncover correlations and trends you might otherwise miss.
Total Flexibility and Control: You define the logic, the calculations, and the categorizations. This empowers you to create dashboards that are not just informative, but transformative for your decision-making.
Evolve with Your Processes: As your development practices change, your custom formulas can be adapted, ensuring your reporting always stays relevant.

How Custom Formulas Work in Keypup: The Basics

At its core, a Keypup custom formula is a single-line expression that you write to compute a new value. This value can then be used as a dimension (for grouping or categorizing data) or a metric (a numerical value, often aggregated).

These formulas operate on the rich data available within Keypup's datasets (or "facts"), such as:

ISSUES & PULL REQUESTS: Your central hub for issues and pull requests.
ACTIVITY EVENTS: Granular events occurring on issues/PRs (e.g., status changes, assignments).
COMMITS: Detailed information about individual code commits.
COMMENTS: Textual interactions on issues, PRs, and reviews.
REVIEWS: Data related to pull request review processes.

Each field within these datasets (like created_at, lines_changed, author_username, label_names, state) becomes a variable you can use in your formulas.

The Anatomy of a Keypup Formula: Operators and Functions

Keypup's formula engine supports a wide array of operatorsand functions, giving you extensive capabilities:

1. Standard Operators:

Arithmetic: + (plus), - (minus), * (multiply), / (divide), ^(power, e.g., 2^3), % (modulo, e.g., 17 % 3).
Comparison: == (equal to), != (not equal to), > (greater than), >= (greater than or equal to), < (less than), <= (less than or equal to).
Logical Combinators:
- AND or &&: True if both expressions are true.
- OR or ||: True if at least one expression is true.
- NOT(expression) or ! (expression): Inverts the boolean value of an expression.
Ternary Operator: A concise way to write an if-else statement: condition ? value_if_true : value_if_false.

2. Data Types in Formulas:

Strings: Enclosed in double (") or single (') quotes.Example: "PULL_REQUEST", 'OPEN'.
Integers: Whole numbers. Example: 23, 100.
Floats: Numbers with decimals. Example: 14.3.
Booleans: true or false.
NULL: Represents the absence of a value.

3. Working with Dates & Times:

Keypup handles dates and datetimes in ISO8601 format.

Date Arithmetic:
- Subtracting two dates (date2 -date1) returns the difference in seconds.
- You can add or subtract a duration (in seconds) from a date.
Time Constants: To convert seconds into more readable units, divide by time constant functions:
- HOUR(): Seconds in an hour (3600).
- DAY(): Seconds in a day (86400).
- WEEK(), MONTH(), QUARTER(),YEAR(): Standardized second counts for these periods.
- Example: (date2 - date1) / DAY() gives the difference in days.
NOW(): Returns the current date and time. Incredibly useful for calculating ages or durations against the present.
Date Manipulation & Formatting Functions:
- BEGINNING_OF_HOUR(datetime), BEGINNING_OF_DAY(datetime), etc.: Truncate a datetime to the start of the specified period.
- END_OF_HOUR(datetime), END_OF_DAY(datetime), etc.: Get the end of the specified period.
- YEAR(datetime), MONTH(datetime), DAY(datetime), WEEK(datetime): Extract parts of a date.
- YEAR_MONTH(datetime), YEAR_QUARTER(datetime), YEAR_WEEK(datetime): Format dates for grouping (e.g.,"2023-Q4").
- DATE(year, month, day): Construct a date.

4. Array Magic:

Many fields in Keypup datasets are arrays (e.g. label_names, assignee_usernames).

Syntax: ["foo", "bar"] for string arrays, [1, 2, 3] for integer arrays.
Accessing Elements: my_array[0] (first element), my_array[-1] (last element). AT_INDEX(my_array, index) is an alternative.
Concatenation: array1 + array2 (arrays must be of the same type).
Removal: array1 - array2 (removes elements in array2 from array1).
FLATTEN(array_field) (Dimension Formulas Only): This powerful operator creates a new record for each value in the array. For example, FLATTEN(label_names) would allow you to count items per individual label, rather than per combination of labels. It must be a top-level operator.
Array Functions:
- LENGTH(array_or_string): Returns the number of elements or characters.
- CONTAINS(list,term_or_list_of_terms): Checks if a list contains one or more specified terms.
- CONTAINS_ALL(list,term_or_list_of_terms): Checks if a list contains all specified terms.
- CONTAINS_EXACTLY(list,term_or_list_of_terms): Checks for exact, order-insensitive equality between two lists.
- ARRAY_FIND(list, value1, value2,...): Returns the first value from value1, value2,... found in list. Useful for prioritizing.
- ARRAY_FILTER(list, regex): Returns a new array containing only elements from list that match the regex.

5. Text Manipulation & Pattern Matching:

CONCAT(value1, value2, ..., valueN): Joins multiple values into a single string. You can also use + for string concatenation (e.g.,"Hello " + world_field).
Regex Matching:
- string ~ regex_pattern: Returns true if string matches the regex_pattern. (e.g., title ~ "bug")
- string !~ regex_pattern: Returns true if string does NOT match.
- MATCH(string, regex_pattern): Equivalent to ~.
- NOT_MATCH(string, regex_pattern): Equivalent to !~.
- IF_MATCH(target_string, regex1,result1, regex2, result2, ..., else_result): Evaluates a string against multiple regexes and returns the corresponding result for the first match.
Substring Extraction:
- LEFT(string, num_chars): Extracts characters from the left.
- RIGHT(string, num_chars): Extracts characters from the right.
SPLIT(string, delimiter): Splits a string into an array of substrings based on a delimiter.

6. Conditional Logic & Control Flow:

IF(condition, value_if_true, [value_if_false]): The classic conditional. Supports else if by adding more condition/value pairs.
IF_NULL(value, fallback_value): Returns value if it's not NULL, otherwise returns fallback_value. Both must be of the same type.
IF_ZERO(value, fallback_value): Returns value if it's not 0, otherwise returns fallback_value. Both must be of the same type.
BETWEEN(value, start, end): Checks if value is inclusively between start and end.
IN(value, list_of_values): Checks if value is present in the list_of_values.

7. Type Conversion:

TO_INT(value): Converts a value to an integer.
TO_FLOAT(value): Converts a value to a float.
TO_STR(value): Converts a value to its string representation.

Dimensions vs. Metrics: Understanding the Difference

Custom formulas can be used for both dimensions and metrics,but their roles and rules differ:

Dimension Formulas:
- Purpose: To group, categorize, or label your data. The result of a dimension formula is often a string, boolean, or a transformed date.
- Operators: Can use all standard operators and dimension-specific functions (like FLATTEN, YEAR_MONTH, ARRAY_FIND).
- No Aggregators (Generally): Dimension formulas themselves do not perform aggregations like SUM or AVG.FLATTEN is a special case that restructures data before aggregation by a metric.
- Example: IF(lines_changed >500, "Large PR", "Small PR") - This creates a new dimension to group PRs by size.

Metric Formulas:
- Purpose: To perform calculationsa nd return a numerical result that is typically aggregated.
- Operators: Can use all standard operators, dimension functions, AND must include at least one metric aggregator function.
- Aggregators are Key: Functions like COUNT(), SUM(field), AVG(field), MAX(field), MIN(field),COUNT_IF(condition), SUM_IF(condition, field) are essential.
- Example: AVG((IF_NULL(merged_at,NOW()) - created_at) / DAY()) - This calculates the average cycle time in days.The AVG() is the crucial aggregator.

Crafting Your Custom Formulas: Practical Examples

Let's bring this to life with some examples, primarily using the ISSUES & PULL REQUESTS dataset:

1. Advanced PR Categorization (Dimension Formula)

Purpose: Categorize PRs based on size and whether they resolve issues.

Formula:

Explanation:

Checks if type is "pull_request".
If true, it concatenates a size category ("Large" or "Small") with a linkage status ("- Linked" if it resolves issues, "- Orphan" otherwise).
LENGTH(resolved_issue_sysids) > 0 checks if the PR is linked to any issues.
If not a PR, it returns "Not a PR".

Resulting Insight:

2. Average Coding Time Before First Review Request (Metric Formula - GitHub context)

Purpose: Calculate the average time from the first commit (or PR creation) to when a review is first formally requested (for platforms like GitHub that track this).

Formula:

Explanation:

LEAST(created_at, first_commit_at): Takes the earlier of PR creation or the first commit as the true start of coding.
IF_NULL(review_requested_at, NOW()): If a review hasn't been requested, uses NOW() for ongoing PRs (or you might filter for PRs where review_requested_at is not null).
The difference is converted to hours.
AVG(): Calculates the average across all relevant PRs.

Resulting Insight:

3. Percentage of "Bug" Fixes in Merged PRs (Metric Formula)

Purpose: Determine what percentage of merged PRs are tagged as fixing a "Bug".

Formula:

‍Explanation:

COUNT_IF(type == "pull_request" AND state == "MERGED" AND CONTAINS(metatags, "Bug")): Counts merged PRs that have "Bug" in their metatags.
COUNT_IF(type == "pull_request" AND state == "MERGED"): Counts all merged PRs.
IF_ZERO(..., 1): Prevents division by zero.
* 100.0: Multiplies by a float to ensure float division for percentage.
ROUND(..., 1): Rounds to one decimal place.

Resulting Insight:

Sign-up and accelerate your engineering organization today !

Request a demo

4. Workload Distribution by Issue Sub-Type (Dimension Formula)

Purpose: Group issues by a standardized set of sub-types, even if the raw sub_type field has many variations.

Formula:

Explanation:

IF_MATCH(): Checks sub_type against a series of regex patterns.
(?i): Makes the regex case-insensitive.
If "bug" or "defect" is found, it returns "Bug/Defect", and so on.
"Other" is the fallback if no patterns match.

Resulting Insight:

5. Identifying Stale Open PRs (Dimension Formula)

Purpose: Flag open PRs that haven't been updated in over 7 days.

Formula:

Explanation:

This formula will return true for stale open PRs, and false otherwise. This boolean dimension can then be used for filtering or grouping.
(NOW() - updated_at) / DAY() > 7: Checks if the time since the last update is greater than 7 days.

Resulting Insight:

The insight's drilldown provides the details of all PRs in both categories:

Tips for Success with Custom Formulas

Start Simple: Don't try to build overly complex formulas from the get-go. Start with a basic idea and add complexity incrementally.
Know Your Data: Familiarize yourself with the datasets and fields available in Keypup.
Test Incrementally: If building a complex formula, test parts of it to ensure they behave as expected.
Use Parentheses: Generously use parentheses () to control the order of operations and make your formulas easier to read and debug.
Metric Formulas Need Aggregators: This is a common pitfall. If you're creating a metric, ensure you have at least one aggregator function (e.g., SUM, AVG, COUNT).

Conclusion: Your Data, Your Rules

Keypup's custom formulas transform it from a simple reporting tool into a powerful analytics engine tailored for software development. By mastering these formulas, you gain the ability to ask nuanced questions, define precise KPIs, and uncover the insights that truly matter to your team and organization.

The journey to data-driven engineering excellence is paved with understanding, and custom formulas are your vehicle. So, dive in, experiment, and start building the reports that will illuminate your path to better software delivery. The power is now truly in your hands!

Back to blogs