Can we open source this?

Can a design system for the Norwegian postal service be open source? Can the GitHub repo be public? YES! We think that would be awesome.

But we work in a large organisation… And while rogue dev teams may be happy to open source — it’s understandable if other departments around us are not equally familiar with the idea. So we wrote up the benefits, showed that we have considered risks and concerns — and found cool examples of other companies doing the same as we want to do.


Benefits ✨

Working in the open has some really cool benefits:

Community involvement

We depend on the community for code, knowledge, inspiration and tutorials every single day in the work that we do. Giving back to this community by sharing our work should very much be in the spirit of our company, too.

Transparency is cool

There are no secrets in a design system, and everything is there in the browser on the sites using it. We can’t see a reason to develop it behind closed doors.

Awesome for recruiting

We do a lot of fun work in our team, and it’s super positive to be able to show some of it off and talk about more of it. Having a public repo is like a bat-signal for developers: come join us.

Fun and useful for contributors

There’s a personal benefit for contributors who can show off their work, and even ask peers for help.

Higher standard

Knowing that anyone and everyone can see your work, will hold us to a higher standard. If it helps us write better code and more useful commit messages, this is excellent.


Risks and concerns ✅

There are a couple of things we need to be aware of:

Security

Developers pushing to a public repo need to be conscious of this. Our mindset should be that the moment we push a commit, that code is public knowledge. It’s important to make sure that certain details are never part of the repo: passwords, server names, deploy scripts and so on.

  • We can write guidelines around this.
  • We can have an onboarding for new contributors.
  • We can consider Git Hooks as a preventive measure.
  • We can consider having a private repo with a public mirror. But this is perhaps more relevant for applications with a higher risk profile, in case of accidentally pushing code that shouldn’t be pushed.

Privacy

Developers working with the design system could feel they are being imposed with a transparency they don’t want. The people involved right now are excited about a public repo, but it’s possible to imagine someone else later feeling different about this.

  • If someone joining the project wants anonymity, they could use whatever GitHub account they want. It doesn’t have to contain their real name, or be the same one they’ve used before.

Licensing

Code in a public repo needs a license. We need to make sure that there are no conflicts between the license we choose, and the licenses of other parts of code we use.

  • Which license should our code have?
  • Which licenses are there on other code that we use?
  • Are there any conflicts between these licenses?
  • We can consider hosting fonts, icons and assets in a different repo.

Credits

Hat tips are polite anyway, but especially important in a public repo.

  • We can maintain a credits.md to say thank you for any code, tools and ideas we use.

Worth the extra effort?

There is potentially “more work” being created than when developing behind closed doors. It holds us to a higher standard on everything from hacks, commit messages and discussions in issues.

  • We can change our mind. If it turns out the benefits do not outweigh the extra effort, we can revert to private.

Do any other companies do this? 🤔

They sure do. Here are some examples of companies with design systems, UI frameworks or pattern libraries in public repos:

Our favourite example is Salesforce and their Lightning Design System. It’s useful to point to their repo and show that absolutely anyone with access to the internet can see both code and commits. And if this is okay for a company like Salesforce — we can do it too.


So where is the design system…? We’re ironing out final details, but stay tuned and we’ll get there soon.

Connection Pooling with c3p0

We at Bring have been using c3p0 in several of our applications, over the past years. Recently we faced an issue where we started to see lots of CannotGetJdbcConnectionException errors. The underlying database is MySQL for reference.

We thought that the issues were related to a migration to a new data center at first, but we ruled that out once we checked out that there were not enough slow queries to back this claim.

Stacktrace:

org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: An attempt by a client to checkout a Connection has timed out.
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:80)
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:628)
	at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:693)
	at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:725)
	at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:735)
	at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:790)
Caused by: java.sql.SQLException: An attempt by a client to checkout a Connection has timed out.
	at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:118)
	at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:77)
	at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:690)
	at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:140)
	at sun.reflect.GeneratedMethodAccessor93.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at net.bull.javamelody.JdbcWrapper$3.invoke(JdbcWrapper.java:765)
	at net.bull.javamelody.JdbcWrapper$DelegatingInvocationHandler.invoke(JdbcWrapper.java:285)
	at com.sun.proxy.$Proxy52.getConnection(Unknown Source)
	at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
	at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
	... 99 more
Caused by: com.mchange.v2.resourcepool.TimeoutException: A client timed out while waiting to acquire a resource from com.mchange.v2.resourcepool.BasicResourcePool@7d3c22a5 -- timeout at awaitAvailable()
	at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(BasicResourcePool.java:1467)
	at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:644)
	at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(BasicResourcePool.java:554)
	at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutAndMarkConnectionInUse(C3P0PooledConnectionPool.java:758)
	at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:685)
	... 108 more

Upon digging in further we found that:

  • There are not enough slow queries in the database to explain these errors.
  • The threaddumps for this application are weird. Lots of threads stuck waiting on something called GooGooStatementCache in c3p0.
  • Heapdump reveals several hundreds of NewPreparedStatement from c3p0.
  • The application logs contain lots and lots of “APPARENT DEADLOCK” WARN-entries from c3p0, indicating that the problem comes from the connection pool library, not mysql.

Futher on analysing the application logs, which looked something like this:

WARN  [c.ThreadPoolAsynchronousRunner] [user:] - com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@15046d35 -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tasks!
WARN  [c.ThreadPoolAsynchronousRunner] [user:] - com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@15046d35 -- APPARENT DEADLOCK!!! Complete Status:

Some of those were CONNECTION TIMEOUT and APPARENT DEADLOCK WARN Errors which is how we identified it was a c3p0 issue. Then we found more details about this on stack overflow. Reply from Steve Waldman, who developed c3p0 helped us solving this problem.

Looking at our configurations we found one property to be configured somewhat wrong. maxStatements in c3p0 was set to a higher value.

So we decided to set the maxStatements=0 which will disable the statement cache entirely and it won’t need to defer the close. Problem solved, isn’t it?

Yes it did, but what we saw after deploying this fix was quite fascinating. We set up a metric in Kibana to check the response time for this application. These are the graphs:

The response time here got reduced almost by a factor of 5. Last week, 90th percentile response time was ~900ms, now it’s 200ms. Quite Impressive. Hope this might help someone who has a similar problem.

b-scripts

In the web team we have a suite of scripts, that we call the b-scripts. This is a collection of scripts that automate some common tasks. Because of our faith in automation, this collection of scripts keeps growing over time and we thought it would be fun to review some of them. We can already reveal that the most difficult part of any of the scripts has been to setup good auto-completion, but it has been well worth the effort. All of our scripts are sub commands to the b command. The b command sets up some environment variables, metrics support, help, and auto-completion for the other commands.

Application specific scripts can be created by putting them in the scripts folder of the application git repository. A lot of applications have their own tooling, but we try to put commonly used functionality in the b-script repository. Currently there are scripts written in python, ruby and bash. In principle, we can use any language, but generally people write scripts in the language they are most comfortable with.

Some of the commonly used scripts are :

  • b blame – Used to find out who has recently deployed an application
  • b branch – Links a git branch with a trello card
  • b deploy – The most used script, we will certainly blog about this script in detail in the future
  • b good-morning – Updates all your local git repositories
  • b heapdump – Generates and downloads a heap dump from a running application
  • b help – Show detailed usage information for another b-script
  • b liquibase-migrate – Apply liquibase database migrations for an application
  • b logs – Shows logs for an application
  • b list-pull-requests – Shows every pull request in a github repository and how long they were active
  • b release - Builds a release-version of an application and uploads it to our nexus
  • b restart - Restarts an application without user-visible downtime
  • b ssh - ssh into app-servers for application in given environment
  • b show - Shows any section of an applications deployment configuration
  • b build-status - Jenkins uses it to notify on github if a branch has build successfully

Altogether we have over 100 different b-scripts registered across all our repositories. We make sure to instrument them, so we can check how well they are working, and we will definitely write about a few of them in more detail in our future blog posts.

Alerting in grafana

As mentioned in our previous blog, We at bring, use influx and grafana extensively, as one of the monitoring tools to collect statistics and visualize different aspects of applications performance.

We have been quite excited with the latest version of grafana, which now provides alerting engine, which we can set up alert rules on the statistics that we collect all over.

As shown in above image, grafana provides built in support for configuring different types of notification channels on which you would want to get notified when the alert rule set up on your monitoring is satisfied.

We use slack as the common communication tools and wanted alerts from grafana to reach out on slack, grafana also provides a built in support for integrating these alerts to slack, but there’s an issue with proxy support in the latest version of grafana that causes slack integration to not work for us.

So we moved towards another possible simplest solution, grafana also provides ability to get alerts posted on a webhook with a predefined payload which looks something like this.

{
  "title": "My alert",
  "ruleId": 1,
  "ruleName": "Load peaking!",
  "ruleUrl": "http://url.to.grafana/db/dashboard/my_dashboard?panelId=2",
  "state": "alerting",
  "imageUrl": "http://s3.image.url",
  "message": "Load is peaking. Make sure the traffic is real and spin up more webfronts",
  "evalMatches": [
    {
      "metric": "requests",
      "tags": {},
      "value": 122
    }
  ]
}

So it was as simple as creating a web hook in one of our custom monitoring application, parsing above payload and post it to slack, via a home grown library which can post to slack. The advantage of using web hook is that it lets you do any custom processing that you might need to do when an alert rule triggers an alert.

So once configurations for notification channel is done (which in our case is a webhook), we can move towards setting up Rule on our stats that we measure and then grafana takes care of posting to the configured notification channel on that alert rule, when the rule passes.

The way to configure a rule is pretty straightforward and simple.

We have started using grafana alerts on some of the statistics that we collect, which we used to monitor manually before. It sure is really cool feature to take advantage off. Hope this was helpful and happy reading.

Measuring jvm stats

As a part of creating a new application that has strict performance requirements and needs to deal with large files, one of the teams working on mybring wanted a tool to investigate how much memory their JVM was using. It’s easy to see how much memory a process is allocating on a linux system, but it’s more difficult to find out how much of the allocated memory that it is actually using. The JVM likes to allocate up to its maximum heap size very aggressively, but it might take a long time for it to actually use all that memory.

There are some tools you can use to connect to a running JVM to inspect this, but for applications with these requirements, historical data is going to be very interesting. Previously, we’ve used tools like Java Melody to get information like this. Java Melody is a good tool, but it has some disadvantages. To use it in every JVM application we have, we would need to add code to all of them somehow. Also, for the metric type of data, we prefer to use Grafana which we’ve previously blogged about here. We don’t know if it’s possible to retrieve the data from Java Melody into Grafana.

We already had telegraf running on all our servers. It collects system information, such as free memory, load averages, cpu utilization every 10 seconds, and it seemed natural that we could leverage that to do something similar for JVM metrics. As it turns out, recent versions of telegraf has a very cool plugin for a tool called Jolokia, which is an agent that you can use with the JVM to get an HTTP-interface where you can perform JMX queries. You can use JMX queries to answer questions such as “How much memory is in the heap now?” or “How many threads are there?” – which is exactly what we needed.

Since all of our applications are set up and configured with configuration management, it seemed that going for this approach we could make a one-time effort to get this sort of monitoring for every JVM application we run. We used puppet to add the Jolokia agent to our servers and to add a startup parameter to a couple of our applications to activate it. After that, we added configuration to telegraf to make it perform some interesting JMX queries every time it runsf. We used a template that looks a lot like this:

# Read JMX metrics through Jolokia
[[inputs.jolokia]]
## This is the context root used to compose the jolokia url
## NOTE that Jolokia requires a trailing slash at the end of the context root
context = "/jolokia/"
## List of servers exposing jolokia read service
[[inputs.jolokia.servers]]
name = "<%= @name %>"
host = "127.0.0.1"
port = "<%= @port %>"

[[inputs.jolokia.metrics]]
name = "heap_memory_usage"
mbean = "java.lang:type=Memory"
attribute = "HeapMemoryUsage"

Puppet will put one configuration file for each application into /etc/telegraf/telegraf.d for us. Since telegraf runs on the same server as the applications, it can reach Jolokia on the loopback interface, which means we do not have to expose JMX on an external IP address.

After testing with a couple of applications and getting some visualization up and running, we decided that it was working well and we activated this puppet configuration for most of our applications. The nice thing for us is that we got this monitoring working for all of our applications for pretty much the same amount of work as doing it for a single one.

It’s easy to add more JMX queries to the configuration files, other data that we’ll collect is connection pool data, threads, non-heap memory usage and the amount of loaded classes. One scenario where this will be very useful is when we have a memory leak and want to figure out which change that caused it, or when we run out of Compressed Class Space and want to figure out which change that caused us to start leaking classloaders.

Additionally, the latest version of Grafana lets us set up alerting based on thresholds on the data we collect, which seems like something we’ll end up using. All in all, we’re very happy with the telegraf, InfluxDB and Grafana stack which makes it really easy to add new monitoring to our systems.