Category Archives: Five-Eyes

Canadian government expects another Snowden-level leak, documents say - Toronto Star 20160709

Canadian government expects another Snowden-level leak, documents say - Toronto Star 20160709

Revelations about Five Eyes mass surveillance has “changed the tone” on Internet issues, but Canada wants free and open cyberspace.

It’s not a matter of if there will be another Edward Snowden, it’s a matter of when, according to internal government documents obtained by the Star.

Global Affairs officials warned minister Stéphane Dion in November an event on the scale of Snowden’s disclosures about Internet surveillance is inevitable.

“Incidents similar to the Snowden disclosures and the Sony hack will happen again and we can expect that sudden events will affect international debates on cyberspace,” the document reads.

The briefing note, prepared for Dion in November and obtained under access to information law, suggests that Snowden’s disclosures about Western mass surveillance “altered the tone” of the international discussion on cyberspace.

In 2013 Snowden, a former employee of the U.S. National Security Agency (NSA), pulled back the curtain on mass surveillance online, detailing the capabilities of the “Five Eyes” countries — Canada, the United States, the U.K., Australia and New Zealand — to monitor activity online. His release of classified NSA documents triggered outrage among those who said he put lives at risk, and praise from others who argued he shed light on questionable practices and has forced needed change. He was forced to flee the U.S. and was granted asylum in Russia.

Then in 2014, hackers broke into Sony company computers and released thousands of emails, documents and sensitive personal information. U.S. federal investigators blamed North Korea.

While Canada has long advocated for an open and free Internet, suggestions that the nation’s spy agency the Communications Security Establishment (CSE) has engaged in mass online surveillance have complicated that narrative.

But the documents state Ottawa remains committed to a free Internet — not only from a democratic point of view, but for the potential for Canadian businesses and consumers to access ever-broadening online markets.
“The Internet owes its success to its open design, its global and interconnected nature, and its flexible and inclusive governance structure,” the documents read.

“All states are grappling with how to harness the potential of networked technologies while managing their far-ranging impacts … The goal (for Canada) is to protect human rights and democratic space, recognize legitimate public safety needs, and preserve the openness and dynamism that has brought about such enormous benefit.”

In a statement Saturday, a spokesperson for Global Affairs said the federal government believes that protecting online privacy and supporting human rights go hand in hand.

“Canada is concerned about rising threats emanating from cyberspace, including from repressive governments and their proxies, as well as the growing threats posed by cybercrime and terrorists’ use of the Internet,” wrote spokesperson John Babcock in an email to the Star.

“While addressing cyber threats, we must not legitimize Internet controls that will be used to restrict human rights and freedoms and hinder the free flow of information.”

The Star reported in 2015 that CSE has stepped up their efforts to guard against “insider threats” since Snowden shared an unprecedented trove of intelligence documents with journalist Glenn Greenwald in 2013. The move was also prompted by a Halifax-based Royal Canadian Navy officer, Jeffrey Delisle, who sold secrets to Russia in 2012.

“Following the unauthorized disclosures of Canadian Navy Sub-Lieutenant Jeffrey Delisle and NSA contractor Edward Snowden, CSE has intensified its efforts to tighten already stringent security,” read CSE’s 2013-14 report to the minister of national defence.

The documents note that Canadian media coverage about Internet security has tended to focus on large-scale hacks, such as the 2014 breach at the National Research Council, or the Heartbleed exploit used on the Canada Revenue Agency that same year.

But officials make clear Canada’s interest in the file goes beyond playing defence against malicious actors. The documents note that a number of “authoritarian regimes” are hoping to impose greater control over their citizens’ access to cyberspace.

“Domestically, they employ repression and censorship. Internationally, they lobby for greater state regulation of cyberspace, including calls to bring it under UN control,” the documents read.

“They also seek to rewrite current understandings of international law to shape the international cyber environment to reflect their values and interests. The same states also exploit cyberspace through espionage and theft of sensitive information from government and private sector networks, including those of Canada.”

Officials censored the names of individual countries they accused of such actions, although Ottawa has previously called out China as the hand behind the NRC hack. At the same time, other countries have accused Five Eyes partners of conducting economic espionage of the own.

The documents note that Global Affairs has been involved in a range of activities promoting an open and free internet, including advocating for human rights and freedoms online and committing $8 million over the last decade to promote cyber security in the Americas and Southeast Asia.

A technical reading of the “HIMR Data Mining Research Problem Book” - Conspicuous Chatter 20160203

A technical reading of the “HIMR Data Mining Research Problem Book” - Conspicuous Chatter 20160203

Boing Boing just released a classified GCHQ document that was meant to act as the Sept 2011 guide to open research problems in Data Mining. The intended audience, Heilbronn Institute for Mathematical Research (HIMR), is part of the University of Bristol and composed of mathematicians working for half their time on classified problems with GCHQ.

First off, a quick perusal of the actual publication record of the HIMR makes a sad reading for GCHQ: it seems that very little research on data mining was actually performed post-2011-2014 despite this pitch. I guess this is what you get trying to make pure mathematicians solve core computer science problems.

However, the document presents one of the clearest explanations of GCHQ’s operations and their scale at the time; as well as a very interesting list of open problems, along with salient examples.

Overall, reading this document very much resembles reading the needs of any other organization with big-data, struggling to process it to get any value. The constrains under which they operate (see below), and in particular the limitations to O(n log n) storage per vertex and O(1) per edge event, is a serious threat — but of course this is only for un-selected traffic. So the 5000 or so Tor nodes probably would have a little more space and processing allocated to them, and so would known botnets — I presume.

Secondly, there is clear evidence that timing information is both recognized as being key to correlating events and streams; and it is being recorded and stored at an increasing granularity. There is no smoking gun as of 2011 to say they casually de-anonymize Tor circuits, but the writing is on the wall for the onion routing system. GCHQ at 2011 had all ingredients needed to trace Tor circuits. It would take extra-ordinary incompetence to not have refined their traffic analysis techniques in the past 5 years. The Tor project should do well to not underestimate GCHQ’s capabilities to this point.

Thirdly, one should wonder why we have been waiting for 3 years until such clear documents are finally being published from the Snowden revelations. If those had been the first published, instead of the obscure, misleading and very non-informative slides, it would have saved a lot of time — and may even have engaged the public a bit more than bad powerpoint.

Some interesting points in the document in order:

  • It turns out that GCHQ has a written innovation strategy, reference [I75]. If someone has a copy it would be great to see it, and also understand where the ACE program fits in it.
  • GCHQ, at the time used heavily two families of technologies: Hadoop, for bulk processing of raw collected data (6 months for meta-data apparently), and IBM Streams (DISTILLERY) for stream, real-time, processing. A lot of the discussion, and open problems, relate to the fact that bulk collection can only provide a limited window of visibility, and intelligence related selection, and processing has to happen within this window. Hence the interest in processing streaming data.
  • Section 2 is probably the clearest explanation of how modern SIGINT works. I would like to congratulate the (anonymous) author, and will be setting it as the key reading for my PETS class. It summarizes well what countless crappy articles on the Snowden leaks have struggled to piece together. I wish journalists just released this first, and skipped the ugly slides.
  • The intro provides a hit at the scale of cable interception operations as of 2011. It seems that 200 “bearers” of 10 Gigabit / sec were being collected at any time; it makes clear that many more sources were available to switch to depending on need. That is about 2 Terabit / sec, across 3 sites (Cheltenham, Bude, and LECKWITH).
  • Section 2.1.2 explains that a lot (the majority) of data is discarded very early on, by special hardware performing simple matching on internet packets. I presume this is to filter out bulk downloads (from CDNs), known sources of spam, youtube videos, etc.
  • The same section (2.1.2) also explains that all meta-data is pulled from the bearer, and provides an interpretation of what meta-data is.
  • Finally (2.1.2) there is a hint at indexing databases (Query focused databases / QFD) that are specialized to store meta-data, such as IP traffic flow data, for quick retrival based on selectors (like IP addresses).
  • Section 2.1.3 explains the problem of “target development”, namely when no good known selectors exist for a target, and it is the job of the analyst to match it though either contact chaining or modus-operandi (MO) matching. It is a very instructive section, which is the technical justification underpinning a lot of the mass surveillance going on.
  • The cute example chosen to illustrate it (Page 12, end of 2.1.3): Apparently GCHQ developed many of those techniques to spy on foreign delegations during the 2009 G20 meeting. Welcome to London!
  • Section 2.2.2 provides a glimpse at the cybersecurity doctrine and world-view at GCHQ, already in 2011. In particular, there is a vision that CESG will act as a network security service for the nation, blocking attacks at the “firewalls”, and doing attribution (as if the attacks will be coming “from outside”). GCHQ would then counter-attack the hostile sources, or simply use the material they intercepted from others (4th party collection, the euphemism goes).
  • Section 2.2.3 provides a glimpse of the difficulties of running implants on compromised machines: something that is openly admitted. Apparently ex-filtrating traffic and establishing command-and-control with implants is susceptible to passive SIGINT, both a problem and an opportunity.
  • Section 3 and beyond describes research challenges that are very similar with any other large organization or research group: the difficulty of creating labelled data sets for training machine learning models; the challenges on working on partial or streaming data; the need for succinct representations of data structures; and the problem of inferring “information flow” — namely chains of communications that are related to each other.
  • It seems the technique of choice when it comes to machine learning is Random Decision Forrests. Good choice, I also prefer it to others. They have an in-house innovation: they weight the outputs of each decision tree. (Something that is sometimes called gradual learning in the open literature, I believe).
  • Steganography detection seems to be a highlight: however there is no explanation if steganography is a real problem they encountered in the field, or if it was an easy dataset to generate synthetically.
  • Section 4, deals with research problems of “Information Flow in Graphs”. This is the problem of associated multiple related connections together, including across types of channels, detecting botnet Command and Control nodes, and also tracing Tor connections. Tracing Tor nodes is in fact a stated problem, with a stated solution.
  • Highlights include the simple “remit” algorithm that developed by Detica (page 26, Sect. 4.2.1); PRIME TIME that looks at chains of length 2; and finally HIDDEN OTTER, that specifically targets Tor, and botnets. (Apparently an internal group codenamed ICTR-NE develop it).
  • Section 4.2.2 looks at communications association through temporal correlation: one more piece of evidence that timing analysis, at a coarse scale, is on the cards for mining associations. What is cute is the example used is how to detect all GCHQ employees: they are the ones with phones not active between 9am and 5pm when they are at work.
  • Beside these, they are interested in change / anomaly detection (4.4.1), spread of information (such as extremest material), etc. Not that dissimilar from say an analysis Facebook would perform.
  • Section 5 poses problem relating to algorithms on streaming graph data. It provides a definitions of the tolerable costs of analysis algorithms (the semi-streaming paradigm): for a graph of n vertices (nodes), they can store a bit of info per vertex, but not all edges, or even process all edges. So they have O(n log n) storage and can only do O(1) processing per event / edge. That could be distilled to a set of security assumptions.
  • Section 5.2.2 / 5.2.3 is an interesting discussion about relaxations of cliques and also point of that very popular nodes (the pizza delivery line) probably are noise and should be discarded.
  • As of 2011 (sect 5.2.4) it was an open problem how far contact chaining is required. This is set as an open problem, but states that analysis usually use 2-hops from targets. Note that other possible numbers are 3, 4, and 5 since after 6 you probably have included nearly everyone in the world. So it is not that exciting a problem and cannot blame the pure mathematicians for not tackling it.
  • Section 5.5.1 asks the question on whether there is an approximation of the correlation matrix, to avoid storing and processing an n x n matrix. It generally seems that matching identifiers with identifiers is big business.
  • Section 6 poses problems relating to the processing, data mining, and analysis of “expiring” graphs, namely graphs with edges that disappear after a deadline. This is again related to the constraint that storage for bulk un-selected data is limited.
  • In section 6.3.2 the semi-streaming model where only O(n log n) storage per vertex is allowed, and O(1) processing per incoming event / edge is re-iterated.
  • Appendix A deals with models of academic engagement. I have to say it is very enlightened: it recognizes the value of openly publishing the research, after some sanitization. Nice.
  • Appendix B and C discuss the technical details and size of the IBM Streams and Hadoop clusters. Section D presents the production clusters (652 nodes, total 5216 cores, and 32 GB memory for each node).
  • Section E discusses the legalities of using intercepted data for research, and bless them they do try to provide some logging and Human Rights Justification (bulk authorization for research purposes).

Canada’s hacking power awes Brazilian security expert - The Globe and Mail 20131012

Canada’s hacking power awes Brazilian security expert - The Globe and Mail 20131012

Brazilian security expert Paulo Pagliusi says he is “astonished” by Canada’s hacking power.

He recently spent three hours reviewing the leaked Communications Security Establishment Canada (CSEC) slides on behalf of Brazil’s FantasticoTV program, which broadcast a report last week alleging CSEC spied on internal communications at the Brazilian Ministry of Mines and Energy (MME).

A retired navy officer-turned-chief executive for Procela IT Security Intelligence, a security-intelligence company, Mr. Pagliusi answered questions from The Globe and Mail via e-mail. The exchange has been edited.

You said that you were amazed by the “sheer power” of this attack. Can you expand on why you said this?

I was astonished by the power of these tools to infiltrate the ministry, such as the “Olympia” program from CSEC. I was especially surprised by the detailed and straightforward way in which the process is explained to intelligence agents, and how thoroughly the Brazilian ministry’s communications were dissected.

The leaked documents have also shown how the data gleaned through espionage was shared with an international spy network, named the “Five Eyes.” [An alliance of five English-speaking countries – Australia, Britain, Canada, New Zealand and the United States – to share intelligence and electronic eavesdropping is commonly known as “Five Eyes.”]

How would you describe the nature of the Olympia program?

As a result of using Olympia for infiltrating the ministry over an unspecified period, the CSEC has developed a detailed map of the institution’s communications. As well as monitoring e-mail and electronic communications, the Olympia program screens I have seen in that presentation have shown that CSEC could also have eavesdropped on telephone conversations.

The MME uses an encrypted server. What could CSEC see by getting inside it?

These MME servers use private encryption, for instance, to contact the National Oil Agency, Petrobras, Eletrobras, the National Department of Mineral Production and even the president of the Republic. CSEC could see state conversations, government strategies upon which no one should be able to eavesdrop.

What is the significance of the CSEC metadata maps showing MME communications to Saudi, Jordan, Eritrea, even Canada?

It means that CSEC has mapped a number of communications of the mentioned countries, being able to monitor e-mail and electronic communications and eavesdropping on telephone conversations.

What is the significance of the slide saying CSEC wanted to call in “TAO” for a “man on the side” operation?

Tailored Access Operations (TAO) is a cyber-warfare intelligence-gathering unit of the U.S. National Security Agency.

TAO identifies, monitors, infiltrates and gathers intelligence on computer systems. In my opinion, the author of the CSEC presentation makes the next steps very clear. Among the actions suggested, there is a joint operation with TAO for an invasion known as “Man on the Side.” All incoming and outgoing communications in the network can be copied, but not altered.

It would be like working on a computer with someone looking over your shoulder.

Do you have any theories about what precisely Canada wanted inside the MME servers?

Considering only the documents leaked by Edward Snowden, I have seen, it is not possible to conclude what precisely Canada wanted inside the MME servers.

However, the speculation it could be broad based economic trend information makes to me perfect sense. In my opinion, specific technology (i.e. “Does Brazil have tech to explore ocean fields that rest of world lacks?”) cannot be found in MME servers.

Heilbronn Institute for Mathematical Research - Data mining research problem book

Excerpt from Snowden document, published by Boing Boing - 20160202

Ways of working

This section gives a few thoughts on ways of working. The aim is to build on the positive culture already established in the Institute’s crypt work. HIMR researchers are given considerable freedom to work in whatever way suits them best, but we hope these ideas will provide a good starting-point.

A.1 Five-eyes collaboration

As on the crypt side, we hope that UKUSA collaboration will be a foundation-stone of the data mining effort at HIMR. This problem book is full of links to related research being carried out by our five-eyes partners, and researchers are very strongly urged to pursue collaborative angles wherever possible—above all, to get to know the people working on the same problems and build direct relationships. Researchers are encouraged to attend and present at community-wide conferences (principally SANAR and ACE), as funding and opportunity allows. We hope that informal short visits to and from HIMR will also be a normal part of data mining life. HIMR has a tradition of holding short workshops to focus intensively on particular topics, where possible with participation from experts across the five eyes community. Frequently these are held during university vacations, to allow our cleared academic consultants to take part. Each summer, HIMR hosts a SWAMP: a two-month long extended workshop on (traditionally) two topics of high importance, similar to the SCAMPs organized by IDA. We hope that HIMR researchers will feel inspired to suggest possible data mining sub-topics for future SWAMPs.

A.2 Knowledge sharing

Inevitably, there is a formal side to reporting results: technical papers, conference talks, code handed over to corporate processing, and so on. But informal dissemination of ideas, results, progress, set-backs and mistakes is also extremely valuable. This is especially true at HIMR, for several reasons.

  • There is a high turnover of people, and it is important that a researcher’s ideas (even the half-baked ones) don’t leave with him or her.
  • Academic consultants form an important part of the research effort: they may only have access to classified spaces a few times a year for a few days at a time, so being able to catch up quickly with what’s happened since their last visit is crucial to help them make the most of their time working with us.
  • HIMR is physically detached from the rest of GCHQ, and it’s important to have as many channels of communication as possible—preferably bidirectional!—so that this detachment doesn’t become isolation. The same goes even more so for second party partners as well. In HIMR’s METEOR SHOWER work, knowledge sharing is now primarily accomplished through two compartmented wikis hosted by CCR Princeton. For data mining, there should be more flexibility, since almost none of the methods and results produced will be ECI, and in fact they will usually be STRAP1 or lower. Paradoxically, however, the fact that work can be more widely shared can mean that there is less of a feeling of a community of interest with whom one particularly aims to share it: witness the fact that there is no shining model of data mining knowledge sharing elsewhere in the community for HIMR to copy! We suggest that as far as possible, data miners at HIMR build up a set of pages on GCWiki (which can then be read and edited by all five-eyes partners) in a similar way to how crypt research is recorded on the CCR wikis. They can then encourage contacts at GCHQ and elsewhere to watch, edit and comment on relevant pages. In particular, the practice of holding regular bull sessions 10 and taking live wiki notes during them is highly recommended. If any researchers feel so inclined, GCBlog and the other collaborative tools on GCWeb are available, and quite suitable for all STRAP1 work. For informal communications with people from MCR and ICTR, there is a chat-room called himr_dm: anyone involved in the HIMR data mining effort can keep this open in the background day by day. There is also a distillery room that is sadly under-used: in principle, it discusses SPL and the corporate DISTILLERY installations. For any STRAP2 work that comes along, there are currently no good collaborative options: creating an email distribution list would be one possibility.

A.3 Academic engagement

The first test for HIMR’s classified work must be its applicability and usefulness for SIGINT, but given that constraint, GCHQ is keen to encourage HIMR researchers to build relationships and collaborate with academic data miners, and publish their results in the open literature. Of course, security and policy will impose some red lines on what exactly is possible, but the basic principle is that when it comes to data mining, SIGINT data is sensitive, but generally applicable techniques used to analyse that data often are not. Just about everyone nowadays, whether they are in academia, industry or government, has to deal with big data, and by and large they all want to do the same things to it: count it, classify it and cluster it. If researchers develop a new technique that can be published in an open journal once references to SIGINT are excised, and after doing a small amount of extra work to collect results from applying it to an open source dataset too, then this should be a win-win situation: the researcher adds to his or her publication tally, and HIMR builds a reputation for data mining excellence. Of course, there may be occasions when publication is not appropriate, for example where a problem comes from a very specific SIGINT situation with no plausible unclassified analogy. Day-to-day contact with the Deputy Director at HIMR should flag up cases like this early on. There are also cases where we feel we have an algorithmic advantage over the outside that is worth trying to maintain, and this can be further complicated if equity from other partners is involved, or if a technique brings in ideas from areas like crypt where strict secrecy is the norm. The Deputy Director should be consulted before discussing anything that might be classified in a non-secure setting: he or she can further refer the question to Ops Policy if necessary.

Informal meetings at blackboards where people briefly describe work they have been doing and problems they have encountered, with accompanying discussion from others in the room. The rules: people who wish to speak bid the number of minutes they need (including time for questions). Talks are ordered from low to high bid, with ties broken arbitrarily. You can ask questions at any time. You can leave at any time. If you manage to take the chalk from the speaker, you can give the talk.