Utilizing SiLK and Mothra to Establish Information Exfiltration through the Area Identify Service


A wide range of fashionable community threats contain knowledge theft through abuse of community companies, which is termed knowledge exfiltration. To trace such threats, analysts monitor knowledge transfers out of the group’s community, significantly knowledge transfers occurring through community companies not primarily meant for bulk switch companies. One such service is the Area Identify System (DNS), which is important for a lot of different Web companies. Sadly, attackers can manipulate DNS to exfiltrate knowledge in a covert method.

This SEI weblog put up focuses on how the DNS protocol may be abused to exfiltrate knowledge by including bytes of knowledge onto DNS queries or making repeated queries that comprise knowledge encoded into the fields of the question. The put up additionally examines the overall visitors analytic we will use to establish this abuse and applies a number of instruments out there to implement the analytic. The combination dimension of DNS packets can present a prepared indicator of DNS abuse. Nevertheless, as a result of the DNS protocol has grown from a easy handle decision mechanism to distributed database help for community connectivity, decoding the combination dimension requires understanding of the context of queries and responses. By understanding the amount of DNS visitors, each in isolation and in mixture, analysts could higher match outgoing queries and incoming responses.

The info used on this weblog put up is the CIC-BELL-DNS-EXF 2021 knowledge set, as printed along with the paper Light-weight Hybrid Detection of Information Exfiltration utilizing DNS based mostly on Machine Studying by Samaneh Mahdavifar et al.

The Function of DNS

DNS helps a number of sorts of queries. These queries are described in a wide range of Web Engineering Job Pressure (IETF) Request for Remark (RFC) paperwork. These RFCs embody the next:

  • A and AAAA queries for IP handle comparable to a site identify (e.g., “which handle corresponds to www.instance.com?” with a response like “192.0.2.27”)
  • pointer report (PTR) queries for identify comparable to an IP handle (e.g., “which identify corresponds to 192.0.2.27?” with a response of “www.instance.com”)
  • identify server (NS), mail alternate (MX), and service locator (SRV) queries for the identification of key servers in a given area
  • begin of authority (SOA) queries for details about addresses on which the queried server could communicate authoritatively
  • certificates (CERT) queries for encryption certificates pertaining to the server’s coated domains
  • textual content report (TXT) queries for extra info (as configured by the community administrator) in a textual content format

A given DNS question packet will request info on a given area from a specific server, however the response from that server could embody a number of useful resource information. The dimensions of the response will rely on what number of useful resource information are returned and the kind of every report.

As soon as analysts perceive the explanations for monitoring DNS visitors and the context wanted for decoding the monitoring outcomes, they will then decide what info is desired from the monitoring. This weblog put up assumes the analyst desires to trace exterior hosts that could be receiving exfiltrated info.

Overview of the Analytic for Figuring out Information Exfiltration

The analytic coated on this weblog put up assumes that the networks of curiosity are coated by visitors sensors that produce community circulate information or not less than packet captures that may be aggregated into community circulate information. There are a selection of instruments out there to generate these circulate information. As soon as produced, the circulate information are archived in a circulate repository or applicable database tables, relying on the evaluation software suite.

The strategy taken on this analytic is, first, to mixture DNS visitors related to exterior locations performing like servers and, second, to profile the visitors for these locations. Step one (affiliation) entails figuring out DNS visitors (both by service port or by precise examination of the appliance protocol), then figuring out the exterior locations concerned. The second step (profiling) examines what number of sources are speaking with every of the locations, the combination byte depend, packet depend, and different revealing info as described within the following sections.

A number of totally different instruments can be utilized for this evaluation. This weblog put up will focus on two units of SEI-developed instruments:

  • The System for Web-Degree Data (SiLK) is a group of visitors evaluation instruments developed to facilitate safety evaluation of huge networks. The SiLK software suite helps the environment friendly assortment, storage, and evaluation of community circulate knowledge, enabling community safety analysts to quickly question giant historic visitors knowledge units. SiLK is ideally suited to analyzing visitors on the spine or border of a big, distributed enterprise or mid-sized ISP.
  • Mothra is a group of Apache Spark libraries that help evaluation of community circulate information in Web Protocol Stream Info Export(IPFIX) format with deep packet inspection fields.

Every of the next sections will current an analytic for detecting exfiltration through DNS queries within the corresponding software set.

Implementing the Analytic through SiLK

Determine 1 under presents a collection of SiLK instructions to implement an analytic to detect exfiltration. The primary command applies a filter to regular, benign DNS visitors, isolating DNS visitors (recognized by protocol recognition as indicated by the appliance label of 53) coming from the interior community (classless inter-domain routing [CIDR] block 192.168.0.0/16) and of comparatively lengthy (70 bytes or extra) packets. The output of the filter is then summarized by vacation spot handle and transport protocol, counting bytes, circulate information, and packets for every mixture of handle and protocol. The ensuing counts are solely proven if the gathered bytes are 500 or extra. After making use of the analytic to benign DNS knowledge, it’s utilized within the second sequence to DNS knowledge encompassing compressed knowledge for exfiltration.

Screen Shot 2023-04-03 at 3.22.00 PM

Determine 1: SiLK Analytic and Outcomes

The ends in Determine 1 present that the community talks to a major DNS server, a secondary DNS server, and a public server. Within the benign case, the info is principally directed to the first DNS server and the general public server. Within the exfiltration case, the info is principally directed to the first DNS server and the secondary DNS server. This shift of vacation spot, in isolation, isn’t sufficient to make the exfiltration visitors suspicious or present a foundation for transferring past suspicion into investigation. Within the benign case, there’s a notable fraction of the visitors directed to the general public DNS server at 8.8.8.8. Within the visitors labeled as abusive, this fraction is lessened, and the fraction to a non-public DNS server (the exfiltration goal) at 224.0.0.252 is elevated. Sadly, given the restricted nature of SiLK circulate information, safety analysts have a tough time exfiltrating extra visitors. To go additional, extra DNS-specific fields are required. These fields are supplied by deep packet inspection (DPI) knowledge in expanded circulate information in IPFIX format. Whereas SiLK can not course of IPFIX circulate information, different instruments comparable to Mothra and databases can.

Implementing the Analytic through Mothra

The code pattern under reveals the analytic carried out in Spark utilizing the Mothra libraries. These libraries permit definition and loading of knowledge frames with community circulate report knowledge in both SiLK or IPFIX format. A knowledge body is a assortment of knowledge organized into named columns. Information frames may be manipulated by Spark features to isolate flows of curiosity and to summarize these flows. Defining the info frames entails figuring out the columns and the info to populate the columns. Within the code pattern, the info frames are outlined by the spark.learn.subject perform and populated by knowledge from both the captured benign visitors or the captured exfiltration visitors through Mothra’s ipfix perform. Collectively, these features set up the knowledge knowledge body.

The end result knowledge body is constructed from the knowledge knowledge body through a collection of filtering and summarization features. The preliminary filter restricts it to visitors labeled as DNS visitors, adopted by one other filter that ensures the information comprise DNS useful resource report queries or responses. The choose perform that follows isolates particular report options for summarization: time, visitors supply and vacation spot, byte and packet volumes, DNS names, DNS flags, and DNS useful resource report varieties. The groupBy perform generates the summarization for every distinctive DNS identify and useful resource report sort mixture. The agg perform specifies that the summarization comprise the depend of circulate information, the counts of supply and vacation spot IP addresses, and the totals for bytes and packets. The filter perform (after the summarization) restricts output to only these displaying a bytes-per-packet ratio of greater than 70 with fewer than three entries within the DNS Identify record. This final filter excludes summarizations of visitors that’s giant solely because of the size of the response record moderately than to the size of particular person queries.

This filtering and summarization course of creates a profile of huge DNS requests and responses (separated by DNS flag values). The usage of DNS names as a grouping worth permits the analytic to tell apart repeated queries to related domains. The counts of supply and vacation spot IP addresses permit the analyst to tell apart repeated visitors to a couple areas as a substitute of uncommon visitors to a number of areas or from a number of sources.

val data_dir = ".../path/to/knowledge"
import org.cert.netsa.mothra.datasources._
import org.cert.netsa.mothra.datasources.ipfix.IPFIXFields
import org.apache.spark.sql.features._

// In dnsIDBenign.sc:
val data_file = s"$data_dir/light_benign.ipfix"
// In dnsIDAbuse.sc:
// val data_file =
//   s"$data_dir/light_compressed.ipfix"

val knowledge = {
  spark.learn.fields(
    IPFIXFields.default, IPFIXFields.dpi.dns
  ).ipfix(data_file)
}



val end result = {
  knowledge
    .filter(($"silkAppLabel" === 53) &&
            (dimension($"dnsRecordList")>0))
    .choose(
       $"startTime",
       $"sourceIPAddress",
       $"destinationIPAddress",
       $"octetCount",
       $"packetCount",
       $"dnsRecordList.dnsRRType" as "dnsRRType",
       $"dnsRecordList.dnsQueryResponse" as "dnsQR",
       $"dnsRecordList.dnsResponseCode" as "dnsResponse",
       $"dnsRecordList.dnsName" as "dnsName")
     .groupBy($"dnsName",$"dnsRRType")
     .agg(depend($"*") as "flows",
          countDistinct($"sourceIPAddress") as "#sIP",
          countDistinct($"destinationIPAddress") as "#dIP",
          sum($"octetCount") as "bytes",
          sum($"packetCount") as "packets")
 //    .filter($"packets" > 20)
     .filter($"bytes"/$"packets" > 70)
     .filter(dimension($"dnsName") < 3)
     .orderBy($"bytes".desc)
}
end result.present(20,false)

The code pattern under reveals the output of dnsIDExfil.sc on benign and on compressed knowledge, the info units used within the previous SiLK dialogue. The presence of multicast (224/8 and 239/8 CIDR blocks) and RFC1918 personal addresses (192.168/16 CIDR blocks) is because of this knowledge coming from a synthetic assortment setting as a substitute of reside Web visitors seize.

Contrasting the benign output in opposition to the abuse output, we see a smaller variety of lookup addresses being queried within the abuse outcomes and a a lot faster drop-off within the variety of queries per host. Within the benign outcomes, there are six DNSNames which are queried repeatedly; within the abuse outcomes, there are two. All the queries proven are PTR (reverse. RRType=12) queries, and all are going to the identical server. Within the high-volume DNSName queries, the utmost common packet size is barely bigger for the abuse knowledge than for the benign knowledge (81 vs. 78). Taken collectively, these variations present a slow-and-steady launch of extra knowledge as a part of the DNS knowledge switch, which displays the file switch going down.

dnsIDBenign.sc output:
+-------------------------------------+---------+-----+----+----+------+-------+
|dnsName                              |dnsRRType|flows|#sIP|#dIP|bytes |packets|
+-------------------------------------+---------+-----+----+----+------+-------+
|[252.0.0.224.in-addr.arpa.]          |[12]     |2835 |1   |1   |416539|5901   |
|[150.20.168.192.in-addr.arpa.]       |[12]     |982  |1   |1   |242585|3125   |
|[200.20.168.192.in-addr.arpa.]       |[12]     |895  |1   |1   |134756|1836   |
|[15.20.168.192.in-addr.arpa.]        |[12]     |901  |1   |1   |133490|1844   |
|[100.20.168.192.in-addr.arpa.]       |[12]     |757  |1   |1   |112173|1533   |
|[2.20.168.192.in-addr.arpa.]         |[12]     |635  |1   |1   |91734 |1288   |
|[3.20.168.192.in-addr.arpa.]         |[12]     |315  |1   |1   |45438 |640    |
|[_ipps._tcp.local., _ipp._tcp.local.]|[12, 12] |122  |32  |1   |13161 |136    |
|[250.255.255.239.in-addr.arpa.]      |[12]     |74   |1   |1   |11328 |152    |
|[101.20.168.192.in-addr.arpa.]       |[12]     |31   |1   |1   |4666  |64     |
+-------------------------------------+---------+-----+----+----+------+-------+
solely displaying high 10 rows

dnsIDAbuse.sc output:
+-------------------------------------+---------+-----+----+----+------+-------+
|dnsName                              |dnsRRType|flows|#sIP|#dIP|bytes |packets|
+-------------------------------------+---------+-----+----+----+------+-------+
|[252.0.0.224.in-addr.arpa.]          |[12]     |1260 |1   |1   |191398|2696   |
|[2.20.168.192.in-addr.arpa.]         |[12]     |255  |1   |1   |130725|1615   |
|[150.20.168.192.in-addr.arpa.]       |[12]     |416  |1   |1   |63606 |866    |
|[200.20.168.192.in-addr.arpa.]       |[12]     |388  |1   |1   |57686 |788    |
|[15.20.168.192.in-addr.arpa.]        |[12]     |379  |1   |1   |56492 |781    |
|[100.20.168.192.in-addr.arpa.]       |[12]     |340  |1   |1   |50738 |694    |
|[3.20.168.192.in-addr.arpa.]         |[12]     |125  |1   |1   |17750 |250    |
|[250.255.255.239.in-addr.arpa.]      |[12]     |32   |1   |1   |4736  |64     |
|[_ipps._tcp.local., _ipp._tcp.local.]|[12, 12] |46   |30  |1   |4467  |51     |
|[_ipp._tcp.local., _ipps._tcp.local.]|[12, 12] |13   |9   |1   |1782  |19     |
+-------------------------------------+---------+-----+----+----+------+-------+
solely displaying high 10 rows

Understanding Information Exfiltration

Whichever type of tooling is used, analysts usually want an understanding of the info transfers from their community. Repetitive queries for DNS decision ought to be moderately uncommon—caching ought to remove many of those repetitions. As repetitive queries for decision are recognized, a number of teams of hosts could also be discovered:

  • Hosts that generate repetitive queries not indicative of exfiltration of knowledge are more likely to exist, characterised by very constant question dimension, periodic timing, and using anticipated identify servers.
  • Hosts that generate repetitive queries with uncommon identify servers or timing could require additional investigation.
  • Hosts that generate repetitive queries with uncommon identify servers or question sizes ought to be examined fastidiously to establish potential exfiltration.

The impression of those hosts on community safety will differ relying on the vary and criticality of belongings these hosts entry, however a number of the visitors could demand rapid response.

What May a Safety Analyst Need to Know

This put up is a part of a collection addressing a easy query: What would possibly a safety analyst wish to know in the beginning of every shift relating to the community? In every put up we’ll focus on one reply to this query and software of a wide range of instruments that will implement that reply. Our aim is to supply some key observations that assist analysts monitor and defend their networks, specializing in helpful ongoing measures, moderately than these particular to at least one occasion, incident, or situation.

We won’t give attention to signature-based detection, since there are a number of sources for such together with intrusion detection methods (IDS)/intrusion prevention methods (IPS) and antivirus merchandise. The instruments utilized in these articles will primarily be a part of the CERT/NetSA Evaluation Suite, however we’ll embody different instruments if useful. Earlier posts examined instruments for monitoring software program updates and proxy bypass.

Our strategy will probably be to focus on a given analytic, focus on the motivation behind the analytic, and supply the appliance as a labored instance. The labored instance, by intention, is illustrative moderately than exhaustive. The choice of what analytics to deploy, and the way, is left to the reader.

If there are particular behaviors that you simply wish to recommend, please ship them by e mail to netsa-help@cert.org with “SOC Analytics Concept” within the topic line.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles