Calico Cloud Pro documentation

Flow log data types

Big picture

Calico Cloud sends the following data to Elasticsearch.

The following table details the key/value pairs in the JSON blob, including their Elasticsearch datatype.

Name	Datatype	Description
`host`	keyword	Name of the node that collected the flow log entry.
`start_time`	date	Start time of log collection in UNIX timestamp format.
`end_time`	date	End time of log collection in UNIX timestamp format.
`action`	keyword	- `allow`: Calico Cloud accepted the flow. - `deny`: Calico Cloud denied the flow.
`bytes_in`	long	Number of incoming bytes since the last export.
`bytes_out`	long	Number of outgoing bytes since the last export.
`dest_ip`	ip	IP address of the destination pod. A null value indicates aggregation.
`dest_name`	keyword	Contains one of the following values: - Name of the destination pod. - Name of the pod that was aggregated or the endpoint is not a pod. Check `dest_name_aggr` for more information, such as the name of the pod if it was aggregated.
`dest_name_aggr`	keyword	Contains one of the following values: - Aggregated name of the destination pod. - `pvt`: endpoint is not a pod. Its IP address belongs to a private subnet. - `pub`: endpoint is not a pod. Its IP address does not belong to a private subnet. It is probably an endpoint on the public internet.
`dest_namespace`	keyword	Namespace of the destination endpoint. A `-` means the endpoint is not namespaced.
`dest_port`	long	Destination port. Not applicable for ICMP packets.
`dest_service_name`	keyword	Name of the destination service. A `-` means the original destination did not correspond to a known Kubernetes service (e.g. a services ClusterIP).
`dest_service_namespace`	keyword	Namespace of the destination service. A `-` means the original destination did not correspond to a known Kubernetes service (e.g. a services ClusterIP).
`dest_service_port`	keyword	Port name of the destination service. A `-` means : - the original destination did not correspond to a known Kubernetes service (e.g. a services ClusterIP), or - the destination port is aggregated. A `*` means there are multiple service port names matching the destination port number.
`dest_type`	keyword	Destination endpoint type. Possible values: - `wep`: A workload endpoint, a pod in Kubernetes. - `ns`: A Networkset. If multiple Networksets match, then the one with the longest prefix match is chosen. - `net`: A Network. The IP address did not fall into a known endpoint type.
`dest_labels`	array of keywords	Labels applied to the destination pod. A hyphen indicates aggregation.
`dest_domains`	array of keywords	Find all the destination domain names for use in a DNS policy by examining `dest_domains`. The field displays information on the top-level domains linked to the destination IP. Applies to flows reported from the source to destinations outside the cluster. If `flowLogsDestDomainsByClient` is disabled, having `dest_domains`: ["A"] doesn't guarantee that the flow corresponds to a connection with domain name A. The destination IP may also be linked to other domain names not yet captured by Calico.
`reporter`	keyword	- `src`: flow came from the pod that initiated the connection. - `dst`: flow came from the pod that received the initial connection.
`num_flows`	long	Number of flows aggregated into this entry during this export interval.
`num_flows_completed`	long	Number of flows that were completed during the export interval.
`num_flows_started`	long	Number of flows that were started during the export interval.
`num_process_names`	long	Number of unique process names aggregated into this entry during this export interval.
`num_process_ids`	long	Number of unique process ids aggregated into this entry during this export interval.
`num_process_args`	long	Number of unique process args aggregated into this entry during this export interval.
`nat_outgoing_ports`	array of ints	List of NAT outgoing ports for the packets that were Source NAT'd in the flow
`packets_in`	long	Number of incoming packets since the last export.
`packets_out`	long	Number of outgoing packets since the last export.
`proto`	keyword	Protocol.
`policies`	array of arrays	List of policies that interacted with this flow. See Format of the policies field.
`process_name`	keyword	The name of the process that initiated or received the connection or connection request. This field will have the executable path if flowLogsCollectProcessPath is enabled. A "-" indicates that the process name is not logged. A "*" indicates that the per flow process limit has exceeded and the process names are now aggregated.
`process_id`	keyword	The process ID of the corresponding process (indicated by the `process_name` field) that initiated or received the connection or connection request. A "-" indicates that the process ID is not logged. A "*" indicates that there are more than one unique process IDs for the corresponding process name.
`process_args`	array of strings	The arguments with which the executable was invoked. The size of the list depends on the per flow process args limit.
`source_ip`	ip	IP address of the source pod. A null value indicates aggregation.
`source_name`	keyword	Contains one of the following values: - Name of the source pod. - Name of the pod that was aggregated or the endpoint is not a pod. Check `source_name_aggr` for more information, such as the name of the pod if it was aggregated.
`source_name_aggr`	keyword	Contains one of the following values: - Aggregated name of the source pod. - `pvt`: Endpoint is not a pod. Its IP address belongs to a private subnet. - `pub`: the endpoint is not a pod. Its IP address does not belong to a private subnet. It is probably an endpoint on the public internet.
`source_namespace`	keyword	Namespace of the source endpoint. A `-` means the endpoint is not namespaced.
`source_port`	long	Source port. A null value indicates aggregation.
`source_type`	keyword	The type of source endpoint. Possible values: - `wep`: A workload endpoint, a pod in Kubernetes. - `ns`: A Networkset. If multiple Networksets match, then the one with the longest prefix match is chosen. - `net`: A Network. The IP address did not fall into a known endpoint type.
`source_labels`	array of keywords	Labels applied to the source pod. A hyphen indicates aggregation.
`original_source_ips`	array of ips	List of external IP addresses collected from requests made to the cluster through an ingress resource. This field is only available if capturing external IP addresses is configured.
`num_original_source_ips`	long	Number of unique external IP addresses collected from requests made to the cluster through an ingress resource. This count includes the IP addresses included in the `original_source_ips` field. This field is only available if capturing external IP addresses is configured.
`tcp_mean_send_congestion_window`	long	Mean tcp send congestion window size. This field is only available if flowLogsEnableTcpStats is enabled
`tcp_min_send_congestion_window`	long	Minimum tcp send congestion window size. This field is only available if flowLogsEnableTcpStats is enabled
`tcp_mean_smooth_rtt`	long	Mean smooth RTT in micro seconds. This field is only available if flowLogsEnableTcpStats is enabled
`tcp_max_smooth_rtt`	long	Maximum smooth RTT in micro seconds. This field is only available if flowLogsEnableTcpStats is enabled
`tcp_mean_min_rtt`	long	Mean MinRTT in micro seconds. This field is only available if flowLogsEnableTcpStats is enabled
`tcp_max_min_rtt`	long	Maximum MinRTT in micro seconds. This field is only available if flowLogsEnableTcpStats is enabled
`tcp_mean_mss`	long	Mean TCP MSS. This field is only available if flowLogsEnableTcpStats is enabled
`tcp_min_mss`	long	Minimum TCP MSS. This field is only available if flowLogsEnableTcpStats is enabled
`tcp_total_retransmissions`	long	Total retransmitted packets. This field is only available if flowLogsEnableTcpStats is enabled
`tcp_lost_packets`	long	Total lost packets. This field is only available if flowLogsEnableTcpStats is enabled
`tcp_unrecovered_to`	long	Total unrecovered timeouts. This field is only available if flowLogsEnableTcpStats is enabled

Format of the policies field

The policies field contains three sub-fields, all_policies, enforced_policies, and pending_policies.

Name	Datatype	Description
`all_policies`	array of keywords	A comma-delimited list of all enforced and staged network policies that applied an action to the flow up until the first enforced policy to apply a verdict.
`enforced_policies`	array of keywords	A comma-delimited list of all enforced network policies that applied an action to the flow up until the first enforced policy to apply a verdict.
`pending_policies`	array of keywords	A comma-delimited list of all enforced and staged network policies that applied an action to the flow up until the first enforced or staged policy that applied a verdict.

Each entry in the list has the following format:

<index>|<tier name>|<policy name>|<action>|<rule index>

Where,

<index> numbers the order in which the rules were hit, starting with 0.

tip
Sort the entries of the list by the <index> to see the order that rules were hit. The entries are displayed in random order due to the way they are stored in the datastore.
<tier name> is the name of the policy tier containing the policy, or __PROFILE__ for a rule derived from a Profile resource (this is the internal datatype used to represent a Kubernetes namespace and its associated "default allow" rule).
<policy name> is the name of the policy/profile; its format depends on the type of policy:
- <tier>.<name> for Calico Cloud GlobalNetworkPolicy.
- <namespace>/knp.default.<name> for Kubernetes NetworkPolicy.
- <namespace>/<tier>.<name> for Calico Cloud NetworkPolicy.
Staged policy names are prefixed with "staged:".
<action> is the action performed by the rule; one of allow, deny, pass.
<rule index> if non-negative, is the index of the rule that was matched within the policy, starting with 0. Otherwise, a special value:
- -1 means the reporting endpoint was selected by the policy but no rule matched. The traffic hit the default action for the tier. In this case, the <policy name> is selected arbitrarily from the set of policies within the tier that apply to the endpoint.
- -2 means "unknown". The rule index was not recorded.

Flow log example, with `no aggregation`

A flow log with aggregation level 0, no aggregation, might look like:

  {
    "start_time": 1597166083,
    "end_time": 1597166383,
    "source_ip": "192.168.47.9",
    "source_name": "access-6b687c8dcb-zn5s2",
    "source_name_aggr": "access-6b687c8dcb-*",
    "source_namespace": "policy-demo",
    "source_port": 42106,
    "source_type": "wep",
    "source_labels": {
      "labels": [
        "pod-template-hash=6b687c8dcb",
        "app=access"
      ]
    },
    "dest_ip": "192.168.138.79",
    "dest_name": "nginx-86c57db685-h6792",
    "dest_name_aggr": "nginx-86c57db685-*",
    "dest_namespace": "policy-demo",
    "dest_port": 80,
    "dest_type": "wep",
    "dest_labels": {
      "labels": [
        "pod-template-hash=86c57db685",
        "app=nginx"
      ]
    },
    "proto": "tcp",
    "action": "allow",
    "reporter": "dst",
    "policies": {
      "all_policies": [
        "0|tier1|ns1/tier1.policy1|pass|3",
        "1|tier2|ns2/staged:tier1.policy1|deny|1",
        "2|default|policy-demo/default.access-nginx|allow|2"
      ],
      "enforced_policies": [
        "0|tier1|ns1/tier1.policy1|pass|3",
        "1|default|policy-demo/default.access-nginx|allow|2"      
      ],
      "pending_policies": [
        "0|tier1|ns1/tier1.policy1|pass|3",
        "1|tier2|ns2/staged:tier1.policy1|deny|1",       
      ]
    },
    "bytes_in": 388,
    "bytes_out": 1113,
    "num_flows": 1,
    "num_flows_started": 1,
    "num_flows_completed": 1,
    "packets_in": 6,
    "packets_out": 5,
    "http_requests_allowed_in": 0,
    "http_requests_denied_in": 0,
    "original_source_ips": null,
    "num_original_source_ips": 0,
    "host": "bz-n8kf-kadm-node-1",
    "@timestamp": 1597166383000
  }

The log shows an incoming connection reported by the destination node, allowed by a policy on port 80. The start_time and end_time describe the aggregation period (5 min.) During this interval, one flow ("num_flow": 1) was recorded. At higher aggregation levels, flows from endpoints performing the same operation and originating from the same Deployment/ReplicaSet are grouped into a single log. In this example, the common source endpoints that are prefixed with access-6b687c8dcb-. Parameters like source_ip may be dropped and set to null, depending on the aggregation level. As aggregation levels increase, more flows will be grouped together based on your data. For more details on aggregation levels, see configure flow log aggregation.

Big picture​

Format of the policies field​

Flow log example, with no aggregation​

Big picture

Format of the policies field

Flow log example, with `no aggregation`