# filter - Clojure Standard Library

For a quick intro to this series of blog posts check out Clojure Standard Library - Intro. It includes a lot of useful info, including notes about presentation of examples and more.

filter

## Quick Overview

### Description

clojure.core/filter is a function that is used to filter out items within a collection. It accepts a predicate function and a collection. It returns a lazy sequence. filter lazily evaluates the items in the collection, returning the items for which the predicate returns true.

A transducer is returned when no collection is supplied.

### Examples

(filter even? [0 1 2 3 4 5 6 7 8 9 10])
; => (0 2 4 6 8 10)

;; Filter out everything but the letter 'a'.
(filter #(= % \a) "Mississippi")
; => ()

;; Filter out everything but the letter 's'.
(filter #(= % \s) "Mississippi")
; => (\s \s \s \s)

;; Filter out everything but the letter 'p'.
(filter #(= % \p) "Mississippi")
; => (\p \p)


1

2

## How To Use

### Parameters and Return Values

filter is a multiple-arity function, which means that it uses arity overloading in order to provide different behavior based on the number of arguments provided. filter accepts one or two arguments.

• One Argument: (filter pred)

Providing only pred (predicate), returns a transducer; this is not the same as partial application or currying.

Quick note on predicates: In Clojure a predicate is a function that evaluates some condition and returns either true or false.

Ex:

;; -- Transducer Examples --

;; Create a transducer that filters collections using the even? function and
;; bind it to the symbol 'even?-t'.
(def even?-t (filter even?))
; => #'user/even?-t

;; Use the transducer 'even?-t' to filter the collection of numbers.
(transduce even?-t conj [1 2 3 4 5 6 7])
; => [2 4 6]

• Two Arguments: (filter pred coll)

Providing both pred and a collection (coll), returns a lazy sequence of items for which the predicate returned true.

Ex:

(filter #(< (count %) 4) ["aa" "aaaa" "a"])
; => ("aa" "a")

;; Creating a vector of maps that represent different pets.
(def pets [{:name "Spot" :type "dog"}
{:name "Mittens" :type "cat"}
{:name "Jack" :type "parrot"}
{:name "Fred" :type "dog"}])
; => #'user/pets

;; Filtering the collection of pets based on which are dogs.
(filter #(= (:type %) "dog") pets)
; => ({:name "Spot", :type "dog"} {:name "Fred", :type "dog"})

;; Filter the collection of pets to find which pets have a name that starts
;; with the character \M.
(filter #(= (first (:name %)) \M) pets)
; => ({:name "Mittens", :type "cat"})


## Expanded Description

### Examining Simplified Source Code

Below is a simplified version of the source code for filter. This simplified version is called filter-smpl. The main difference between the original and smpl-random-sample, is that I have changed it from being a multiple-arity function to a two-arity funcion. This means that it no longer accepts a variable number of arguments. It requires both pred and coll and behaves identically to filter, when both pred and coll are provided, in most cases. 3 I made this change in order reduce the number of concepts involved and to make it easier to understand.

(defn filter-smpl                      ; Name
"Simplified version of filter."      ; Doc String
([pred coll]                         ; Parameters
(lazy-seq                          ; Function Body: start
(when-let [s (seq coll)]
(let [f (first s) r (rest s)]
(if (pred f)
(cons f (filter pred r))
(filter pred r))))))))     ; Function Body: end



#### Breaking It Down:

The meat of the function, the portion that contains the main “filtering” logic, is this bit right here:

(let [f (first s) r (rest s)]  ; line A
(if (pred f)                 ; line B
(cons f (filter pred r))   ; line C
(filter pred r)))          ; line D


Now that all of the other parts have been removed, it’s easier to see how the function works. Let’s start off by breaking down this section of the filter function line by line.

##### Line A: (let [f (first s) r (rest s)]

Note that the value of s is a sequence backed by the collection coll.

The first thing you will notice is that this line makes use of a let form. A quick explanation of let is that it is a macro that allows you to bind values to names and have those bindings only exist within the let expression. Also, those bindings will not interfere with existing bindings within contexts outside of the let expression (e.g. global).

Within this let form the results of (first s) and (rest s) are being bound to the names f and r respectively. (first s) makes use of the function first which, in this expression, will return the first item in the sequence s. (rest s) uses the function rest which will return a sequence of all the items in s after the first item.

So, now we have the first item in s bound to the name f and the rest of the items in s bound to the name r.

##### Line B: (if (pred f)

Line B contains the start of a standard ‘if’ statement. If the result of the predicate (pred) being called on the first item (f) from the collection (coll by way of s) is true then the expression on line C will be evaluated, otherwise the expression on line D will be evaluated.

##### Line C: (cons f (filter pred r))

Note that this line is only evaluated if the expression (pred f) returns true.

Let’s examine this in the order that it would be evaluated:

(filter pred r): is a recursive call 4 5. For an explanation of what filter does go back to the beginning of this post. In the recursive call pred is being provided, again, as the predicate and r, the leftover items from coll (by way of s), is being provided as the collection to be used. This expression will return a lazy sequence of items that have been filtered.

(cons f (filter pred r)): this expression uses the function cons, simply put, cons will return a new sequence where f will be the first item followed by all the items in the sequence that results from the expression (filter pred r). This is the part of the function that places the items that “pass” the predicates test in the final result.

##### Line D: (filter pred r)))

Note that this line is only evaluated if the expression (pred f) returns false.

You will likely notice that the expression (filter pred r) is also in line C, for a detailed explanation of it refer to the section above about line C.

An understanding of the purpose of this line can be gained by comparing it to line C. The only real difference between the two lines is this bit: (cons f. Remember that the purpose of the cons expression from line C is to place f at the front of the result of the recursive call to filter. So, not including (cons f will result in f not being included in the result. This effectively filters out f from the result.

##### Putting the lines together

Now lets examine the expression as a whole, this will make it easier to understand how the individual lines contribute to the “filtering” action.

In line A we started by separating the first item (f) from the sequence (s) from the rest (r). Then in line B we called the predicate on f to see if it should be filtered out or not. If the predicate returned true, meaning that f should be included in the result, we move to line C. In line C we made a recursive call, this results in this same process being applied to the rest of the sequence (r). Then we took the result of the recursive call and placed f at the front of it. If the predicate returned false, meaning that f should be filtered out, then we skip line C and move to line D where we only make the recursive cal, effectively filtering out f.

#### The Other Bits

##### The lazy-seq expression

The use of the macro lazy-seq creates a lazily evaluated sequence. The use of lazy-seq not only creates a lazy sequence, it is also what allows for a recursive call. A recursive call can be made without overflowing the stack, because the traversal state is stored on the heap instead of the stack. 6

##### when-let

The when-let expression, in this context, tests if coll is empty or not. If it is empty the body is not run, effectively signaling the end of the result sequence. If it is not empty a sequence, backed by coll, is bound to s and the body is computed.

1. The special form #(= % \a) is a lambda (anonymous function) that returns true if the argument passed to it is the character ‘a’. ↩︎
2. Check out the post on: even? ↩︎
3. I removed the part that returns a transducer when only pred is supplied. I also removed a section that is optimized for dealing with chunked sequences. For the most up to date version of the actual source code click on the source code link in this page for random-sample↩︎
5. For how this is possible with out issue refer to The lazy-seq expression section below. ↩︎