Query expressions
DRAFT DRAFT DRAFT DRAFT
These are my raw notes on section 7.16 of the C# Language Specification. Section 7.16 falls within section 7 on expressions.
Heuristic Model
- Notes. Add notes for each section.
- Definitions. Add definitions for the chapter.
- Examples. After adding definitions, then add examples.
- Edit. After adding examples, then edit for readability etc.
My Personal Conventions
- terminology is italicized
- code is in
back ticks
Intro
query expression syntax is similar to that of relational and hierarchical query languages
- begins with
from
clause - ends with either
select
orgroup
clause - after the initial
from
can come zero or more of these clausesfrom
let
where
join
orderby
- each
from
clause is a generator and includes:- a range variable...
- which ranges over the elements of a sequence
- each
let
clause- introduces a range variable
- representing a value computed by means of previous range variables
- each
where
clause- is a filter
- that excludes items from the result
- each
join
clause- compares specified keys of the source sequence
- with keys of another sequence
- yielding matching pairs
- each
orderby
clause- reorders items
- according to specified criteria
- the final
select
orgroup
clause- specified the shape of the result
- in terms of the range variables
- an
into
clause- can "splice" queries
- by treating the results of one query
- as a generator in a subsequent query
Ambiguities
The way to mixing contextual keywords into strings.
from
where
join
on
equals
into
let
orderby
ascending
descending
select
group
by
The above are keywords when they occur anywhere within a query expression.
To use these keywords within a query expression, prefix them with @
from @select
in (new string[] { "from", "select" })
select @select
Where a query expression is any expressions that
- starts with
from <em>identifier
- followed by any token except
;
=
or,
Translation
The steps for turning a query expression into fluent syntax.
- C# does not specify the execution semantics of query expressions.
- Rather, the compiler translates query expressions into methods
Where | |
Select | |
SelectMany | |
Join | |
GroupJoin | |
OrderBy | |
OrderByDescending | |
ThenBy | |
ThenByDescending | |
GroupBy | |
Cast |
- These methods must have particular
- signatures
- result types
- These methods can be
- instance methods of the object being queried, or
- extension methods that are external to the object.
- [I'd like to see an example of override the Linq Extension Methods]
- The translation:
- is a syntactic mapping
- occurs prior to any type binding or overload resolution
- is guaranteed to be syntactically correct
- is NOT guaranteed to produce semantically correct C# code
- After the translation:
- the resulting methods are invoked as regular methods
- and this may result in normal method call errors
- The compiler repeats the following translation until further reductions are impossible
- the compiler applies each translation section in order
- each section is applied exhaustively
- once exhausted, a section is not later revisited in the same query
- Two notes:
- assignment to range variables is NOT allowed in a query expression, though this rule need not be strictly enforced in all C# implementations
- certain translations inject range variables with transparent identifiers denoted by
1. Select
and groupby
clauses with continuations
- from ... into x ...
- translates into
- from x in ( from ... ) ...
Example
from c in customers group c by c.Country into g select new { Country = g.Key }
becomes
from g in ( from c in customers group c by c.Country ) select new { Country = g.Key }
then becomes
customers.GroupBy(c => c.Country).Select(g => new { Country = g.Key })
2. Explicit range variable types
from
- from T x in e
- translates into
- from x in (e).Cast<T>()
join
- join T x in e on k1 equals k2
- translates into
- join x in ( e ).Cast<T>() on k1 equals k2
Example
from Customer c in customers where c.City == "London" select c
becomes
from c in customers.Cast<Customer>() where c.City == "London" select c
then becomes
customers.Cast<Customer>().Where(c => c.City == "London")
Note
The .Cast<T>() operates on each object in the collection (as opposed to casting the collection).
3. Degenerate query expressions
A degenerate query expression is one the trivially selects the elements from the source.
- from x in e select x
- translates into
- ( e ).Select(x => x)
Example
from c in customers select c
becomes
customers.Select(c => c)
Notes
- if a query expression includes only a degenerate query,
- then the translation appends a .Select()
- that said, if there are further translations
- a later phase of the translation
- will replace the degenerate query with just its source
- This happens because...
- it is important to ensure that the result of a query expression is not the source
- lest we reveal the type and identity of the source to the client of the query
- [why would that be problematic?]
4. From
, let
, where
, join
, and orderby
clauses
A query expression with a...
...second from
clause followed by a...
- This is the SelectMany. It isn't a query continuation.
- The
select
clause has access to the range variable from both the first and secondfrom
clauses.
... select
clause
- from x1 in e1 from x2 in e2 select v
- ( e1 ) . SelectMany ( x1 => e2, ( x1 , x2 ) => v )
- from c in customers from o in c.Orders select new { c.Name, o.OrderId, o.Total }
- customers.SelectMany(c => c.Orders, (c, o) => new { c.Name, o.OrderId, o.Total } )
something other than a select
clause
- from x1 in e1 from x2 in e2 ...
- from * in ( e1 ) . SelectMany( x1 => e2 , ( x1, x2 ) => new { x1, x2 } )
- from c in customers from o in c.Orders...
- from * in customers.SelectMany( c => c.Orders, ( c, o ) => new { c, o } ) ...
Recall that the * is the transparent identifier. It captures multiple range variables and later becomes an anonymous object or function. In the above case, it later becomes new { x1, x2 }
Note, in both the above examples, the range variables of both from
clauses stay in scope; that is, both are available in subsequent clauses.
let
clause
The variable defined within the let clause has access to the initial range variable and, along with it, is available through the rest of the query.
- from x in e let y = f ...
- from * in ( e ) . Select ( x => new { x, y = f } ) ...
- from o in orders let t = o.Details.Sum(d => d.UnitPrice * d.Quantity) ...
- from * in orders.Select(o => new { o, t = o.Details.Sum(d => d.UnitPrice * d.Quantity ) } ) ...
where
clause
- from x in e where f ...
- from x in ( e ).Where ( x => f )
- from o in orders where o.Id > 0
- from o in orders.Where(o => o.Id > 0)
join
clause without an into
followed by a
select
clause
- from x1 in e1 join x2 in e2 on k1 equals k2 select v
- ( e1 ) . Join ( e2, x1 => k1, x2 => k2, ( x1, x2 ) => v )
something other than a select
clause
In this case, the transparent identifier * holds the place of the anonymous new { x1, x2 }
- from x1 in e1 join x2 in e2 on k1 equals k2
- from * in ( e1 ) . Join ( e2, x1 => k1, x2 => k2, ( x1, x2 ) => new { x1, x2 } )
join
clause with an into
followed by a
The into
makes the join
into a group join.
select
clause
The output here is the initial range variable x1 and the group formed from the second range variable x2. In other words, x1 remains in scope but x2 doesn't because it's behind g.
- from x1 in e1 join x2 in e2 on k1 equals k2 into g select v
- ( e1 ) . GroupJoin ( e2, x1 => k1, x2 => k2, ( x1, g ) => v )
something other than a select
clause
- from x1 in e1 join x2 in e2 on k1 equals k2 into g ...
- from * in ( e1 ) . GroupJoin ( e2, x1 => k1, x2 => k2, ( x1, g ) => new { x1, g } )
orderby
clause
- from x in e orderby k1, k2, k3 ...
- ( e ) . OrderBy ( k1 ) . ThenBy ( k2 ) . ThenBy ( k3 ) ...
followed by descending
- ( e ) . OrderByDescending ( k1 ) . ThenByDescending ( k2 ) ...
5. Select
clauses
- from x in e select v
- ( e ) . Select ( x => v )
The =>
is a projection from each value of x
into v
. If v
is simply a repeat of x
, then the translation is just ( e )
.
6. Group
by
clauses
- from x in e group v by k
- ( e ) . GroupBy ( x => k , x => v )
The exception is when v
is the identifier x
, in which case the result is ( e ) . GroupBy ( x => k )
7. Transparent identifiers
- some translations *inject range variables with transparent identifiers
- the
*
denotes these - they are NOT a proper language feature
- rather, they exist only as an intermediate step during translation
- the
- further translation steps propagate the
*
into either- anonymous functions
- anonymous object initializers
- cases:
- when a
*
occurs as a parameter in an anonymous function,- then the members of the associated anonymous type,
- are automatically in scope in the anonymous function body
- when a
*
occurs as a member of a declarator in an anonymous object initializer- then it introduces a member with a transparent identifier
- when a
- As described above, the
*
are always introduced with anonymous types - the intent is to capture multiple range variables as members of a single object
- a c# implementation is allowed to use a different mechanism to accomplish the same intent.
Pattern
- Types can implement this pattern to support query expressions on those types.
- Types have flexibility in how they implement query expressions.
- implement as
- instance methods or
- extensions methods,
- because the invocation syntax is identical
- can request
- delegates or
- expression trees,
- because anonymous functions are convertible to both
- implement as
- The following is the recommended shape of a generic type
C<T>
that supports query expressions. - It's possible to implement this with a non-generic type.
- See more details in Specification-QueryPattern.
Terminology in Approximate Order of First Occurrence
- query expression
- any expression that starts with "from identifier"
- followed by any token except:
;
=
,
- Prefix those with @ if we want to use any of those in a string.
- expression
- a line of code
- that evaluates to a value
- clause
- a part of a statement
- that does not constitute a complete statement
- generator
- a special type of routine
- that controls the iteration behavior of a loop
- yields values one at a time
- all generators are iterators
- generators are similar to functions that return arrays
- a generator has parameters
- other code can call a generator
- a generator generates a series of values
- generators are different from functions that return arrays
- because generators yield values one at a time
- instead of returning all the values at once
- a generator looks like a function but behaves like an iterator
- https://en.wikipedia.org/wiki/Generator%28computerprogramming%29
- a special type of routine
- range variable
- create these in a
from
orlet
clause - stores each subsequent value that a generator yields
- create these in a
- ranges
- sequence
- read-only
- forward-only
- one item at a time
- can be lazily generated
- potentially infinite
- http://stackoverflow.com/questions/2627172/the-difference-between-lists-and-sequences
- token
- white space and comments are not tokens
- the following are tokens
- identifier
- keyword
- integer-literal
- real-literal
- character-literal
- string-literal
- operator-or-punctuator
- range variable
- sequence
- clauses and keywords
from
select
group
by
let
where
join
on
equals
into
orderby
ascending
descending
- splice
- contextual keywords vs simple names
- query expression translation
- Where
- Select
- SelectMany
- Join
- GroupJoin
- OrderBy
- OrderByDescending
- ThenBy
- ThenByDescending
- GroupBy
- Cast
- translation
- first into another query
- then into Methods
- range variables
- the variable immediately following the
from
- the variable immediately following the
- transparent identifier
- represented with
*
- exists only as an intermediate step in query translation
- later steps turn it into anonymous functions or anonymous object initializers
- tend to capture multiple range variables as members of a single object
- represented with
- explicit range variable type
- degenerate query expressions
- trivially selects the elements of the source
- [this prevents calling code from being able to modify the source]
- identifier
- member declarator