May 20, 2012

Women's work...

The most recent take from the Bureau of Labor Statistics says the participation rate of all US workers is 63.6 percent of the population - the lowest figure since Dolly Parton sang "9 to 5" in December 1981.

Descriptive and inferential statistical methods used in burns research.

Burns : journal of the International Society for Burn Injuries , Vol. 36, No. 3. , pp.

May 19, 2012

QOTD: total unimodularity

I started reading Matousek’s discrete geometry book. Specifically, chapter 12 on applications of high-dimensional polytopes. Mostly because he apparently draws a connection between graphs and the Brunn-Minkowski theory, and I’m interested to see exactly what that connection is and if it has any interesting implications.

I just finished the proof of the weak perfect graph conjecture (it’s been a while since I read a nontrivial pure math proof, so I haven’t actually *digested* it entirely). Now I’m on the exercises for that portion. So, here’s an interesting question involving the concept of total unimodularity, which is apparently one of those foundational concepts in an area called polyhedral combinatorics.

A matrix \(\mat{A}\) is called totally unimodular if every square submatrix of \(\mat{A}\) has determinant 0 or \(\pm 1.\) The question is to show that every nonsingular totally unimodular \(n \times n\) matrix maps the lattice \(\mathbb{Z}^n\) bijectively onto itself.

Update (solution)
It’s easy peasy. Clearly each entry of \(\mat{A}\) is in \(\{-1,0,1\}\), so \(\mat{A}\) maps \(\mathbb{Z}^n\) into itself. The fact that the mapping is bijective follows easily from Cramer’s rule, the fact that \(|\mathrm{det}(\mat{A})| = 1,\) and the fact that all the minors of \(\mat{A}\) are integral.

May 18, 2012

Friday math movie: History of Mathematics

This is an excellent program from the BBC.

I’m not sure why YouTube hasn’t taken it down due to copyright reasons, but until they do…

This is the first in a series.

Related posts:

  1. Friday math movie: The $8 billion iPod
  2. Friday math movie: Inspirations – animation of Escher art
  3. Friday math movie: Facebook for Math Nerds

A matter of perspective

Have you ever watched rugby matches on TV and noticed the advertisements on the field that looked 3-dimensional? I remember having the impression that the adverts were superimposed by the TV people. But when I told the wife, she plainly pointed out that when the players ran across the field, their bodies covered the ads [...]

May 17, 2012

Meetings/Conferences/Workshops/Seminars- whats in a name?

In June 11-14 will be a new workshop: Algorithmic Frontiers.




How many venues do we have for meetings?
  1. What the call themselves:
    Meetings (e.g., MATHFEST), Conferences (e.g., STOC), Workshops (Barriers), Seminars (e.g., Dagstuhl), Tutorials, Lunches. More?
    (six options)
  2. Criteria for getting a paper in: Refereed (e.g., STOC): lightly refereed (I think MATHFEST), unrefereed, talks-by-invite-only (Algorithmic Frontiers, Barriers)
    (four options)
  3. Participants: Open (most)or by-invite-only (Dagstuhl and Bertinoro) (two options)
  4. Size: This is more informal. However, CCC is small (100), STOC is larger (around 400), FCRC larger still, and Supercomputing is expecting 11,000. Medical conventions can get around 20,000.
    (five options, though could be more or less depending on your mood.)
  5. Variety of activities: A venue can have any combination of the following: contributed talks (unrefereed), refereed talks (typical STOC), invited talks, rump sessions,
    tutorials, workshops, short courses. (128 combinations, though in reality only about 5 options actually happen, so I'll take it as 5)
  6. Length: Number of days can be anything from 1 day to 2 months. However, I don't think all 60 options really exist. I've seen 1,2,3,4,5 days, 1 week, 2 weeks, 1 month, and 2 months. nine options.
  7. Expense: Either expensive or VERY expensive.
So that would be 6x4x2x5x5x9x2. That's a lot! Of course its far far less since, for example, an invite-only gathering can only have invited talks. I would guess its more like 5 types. And some are inconsistent (e.g., STOC sometimes has tutorials and sometimes doesn't).

Algorithmic Frontiers seems to be a 4-day open workshop that has invited talks only. I don't know how big it will be but I would guess between 100 and 200.

Do the gatherings that we go to work? As my Software engineering friend Jim Purtilo often says If you don't know your goals you are not going to achieve them. So, what are the goals? Ultimately to help both us as individuals and us as a community in our research. The hope is we learn things and get inspired to work on problems. And these can be done by the formal talks in the ballroom and informal talks in the hallways. Does this happen? Yes. Is it cost effective (not just money but time)? Debatable, as this and other blogs have debated. Here are my experiences. What are yours?
  1. DAGSTUHL SEMINARS. These are specialized one-week long meetings by
    invitation only. The talks need not be on the latest/greatest thing and hence are understandable. (I've been to DAGSTUHL- Complexity 3 times.)
  2. Bertinoro is similar to Dagstuhl. I've been to the RATLOCC 2011 and
    RATLOCC 2009 which are on Ramsey Theory and Logic. Here there were even more talks
    that were surveys or historical-perhaps because math moves slower than computer science.
    If the talks were on the INTERSECTION Of Ramsey Theory and Logic then it would be too specialized.
    (I've worked in both, but not quite together.)
    As is, its a nice mix.
    But the main point- I understood the talks.
  3. The Center for Intractability sponsors workshops which anyone can go to
    but the speakers are picked by them. I've been to one of the Barriers workshops and it was very good. The speakers have more time then at conferences,
    which is a plus. The Algorithmic Frontiers workshop seems to be in this model.
  4. The last few CCC's and STOC's that I've gone to I have enjoyed and gotten stuff out of, but not as much as the smaller venues.
    Partly because the talks are shorter.
    which is a plus. The Algorithmic Frontiers workshop seems to be in this model.
  5. MATHFEST is nice in that there is a VARIETY Of activities- workshops, seminar, tutorials, invited talks.
    Very large which is both good (that's why they can have all of these things and bad (I never met the same person twice).
  6. One of my colleagues, Jeff Hollingsworth, is organizing SC12, Supercomputing 2012.
    This will have 11,000 people at it. The program committee has over 100 people
    on it. This seems... large. They seem to have a variety of activities
    so it more like MATHFEST.
  7. Of course the talks are only part of the reason to go to these things. Even so, for me they are a big reason.

What venues to YOU get the most out of? Why or why not?

May 16, 2012

Gödel Prize

The ACM announced the 2012 Gödel Prize awardees, three seminal papers in algorithmic game theory. The prizes themselves will be presented at ICALP in Warwick in July.

Elias Koutsoupias and Christos H. Papadimitriou: Worst-case equilibria, Computer Science Review, 3(2): 65-69, 2009.

Tim Roughgarden and Éva Tardos: How Bad Is Selfish Routing?, Journal of the ACM, 49(2): 236-259, 2002.

Noam Nisan and Amir Ronen: Algorithmic Mechanism Design, Games and Economic Behavior 35: 166-196, 2001.

The first paper introduced the price of anarchy. The second applied price of anarchy to routing problems. The third applied algorithmic techniques and competitive analysis to auctions.

Congrats to all the winners.

May 14, 2012

Wall Street Complexity

There is much blame to go around for JPMorgan Chase's two billion dollar loss last week but part of that blame came back to us. In a New York Times web piece, How Moore’s Law Affects Wall St. Trading, Quentin Hardy argues
Faster, cheaper computing makes it possible to create more and better models for calculating cash movements, which can be turned into trading instruments. Areas like leasing, mortgages and project finance have exploded – as has the entire financial derivatives market — thanks to cheap computing...
Soon, it becomes nearly impossible to say what is going on where, and you get events like the 1998 blow-up at Long Term Capital Management, the creation and destruction of the subprime mortgage market in 2008 and perhaps even the “flash crash” in 2010. JPMorgan’s loss seems to be the latest in that series.
I've argued the dangers of reducing computational friction before. But here computational complexity comes in a different way. A derivative is just a function of current and future security prices. But a derivative complex enough can have a behavior that even its creator cannot understand. The Clay Math Institute offers a million dollars to settle "P v NP" but it cost Chase two billion.

May 13, 2012

Recursively counting numbers with fixed bit counts

I ran across this problem in a reddit side-bar job-ad, and was intrigued by the task (description paraphrased to decrease googleability):

Write a function

uint64_t bitsquares(uint64_t a, uint64_t b);

such that it return the number of integers in [a,b] that have a square number of bits set to 1. Your function should run in less than O(b-a).

I think I see how to do it in something like logarithmic time. Here’s how:

First off, we notice that we can list all the squares between 0 and 64: these are 0, 1, 9, 16, 25, 36, 49, and 64. The function I will propose will run through a binary tree of depth 64, shortcutting through branches whenever it can. In fact; changing implementation language completely, I wonder if I cannot even write it comprehensively in Haskell.

The key insight I had was that whenever you try to find the number of numbers with a bitcount matching some element of some list within the bounds of 0b0000…0000 and 0b000…01111…11, then it reduces to a simple binomial coefficient — n choose k gives the number of numbers with k bits set among the n last. Furthermore, we can reduce the total size of the problem by removing a matching prefix from the two numbers we test from.

Hence, we trace how many bits off the top agree between the two numbers. We count the set bits among these, subtract them from each representative in the list of squares, giving us the counts we need to hit in the remainder.

Write a’ for a with the agreeing prefix removed, and similarly for b’. Then the total count is the count for the reduced things from a’ to 0b000…01…111 plus the count for the reduced things from 0 to b’. The reduction count for b’ needs to be 1 larger than the one for a’ since in one case, we are working with the prefix before the varying bit increases, and in the other, we work with the prefix after the varying bit increases — the latter count is not really from 0 to b’, but this is a useful proxy for the count from 0b0000…010…000 to b’ with the additional high bit set.

In code, I managed to boil this down to:

import Data.Word
import Data.Bits
import Data.List (elemIndices)

bitsquare :: Word64 -> Word64 -> Word64bitsquare a b = bitcountin a b squares -- # integers in [a,b] with square # of 1
s

squares = [1,4,9,16,25,36,49,64] :: [Word64]
allones = [fromIntegral (2^k - 1) | k - [1..64]]

choose n 0 = 1
choose 0 k = 0
choose n k = (choose (n-1) (k-1)) * n `div` k

popCount :: Word64 -> Word64
popCount w = sum [1 | x - [0..63], testBit w x]

-- # integers in [a,b] with 1-counts in counts
bitcountin :: Word64 -> Word64 -> [Word64] -> Word64
bitcountin a b counts
| a > b = 0
| a == b = if popCount b `elem` counts then 1 else 0 | (a == 0) && (b `elem` allones) = sum [choose n k | n - [popCount b], k - c
ounts]
| otherwise = (bitcountin a' low [c-lobits | c - counts, c>= lobits]) +
(bitcountin hi b' [c-hibits | c - counts, c>= hibits])
where
agreements = [(testBit a n) == (testBit b n) | n - [0..63]]
agreeI = elemIndices False agreements
prefixIndex = last agreeI
prefixCount = sum [1 | x - [prefixIndex..63], testBit a x]
a' = a .&. (2^prefixIndex - 1)
b' = b .&. (2^prefixIndex - 1)
low = 2^prefixIndex - 1
hi = 0
lobits = prefixCount
hibits = prefixCount+1

May 11, 2012

So THATS why the 17x17 challenge was so hard. Or maybe not.

On November 30, 2009 I posted the famous 17x17 challenge:

(Paraphrase) Find a 4-coloring of the 17x17 grid that has
no monochromatic rectangles. For $289.00. It was solved in 2012
by Bernd Steinbach and Christian Posthoff (I posted about it
here
and will post their paper when they make it it is public, which should be soon.)
Even though it was solved, it seemed to be a hard problem.

On April 28, 2010 (before the problem was solved)
I wondered WHY it was so hard and posted the following question:

(Paraphrase) Consider the following problem:
Given (N,M,c,f) where f is a partial c-coloring of NxM,
can f be extended to a total c-coloring of NxM (without mono rectangles)?
Is this problem NP-complete?


Kevin Lawler emailed
me a sketch of a proof recently! So YES, it is NP-complete!
I cleaned it up, wrote it up, and put in a few other things, and the paper is
here.
(We will be posting to arXiv after we get comments from this blog.)

  1. Does this really show why the 17x17 challenge was hard?
    Not really since the 17x17 challenge is just one instance.
  2. Does this show that grid coloring problems are hard in general?
    Not really since the case we are really interesting in is where
    f is the empty function. While we do not believe this case is
    easy, we have not ruled this out.
  3. What can we show about the algorithmic complexity?
    The problem is FPT. For fixed c its in time poly(N,M)+O(cc4).
This is a problem for hardness-of-Ramseyian numbers in general. While it seems as if computing (say) Ramsey Numbers is hard, there is no real proof of this. Related problems have been studied by Marcus Shaefer here.
While this work is very interesting
it is NOT the same as showing that finding Ramsey Numbers is hard.I suspect that to prove such things we need a new framework for
lower bounds.


May 10, 2012

Mother's day freebies & more

Aurora from Supercharged Science  is again giving away a great free gift!

She has one of the top homeschool science curricula out there, and this is a free selection of activities and experimentsfrom her summer e-Camp program.


You're going to get actual sections of e-Camp, complete with all the explanations, step-by-step videos, and lots more.

And yet more... Currclick is giving away several Mother's Day freebies, one of which is my book Math Mammoth Place Value 2. So, here's your chance to get one of my books completely free.


Enjoy!

Maria

May 08, 2012

Paying to Publish

You proved a nice theorem, wrote up the paper and submitted it to a major computer science conference. Your paper was accepted! Congratulations. Now pay up.

An author of every paper accepted at a CS conferences is expected to present that paper at the conference. To do so requires at the least paying the registration fees, travel and lodging to go the the meeting. That can easily run one to three thousand dollars (or more) depending mostly on how far you need to travel.

You can use grant money for these expenses. Some conference offer support to those who need it, particularly students. CS departments will often help out if needed. Sometimes people pay out of their own pockets and, in any case, the funds come from limited pots that could have been used for other purposes.

In the "old days" this was less of a problem. There were only one or two conferences relevant to one's field and you were probably going to those conferences anyway. Now as the field has grown and it has been harder to get your papers published in the strongest conferences, you may find yourself traveling just to give the talk. Even many major conferences don't draw many attendees who don't have papers in the conference.

We haven't seen an outcry of these expenses, say compared to the outcry over the cost of digital library subscriptions. Perhaps we consider attending the conference a "reward" for getting published.

We could just eliminate the conferences and publish the proceedings and post videos of talks, made at the home institutions, saving the field huge amounts of money. Then we could actually choose to attend conferences to meet people in our field instead of just talking at them.

Note: Elchanan Mossel had a similar observation in a comment on a recent post.

May 07, 2012

Hanan Samet wins Kanellakis Prize!

(
  1. Symposium on Turing in Princeton in about a month
  2. UMCP is doing a job search for a lecturer
  3. New York Area Theory Day
) The Paris Kanellakis Theory and Practice Award for 2011 went to Hanan Samet from University of Maryland at College Park. See here for the details. He works on data structures THAT PEOPLE ACTUALLY USE! The Kanellakis price honors Theoretical accomplishments that have had significant and demonstrable effect on the practice of computing. This raises the question: What theory accomplishments are
good candidates for the prize (that have not already won it)?

IntMath Newsletter: Math secrets, TED-ED, puzzles

8 May 2012

In this Newsletter:

1. Your math secret
2. TED-ED – Flip your lesson
3. Image conversion project – thank you!
4. Twitter update
5. Math puzzle
6. Friday math movie – The $8 billion iPod
7. Final thought – Apply yourself

1. What’s your math secret?

Math secret

Do you have a secret about your math experiences? Several interesting ones have gone up already.

Share yours too! You don’t have to reveal your real name.

What’s your math secret?

2. TED-ED: Flip your Lessons

This one is mostly for teachers, but math students should also find something interesting here.

TED (the place where the world’s great thinkers get us to re-evaluate the way we look at the world) recently released TED-ED.

TED-ED is a collection of "Lessons worth sharing". There are lesson plans built around interesting TED talks, and you can either use those lessons, modify them to your own needs, or create a completely new lesson based on the talk.

“Flipping” your lesson involves getting students to learn some things before the actual in-class lesson. This allows students more time to process the concepts during class time, rather than hearing the concepts for the first time. Flipping gained a lot of momentum with the growth of resources like the Khan Academy.

This tour explains the TED-ED concept.

Here is the Math Category

There’s also a Math in Real Life series of TED talks.

3. Twitter update

There are now over 3000 people following IntMath on Twitter! The 3,000th follower was a design engineer.

There are many good people sharing educational messages and resources there. It’s a great place for getting to know people and for sharing interesting finds.

Why don’t you join us?

IntMath on Twitter

4. Convert images project – thank you!

A big thank you to all those who helped out in the recent IntMath image conversion project. There were a few thousand images converted and now it’s all done.

I’m in the process now of removing the old images and replacing them with the new ASCIIMathML-produced, MathJax output equivalents.

You an see some examples of converted images in the math on these pages:

Matrix Multiplication and Inverse

Related Rates (calculus)

5. Math puzzles

Several people wrote in to answer the last puzzle about numbers. The correct answers were 41 and 50. Great to see reasons given for the answers!

New Puzzle

Line ABCD is a diameter of a circle whose radius is r. Length AB = BC = CD. Semicircles are drawn on AB and BD to create the shaded figure as shown.

Circle puzzle

What is the perimeter of the shaded figure?

6. Friday math movie: The $8 billion iPod

8 billion dollar iPod

Rob Reid gets us to think about how silly the music and movie industries have been regarding copyright. He explains Copyright Math.

Friday math movie: The $8 billion iPod

7. Final thought – application

Lee Iacocca, colorful chairman of Chrysler Motors for many years, was named one of the top American CEOs of all time. The following quote could apply to what we do with our math knowledge:

Apply yourself. Get all the education you can, but then, by God, do something. Don’t just stand there, make it happen. [Lee Iacocca]

Until next time, enjoy whatever you learn.

Related posts:

  1. IntMath Newsletter – Overcoming fear of math tests, free calculus book
  2. IntMath Newsletter: Cultural math, visual stats, matrices and persistence
  3. IntMath Newsletter: Length of a curve, resources, 1000 posts

Coincidences

I’ll be visiting China tomorrow and my mum kindly handed me a stack of RMB (China’s currency) that was left-over previously. I took a quick count and it totals exactly 1729 Yuan. What are the odds?

May 04, 2012

Is it well known that we need to redefine well known?

A LONG time (so long ago I was a guest poster, not a co-blogger)
I posted
about how calling things that are on You-Tube rare is odd
since ITS ON YOU-TUBE! ANYONE in the world can look at it!
I now have a Contrasting thought: Can something be
well known if its not easily found on the web?

Last week I
posted
a proof that the intersection of a CFL and a REG lang
is CFL that did not use PDA's. I thought it was NOT new and indeed, the
comments politely gave me the proper reference.
So far, so good. But wait--- some of them called the proof Well Known.
Can a proof be well known if its not on the web?
The notion of a proof being well known has always been problematic
since one wonders what the scope is.

  1. Addition
    being commutative is well known but might not
    to my 5-year old niece.
    Someone emailed me that I should take this opp to teach her
    about noncommutative algebras.
  2. Binary search is well known but might not be known to my (then)
    8 year old great nephew.
    Some said I should use this opp to teach him logarithms.
  3. Induction is a well known technique but it still puzzles some undergraduates.i
  4. The poly vdw theorem
    is well known among mathematically inclined
    high school students in Maryland but
    is not even that well known among combinatorists.
The point really is well known to who?. But now with the web we can ask a more focused question: Can you find it easily? The proof I posted was not one that I found on the web. So here is what I want your thoughts on:
  1. Should we redefine our definition of
  2. If I were to make an article out of the three short notes on CFL's
    that I posted about, and submit to Math Archives,
    would that help the problem of what is easy to find
    or would it just clutter up Math Archives making things hard to find.
    (I would of course provide all references and make NO claim to
    originality.)
  3. Should I post to Wikipedia?
  4. Will the web ever replace
    asking someone who knows stuff?

May 03, 2012

Microsoft saves the Yahoo NY Researchers

I started working with David Pennock on prediction markets back when we both were at the NEC Research Institute in New Jersey a decade ago. After a major reorganization the dropped basic research from their mission, I went back to academics but David stayed in industry research first at Overture which soon was swallowed up by Yahoo. He ended up at Yahoo Research New York in a small but amazingly strong research lab including machine learning theorist John Langford and social scientist Duncan Watts. But with a new Yahoo CEO and Prabhakar Raghavan and Andrei Broder's departures for Google, it  became clear that the days of Yahoo research were numbered. 

Today Microsoft announced that they are hiring 13 researchers from the Yahoo lab including David, John and Duncan as they start a new Microsoft Research Lab in New York, initially led by Microsoft New England director Jennifer Chayes. Lots of nice coverage from the New York Times, All Things D,  blog posts from Jennifer and John, and a pictorial take from Chris Maase. Not the first time Microsoft has done something like this, Microsoft Research Silicon Valley got its start by hiring several researchers from the old Xerox PARC.

I'm happy the Yahoo researchers found a great home but I also mourn the loss of yet another company abandoning basic research in computer science.

May 01, 2012

Berkeley Wins the Simons

As reported today in the New York Times, the Simons Foundation has chosen U. C. Berkeley to host the new Simons Institute for the Theory of Computing, a $6,000,000/year center for studying core theoretical computer science and its applications to other disciplines. There will be 70 researchers (faculty, postdocs, grad students) at any given time affiliated with the Institute.

This will be a game changer for CS theory.

April 30, 2012

Is Moore's Law Good for CS?

Roy Friedman writes a blog post on The Expected End of Moore’s Law is Good News for Computer Science.
For a long time, people got used to being lazy. If computers become twice as fast every 1.5 to 2 years, there is no point in investing much efforts in writing efficient code. If something does not run fast enough, simply wait for the next generation of Intel x86 and everything will be resolved. In particular, CPUs became fast enough that traditional programming languages and efficient data structures and algorithms were being abandoned in favor of high level scripting languages whose most sophisticated data structure is an associative array. Suddenly, every Joe-hoe could become a programmer developing sophisticated Web applications with no effort  – no need for hard earned computer science degrees anymore.
Let's ignore the issue as to whether Moore's law is really coming to an end (Moore's law has had 5-10 years left in it since Gordon Moore developed his law). Friedman misses the bigger point, an end of Moore's law will do great harm to Computer Science.

Friendman's mistake is to assume that people always have the same problems to solve. Suppose computers stopped getting more powerful fifteen years ago. Machine Learning as we know now would not have been possible. Nor would we have the advances in graphics, robotics and virtually every other area of computer science. A good deal of CS systems research goes into making these new machines even more efficient, secure and reliable.

Even in theoretical computer science, the advances in computer technology make our work stand out. As we try to tackled larger problems asymptotic algorithmic techniques start dominating the running time. While we can now brute-force solve NP-complete problems on moderate input sizes, as computer performance improves, so does the thirst for solving even larger challenges and the P versus NP problem becomes a harder monster to slay.

Sure Joe-hoe can wait a year or two for his application to run fast without new coding ideas. While he waits Sam-ham takes advantage of new technologies making Joe-hoe's applications out of date.

Without the increased power of computers, we all become hackers to make our programs run better and that is not computer science.

April 26, 2012

I left Push Down Automata out of my class and learned some things!


This semester in the Ugrad Course titled Elementary Theory of Computation
(Syllabus: Reg Languages, CFLs, Computability Theory, P and NP) I decided to NOT
teach PDA's. I mentioned them and told the students they were equivalent
to CFG's but nothing more.

This lead to a NEW (to me at least) piece of mathematics!

Proving
X={a^nb^nc^n | n ∈ N}
is NOT a CFL is easy using the pumping theorem.
Consider
Y={w | number of a's, b's and c's in w is the same}
How do you prove Y is not regular?
Why don't you just intersect Y with a*b*c* to get X which
you know is not a CFL?
I hear you say.
AH- you need to know that CFL intersect REG is CFL.
If we had the PDA/CFG equivalence then this would be an
easy cross product construction. But now?
SO, we need a proof that CFL intersect REG is CFL
that ONLY uses CFL's. That is, DOES NOT use PDA's.
Here is a proof.
We do not believe it is new but have not been able to find a reference.
If you know a reference please comment.
(NOTE ADDED LATER- The commenters politely provided a reference
and I have put it into the paper.)

Another point of interest: How do you show that REG is a subset of CFL.

  1. Normally you would note that DFA's are just PDA's without a stack,
    hence every lang recognized by a DFA can be recognized by a PDA,
    and then use PDA/CFG equivalence. I could not use this.

  2. You could use the proof that any DFA language is a left-linear grammar
    (left linear only has productions of the form X-->aY)
    and this is nice since, in fact, they are equivalent.
    Here is a proof.
    I ended up not doing this but I will next year
    when I do the course.

  3. On the midterm I had the following question:

    Recall that if α is a regular expression then $(α) is the
    set of strings that α generates.
    Recall that if G is a CFG then L(G) is the set of strings generated by G.

    Prove the following by induction on the formation of a regular expression:
    For all regular expressions α there exists a Context Free Grammar G
    such that L(α)=L(G).
    You cannot use PDA's (Push Down Automata). (If you do not know what this is,
    do not worry.)

    I could only ask this BECAUSE they had not seen PDA's or Left Liner Grammars.
    Of the 40 students about 20 got it right.


  4. One of my students, Justin Kruskal, wondered how to go from a DFA directly to a CFG
    (recall that he had not seen left-linear grammars).
    It is interesting to see what untainted students come up with on their own.
    He came up with a proof
    which is
    here.
    It is a weaker result than the left-linear grammar equivalence, but its his!
Questions for YOU, the readers.

  1. What do you think of leaving out PDA's? (I may heed your advice
    next spring when I teach it again.)

  2. Is the proof I point to that CFG intersect REG is CFG, that only uses CFG's, new?
    If not please give a reference. A comment like Bill you idiot, this is a well known proof
    that I have neither a reference nor a website to point to.
    ,
    is not helpful. If you DO have a reference or website to point to you can call me whatever you like.

  3. Have you ever learned some new math be leaving something OUT of a course?

April 25, 2012

Classification of Finite Simple Semigroups and Moufang Loops

I had a question that I was going to ask on Math Overflow, but after some research I managed to find the answer.

Finite simple groups have a complete classification. I was wondering if there were any weakenings of the axioms of group that also allowed a complete classification of the simple objects. (Here, I mean no nontrivial quotients.) Surprisingly, there’s a classification for semigroups. In the theory of semigropus the term “Simple& is used for a weaker notion. Semigroups with no nontrivial quotients are known as “congruence-free”. The classification of finite congruence-free semigroups splits into two cases: for semigroups with a zero (an element 0 such that 0x = 0) there’s an explicit construction, while a congruence-free semigroup without a zero must be a simple group.

Another direction to generalize is weaken the form of associativity. The most-studied weakening is the Moufang property, which includes the octonions as a non-trivial example. Here, the complete classification is also known: a finite simple Moufang loop is either a group or a Paige loop, which is a non-associative construction closely related to the octonions, but defined over a finite field. It’s interesting that in this case, the one non-associative family resembles simple groups of Lie type, in that it’s parameterized by the finite fields. This classification relies non-trivially on the classification of simple groups, in that the explicit classification is used to rule out any other non-associative examples.

The paper Octonions, simple Moufang loops and triality by Gábor Nagy and Petr Vojtechovský, explains Moufang loops, and how the classification of non-associative Moufang Loops reduces to a question about finite simple groups.

See the first 4 million digits of Pi

This online visualization represents the first 4,000,000 decimals of Pi within a single image. Each unique digit of Pi corresponds to a specific color, and is rendered as a 1x1 pixel dot. You can also search for the first occurrences of any specific decimal combination.


April 23, 2012

CS in the Sunshine State

As many of you've heard the Dean of Engineering at the University of Florida is planning deep cuts to the Computer and Information Science and Engineering Department (CISE) and focusing its mission solely on teaching. Here is the relevant part of her plan.
Roughly half of the CISE faculty would be offered the opportunity to move to Electrical and Computer Engineering, Biomedical Engineering or Industrial and Systems Engineering.  These faculty would continue to support the graduate and research mission in the Computer Engineering degree track.  The choice of which faculty and which departments will be made based on fit with the research program and with the receiving departments. Staff positions in CISE which are currently supporting research and graduate programs would be eliminated.  The activities currently covered by TAs would be reassigned to faculty and the TA budget for CISE would be eliminated. The faculty remaining in CISE would then focus their efforts on teaching and advising students in the existing Computer Science BS and MS degree programs, offered through both the College of Engineering and the College of Liberal Arts and Sciences.  Their assignments would change to reflect this new educational mission with sole focus on delivering quality education for students in these degree programs.  Any faculty member who wishes to stay in CISE may do so, but with a revised assignment focused on teaching and advising.
In other words Florida is eliminating core Computer Science research, something that makes no sense for a state flagship research university in this day and age. There is a website and petition protesting the move which has caused the Dean to respond. Perhaps because of geographical closeness, this was a big topic of discussion when I was down at Georgia Tech last week. The current and founding Deans of the College of Computing at GT wrote strong letters to the Florida president. The CRA has also expressed their concern.

Certainly the president and dean deserve much of the blame to allow the targeting of computer science at Florida. But as Steven Salzberg of Forbes points out, the real villains lie in Tallahassee with a governor and legislature that has cut funding for the school by 30% over the past six years.

April 20, 2012

Math Mammoth Grade 1 aligned to the Common Core Standards

Math Mammoth Grade 1 Complete Curriculum is now aligned to the Common Core Standards. I will record here the main changes in its content as compared to the earlier edition / version (through April 2012).

You can tell the new edition and the earlier ones apart by their cover image: the edition aligned to the Common Core Standards has these new cover images.


What used to be chapter 4 (Place Value) switched places with Chapter 3 (Addition and Subtraction Facts within 0-18). 

The topics of rounding, even & odd numbers, and parenthesis were taken off as they are  not in the Common Core standards for grade 1.

There are many new lessons about adding and subtracting within 20, and adding and subtracting two-digit numbers without regrouping, explaining basic  mental math principles.

The lesson "Exploring Measuring" is split into two lessons (Measuring Length and Exploring Measuring). Both lessons are expanded from what they were before.

Geometry section was expanded and changed. It now has more material about shapes.

Math Mammoth Grade 2 aligned to the Common Core Standards

Math Mammoth Grade 2 Complete Curriculum is now aligned to the Common Core Standards. I will record here the main changes in its content as compared to the earlier edition / version (through April 2012).

You can tell the new edition and the earlier ones apart by their cover image: the edition aligned to the Common Core Standards has these new cover images.

The topics of rounding and finding 1/4 of a number were taken off as they are not in Common Core Standards for grade 2.

The topics of regrouping in addition and regrouping in subtraction are now split so that 2-A contains regrouping in addition, and 2-B contains regrouping in subtraction.

There is now more explicit (more scaffolded) instruction about adding four single-digit numbers, and then adding four two-digit numbers in columns (in 2-A).

Regrouping in subtraction with three-digit numbers is also scaffolded more (the progression of the concept presentation is by smaller steps) (in 2-B). Regrouping with zero tens is still left for third grade (as in 506 − 358).

There are more word problems in general. As before the change, the curriculum still has lots of instruction about mental math in addition and subtraction.

Geometry chapter concentrates on naming basic shapes such as triangles, quadrilaterals, pentagons, and hexagons, putting shapes together to form new ones, dividing shapes into new ones, and geometric patterns. The topics of right angle and parallel lines were taken off as they are not in Common Core Standards for grade 2.

In measuring, the topic of volume is moved to third grade in accordance to Common Core Standards. The topics in 2nd grade include inches, half-inches, centimeters, feet, miles, meters, kilometers, and weight in pounds and in kilograms.  Compared to the earlier edition, I have added line plots, exercises about measuring objects both in inches and in centimeters, estimating lengths, and finding how much longer one thing is than another.

The topics of clock, place value with 3-digit numbers, graphs, and introduction to multiplication didn't have any significant conceptual changes (some cosmetic ones).

Fred and Frank running times puzzle

Here's a puzzle for you to solve.

Fred and Frank are two fitness fanatics on a run from A to B. Fred runs half the way and walks the other half. Frank runs for half the time and walks for the other half. They both run and walk at the same speeds. Who finishes first?


Puzzle from Fawn Nguyen's puzzles and brainteasers, set 13

Here's one possible way to solve it. Choose some easy numbers for the distance they travel, for the speeds, and for the time in case of Frank.Then just calculate!

Let's say they run 10 miles per hour and walk 5 miles per hour. Let's also say the distance from A to B is 20 miles.

Fred runs half the way. So, Fred runs 10 miles = 1 h, and he walks 10 miles = 2 h. So he takes a total of 3 hours.

Frank runs half the time and walks half the time. Now, we don't know the time, so let it be t. We can write an equation about the total distance:


(distance running) + (distance walking) = 20 miles

(1/2t) * 10 + (1/2t) * 5 = 20

5t + 2.5t = 20

7.5 t = 20

t = 2 2/3 hours

So, Frank is quicker.



April 19, 2012

How important is the PROOF of Cooks theorem for my ugrad class?

I am teaching Automata theory this semester. The topics are basically (1) Regular Languages, (2) Context Free Languages, (3) Decidable, undecidable, and c.e. sets, (4) P, NP, NP-completeness.

Other things can be added like Primitive Recursive Functions, the arithmetic hierarchy, the polynomial hierarchy.

I am considering NOT proving Cook's Theorem. State it YES, say how important it is YES, but prove it NO. I am not sure if I want to do this. This post is an attempt to get some intelligent comments on this. My mind is not made up yet so this is I raise the question TRULY non-rhetorically. Some thoughts:
  1. The proof is coding a Turing Machine computation into a formula. It is interesting that you can do this. The details of it are not so interesting.
  2. The very good students (I would guess 5 out of my 40) will get it. The bad students never get it. I am oversimplifying--- and if this was a reason to not teach a topic I may not have much of a course left.
  3. The time spend IN CLASS on this is not so much- one lecture. The time spend in OFFICE HOURS on this re-explaining it is a lot. And its not clear that it helps.
  4. Showing them NPC reductions is more important and more interesting. This is what I would put in instead. I usually only get to do a few.
  5. Cook's Theorem is SO fundamental that the students should SEE a proof of it even if they don't understand it. but I am not a fan of they should see it even if they don't understand it.
  6. Cook's Theorem, like many things, you don't get the first time you see it, but you do the second, helped by having seen it once.
  7. Even if we now they mostly won't get it, we want them to know that we WANT them to get it.
  8. Students should know that the proof that SAT is NP-complete goes all the way back to the machine model (very few other NP-completeness proofs do). Is it good enough to TELL them this. I would DEFINE Turing Machines, SAY that you CAN code a computation into a formula, but not actually do it?
  9. I like to teach things that I can ask problems about. Cook's theorem IS okay for this- I usually show how to make a formula out of some TM instructions and leave as HW making a formula out of other TM instructions. Even so, not that much can be asked about it. The students have a hard time with this (Much more so than other parts of the course) so I wonder if its too much sugar for a cent.
  10. I teach this course a lot so I want to mix things up a bit. I realize this is not really a good reason for the students.
  11. An argument for: See how it goes! The decision I make this semester need not be what I always do.

April 18, 2012

Some Announcements

STOC early registration deadline is Thursday. This year STOC will have both tutorials and workshops and its second poster session. Hope to see you all there.

The Simons Foundation is offering Ph.D. Fellowships in Theoretical Computer Science. Deadline May 1.

Finally a big congratulations to Josh Grochow who defended his thesis yesterday at the University of Chicago. Josh was co-advised by Ketan Mulmuley and myself and he'll be joining Toronto as a postdoc next year. Josh is my seventh and last Ph.D. at Chicago.

April 17, 2012

Alan Turing Centenary Videos on Mathtube

The first half of our Alan Turing Centenary lecture series is over, and we've got all three of our talks up on mathtube.org.  You can skip the first one, it's pretty boring, but Mike Williams on early computers and John Ferris on Turing and WWII codebreaking are well worth your time!

April 16, 2012

ACM/IEEE Curriculum 2013

[STOC 2012 Early Registration Deadline Thursday]

On a roughly ten-year cycle, the ACM and IEEE Computer Society get together to create a list of core topics that every getting an undergraduate CS degree should know. The joint task force has a strawman draft proposal and would like your comments for a final report hopefully next year.

This version breaks the requirements into Core-Tier 1 and Tier 2 with the more critical material in Tier 1 and also suggests some elective topics.

In Algorithms and Complexity there is a combined 21 lecture hours of material for Tier 1 and Tier 2 that should comfortably fit into a single quarter long course though that seems aggressive given the list of material. That is built on 41 hours of "Discrete Structures" that includes basic combinatorics and proof techniques. It's a testament to the fundamental importance of theory that it holds more core hours than most other areas.

While I'm always wary of outside committees setting courses and topics, these reports do give guidance in what the "community" believes are important topics for all well-trained computer scientists to know. They can heavily influence curriculum in theoretical computer science especially at the too many CS departments that don't a significant theory group. So take a look at the draft and give your comments to the task force.

April 14, 2012

Are there perturbations that preserve incoherence and give nicely conditioned “submatrices”?

Let \(\mat{P}_{\mathcal{U}}\) denote the projection unto a \(k\)-dimensional subspace of \(\C^{n}.\) We say \(\mathcal{U}\) is \(\mu\)-coherent if \((\mat{P}_{\mathcal{U}})_{ii} \leq \mu \frac{k}{n} \) for all \(i = 1, \ldots, n.\)

Let \(\mat{A} \in \C^{n \times n} \) be a SPSD matrix whose top \(k\)-dimensional invariant subspace is \(\mu\)-coherent. Given \(\delta > 0\) and \(\mat{S} \in \C^{n \times \ell}\), where \(k < \ell \ll n,\)  is there a matrix \(\mat{E}_{\delta,\mat{S}} \in \C^{n \times n} \) such that

  1. \(\hat{\mat{A}} := \mat{A} + \mat{E}_{\delta,\mat{S}}\) is SPSD,
  2. The top \(k\)-dimensional invariant subspace of \(\hat{\mat{A}}\) is also \(\mu\)-coherent,
  3. \(\|\hat{\mat{A}} – \mat{A}\|_2 < \delta \), and
  4. \(\mat{S}^t \hat{\mat{A}} \mat{S} \succeq \delta \cdot \mat{I}\) ?

It’d be fine if instead of (2) holding, the coherence of the top \(k\)-dimensional invariant subspace of \(\hat{\mat{A}}\) is slightly larger than that of \(\mat{A}\).

April 13, 2012

Geometry problems (challenges)

Someone sent me this link

http://fivetriangles.blogspot.com/

It contains challenging geometry problems for middle and high school level. Unfortunately I didn't see solutions!

This blog for teachers contains an unordered series of problems for middle and junior high school students. The problems are not intended to be ``clever'', but rather require only a knowledge of already-learned basic mathematics principles, applied in a more sophisticated manner than for the problems found in a typical textbook.

April 12, 2012

Unique Golf Winners

In the Masters Golf Tournament held last weekend there were 55 players who had a final score between 10 under par and 9 above. By the pigeonhole principle one would expect many players to have the same score and in particular multiple players to have the best score. There was a tie for first that required a playoff.

But this is the exception, there were only four playoffs in the last twenty years of the Masters. In most years the Masters tournament has a unique player with the lowest score. Why?

I used Golf Tournaments to motivate the Mulmuley-Vazirani-Vazirani isolation lemma. If you have a collection of sets over {1,...,n} and choose random weights w1,...,wn from {1,...,r} let the weight of a set be the sum of the weights of its elements. The MVV lemma states there will be a unique maximum weighted set with probability at least 1-n/r.

It's a cool lemma with a short proof: you argue that with high probability each element i is either in all or none of the max weighted sets.

So how do I tie the lemma to golf tournaments? Actually I don't see a direct connection. But the spirit is the same.

April 10, 2012

I'll be on that list until the day I die. Or later.

How hard is it to get OFF of lists?
  1. Univ of MD at College Park (UMCP) Professor Carl Smith was an editor for JCSS before his death in 2004. I put together a memorial issue of JCSS in his honor that appeared in 2008. SO HOW COME HE IS STILL GETTING ISSUES OF JCSS AT UMCP IN 2012? Normally I would email JCSS about this, but whenever I do that they stop it for a while and then it starts again. So I am not going to bother.
  2. I recently got the following email:
    
    William Gasarch
    SIGACT News
    
    Hi William,
    
    The winners of Russia's prestigious Debut Prize are
    presently in the U.S.  The newest of the winning books
    is Irina Bogatryreva's Off The Beaten Track which
    contains her story and two others about hitchhiking in
    Russia.  Irina is also available for interviews as she
    speaks English fluently!
    
    Please let me know if you would like to receive a copy
    of this new and interesting novel.
    
    The Debut Prize winners will be visiting select cities
    (Washington, Boston and New York) during their stay and
    will also return in June for BEA.
    
    I look forward to your thoughts.
    
    Best,
    Shirley
    
Why did I get that? As SIGACT NEWS book review editor I am on lists of book review editor. Technology is sophisticated enough generate lists of book review editors. I suspect it is not cost effective to figure out which editor is appropriate for which types of books since email is free. I have sometimes gotten actual books in the mail that are not math or CS--- that seems like more of a waste of the companies money (though the Biography of Ted Kennedy that I got was a good read.)

Is this a problem? Overall yes since it costs society something (money? time?) to have people (even dead people) on lists where they shouldn't be. But there does not seem to be an incentive on anyone who could fix it to fix it.

April 07, 2012

Generalised entropy

Introduction
The entropy of a probability distribution can be seen as a measure of its uncertainty or a measure of the diversity of samples taken from it. Over the years I've talked lots about how probability theory gives rise to a monad. This suggests the possibility that maybe the notion of entropy can be generalised to monads other than probability. So here goes...

> {-# LANGUAGE MultiParamTypeClasses, FlexibleInstances, GeneralizedNewtypeDeriving #-}
> {-# LANGUAGE FunctionalDependencies, TypeSynonymInstances #-}

> import Control.Monad
> import Control.Monad.Writer hiding (lift)


Shannon entropy
I've talked in the past about how there is some trickiness with defining the probability monad in Haskell because a good implementation requires use of the Eq typeclass, and hence restricted monads. Restricted monads are possible through a bunch of methods, but this time I don't want them.

It's common to represent probability distributions on finite sets as lists of pairs where each pair (p, x) means x has a probability p. But I'm going to allow lists without the restriction that each x appears once and make my code work with these generalised distributions. When I compute the entropy, say, it will only be the usual entropy in the case that each x in the list is unique.

So here's our type and some instances for it:

> data P a = P [(a, Float)] deriving Show

> instance Functor P where
> fmap f (P xs) = P [(f a, p) | (a, p) <- xs]

> instance Monad P where
> return x = P [(x, 1)]
> P xss >>= f = P [(y, p*q) | (pxs, p) <- xss, let P ys = f pxs, (y, q) <- ys]

We can easily compute the expected value of a distribution, and its entropy, like this:

> expectation0 (P xs) = sum [x*p | (x, p) <- xs]
> entropy0 (P xs) = -sum [if p==0 then 0 else p*log p/log 2.0 | (_, p) <- xs]

An important property of entropy is known as the grouping property which can be illustrated through an example tree like this:


The entropy for the probability distribution of the final leaves is the sum of two components: (1) the entropy of the branch at the root of the tree and (2) the expected entropy of the subtrees. Here's some corresponding code. First simple bernoulli trials:

> bernoulli p a b = P [(a, p), (b, 1-p)]

Now the branch at the root of the tree:

> root = bernoulli 0.3 False True

We can compute the entropy for the distrbution on the leaves:

> test1 = entropy0 $ do
> x <- root
> if x
> then bernoulli 0.2 3 4
> else bernoulli 0.4 5 6

Or the sum of the root entropy and the expected subtree entropy:

> test2 = entropy0 root + (expectation0 $ do
> x <- root
> if x
> then return $ entropy0 (bernoulli 0.2 3 4)
> else return $ entropy0 (bernoulli 0.4 5 6))

You can confirm for yourself that test1 == test2.

We can rewrite that a little. We're drawing True or False from root only to decide which distribution to use at the next stage. But we may as will pick the distribution itself at random. So define:

> dist = bernoulli 0.3 (bernoulli 0.4 5 6) (bernoulli 0.2 3 4)

And now we expect the equality of test3 and test4:

> test3 = entropy0 $ do
> x <- dist
> x

> test4 = entropy0 dist + (expectation0 $ do
> x <- dist
> return $ entropy0 x)

There's a more elegant way of writing this. Define:

> left0 dist = entropy0 (join dist)
> right0 dist = entropy0 dist+expectation0 (fmap entropy0 dist)

Now we expect left0 dist and right0 dist to always be equal. We've almost generalised to something that makes sense in the context of monads other than probability.

The algebra of a monad
Here are a couple of important properties of expectation0:

1. expectation0 (return d) = d
2. expectation0 (join d) = expectation0 (fmap expectation d)

In English: the expectation of certainty is just the certain value, and the expectation of an expectation is just the expectation. But these rules are precisely the conditions that define an -algebra, where is a monad.

So let's define a type class:

> class Algebra m a | m -> a where
> expectation :: m a -> a

We'll assume that when m is a monad, any instance satisfies the two laws above. Here's the instance for probability:

> instance Algebra P Float where
> expectation (P xs) = sum [x*p | (x, p) <- xs]

In keeping with the notion that entropy measure diversity let's also define:

> class Diverse m r | m -> r where
> entropy :: m x -> r

with the instance:

> instance Diverse P Float where
> entropy (P xs) = -sum [if p==0 then 0 else p*log p/log 2.0 | (_, p) <- xs]

It's not clear what laws we need but for now we'll assume a generalised entropy satisfies left dist == right dist :

> left dist = entropy (join dist)
> right dist = entropy dist+expectation (fmap entropy dist)

We'll call that the generalised grouping law.

Binary trees
It's not hard to find other structures that satisfy these laws if we cheat and use alternative structures to represent probabilities. For example We can make Tree an instance by assuming Fork represents a 50/50 chance of going one way or another:

> data Tree a = Leaf a | Fork (Tree a) (Tree a) deriving Show

> instance Functor Tree where
> fmap f (Leaf a) = Leaf (f a)
> fmap f (Fork l r) = Fork (fmap f l) (fmap f r)

> instance Monad Tree where
> return x = Leaf x
> Leaf a >>= f = f a
> Fork l r >>= f = Fork (l >>= f) (r >>= f)

> instance Algebra Tree Float where
> expectation (Leaf a) = a
> expectation (Fork l r) = 0.5*expectation l+0.5*expectation r

> instance Diverse Tree Float where
> entropy (Leaf a) = 0
> entropy (Fork l r) = 1+0.5*entropy l+0.5*entropy r

Lists
We could make non-empty lists into an instance by assuming a uniform distribution on the list. But another way to measure the diversity is simply to count the elements. We subtract one so that [x] corresponds to diversity zero. This subtraction gives us a non-trivial instance:

> newtype L a = L [a] deriving (Show, Monad, Functor)

> instance Algebra L Int where
> expectation (L xs) = sum xs

> instance Diverse L Int where
> entropy (L xs) = length xs-1

Tsallis entropy
There are measures of diversity for probability distributions that are distinct from Shannon entropy. An example is Tsallis entropy. At this point I'd like a family of types parametrised by reals but Haskell doesn't support dependent types. So I'll just fix a real number q and we can define:

> q = 2.5

> data T a = T [(a, Float)] deriving Show

> instance Functor T where
> fmap f (T xs) = T [(f a, p) | (a, p) <- xs]

> instance Monad T where
> return x = T [(x, 1)]
> T xss >>= f = T [(y, p*q) | (pxs, p) <- xss, let T ys = f pxs, (y, q) <- ys]

> instance Algebra T Float where
> expectation (T xs) = sum [x*p**q | (x, p) <- xs]

> instance Diverse T Float where
> entropy (T xs) = (1-sum [p**q | (_, p) <- xs])/(q-1)

And again we find our generalised grouping rule for entropy holds.

Operads
This is all derived from Tom Leinster's post last year at the n-category cafe. As I talked about here there's a close relationship between monads and operads. Operads area a bit like container monads where the containers don't contain anything, but just have holes where contents could be placed. This makes operads a better place to work because you don't have the awkward issue I started with: having to disallow lists of value/probability pairs where the same value can appear more than once. Nonetheless, in (unrestricted) Haskell monads you don't have Eq available so you can't actually have definitions of return or >>= that can notice the equality of two elements. If such definitions were possible, the grouping law would no longer work as stated above.

Crossed homomorphisms
The generalised grouping law even makes sense for very different monads. For the Reader monad the law gives the definition of a crossed homomorphism. It's pretty weird seeing a notion from group cohomology emerge like this and I recommend skipping to the final section unless you care about this sort of thing. But if you do, this is related to research I did a long time ago. This is to test that the Schwarzian derivative really does give rise to a crossed homomorphism.

Firstly let me set up some automatic differentiation code:

> data D a = D { re::a, im::a } deriving (Show, Ord, Eq)

> instance Num a => Num (D a) where
> fromInteger n = D (fromInteger n) 0
> D a a'+D b b' = D (a+b) (a'+b')
> D a a'*D b b' = D (a*b) (a*b'+a'*b)
> D a a'-D b b' = D (a-b) (a'-b')

> instance Fractional a => Fractional (D a) where
> fromRational n = D (fromRational n) 0
> D a a'/D b b' = let q = 1/b in D (a*q) ((-a*b'+a'*b)*q*q)

> lift x = D x 0

> d f x = im (f (D x 1))

> raised f = re . f . lift
> raised2 = raised . raised
> raised3 = raised2 . raised

The Cn are the n-times (automatically) differentiable functions. Unfortunately the Endo defined in Data.Monoid acts the wrong way round from what I want so I need a Dual:

> type C1 = Dual (Endo (D Double))
> type C3 = Dual (Endo (D (D (D Double))))
> type C4 = Dual (Endo (D (D (D (D Double)))))

> instance Eq (Endo (D Double))
> instance Ord (Endo (D Double))

A silly Show instance that simply evaluates a function at a number I chose randomly: 1.234.

> instance Show (Endo (D Double)) where
> show (Endo f) = show (f 1.234)

> instance Num C1 where
> fromInteger n = Dual (Endo (\x -> fromInteger n))
> Dual (Endo f)+Dual (Endo g) = Dual (Endo (\x -> f x + g x))
> Dual (Endo f)-Dual (Endo g) = Dual (Endo (\x -> f x - g x))
> Dual (Endo f)*Dual (Endo g) = Dual (Endo (\x -> f x * g x))

> instance Fractional C1 where
> fromRational n = Dual (Endo (\x -> fromRational n))
> Dual (Endo f)/Dual (Endo g) = Dual (Endo (\x -> f x / g x))

> newtype Q a = Q (Writer C4 a) deriving (Monad, Functor)

We can give Q a a geometrical interpretation. The underlying type is a pair (a, C4). If we think of elements of C4 as charts charts on a piece of Riemann surface then for any , an element of (a, C4) represents a local piece of a section of the th tensor power of the canonical bundle. Ie. we can think of it as representing . I'll concentrate on the case which gives quadratic differentials. We can think of an element of ((a, C4), C4) as forms where we're composing two charts. We can collapse down to an ordinary chart by using the chain rule. Here's the code:

> instance Algebra Q C1 where
> expectation (Q ma) = let (Dual (Endo a), Dual (Endo f)) = runWriter ma
> in Dual (Endo (\x -> a (raised3 f x)*(raised2 (d f) x)^2))

Now we can define the Schwarzian derivative:

> schwarzian f x = let f0 = raised3 f x
> f1 = raised2 (d f) x
> f2 = raised (d $ d f) x
> f3 = (d $ d $ d f) x
> in f3/f1-1.5*(f2/f1)^2

And somwehat bizarrely, we now have a generalised entropy:

> instance Diverse Q C1 where
> entropy (Q ma) = let (_, Dual (Endo f)) = runWriter ma
> in Dual (Endo (\x -> schwarzian f x))

This is the construction that gives rise to the Virasoro algebra which plays such an important role in String Theory.

Some tests
And here's a bunch of tests. I'd have used QuickCheck but it won't install for me today...

> test :: (Algebra m t, Diverse m t, Num t, Functor m, Monad m) => m (m x) -> IO ()
> test x = do
> print (left x, right x)

> main = do
> test $ L [L [1, 2, 3], L [2, 3, 4], L [1], L [5], L [2, 7::Int]]
> test $ P [(P [(0, 0.5), (1, 0.5)], 0.5), (P [(2, 0.5), (3::Int, 0.5)], 0.5::Float)]
> test $ T [(T [(0, 0.5), (1, 0.5)], 0.5), (T [(2, 0.5), (3::Int, 0.5)], 0.5::Float)]
> test $ Leaf (Leaf 1 `Fork` Leaf 2) `Fork` Leaf (Leaf 3 `Fork` (Leaf 4 `Fork` Leaf 5))
> test $ (Q (writer
> (Q (writer (Dual (Endo (\x -> x)),
> Dual (Endo (\x -> x^2+1)))),
> Dual (Endo (\x -> (2+x)/(3+x*x))))) :: Q (Q C3))

April 05, 2012

Guest post on Tom Cover by Yoav Freund

(Details on registration and student travel awards for the 2012 IEEE Conference on Computational Complexity in Porto available at here.

New York Area Theory Day information here.)



Tom Cover passed away recently. Today we have a guest post about him from Yoav Freund which was written with help from Gabor Lugosi, Nicolo Cesa-Bianchi, Avrim Blum, Rob Schapire, Sanjoy Dasgupta and Manfred Warmuth.

Tom Cover was a leading light, a mentor and an inspiration. I was shocked to hear of his passing away. It was only a few weeks ago that I saw him in the recent ITA conference in San Diego. As usual, engrossed in conversation with a young colleague, gesticulating with his long arms to punctuate his words.

I was first introduced to Cover's work when, as a graduate student in UCSC, I read the classical book of Cover and Thomas on Information theory. Lured by the psychedelic cover I was enchanted by the lucid examples (betting in the horse races) and the elegant math. Like me, many computer scientists were introduced to the deep connections between probability, information, coding and prediction. It is a standard reference for information theory in computer science.

Beyond the book, Cover wrote wrote many of the foundational papers in information theory and its applications. A very partial list of his contribution include his work with Peter Hart on the convergence of the nearest neighbor classifier, His work on the capacity of linear classifiers preceded Vapnik and Chevonenkis. His work on finite memory algorithm for deciding whether the bias of a coin is rational or irrational is a beautiful mind-bender. His papers on gambling and portfolio management started much of the work on online learning.

Speaking of online learning, Cover took part in the first workshop on online learning that was held in UC Santa Cruz at 1992. Cover, with his typical sense of humor, suggested we call the workshop something for nothing. Later we had a bon-fire by the beach, I will always remember Cover as he was that day, a brilliant scientist, a wonderful communicator, a pursuer of beauty and of fun.

Annotated Bibliography

There's also his work on "Nearest neighbor pattern classification" (mid-60s, with Hart).

Yes, that's a very important one. He had two other early influential papers on nearest neighbor classification:

Thomas M. Cover. Rates of Convergence for Nearest Neighbor Procedures. Proceedings of The Hawaii International Conference on System Sciences, Honolulu, Hawaii, January 1968. Thomas M. Cover. Estimation by the Nearest Neighbor Rule. IEEE Transactions on Information Theory, IT-14(1):50--55, January 1968.

Even before Vapnik-Chervonenkis, he calculated the VC shatter coefficient for half spaces:

Thomas M. Cover. Capacity Problems for Linear Machines. Chapter in the book Pattern Recognition, Thompson Book Co., 1968. ed. by L. Kanal.

Then, of course, he was among the pioneers of online learning:



Thomas M. Cover. Universal Gambling Schemes and the Complexity Measures of Kolmogorov and Chaitin. Stanford University Dept. of Statistics Technical Report No. 12, October 1974.

Robert M. Bell and Thomas M. Cover. Competitive Optimality of Logarithmic Investment. Mathematics of Operations Research, 5(2):161--166, May 1980.

Thomas M. Cover. Log Optimal Portfolios. Chapter in Gambling Research: Gambling and Risk Taking, Seventh International Conference, Vol 4: Quantitative Analysis and Gambling, ed. by W.E. Eadington, 1987, Reno, Nevada.

Paul H. Algoet and Thomas M. Cover. Asymptotic Optimality and Asymptotic Equipartition Properties of Log-Optimum Investment. The Annals of Probability,, 16(2): 876-898, 1988.

Robert Bell and Thomas M. Cover. Game-Theoretic Optimal Portfolios. Management Science, 34(6): 724-733, June 1988.

He was among the first to use Blackwell's approachability for online learning:

Thomas M. Cover and David H. Gluss. Empirical Bayes Stock Market Portfolios. Advances in Applied Mathematics, (7):170-181, 1986 This one was a big seminal paper for structural risk minimization: Andrew R. Barron and Thomas M. Cover. Minimum Complexity Density Estimation. IEEE Transactions on Information Theory, 37(4): 1034-1054, July 1991.

Avrim Blum says: I don't know if the following had a big influence on COLT but I remember Ron Rivest describing this result which totally blew my mind: COVER, THOMAS M (1973). On determining the irrationality of the mean of a random variable. Ann. Math. Statist. 1862-871. COVER & HIRSCHLER (1975). A finite memory test of the irrationality of the parameter of a coin. Annals of Statistics, 939-946

You are observing a sequence of flips of a coin of unknown bias p, and after each flip must predict if p is rational or irrational. Shows that with prob 1 you can do this making only a finite number of mistakes....

April 03, 2012

We Think Like Our Fields

Have lunch with economists and they'll talk about the decision making processes and equilibriums of everything from politics to sports. Computer scientists worry about the misuse of information. Systems people create policies that look like computer programs. Mathematicians internally create models of society and derive consequences from it. I find myself treating the world as one large computational process. Isn't it?

It's not just academics, I've seen the same kind of thinking from lawyers, doctors and business owners. In one sense this isn't a bad thing. It's hard to reason about the world and using our tools and talents to makes sense of it all helps us cope with society. We all have our own religion whether it be spiritual or scientific. 

The problem comes when we just hang with our own, in our social networks both in person and online. We start believing that everyone else thinks the same way we do. Then we have difficulty working with people outside our community and that just isolates us even more. 

April 02, 2012

Measuring videos

I've made a page filled with measuring-related videos, for about 4th grade level but you can use them with 3rd and 5th too.

Topics include measuring to the 1/8th of an inch, in millimeters & centimeters, conversions between inches, feet, and yards, metric units of weight, of length, and of volume, customary units of weight, temperature, and time.

April 01, 2012

Trick question or Stupid question?

April fool days Blogs or Columns may

present an open problem as closed,

present a closed problems as open,

make us wonder if it's a joke or not, (Lance STILL isn't telling!), or

make us question some well held beliefs.

This year I will take a different route- similar to what Lipton did here, which is NOT try to fool you, but present a post ABOUT tricky or foolish or odd things.

When is a trick question a stupid question? I give some examples and my opinions. But I want your opinions- are these questions (with the answers I give) TRICK QUESTIONS or STUPID QUESTIONS?
  1. How many states are in the United States? Answers and Commentary
  2. What is the least common Birthday in America? Answers and Commentary
  3. What is the degree of (x-a)(x-b)(x-c)... (x-z)? Answers and Commentary
  4. What US state has the eastern most point in America? Answers and Commentary
  5. What is the least common first names for a U.S. President? (The answer is a tie.) Answers and Commentary
  6. Which two numbers come at the end of this sequence?
    2,4,6,30,32,34,36,40,42,44,46,50,52,54,56,60,62,64,x,y
    Answers and Commentary. See also this blog entry on the problem.
  7. An expert on tracking animals notices one day that there are bear tracks and rabbit tracks converging on the same spot. He can estimate that they converged at 6:00PM with a margin of error of 17 seconds. Hence they must have been there at the same time. He also notices that from the spot they converged only rabbit tracks can be found. HOW CAN A RABBIT EAT A BEAR FOR DINNER? Answers and Commentary
  8. There are 6 blue socks, 8 white socks, and 10 black socks in a drawer. How many do you need to take out of the drawer in order to get a pair. Answers and Commentary
  9. What is the square root of nine? Give the answer as an anagram of the question. Answers and Commentary

March 30, 2012

The Value of an Academic Publication

Russell O'Connor's paper was accepted into last years ACM SIGPLAN Workshop on Generic Programming. Russell put the final version of his paper on the ArXiv under a public domain dedication. Russell couldn't transfer author's rights to the ACM since he gave them up. After much discussion the ACM decided not to publish the paper in the proceedings. The abstract in the proceedings states
We note that one of the papers presented in the workshop is not included in the proceedings. This paper, "Functor is to Lens as Applicative is to Biplate: Introducing Multiplate" by Russell O'Connor, is accessible as arXiv:1103.2841v2 [cs.PL].
Russell gives his account but he focuses on ArXiv instead of the public domain aspect. In the STOC CFP we encourage putting your submissions on ArXiv and similar sites. The issue that worried ACM was the loss of rights. ACM could have published the paper, it was in the public domain, but it wouldn't have control of that publication and didn't want to set precedent. Scott Delman of the ACM responded here.

An academic paper you write has little direct value to you (the paper itself not the Intellectual Property within). No one is likely to give you any money for that paper. But your paper does have financial value as part of a collection in a journal or conference proceedings. Commercial and non-profit publishers know how to collect on this value. You might complain that this puts your paper behind a firewall but all the major publishers in CS allow you to post earlier drafts on your homepage and archive sites. As long as you make that effort, people will have access to your papers.

It does take some money (or considerable person hours) to maintain even an electronic journal or conference proceedings. But publishers get more value than that. For the ACM, the DL revenue is a major source of funding for ACM activities and those of the SIGs.

Of course you should never trust the opinion of someone who has a financial interest in a position and as SIGACT chair, we certainly make use of the DL revenue. We are hoarding some in case the DL revenue shrinks in the future.

It seems a shame to leave the monetary value of our papers on the table but also that value shouldn't be exploited. Big discussions will continue on the publications issue, at ACM and other publishers and at all levels of the CS community. There will be a Dagstuhl workshop focused on this topic in the fall. Figuring out the right model for publications will not be an easy one.

My biggest fear is that lack of a plan will lead to a degradation in the quality of our publications and then everyone loses.

March 29, 2012

The topology of the set of all types

It is well known that, both in constructive mathematics and in programming languages, types are secretly topological spaces and functions are secretly continuous. I have previously exploited this in the posts Seemingly impossible functional programs and A Haskell monad for infinite search in finite time, using the language Haskell. In languages based on Martin-Löf type theory such as Agda, there is a set of all types. This can be used to define functions $\mathbb{N} \to \mathrm{Set}$ that map numbers to types, functions $\mathrm{Set} \to \mathrm{Set}$ that map types to types, and so on.

Because $\mathrm{Set}$ itself is a type, a large type of small types, it must have a secret topology. What is it? There are a number of ways of approaching topology. The most popular one is via open sets. For some spaces, one can instead use convergent sequences, and this approach is more convenient in our situation. It turns out that the topology of the universe $\mathrm{Set}$ is indiscrete: every sequence of types converges to any type! I apply this to deduce that $\mathrm{Set}$ satisfies the conclusion of Rice’s Theorem: it has no non-trivial, extensional, decidable property.

To see how this works, check:

The Agda pages can be navigated be clicking at any (defined) symbol or word, in particular by clicking at the imported module names.

Sanjeev Arora wins ACM-Infosys Award

Sanjeev Arora will receive the 2011 ACM-Infosys Foundation Award, the highest honor ACM gives to a mid-career scientist.
Sanjeev Arora is one of the architects of the Probabilistically Checkable Proofs (PCP) theorem, which revolutionized our understanding of complexity and the approximability of NP-hard problems. He helped create new approximation algorithms for fundamental optimization problems such as the Sparsest Cuts problem and the Euclidean Travelling Salesman problem, and contributed to the development of semi-definite programming as a practical algorithmic tool. He has played a pivotal role in some of the deepest and most influential results in theoretical computer science, and continues to inspire colleagues and new generations of researchers.
Congratulations to Sanjeev!

In other news, the IEEE Computer Society W. Wallace McDowell Award is being awarded to Ron Fagin, and the US gives a big push for big data (CCC blog has details).

Another reminder for upcoming deadlines: STOC Posters (3/31), ACM Turing Student Travel (4/2), STOC Student Travel (4/4) and FOCS papers (4/4).

March 27, 2012

"Math was a mistake- I made it too hard"

(REMINDER- IF you are a STUDENT who wants to GOTO STOC 2012 but needs money to go then you should GOTO the STOC 2012 homepage and click on Travel Support. Deadline to apply: April 4.)

In the movie Oh God Book II God, played by George Burns, says
Math was a mistake, I made it too hard
Or at least I thought he said it. I have repeated this quote both in the blog and as a comment on Scott's blog. As I noted on my blog, if you Google
"Math was a mistake, I made it too hard"
all of the hits that you get lead back to me. This is still true. (Though it won't be after this post goes up.)

I thought it was a great quote so I am surprised it is not better known. FINALLY Oh God Part II came TV so I watched it just to see if what I've been quoting all these years is really in the movie. (Also, its a pretty good movie, for what it is.)

Alas, that is NOT the quote! What God really said was
"Mathematics, that was a mistake. I should have made the whole thing a little easier"
I think my version is better.

There are other misquoted quotes. My favorite: Stalin never said
The capitalists will sell us the rope with which we will hang them.
He did say:
The capitalists will furnish credits which will serve us for the support of the Communist Party in their countries and, by supplying us materials and technical equipment which we lack, will restore our military industry necessary for our future attacks against our supplier. To put it in other words, they will work on the preparations of their own suicide. (The Yale Book of Quotations (2006))
Both for George Burns and Stalin I am troubled- are we better off using the pithy version of the quotes, which DOES capture what they meant, or the original?

We have a similar issue when we teach- do we teach math the way it was invented or discovered (messy but motivated) or the way it is understood now (clean but unmotivated). Hopefully its not an either-or question and we can do some of both.

March 26, 2012

Zoowhiz - online learning system

Here's a fairly new learning website for children that is free: ZooWhiz. It lets children practice math, reading, and phonics/vocabulary, and earn points while doing so. Then they can spend their points to buy animals for their zoo, or to unlock arcade games.

It's quite a bit for being free, I think. My girls played it today and enjoyed it a lot, especially the younger one. I was impressed by the quality and how it is offered for free. (They have a "premium" option coming soon where you pay for extra features.)

It says that access time outside school hours is limited but I think it may mean Australian school hours.

The educational activities vary from one to the next; it's sort of like giving them "mixed review." They are organized by age groups and also by levels within the age groups.


What is an Elegant Proof?

What is an elegant proof? I do not know and I doubt it can be well defined; however, we all know it when we see it. I welcome comments on the topic; however, I will give a very simple example for a contrast of elegant and non-elegant. All of the math discussed here informally is done formally here.

Consider the following theorem:
There is no 2-digits number that is the sum of the squares of its digits.
One crude measure of elegance is to minimize the number of numbers that you need to check directly are not the sum of the squares of their digits. With this in mind. With this in mind, here are several proof sketches.
  1. One could proof this by enumerating all possible 2-digit numbers. If you wrote a program for this then you could use it on similar problems. Even so, I suspect most of us would call this proof NOT ELEGANT. This requires 89 CHECKS.
  2. There is a proof that first easily eliminates 10, 20, 30, 40, 50, 60, 70, 80, 90 and then looks at every interval [11,19], [21,29], ..., [91,99]. More elegant than enumeration and certainly shorter. This required 8 CHECKS.
  3. There is a proof that looks at the equation 10a + b = a2 + b2. We look at it mod 2, mod 4, and mod 10. This gives what I thought was the most elegant proof; however, after writing it down carefully it was about the same length as the interval proof. This required 2 CHECKS.
Consider the following theorem:
There is no x &geq 2 that is the sum of the squares of its digits.
The cases of 2 &leq x &leq 9 and x &geq 100 can be done with zero checks, so using the best proof of the last theorem, we can do this theorem with 2 checks.

Consider the following theorem.
The only 3-digits number that is the sum of the cubes of its digits is 153. (NOTE ADDED LATER: This is INCORRECT. A commenter pointed out that 370, 371, 407 also work. I will fix the proof and statement later and see if these are the only ones. Score a point for the `do it by a computer enumeration' argument!)
I have a proof of this which is... not quite elegant but not brute force. I get it down to only 21 CHECKS. If you have a better proof in terms of NUMBER OF CHECKS that is not contrived to reduce NUMBER OF CHECKS I would be very interested to see it. Of course, I cannot define contrived rigorously or elegantly.

March 25, 2012

I discover vegetables taste good in Dijon sauce then proceed to spread the gospel

I had a dozen eggs in the fridge that I need to use up by Wednesday; now I’m down to four. I made Jaffa cakes last week (not bad, but not memorable, and definitely not worth the effort: my main complaint is that the cake portion is bland and has a too-tough texture) and I left some chocolate blueberry cookie dough sitting in the fridge overnight which I’ll bake up tonight. That accounted for four of the eggs. This post is about the two meals that account for another four, and maybe the remaining four also :)

Steak, eggs, and vegetables in a Dijon sauce

Steak, eggs, and vegetables in sauce

I’m in love with this dish because it’s super simple to prepare, filling, and not too unhealthy: steak, eggs, and steamed veggies in a Dijon sauce. Surprisingly, considering how much I love meat, the star of the show here is the veggies in sauce: steamed baby carrots and French green beans (haricot verts) with an emulsion of Dijon mustard, red wine vinegar, EVOO, and some salt. Basically, it’s a quick and dirty substitute for Sauce Dijon.

If I’d known sauce + steamed vegetables was such a winning combination (don’t ask me why it took so late in life for me to get a clue) I would be a much healthier person :) My mission for next week is to learn how to make Hollandaise sauce and its derivatives, and find other vegetables to smother in sauce. No doubt Alton Brown has something interesting to say on the matter.

March 24, 2012

David Waltz (1943-2012)

David Waltz, head of the Center for Computational Learning Systems at Columbia, passed away yesterday after a battle with a brain tumor at the age of 68.

David Waltz is best known for his research in artificial intelligence but I'll remember him most for his leadership of the NEC Research Institute when I was there. David fought hard and risked his career to protect basic research, particularly theory, at NEC. He would end up losing the presidency of NEC in this conflict. It is a great leader that is willing to risk all to support his people.

March 22, 2012

ATMCS 5 in Edinburgh

ATMCS 5 – Algebra and Topology, Methods, Computing, and Science

Second Announcement

This meeting will take place in the period July 2-6, at the ICMS in Edinburgh, Scotland. The theme will be applications of topological methods
in various domains. Invited speakers are

J.D. Boissonnat (INRIA Sophia Antipolis) (Confirmed)
R. Van de Weijgaert (Groningen) (Confirmed)
N. Linial (Hebrew University, Jerusalem) (Confirmed)
S. Weinberger (University of Chicago) (Confirmed)
S. Smale (City University of Hong Kong) (Confirmed)
H. Edelsbrunner (IST, Austria) (Confirmed)
E. Goubault (Commissariat à l’énergie atomique, Paris) (Confirmed)
S. Krishnan (University of Pennsylvania) (Confirmed)
M. Kahle (The Ohio State University) (Confirmed)
L. Guibas (Stanford University) (Confirmed)
R. Macpherson (IAS Princeton) (Tentative)
A. Szymczak (Colorado School of Mines) (Confirmed)
P. Skraba/ M. Vejdemo-Johansson (Ljubljana/St. Andrews) (Confirmed)
Y. Mileyko (Duke University) (Confirmed)
D. Cohen (Louisiana State)
V. de Silva (Confirmed)
There will be opportunities for contributed talks. Titles and abstracts should be send to Gunnar Carlsson at gunnar@math.stanford.edu.

The conference website is located at http://www.icms.org.uk/workshops/atmcs5. Those interested should register there as soon as possible so that we can obtain an idea of the number of participants.

For the organizing committee,

Gunnar Carlsson

March 21, 2012

A few fun links for even & odd numbers

Enjoy!

Fruit Shoot
Shoot a fruit with an even or odd number, whichever one your aim tells you. Three levels: 1-10, 1-20, and 1-100.
www.sheppardsoftware.com/mathgames/earlymath/Fruit_shoot_odd_even.htm

Odd or Even?
Drag and drop the number cards to their correct place in the diagram (even or odd). three difficulty levels (numbers 1-10, 1-30, or 1-100).
www.crickweb.co.uk/ks2numeracy-properties-and-ordering.html#

Story of Odd and Even
A simple story about two ladybugs named Odd and Even. It teaches even and odd within 1-10.
www.primarygames.com/storybooks/even_odd/1.htm

March 20, 2012

Stacks Project

I’ve been trying to learn about stacks, something that is much easier in the Internet age. The Stacks Project is a collaborative textbook that introduces the subject from the ground up, including all of the machinery necessary. The book is already up to 3000(!) pages.

March 17, 2012

Do you have a math question?

Recently I was checking the links & adding some new ones on my page about math help and tutoring.

Here are some that are the "cream of the crop" when it comes to math help message boards. Typically, they require a student to show some work before someone helps -- or the helpers will only point out how to get started. You could definitely use these message boards as a parent if you can't figure out a problem in your child's math book, or if you have generic math or math education questions.

Message boards/forums

Ask Dr. Math
Browse old math questions or ask your own. Expert answers. The archives have thousands of answered questions covering all math topics from elementary to college level.
mathforum.org/dr.math

S.O.S. Mathematics CyberBoard
Active discussion board where you can ask math questions about algebra, calculus, geometry, trigonometry, probability, college math, computer science, physics, and more.
www.sosmath.com/CBB

Mrs. Glosser's Math Goodies Forums
For students, teachers & parents. Homework help, Teacher Talk, Parents place.
www.mathgoodies.com/forums

Math Help Boards
From pre-algebra on through university-level math. It has experienced staff, is completely free, and as a policy does not give answers away without the asker showing some effort.
www.mathhelpboards.com

FreeMathHelp.com
A math homework help forum.
www.freemathhelp.com/forum

Overloading Python list comprehension

Introduction
Python is very flexible in the way it allows you to overload various features of its syntax. For example most of the binary operators can be overloaded. But one part of the syntax that can't be overloaded is list comprehension ie. expressions like [f(x) for x in y].

What might it mean to overload this notation? Let's consider something simpler first, overloading the binary operator +. The expression a+b is interpreted as a.__add__(b) if a is of class type. So overloading + means nothing more than writing a function. So if we can rewrite list comprehensions in terms of a function (or functions) then we can overload the notation by providing alternative definitions for those functions. Python doesn't provide a facility for doing this directly, but we can at least think about what it might mean to do this. Later we'll see how to tweak the Python interpreter to make it possible.

map
Consider the expression
[a for x in y]
Here the single letter variables are 'metavariables' representing fragments of Python code. To a good approximation this is equal to:
map(lambda x: a, y)
(BTW Everything I say here is "to a good approximation". Python is an incredibly complex language and I'm not good enough at it to make any categorical statements about when one fragment of code is the same as another.)

So it's tempting to see list comprehensions as syntactic sugar for map, in which case one approach to overloading comprehension is to consider interpreting it in terms of replacements for map. But this isn't a very powerful overloading. It just gives us a slightly different way to write something that's already straightforward.

concatMap
Another reason for not simply seeing list comprehension in terms of map is that nested list comprehensions need another operation. Consider
[(y, z) for y in [1, 2] for z in ['a', 'b']]
This isn't quite the same as
[[(y, z) for z in ['a', 'b']] for y in [1, 2]]
but it's close. The latter produces nested lists whereas the first gives one flat list. We can think of nested comprehensions as applying a flattening operation. Let's use list comprehension to implement flattening:
def concat(xs):
return [y for x in xs for y in x]
We now write our nested comprehension as:
concat([[(y, z) for z in ['a', 'b']] for y in [1, 2]])
We know how to write non-nested comprehensions using map so we get:
concat(map(lambda y: [(y, z) for z in ['a', 'b']], [1, 2]))
And rewriting the inner comprehension we get:
concat(map(lambda y: map(lambda z: (y, z), ['a', 'b']), [1, 2]))
Every time we add another level of nesting we're going to need another concat. But the innermost map doesn't have a concat. Purely for reasons of symmetry we can ensure every map has a concat by enclosing the innermost element as a singleton list:
concat(map(lambda y: concat(map(lambda z: [(y, z)], ['a', 'b'])), [1, 2]))
Every map has a concat so we can simplify slightly. Let's define:
def concatMap(f, xs):
return [f(y) for x in xs for y in x]

def singleton(x):
return [x]
Our expression becomes:
concatMap(lambda y: concatMap(lambda z: singleton((y, z)), ['a', 'b']), [1, 2])
Importantly we've completely rewritten the comprehension in terms of concatMap and singleton. By changing the meaning of these functions we can change the meaning of comprehension notation, or at least we could if the Python interpreter defined comprehension this way. It doesn't, but we can still reason about it. Although any comprehension that doesn't use ifs can be rewritten to use these functions, I won't give a formal description of the procedure. Instead I'll provide code to perform the rewrite later. While I'm at it, I'll also handle the ifs.

Laws
Freely redefining singleton and concatMap to redefine comprehension could get weird. If we're going to redefine them we should at least try to define them so that list comprehension still has some familiar properties. For example, for y a list we usually expect:
y == [x for x in y]
In other words
y == concatMap(lambda x: singleton(x), y)
At this point I could give a whole bunch more laws but it's time to own up.

Monads
A pair of functions singleton and concatMap, along with a bunch of laws, are essentially the same thing as a monad. In Haskell, concatMap is usually called bind and singleton is called return. What I've done here is show how Wadler's Comprehending Monads paper might look like in Python. Haskell has specialised monad notation built into its grammar. But what's less well known is that so does Python! The catch is that although the grammar is right, the semantics can't be generalised beyond lists.

Monad-Python
One great thing about Python is that there seem to be libraries for working with every aspect of Python internals. So it's fairly easy to write a simple Python interpreter that rewrites list comprehensions to use singleton and concatMap. I've placed the source on github. Use mpython.py instead of python as your interpreter. I've tested it with Python 2.6 and 2.7.

When using mpython, list comprehension uses whatever definitions of __mapConcat__ and __singleton__ are currently in scope. By default they are the definitions I gave above so we get something close to the usual list comprehension.

An example of the kind of code you can run with mpython.py is:
import math

def __concatMap__(k, m):
return lambda c:m(lambda a:k(a)(c))

def __singleton__(x):
return lambda f:f(x)

def callCC(f):
return lambda c:f(lambda a:lambda _:c(a))(c)

def __fail__():
raise "Failure is not an option for continuations"

def ret(x):
return __singleton__(x)

def id(x):
return x

def solve(a, b, c):
return callCC(lambda throw: [((-b-d)/(2*a), (-b+d)/(2*a))
for a0 in (throw("Not quadratic") if a==0 else ret(a))
for d2 in ret(b*b-4*a*c)
for d in (ret(math.sqrt(d2)) if d2>=0 else throw("No roots"))
])

print solve(1, 0, -9)(id)
print solve(1, 1, 9)(id)
print solve(0, 1, 9)(id)
I have defined our functions so that comprehension syntax gives us the continuation monad. This makes continuation passing style relatively painless in Python. (At least easier than chaining many lambdas.) I have then defined callCC to be similar to its definition in Haskell. There are many uses for callCC including the implementation of goto. Above I use it in a trivial way to throw exceptions.

Conclusion
My script mpython.py is a long way from an industrial strength interpreter and I'm not proposing the above as an extension to Python. My goal was simply to show how Haskell-style monads are not as alien to Python as you might think. In fact, it's reasonable to say that Python already supports one flavour of specialised monad syntax. Most users don't realise it as such because it has been hard-wired to work with just one monad, lists.

BTW if you attempt to implement all of the other Haskell monads you'll find that Haskell behaves a little differently because of its laziness. You can recover some of that laziness by careful use of continuations in Python. But I've no time to go into that now.

March 16, 2012

Puzzled by the definition of stationarity

I’m reading Santosh Vempala’s survey “Geometric Random Walks: A Survey,” and already I’m puzzled at one of the very first definitions he gives.

Define a Markov chain as a state space sigma algebra pair \((K, \mathcal{A})\) with transition probability measures given by \(P_u\) for each \(u \in K.\)

A distribution \(Q\) on \((K, \mathcal{A})\) is called stationary if one step from it gives the same distribution, i.e., for any \(A \in \mathcal{A},\)
\[
\int_A P_u(A) \, dQ(u) = Q(A).
\]

This definition makes sense in words, but mathematically it doesn’t seem sound: unless \(P_u(A) = 1\) a.e. (with respect to \(u \in A\)), this equality can’t hold. I must be missing something…

Update:
The definition should involve integration over the whole space:

A distribution \(Q\) on \((K, \mathcal{A})\) is called stationary if one step from it gives the same distribution, i.e., for any \(A \in \mathcal{A},\)
\[
\int_K P_u(A) \, dQ(u) = Q(A).
\]

That this is so can be seen from the discrete case. Just goes to show you that you have to be careful even when reading peer-reviewed articles.

March 15, 2012

Alan Turing Year in Calgary

It's Alan Turing's centenary, and we've been celebrating it at the University of Calgary with a series of lectures.  This term, we've had a talk on the decision problem, one (by Mike Williams) on Turing and early electronic comupters, and one coming up on March 27, by John Ferris, on Alan Turing and codebreaking in WWII.  Yesterday, we screened the biopic Breaking the Code, with Derek Jacobi as Alan Turing (which you can watch on YouTube in its entirety!). The Pacific Institute for the Mathematical Sciences is paying to have the lectures videotaped and the'll be appearing on mathtube.org as they become available.  The lecture by my distinguised colleague in the Computer Science department, Mike Williams, was just posted a couple of days ago.  Mike is a former President of the IEEE Computer Society, editor in chief of the Annals of the History of Computing, and head curator for the Computer History Museum.  So he knows his history of computing machinery, and gave us a wonderful talk about Turing's role in the development of early digital computers.  (There's also a lecture by me on the 1936 paper, but that's much less interesting.) Thanks to generous funding from the Faculty of Science, we also have nice posters, like the one below, advertising our last talk for the Winter term, by my distinguished colleague in the History Department, John R. Ferris. 

March 14, 2012

The Rewards of Honesty

The New York Times published an article - which was re-published in the local papers - about a former mathematics professor Kim Myungho who self-published a book decrying the Korean judiciary. (Note the word “published” appeared thrice in the previous sentence.) In fact a movie already seen by several millions was made about Kim who [...]

It’s Pi Day today

A little creative tinkering yielded this mnemonic for Pi (apologies, latex died) to 15 places. “How I need a drink, alcoholic of course, after the heavy lectures involving Euclid’s algorithm” I’ve substituted “quantum mechanics” with “Euclid’s algorithm”, not exactly ideal since it is usually called Euclidean algorithm, plus do you count that apostrophe? But [...]

Hardy on Number Theory

The elementary theory of numbers should be one of the very best subjects for early mathematical instruction. It demands very little previous knowledge; its subject matter is tangible and familiar; the processes of reasoning which it employs are simple, general and few; and it is unique among the mathematical sciences in its appeal to natural [...]

March 13, 2012

Pi Day is Upon Us...

Vi Hart: Pi Is (still) Wrong.



Vi argues against Pi, and for Tau. Even if you disagree with her, I think you will enjoy the pies! And if you're not familiar with the controversy between Pi and Tau, I suggest you read http://tauday.com/

Especially take a look at these images

pi and unit circle

tau and unit circle

(images by Michael Hartl from Tauday.com)

Personally, I am not taking sides! At least not for now!

March 09, 2012

Is this a simulation?

There is reason to suppose that this, all this, is a simulation. Not the least enticing reason, but perhaps a misleading one, is that quantum mechanics tells us that at a small scale the world is described by probabilities and statistics. Not only is it described by a statistical distribution but unless we probe closely the statistical distribution is not rendered for us to notice. This is efficient and puts us in mind of a simulation or video-game where an algorithm decides what will be displayed on screen: if we zoom up close the texture of the simulation breaks down, but how it breaks down into pixels is determined by the rules of the program.

Heisenberg took seriously the notion that only what we can measure exists. Not so long ago I enjoyed Quantum: Einstein, Bohr and the Great Debate about Reality by Manjit Kumar and learnt that Heisenberg had been inspired, long before he knew much about matrices or anti-commuting variables, by the track of a charged particle through a cloud chamber. As the particle passes through the vapour of the cloud chamber it ionises molecules around which other molecule condense. Thus a track of condensed vapour is formed following the path of the charged particle/atom/molecule. But, reasoned Heisenberg, although the path appears to be continuous it is actually only a sequence of points which occur where each ion in the vapour is formed - unless it is measured the particle's postion is not known for certain. Of course faced with this thought Heisenberg opted for the wild solution that when the particle's position is not measured it does not exist. In the film version of Michael Frayn's play Copenhagen, Heisenberg is shown being inspired while walking beneath a series of pools of light and then vanishing in the shadows between. It all leads very temptingly to the idea that, just like in a computer game, wheresoever we do not look is not rendered and that the fine-grain detail could be displayed by according to an algorithm. So is this a simulation?

Let me take a different tack and wonder whether it might be possible one day (let t tend towards infinity...) to construct a simulator. We might dream of something into which we may project all the information in our most advanced brain-state and where all its functions can be replicated as if in the real world, or perhaps I should say the world we conceive as real presently. It will be informative to think about this notion of the  real world. To do so permit me the assumption that there is some grand unified theory possessing a large symmetry and let the simulator we are considering be a perfect simulation. That is there are no flaws in the simulation that would allow the insider, the brain in the vat, to deduce that he/she/it exists inside the simulator (the cat from The Matrix is back in the bag, so to speak, and there are no rounding error problems inside the simulator for a messianic figure to take advantage of...). Note that I am taking this to mean this means that machine must encode all there is to know about M-theory in one form or another. Finally let us introduce a new constraint, let the machine be localised, i.e. it is not infinite in any direction - it can fit in a bounded volume, nutshell or even teapot, a big enough teapot. All the assumptions that have been made can be recast as saying we have a quantum gravity simulator in a box, let us call it M-box.

M-box buzzes, fizzes, gurgles and pops, and by assumption recreates all the information content of a universe. It must be one hell of a machine. If we suppose it could exist what are the corollaries for the physical description of the universe it exists within and simulates perfectly? Since we assumed it is finite in extent and yet contains all of M-theory must we throw out of M-theory any non-local effects that occur? After all the machine is localised and can recreate perfectly M-theory. By localised I had better emphasise that I mean that its internal workings do not rely on any non-local physical effects. This point is worth some more words. Our usual picture of a computer simulation involves information encoded in 1's or 0's in bits of information, more ambitious quantum computers would build information upon data units that can be in a superposition of two states - these are called qubits. Qubits could be built on electron spin for example. An electron is a relatively simple solution of quantum field theory. For the M-box we would like to build its circuitry on solutions of M-theory; we would like the M-box to be able to excite membranes extending into the compact higher-dimensional space. Membranes can be spun and vibrated, but membranes can also be U-dualised into fivebranes. To take this to its limit the clever M-box we imagine will encode data by U-dualising membranes. We try hard not to think about how it might do this. But we emphasise that the M-box is imagined to encode the full set of M-theory solutions by U-dual rotations of a single membrane. Mathematically the object which encodes all these solutions is the coset $\frac{E_{11}}{{\cal R}(E_{11})}$ where ${\cal R}(E_{11})$ is a real-form of $E_{11}$. Now we may ask the question is there a single solution to M-theory which itself possesses the full symmetries of M-theory. The closest we know of is the BKL cosmological singularity which is expected to carry the symmetries of $\frac{E_{10}}{{\cal R}(E_{10})}$. But awe-inspiring as a cosmological singularity may be it is still not enough to be part of the circuit board of our conjectured M-box. If we did construct such an $E_{10}$ M-box then once we embedded our brain-state inside the simulation we would be able to deduce that some parts of the full $E_{11}$ symmetry are missing (we would probably need some simulated psychiatric help after our simulated selves came to this conclusion). So if one could identify a solution to M-theory which itself possessed a full $E_{11}$ symmetry then we would be in business and M-boxes could start to be rolled off the manufacturing line. However as things stand if we deduce that there is an $E_{11}$ symmetry to M-theory and we wondered if we were simulated we might imagine that the simulator is not an M-box but rather an M'-box (surely this is a catchier name than M-box 360?) which is running in a universe which is governed by M'-theory - which has a larger symmetry than $E_{11}$. The argument is then repeated so that M-box $\subset$  M'-box $\subset$  M''-box $\subset \ldots $.  One would agree that this is not an enticing picture for someone looking for a closed description of the universe and hoping that we do live in a simulation. One of the attractions of a Kac-Moody algebra is that it has infinite generators and one might hope yet that a clever way to embed $E_{11}$ inside itself could be found and associated with an M-theory (bound-state) solution. Alternatively keen simulationists might prefer to spend their time in Skyrim instead.

March 08, 2012

Eff 3.0

Matija and I are pleased to announce a new major release of the eff programming language.

In the last year or so eff has matured considerably:

  • It now looks and feels like OCaml, so you won’t have to learn yet another syntax.
  • It has static typing with parametric polymorphism and type inference.
  • Eff now clearly separates three basic concepts: effect types, effect instances, and handlers.
  • How eff works is explained in our paper on Programming with Algebraic Effects and Handlers.
  • We moved the source code to GitHub, so go ahead and fork it!