Monday, December 23, 2019

Bodo Moeller predicted the 512-bit export attacks in TLS in 1998

Bodo Moeller predicted the 512-bit export factoring attacks in TLS in 1998. Since TLS mailing list archives from this time period are hard to find now (I had to use http://web.archive.org/web/20081014015810/http://www.imc.org/ietf-tls/entire-arch.txt), here is one message:

From bounce-ietf-tls-435@lists.consensus.com  Tue Jan 26 18:34:02 1999
Received: from hamlet.consensus.com (hamlet.consensus.com [157.22.240.90])
 by mail.proper.com (8.8.8/8.8.5) with SMTP id SAA20285
 for <ietf-tls-archive@imc.org>; Tue, 26 Jan 1999 18:33:58 -0800 (PST)
Message-Id: <m105KoS-0003bDC@ulf.mali.sub.org>
Date: Wed, 27 Jan 99 03:34 +0100
From: Bodo_Moeller@public.uni-hamburg.de (Bodo Moeller)
To: "IETF Transport Layer Security WG" <ietf-tls@lists.consensus.com>
Subject: Re: RFC 2246
In-Reply-To: <008f01be4976$855fa8c0$3508000a@haruspex>
Organization: Hamburg University
List-Unsubscribe: <mailto:leave-ietf-tls-435N@lists.consensus.com>
List-Software: Lyris Server version 3.0
List-Subscribe: <mailto:subscribe-ietf-tls@lists.consensus.com>
List-Owner: <mailto:owner-ietf-tls@lists.consensus.com>
X-List-Host: Consensus Development Corp <http://www.consensus.com/>
Reply-To: "IETF Transport Layer Security WG" <ietf-tls@lists.consensus.com>
Sender: bounce-ietf-tls-435@lists.consensus.com
X-Lyris-Message-Id: <LYR435-4064-1999.01.26-18.37.52--ietf-tls-archive#imc.org@lists.consensus.com>
Precedence: bulk
X-List-Host: SKYLIST.net <http://www.SKYLIST.net/>

"Tim Dierks" <timd@consensus.com>:

> I don't know if this is final yet: I'm still going over a couple of textual
> nits with the RFC editor. I'd expect it today or tomorrow, though.

It seems that a few days ago somebody fixed the mailing list
administration software for ietf-tls, finally allowing me to subscribe
(after various futile attempts from this and other accounts starting
October 1998 plus an inquiry or two to the administrative address
without any reaction whatsoever).  Congratulations on that.

As already noted in my message of 1998-10-10 to ietf-tls@consensus.com
(rejected because "Only members of ietf-tls are allowed to contribute
messages" although on 1998-10-06, in response to one of my
subscription attempts, I had received a message from
lyris@lists.consensus.com which claimed that "It's a good idea to save
this message somewhere safe so you know how to unsubscribe") and to
TimD@consensus.com, the SSL 3.0 and TLS 1.0 protocols have an
unnecessary weakness which could be used by an attacker to force a
SSL/TLS connection to use weak cryptography even if both systems
support strong cryptography; an edited quote from my old message (all
edits are marked by brackets):

  From: Bodo_Moeller@public.uni-hamburg.de (Bodo Moeller)
  To: ietf-tls@consensus.com
  Cc: TimD@consensus.com
  Subject: Export-PKC attacks on SSL 3.0/TLS 1.0
  Date: Sat, 10 Oct 1998 11:37:42 -0700
  
  I finally found time to read the SSL 3.0 and TLS 1.0 specifications[1],
  and I noticed that there apparently is a potential weakness in those
  protocols.  The scenario to which the attack applies is when both
  client and server are able to perform strong cryptography (i.e., not
  export-weakened), but both are also willing to put up with export
  ciphers if the respective other side does not support strong
  cryptography.

  [Footnote 1 omitted.]

  [...]        All authentication in the protocol relies on the security
  of the algorithms negotiated at [early handshake] stage.  Now assume
  a very strong
  attacker whose equipment and algorithmic know-how allows him to
  decrypt messages encrypted for weak public keys, e.g. by factoring a
  512 bit RSA modulus during its lifetime.  (One might argue that likely
  No Such Attacker exists, but better safe than sorry.)
  
  If the server offers the RSA_EXPORT key exchange method with a 512 bit
  public key, obviously there is no security against this attacker: The
  attacker deletes the strong methods from the cipher suite list in the
  ClientHello message, the server then presents its weak certificate,
  the client sends the weakly encrypted premaster secret across the
  network, the attacker obtains the premaster secret and uses it to
  "authenticate" to both parties.
  
  Thus, the server should not use RSA_EXPORT with 512 bit public keys.
  Fortunately, there are other exportable key exchange method that
  include strong signatures: In the case of RSA_EXPORT with a 1024 bit
  public key, this strong public key can be used to sign a temporary 512
  bit RSA key (Appendix D.1 gives recommendations on how often this
  temporary key should be changed), which is then used to encrypt the
  premaster secret.  Unfortunately, the strong authentication provided
  by the 1024 bit RSA key is not used properly to thwart the attack
  described above: The signature covers, besides the temporary key, only
  the nonces ClientHello.random and ServerHello.random.  This binds the
  temporary key to the current connection, but that is not enough:
  The signature also needs to cover ClientHello.cipher_suites (and, in
  order to be prepared for future protocol changes, also
  ClientHello.client_version).  Only then, there is strong
  authentication for _why_ the server chose an export-weakened cipher.
  
  I did not find this attack mentioned in the David Wagner and Bruce
  Schneier's paper "Analysis of the SSL 3.0 protocol" (November
  19, 1996).  They have a section on "Key-exchange algorithm rollback",
  but their attack is about relabeling Diffie-Hellman parameters as RSA
  parameters (the strong signature should also cover a label that
  explicitly says which key exchange method the server picked).
  
  While I'd bet I am not the first one to notice the weakness pointed
  out above, it obviously has not been dealt with in the TLS protocol
  draft.  The draft should point out that a strong server that offers
  export-weakened [cipher suites] could be forced, by a sufficiently strong
  attacker, to use weak algorithms even if the client also supports
  strong cryptography.  The protocol should be improved by protecting
  more data (as detailed above) with _strong_ authentication in
  ServerKeyExchange messages, so that strong assurance can be achieved
  that the parties will choose the highest strength of cryptography
  supported by both.
  
  Unfortunately, this protocol improvement will be useless until all
  earlier version of the protocol are phased out -- otherwise, the
  attacker can simply change the version number fields and proceed as
  described above.  But there is likely some time until 512 bit
  factoring machines hit the mass market :-)

Later observations on the protocol drafts (quoted from a message which
I mailed on 1998-11-22 to a different list because this one still was
closed) include

              [...] that the draft excludes strong temporary RSA keys
  unless the public key in the certificate "cannot be used for
  encryption" (appendix D.1) -- i.e., strong temporary parameters either
  must be DHE parameters, or the server certificate must have keyUsage
  restricted accordingly.  For some reason, the TLS drafts do not allow
  strong-crypto forward secrecy for RSA-only implementations unless the
  CA put a "keyUsage" extension in the server's X.509 certificate that
  ensures that the key "cannot be used for encryption". -- Maybe the
  draft authors were not even thinking of "keyUsage" restrictions --
  they talk of "X509v3" certificates, but the references list the
  prehistoric 1988 version of X.509 -- i.e., X.509 v1.

The drafts don't even mention forward secrecy, while it is a
subject that definitely ought to be explained in the security
considerations section.

Finally, the definitions of ASN.1Cert and certificate_list have
discrepancies between section 7.4.2 and appendix A.4.2.  The former
are correct:

       opaque ASN.1Cert<1..2^24-1>;

       struct {
           ASN.1Cert certificate_list<0..2^24-1>;
       } Certificate;

The latter, reproduced below, are not -- not that "opaque
ASN.1Cert<2^24-1>" does not even match the "T T'<floor..ceiling>"
syntax defined in section 4.3.

    opaque ASN.1Cert<2^24-1>;

    struct {
        ASN.1Cert certificate_list<1..2^24-1>;
    } Certificate;


Bodo M"oller
<bmoeller@acm.org>

---
You are currently subscribed to ietf-tls as: [ietf-tls-archive@imc.org]
To unsubscribe, forward this message to leave-ietf-tls-435N@lists.consensus.com



Another message:
From bounce-ietf-tls-435@lists.consensus.com  Tue Feb  2 02:24:50 1999
Received: from hamlet.consensus.com (hamlet.consensus.com [157.22.240.90])
 by mail.proper.com (8.8.8/8.8.5) with SMTP id CAA02265
 for <ietf-tls-archive@imc.org>; Tue, 2 Feb 1999 02:24:46 -0800 (PST)
Message-ID: <A9B9B14D4F3FD211857A00A0C9DDB07B164A3D@schzmxs1.europe.entrust.com>
From: Rene Eberhard <rene.eberhard@entrust.com>
To: "IETF Transport Layer Security WG" <ietf-tls@lists.consensus.com>
Cc: "'Bodo_Moeller@public.uni-hamburg.de'"
  <Bodo_Moeller@public.uni-hamburg.de>
Subject: RE: RFC 2246
Date: Tue, 2 Feb 1999 05:19:33 -0500 
MIME-Version: 1.0
Content-Type: text/plain;
 charset="iso-8859-1"
List-Unsubscribe: <mailto:leave-ietf-tls-435N@lists.consensus.com>
List-Software: Lyris Server version 3.0
List-Subscribe: <mailto:subscribe-ietf-tls@lists.consensus.com>
List-Owner: <mailto:owner-ietf-tls@lists.consensus.com>
X-List-Host: Consensus Development Corp <http://www.consensus.com/>
Reply-To: "IETF Transport Layer Security WG" <ietf-tls@lists.consensus.com>
Sender: bounce-ietf-tls-435@lists.consensus.com
X-Lyris-Message-Id: <LYR435-4822-1999.02.02-02.29.12--ietf-tls-archive#imc.org@lists.consensus.com>
Precedence: bulk
X-List-Host: SKYLIST.net <http://www.SKYLIST.net/>

Hi

>   [...]        All authentication in the protocol relies on 
> the security
>   of the algorithms negotiated at [early handshake] stage.  Now assume
>   a very strong
>   attacker whose equipment and algorithmic know-how allows him to
>   decrypt messages encrypted for weak public keys, e.g. by factoring a
>   512 bit RSA modulus during its lifetime.  (One might argue 
> that likely
>   No Such Attacker exists, but better safe than sorry.)

Am I right in understanding that you are talking about the lifetime
of a temporary RSA key? According to D.1 we assume that the attacker
can factoring the key within one day.

Although the described attack is based on factoring an export RSA key
(which is between 328 and 512 bits) it is worth picking up the 
discussion.


>   If the server offers the RSA_EXPORT key exchange method 
> with a 512 bit
>   public key, obviously there is no security against this 
> attacker: The
>   attacker deletes the strong methods from the cipher suite 
> list in the
>   ClientHello message, the server then presents its weak certificate,
>   the client sends the weakly encrypted premaster secret across the
>   network, the attacker obtains the premaster secret and uses it to
>   "authenticate" to both parties.

The attacker can fake the finished message at the end of the HS
because he owns the premaster secret.


>   Thus, the server should not use RSA_EXPORT with 512 bit public keys.
>   Fortunately, there are other exportable key exchange method that
>   include strong signatures: In the case of RSA_EXPORT with a 1024 bit
>   public key, this strong public key can be used to sign a 
> temporary 512
>   bit RSA key (Appendix D.1 gives recommendations on how often this
>   temporary key should be changed), which is then used to encrypt the
>   premaster secret.  

It is not always possible that an export server can generate strong RSA
signing key pairs. Whether an RSA key can be used for signing only (instead
for
key encipherment) is prescribed by the CA's policy.


> Unfortunately, the strong authentication provided
>   by the 1024 bit RSA key is not used properly to thwart the attack
>   described above: The signature covers, besides the 
> temporary key, only
>   the nonces ClientHello.random and ServerHello.random.  This 
> binds the
>   temporary key to the current connection, but that is not enough:
>   The signature also needs to cover ClientHello.cipher_suites (and, in
>   order to be prepared for future protocol changes, also
>   ClientHello.client_version).  Only then, there is strong
>   authentication for _why_ the server chose an export-weakened cipher.

I completely agree with you. I'd additionally add the selected cipher suite 
to the signature. This prevents the relabeling of a selected cipher suite.

And even more. There could be an additional HS message in the server's
reply.
If the server only sends its certificate which includes key agreement /
exchange
parameters the client never knows whether the server got the entire client 
hello message. (If an attacker obtained the premaster secret during HS.)
But this requires that the server always has an signature-capable
certificate.

   
>   I did not find this attack mentioned in the David Wagner and Bruce
>   Schneier's paper "Analysis of the SSL 3.0 protocol" (November
>   19, 1996).  They have a section on "Key-exchange algorithm 
> rollback",
>   but their attack is about relabeling Diffie-Hellman 
> parameters as RSA
>   parameters (the strong signature should also cover a label that
>   explicitly says which key exchange method the server picked).

I must admit that I don't understand why the signature structure hasn't
been improved. The 'relabeling Diffie-Hellman' problem still exists.
And the relabeling problem could get more importance when TLS uses
ECC ciphers!


> Later observations on the protocol drafts (quoted from a message which
> I mailed on 1998-11-22 to a different list because this one still was
> closed) include
> 
>               [...] that the draft excludes strong temporary RSA keys
>   unless the public key in the certificate "cannot be used for
>   encryption" (appendix D.1) -- i.e., strong temporary 
> parameters either
>   must be DHE parameters, or the server certificate must have keyUsage
>   restricted accordingly.  For some reason, the TLS drafts do 
> not allow
>   strong-crypto forward secrecy for RSA-only implementations 
> unless the
>   CA put a "keyUsage" extension in the server's X.509 certificate that
>   ensures that the key "cannot be used for encryption". -- Maybe the
>   draft authors were not even thinking of "keyUsage" restrictions --
>   they talk of "X509v3" certificates, but the references list the
>   prehistoric 1988 version of X.509 -- i.e., X.509 v1.
> 
> The drafts don't even mention forward secrecy, while it is a
> subject that definitely ought to be explained in the security
> considerations section.

I agree.

> Finally, the definitions of ASN.1Cert and certificate_list have
> discrepancies between section 7.4.2 and appendix A.4.2.  The former
> are correct:
> 
>        opaque ASN.1Cert<1..2^24-1>;
> 
>        struct {
>            ASN.1Cert certificate_list<0..2^24-1>;
>        } Certificate;
> 
> The latter, reproduced below, are not -- not that "opaque
> ASN.1Cert<2^24-1>" does not even match the "T T'<floor..ceiling>"
> syntax defined in section 4.3.
> 
>     opaque ASN.1Cert<2^24-1>;
> 
>     struct {
>         ASN.1Cert certificate_list<1..2^24-1>;
>     } Certificate;


Still appears in the RFC =(.

Regards Rene

---
You are currently subscribed to ietf-tls as: [ietf-tls-archive@imc.org]
To unsubscribe, forward this message to leave-ietf-tls-435N@lists.consensus.com

Wednesday, November 20, 2019

Google DoubleClick Mozilla overview (third draft)


There are many problems with web advertising in general, including annoying features like autoplay video ads and pop-ups and also problems like “click fraud” which matter to advertisers. This essay will however be focusing on the privacy issues with some of the kinds of ads that Google produces and the history behind them, and why Larry/Sergey didn’t consider them when buying DoubleClick for example. Also discussed is Mozilla and how they are involved (like in the Google/Mozilla search deal), including Brendan Eich who created JavaScript that eventually left Mozilla to found Brave. There is also the difficulty of solving these issues, which will also be discussed. Of course, advertising is not limited to the web and there are often many benefits and risks (like deceptive advertising) to advertising in general, most of which will not be discussed here.

The history of Google and its advertising will be discussed first. Google was founded in 1998 by Larry Page and Sergey Brin while at Stanford, and took VC funding from KP and other partners. Google was founded with the search engine (with the PageRank algorithm) as the first product, but later added products like Gmail. Eric Schmidt was bought in as CEO in 2001 and recently left but are still on the board. Google IPOed in 2004, using dual class stock for example.

The first kind of ads that Google did was AdWords, dating back to 2000. AdWords was based on search keywords, and the text ads were displayed at the top of the search results (labelled as ads) and were relatively simple. Typically the highest bidder was shown, and the advertiser paid Google when the user clicked on the ads. AdWords involved relatively little tracking at least initially and will not be mentioned much here. At this time Google was also taking a stand against popup ads.

AdSense was ads shown on webpages themselves, based on JavaScript. It was invented in 2003. AdSense at least initially was based on keywords on webpages themselves (which Google fetched from its cache for example), which advertisers could bid on. Like with AdWords, Google and websites gets paid when users click on the ads. It also involved little tracking at least initially.

Google bought DoubleClick in 2008. DoubleClick was invented in 1995. It made more sophisticated ad tracking via cookies and the like famous (which was often called “retargeting”), and the problems will be described here. DoubleClick themselves called its product “Dynamic Advertising Reporting and Targeting” at one point for example. Initially DoubleClick was mostly banner ads, and many users developed so-called banner-blindness from these ads. Cookies were itself invented in Netscape in 1994, and the IETF group that developed RFC 2109 and 2965 already know that tracking with “third-party cookies” were a problem (and it was mentioned in these RFCs). Those attempts at IETF cookie standards ultimately failed partly because they were incompatible with current browsers, and led to RFC 6265 that is closer to how cookies are implemented in browsers today. It also led to W3C P3P which was famously implemented in IE6, which also of course failed (partly because it was too complex) and was removed from Windows 10 but was an attempt to get the tracking under control.

Google bought Urchin in 2005, turning it into Google Analytics. Urchin was founded in 1998. Initially its product was to analyze web server log files, with JavaScript tags being added in Urchin 4 (called “Urchin Traffic Monitor”). The hosted version based entirely on JavaScript that was created later was initially called “Urchin on Demand” and was introduced in 2004. Of course, the original software that was sold receive little attention once Google bought it and it became Google Analytics and it was discontinued in 2012.

One problem with the ads is tracking. The current economy is a debt-based economy based on consumption. The more money advertisers can extract from consumers, the more they are willing to spend on ads. This results in tracking getting creepier and creepier, and encourage consolidation of data for example. Most of the ad tracking is called “retargeting” and it is often based on cookies and JavaScript, and DoubleClick was one of the first to do it. All ads encourages consumption by definition, but tracking ads are particularly bad for these reasons.

For example, DoubleClick has cross-device retargeting introduced in 2015. Of course, it is limited to logged-in users tracking via the user account at least initially (which any websites can do), but it illustrated the trend. Google changed the privacy policy to allow Google accounts to be used for such logged-in user tracking in 2016. Recently Google signed an agreement with MasterCard to obtain credit card sales data. Of course, credit cards directly ties an increase in debt to consumer spending, which in turn can go to Google as ad dollars.

According to http://adage.com/article/digital/google-turns-behavioral-targeting-beef-display-ads/135152/, “In December 2008 Google added DoubleClick cookies to AdSense ads”, tying the DoubleClick cookie-based tracking (dating long before Google bought it) to AdSense. I assume that AdSense tracking probably did not exist before Google bought DoubleClick. Google Analytics added AdWords and AdSense support in 2009. In 2012, Google changed its privacy policy to allow data to be consolidated, which was also very controversial. In 2014, Google Analytics integrated with DoubleClick, allowing things like remarketing lists to be shared according to https://analytics.googleblog.com/2014/05/google-analytics-summit-2014-whats-next.html. Remarketing lists are basically lists of website visitors that can be uniquely identified by things like cookies, and it is one of the ways of targeting ads to users. It can probably be assumed that sharing remarketing lists basically ties the tracking together. Sharing of Google Analytics remarketing lists with AdWords was introduced in 2015, along with linking of Google Analytics and AdWords “manager” accounts, according to https://adwords.googleblog.com/2015/11/share-google-analytics-data-and.html. “Google Analytics 365” came in 2016, according to https://analytics.googleblog.com/2016/03/introducing-google-analytics-360-suite.html. Remarketing lists for search ads was introduced in 2012 and was tied to Google Analytics in 2015 (though not all data from Google Analytics can be used). It allowed different search ads to be targeted to different visitors based on cookie-based tracking on websites (with sites using special tags for this purpose). For example, you can show different search ads to visitors that visit the site every day.

https://www.propublica.org/article/google-has-quietly-dropped-ban-on-personally-identifiable-web-tracking is about a privacy policy change in 2016. To quote from the article: “Google quietly erased that last privacy line in the sand – literally crossing out the lines in its privacy policy that promised to keep the two pots of data separate by default. In its place, Google substituted new language that says browsing habits “may be” combined with what the company learns from the use Gmail and other tools.”

Of course, users often has little control and benefit over storage of user data and ad retargeting by trackers too, especially when many parties are involved. This was mentioned during the Google/DoubleClick acquisition for example. Of course, some provides more control than others, such as AdChoices for example. AdChoices was an attempt at self-regulation for ad publishers, and used an icon to indicate that data was being collected. You can click the icon to display the privacy policy for the ads or opt-out of ad targeting. It was not the same as blocking ads completely though, and did not solve all of the problems of ads either. There was also an attempt at a Do-Not-Track HTTP header, which was probably too simple (and thus was also very vague in its meaning) and there was no guarantee that a site would comply either obviously since it was just an HTTP header (IE11 enabling it by default was also controversial and Windows 10 no longer does so by default).

Some of the problems with the opt-out methods are similar to the problems of a national “do not email” registry proposed in the US CAN-SPAM Act of 2003 for spam messages, and such lists to “opt out” of spam are widely considered to be unacceptable in general. Even “opt-out” or “unsubscribe” links in spam is widely considered untrustworthy for obvious reasons, though legitimate mailing lists will also have them. That idea came from the similar “do not call” registry for telephone marketing (to stop annoying marketing phone calls which were considered more annoying than spam of course), but email and internet advertising ended up being very different from telephone calls making these laws difficult to enforce. It is far easier to send an email than to call someone for example, and email is also more difficult to trace to the origin especially given that the Internet is global. FTC has a report at https://www.ftc.gov/reports/can-spam-act-2003-national-do-not-email-registy-federal-trade-commission-report-congress describing these problems (it was a report to Congress that was required by CAN-SPAM), including the possibility that such a list can be abused by spammers for example. “Closed-loop opt-in” using confirmation emails for mailing lists on the other hand is widely accepted, but it is not mentioned in CAN-SPAM. One example includes the tracking of “opt-out” using cookies in things like AdChoices, which themselves can be used for other purposes obviously.

There are some reasons why these problems were not apparent (for example to Larry/Sergey) when Google bought DoubleClick, or when remarketing lists was shared, or for that matter when Urchin became Google Analytics and the data was merged with ad data.

The difficulty of researching things like the tying of remarketing lists during the writing of this essay shows some of the problems. It seems that no one cared about the privacy implications when remarketing lists in AdSense and DoubleClick was shared for example. In many cases, advertisers managed “remarketing” lists of “anonymous” visitors that was being tracked by cookies from a central console without thinking of the privacy problems, treating visitors almost as numbers. This ties in with the idea of treating people as “consumers” to be extracted from that are also fundamentally flawed. Another example of this is AOL that famously made it difficult to cancel at one point, partly because measuring “customer loyalty” as numbers to be extracted from consumers was part of their culture. To make it worse, they once charged consumers by the time spent on AOL, so the longer they stay the more revenue they made.

The Google-DoubleClick acquisitions was also controversial, with EPIC, CDD and US PIRG for example filing complaints with the FTC in April 2007, a “first supplement” to the complaint in June 2007, and a “second supplement” in September 2007. There was also a Senate hearing on Sept 27, 2007 with testimonies from a variety of sources regarding that issue. One of the concerns back then was aggregation of tracking data and lack of control by users, though other issues unrelated to ads like storage of IP addresses by search engines were also mentioned. Ultimately it took the FTC until the end of 2007 to approve the deals, after a “second request”.

Before the Google-DoubleClick acquisition, DoubleClick was once planned to merge with Abacus. FTC blocked the merger because of the privacy problems and it never happened. Abacus Direct seems to be a market researching company targeting consumer buying behavior. As a result, Abacus had a lot of personal info about consumers, and there were concerns that this data could be merged with DoubleClick data and may be used to deanonymize them.

In 2012, Jonathan Mayer discovered that Google used some tricks in JavaScript to allow tracking in Safari. It involved how Google was able to bypass cookie blocking policy in Safari by using an invisible form to fool Safari into allowing cookies. FTC fined Google $22.5 million over this behaviour, and more recently there has been lawsuits about it in the UK. There has been also a class action lawsuit about this in the US. Google argued the tracking was unintentional at the time and that it was related to Google+ “Plus” buttons on DoubleClick ads (for logged-in users I believe). It is probably worth mentioning here that a lot of these kind of buttons (like Facebook’s Like buttons, to name another example) do their own tracking too (they generally worked by using IFRAMEs to the website involved), and this has been well known for years. For example, according to https://www.technologyreview.com/s/541351/facebooks-like-buttons-will-soon-track-your-web-browsing-to-target-ads/ Facebook started using the tracking Like buttons to target ads in 2015. I think the Facebook-WhatsApp acquisition story is also famous by now BTW, including how they eventually allowed data sharing between the two (presumably after years of losses). It is worth mentioning how even the WhatsApp founders now recommend deleting Facebook (especially after the Cambridge Analytica debacle).

Now, let’s discuss Mozilla. Brendan Eich was the creator of JavaScript at Netscape when it was invented in 1995 and was the CTO of Mozilla Corporation from 2005 to 2014. After he stepped down from Mozilla in 2014 (just after he became CEO and after bad publicity stemming from his political donations about things like gay marriage), he was one of the founders of Brave with its Basic Attention Token etc. Andreas Gal joined Mozilla in 2008 and was the CTO from 2014 until 2015 when he left Mozilla.

Mozilla signed the Google search deal in 2004, before Google even IPOed (let alone things like DoubleClick). Mozilla switched to a Yahoo search deal in late 2014 (by then the search engine was based on MS’s Bing I think), which was part of Marissa Mayer’s attempt to fix Yahoo before it was sold to Verizon. Recently Mozilla switched back to Google as the default search engine.

BrendanEich mentioned in https://twitter.com/BrendanEich/status/932747825833680897 that “It's not a simple Newtonian-physics (or fake economics based on same) problem.” This was about the history of the Google search deal with Mozilla and the fact that it was signed before Google IPOed (when it was being funded by VCs). It is worth mentioning here that Google was founded in 1998 when the now famous dot-com bubble was at the peak and VC funding was common (allowing many startups to grow fast which was considered more important than profits). Many other dot-com startups at the time had problems and ended up failing when the bubble collapsed around 2001. It is worth mentioning that the DoubleClick acquisition dates back to 2007 which was just before the housing bubble famously collapsed leading to another recession, and that bubble probably started just after the dot-com bubble.

BrendanEich mentioned in https://twitter.com/BrendanEich/status/932473969625595904 that “A friend said in 2003 that Sergey declared G would not acquire display ads & arb. Search vs. Display as that would be “evil”.”, before Google even IPOed (in 2004). Unfortunately no other source was given.

It was mentioned on Twitter that Firefox OS enabled tracking protection by default unlike desktop Firefox. It was mentioned in https://twitter.com/andreasgal/status/932757853504339968 that “Yup. I was able to sneak that past management”. I then asked “I wonder if you ever talked to Larry/Sergey.” and Brendan then answered that Andreas didn’t of course. I wonder what would have happened if they did.

https://pagefair.com/blog/2017/gdpr_risk_to_the_duopoly/ has some information on the effect of EU GDPR on Google ads. Notice that AdWords comply if all “personalization” features are removed for example. This included things like “remarketing”. I suspect that AdWords when it was first created in 2000 did not have these features. Other features like “remarketing lists for search ads” are also listed as not compliant, which was of course probably added later too. There was also the infamous cookie law that required notification for placing cookies, which was not that effective but a major step in the direction given that most ad tracking (including DoubleClick) were based on cookies. Google’s implementation of GDPR caused some concerns with publishers (http://adage.com/article/digital/tensions-flare-google-publishers-gdpr-looms/313592/), and some publishers blocked EU IP addresses in response to GDPR.

Data breaches are also a problem. The AOL search data breach from 2006 is pretty famous. The data was “anonymized” but the search terms was often enough to deanonymize users. Ad tracking data is likely similar, including browsing history and the like. Anonymizing data is a useful technique to avoid accidental abuse, but it is impossible to anonymize most personal data in a way that prevent all abuse. For example, various techniques for anonymizing IP addresses and MAC addresses has been developed, including hashing and truncation. Of course, the more data that is consolidated and collected, the higher the risk and impact of a breach.

Of course, it is worth noting that Google/DoubleClick isn’t the only one involved in the ad bubble (though DoubleClick was one of the first to do ad tracking I think). I think Taboola is often considered even worse than Google for example. The same fundamental problems with tracking however tends to apply to all of the ad networks. Some of the worse ones may use browser fingerpointing via things like JavaScript, which is even worse than the tracking via cookies that is most commonly used. Browser fingerpointing is generally difficult to prevent on the browser side, but it is so famous that the WHATWG HTML spec mentions it and marks the parts of the spec where there is a risk. For example the list of browser plugins (navigator.plugins in JavaScript) could be used at one point (in Firefox it used not to be sorted so it would be unique for each user, which made the fingerpointing even easier), but fortunately plug-ins are dying off anyway because of other problems. EFF created Panopticlick which illustrated some of the fingerpointing that was possible, and other examples that became famous included Evercookie by Samy Kamkar. To make things worse, many plugins like Flash had their own cookies as well (though browsers have been getting better at clearing them and Flash is becoming obsolete). It is also worth noting that the current tracking ads are not the only kind of web advertising. There are so-called “first-party” and “third-party” ads and cookies. Example of first-party ads includes Twitter and Reddit ads. Example of third-party ads includes DoubleClick and Taboola ads. First-party ads don’t have the issues described here, though may still have other issues.

Recently, Google’s ad blocking and “better ads” (including so-called Better Ad Alliance) involves annoying ads, but don’t fix the fundamental issues described here. Apple’s ad blocking targets retargeting by limiting the life of cookies for example (making them less effective for tracking), but does not change the display of ads or make ads less annoying (for example, autoplay video ads are pretty famous as well, especially with Flash).

Now, fixing the problems might be difficult. Obviously it would affect not only shareholders but pretty much everyone else if Google completely got rid of tracking ads. This includes sites depending on Google ads for revenue as well as Google itself. One example is Sun, which was losing money (net loss) in 2008-2009 partly because of the open source efforts. Ultimately they have to be bought out by Oracle, and they made Solaris closed source soon afterwards for example. A current example is WeWork, which is increasing revenue but not fast enough to make up for increasing expenses: https://www.zerohedge.com/markets/wework-bailout-could-be-imminent-cash-runs-out. In general larger companies have more expenses than smaller ones, and not just employees but buildings etc. For example, the more employees Google have, the more housing have to be created in Mountain View or otherwise prices would go up. Similarly, rising salaries leads to higher housing prices. This is not limited to Google of course.

Google in 2015 hired Ruth Porat as CFO to bring financial discipline to Google. This included cutting unprofitable projects, especially “Google X” research projects and failed projects like Google Glass. According to https://www.bloomberg.com/news/features/2016-12-08/google-makes-so-much-money-it-never-had-to-worry-about-financial-discipline, one of the things they did was “to force the Other Bets to begin paying for the shared Google services they used”. It is probably reasonable to suspect that the increase in ad revenue due to DoubleClick etc is part of why they were able to start so many of these projects in the first place. One recent example is the recent changes in pricing of of Google Maps, mentioned in https://www.inderapotheke.de/blog/farewell-google-maps

For Mozilla, a good example to illustrate the problems with funding browser development is the Opera browser. It was founded in 1995 in Norway. First browser was released in 1996. It IPOed in 2004. The browser used its own engine and it had a lot of unique features, like relatively good CSS support early on (unlike Netscape 4 at the time which famously had relatively poor support and was a problem for web developers for years). At first it was officially a paid browser with a trial version (like Netscape was before 1998), but later they used ads (choices included banner ads or text-based Google ads) for non-paying customers. They eventually signed a search deal with Google which removed the ads and instead just used Google as the default search engine (like Mozilla’s). Of course, there wasn’t much profit margin in a web browser, and so they had to cut costs to keep stocks and quarterly earnings going up (so planning for the future was hard for example). It was strong in the mobile world before WebKit became dominant there though (before things like iPhone and Android and when things like WML was common) and may still be strong in some embedded applications, with products like Opera Mini that was basically remote rendering of web pages (useful when devices had less processing power). Opera never had much market share (though it had plenty of fans back in the day), and in the end Opera had to switch to Chromium (with the Blink engine) instead of their own engine and codebase in the desktop browser (though they did release last updates for the old one that included for example TLS enhancements). Opera was eventually sold to a Chinese consortium, which eventually renamed the company Otello. The founders eventually started the Vivaldi browser, which is also based on Chromium/Blink but has many differences. In contrast, the Mozilla Foundation was created as a non-profit organization in around 2003 as the old Netscape was dying off with AOL’s help (AOL bought Netscape in 1998 BTW). It owns a for-profit Mozilla Corporation for tax reasons (non-profits are not subject to taxes that for-profits have in the US). I think the corporation owns the search deals like Yahoo and Google for example. You can still donate to the Mozilla Foundation today. Mozilla Firefox 1.0 was released in 2004 after the Foundation was created (and after the branded Netscape 6/7 releases) and quickly took market share from the dominant IE6 that was stagnating the web (by being virtually unchanged for a long time without any real development) and was also well known for security problems like the Download.Ject attacks. MS was forced to respond with IE6 in Windows XP SP2 which in addition to security enhancements also added a few features like pop-up blocking and IE7 which finally bought real enhancements to the core engine that help web developers (especially in places like CSS). The old Netscape search deal with Google dates back to 1999 (obviously Netscape.com was Netscape’s home page at the time), and the success of the deal probably inspired the later Google search deal that Mozilla did.

One alternative to the current tracking ads is called Basic Attention Token. Basic Attention Token is based on the Ethereum cryptocurrency and blockchain (this is like Bitcoin but it is GPU minable for example using a different algorithm and it is one of the most popular GPU minable coins). It was created by the Brave browser, which supports it directly. It is intended to “directly measure” attention. “Attention” is measured on the client side (based on local browser history) and tokens are rewarded for them (called “basic attention metrics”), eliminating the privacy issues. This is often called a “zero-knowledge proof”. There are also other benefits like reducing so-called “click fraud” that hurts advertisers that is a common problem with current ads and removing the need for intermediaries that do tracking like DoubleClick and Taboola (so advertisers also gets more of the money too since they don’t have to pay them). Many other kinds of tokens and “smart contracts” has been created on Ethereum, and so-called initial coin offerings (ICOs) has been the most common use of Ethereum (helping the price to rise). Of course, there is little to no regulation for them at the moment which results in many scam ICOs too (they tends to raise money very quickly, partly since it is so easy to give coins to them).

There are also systems for paying authors directly like Patreon, though it is also trivial to use PayPal or cryptocurrencies for this purpose (though also harder to donate). Patreon allow money to be “pledged” to specific authors. There are also many kinds of “paywalls” implemented on websites, many of which has their own problems like relying on cookies to track how many times people visited a site (to limit the number before the user have to pay of course) or making it difficult to post links on Slashdot, Reddit, and Hacker News that often dislike paywalls for obvious reasons (though some are better than others).

Of course, the problems described in the essay as well as other problems of ads (including annoyance and performance cost of ads) led to more use of ad blockers, which also have their own history. Banner ad blindness has also been known for years now, and Google’s ads tends to be simple text-based ads at least initially. One of the first type of blocking was popup blockers, and Google was taking a stand against popups in the early days (they were well known to be annoying). They became common in browsers by the mid-2000s (even IE6 in XP SP2 had them). At one point circa 2002, AOL/Netscape was disabling the popup blocker from Netscape-branded Mozilla releases (at one time there was the Mozilla source code/binaries and the official Netscape-branded builds based on the Mozilla source). Of course after user backlash they backed off from doing so. This was long before Google bought DoubleClick for example. Later more sophisticated ad and cookie blockers like AdBlock Plus and uBlock Origin came out as add-ons to browsers like Firefox, and one is built into Brave of course (along with BAT as a replacement for the lost ad revenue). Many other browsers have also similar tracking protection including Firefox and IE, but they just disable them by default and may require that ad blocking lists (such as EasyList) be manually loaded. Of course, some sites has been attempting to detect ad blockers and ask users to turn them off (even Ars Technica did it at one point though it only lasted one day), which is also ineffective and not a good idea for obvious reasons (including the fact that it reflects badly on the sites that are doing it). Lawsuits against ad blockers was also tried in some countries, which was obviously mostly unsuccessful (like a lawsuit against AdBlock Plus in Germany by publishers there).

Thursday, October 31, 2019

486SX history

The original 486SX-20 was introduced in early 1991, and according to Alex U. Witkowski was a 486 on the "P648" process with the "Disable Floating Point" bonding option used to disable the FPU. The "487SX" was merely a 486DX with an extra key pin to prevent incorrect installation and another pin that disabled the 486SX. Robert Collins says that "The market is supposed to perceive the '487SX as a coprocessor. A coprocessor can obtain a much higher profit margin than a CPU.", and described it as "a brainchild of marketing people. Even many of the engineers at Intel think it is a stupid idea, and deplore the deceptive marketing technique. The same holds true for many of the field representatives, they think it is a sleazy marketing practice." Red Hill described it as "any 386SX-33 was faster, and even a good 386SX-25 would have run it close.".

But the idea was not bad. From https://groups.google.com/d/msg/comp.sys.ibm.pc.hardware/mNTwkNNdKpo/AVlseoZ0UQ0J, by August 1991 the socket was "supposed to be called "the performance upgrade socket."" From http://www.os2museum.com/wp/486-overdrive/: "In early 1992, Intel ran a teaser ad campaign promising wonders to come in the shape of upgrade processors pluggable into the upgrade socket of 486SX systems." The speed of the 486SX chip was increases to 25 Mhz and later 33 Mhz. With the "P650" shrink and the removal of the FPU from the die completely, the 486SX processor could also be packaged in a surface mount PQFP package (which was cheaper than the PGA package previously used). Intel in mid-1992 sold the new surface mount 486SX at a relatively cheap price to OEMs ($120 per 1000 for 486SX-25 in mid-1992 I think) allowing it to become mainstream. With pre-built systems like Dell, Compaq, and Packard Bell it was very popular. The 487SX socket was renamed the "OverDrive" socket, and "clock doubling" allowed 486DX2 (and later 486DX4) OverDrive processors to be sold by Intel. According to the OS/2 Museum, "486 OverDrives, although never cheap, brought a sizable performance gain in a timely manner with 100% compatibility and minimum hassle."

Monday, August 5, 2019

RC4/RC2 exercises

These are similar to the Cryptopals exercises, but are specific to RC2 and RC4.

On RC4:
These exercises will use a 256-byte (2048-bit) RC4 variant, with "128-bit" RC4 being the same 16 bytes repeated.
Exercise 1: Implement 256 byte RC4. Start by testing with the same 16 bytes repeated.
Exercise 2: Read https://www.rc4nomore.com/ and implement the attack. Compare the results of a random 256 byte value with the same 16 byte value repeated.
Exercise 3: With RC4 with the same 16 bytes repeated, make a few bytes a fixed value like 0xFF and the other bytes random. See which bytes of the key stream are affected. Do the same with the full 256 bytes RC4 and compare.
Exercise 4: Make the last three bytes a counter like WEP, and see which bytes of the keystream are affected.

Now, on RC2:
Exercise 1: Implement RC2 in counter mode.
Exercise 2: Implement the attack in https://www.schneier.com/academic/paperfiles/paper-relatedkey.pdf against the keystream.
Exercise 3: According to https://www.schneier.com/academic/smime/download.html : "However, if you shift the whole table over a byte, and then change a couple of bytes, you now have the table for a different key"
See how this affect the mixing/mashing round of RC2. And read the paper https://www.cryptrec.go.jp/exreport/cryptrec-ex-1042-2001.pdf . Create a distinguisher for the keystream based on this information.
Excecise 4: Implement the Schneier attack with 64-bit keys instead of 40-bit keys using something like a GPU, as used in for example Lotus Notes.

Feel free to send answers via email to yuhongbao_386 at hotmail dot com.

Wednesday, October 24, 2018

Windows 2000 SP servicing history

March 2003: MS03-007 released with only the ntdll.dll file, and there was a problem on Windows 2000 SP2 with certain versions of ntoskrnl.exe.
April 2003: MS03-013 was released with additional files to solve these problems, which also became standard for all the hotfixes.
June 2003: Windows 2000 SP4 released toward the end of June, and Windows 2000 SP2 support was supposed to end at the same time.
July 2003: MS03-026 was released just after Windows 2000 SP4.
August 2003: The Blaster worm hits, and people found out that MS03-026 could still be installed on Windows 2000 SP2. (This is not true for all of them during the period)
September 2003: MS offers "Custom Support" for Windows 2000 SP2 (originally until the end of the year), and also released MS03-039 that also works on Windows 2000 SP2 as an "exception". (https://groups.google.com/forum/#!search/custom$20support$20for$20Windows$202000$20sp2/microsoft.public.win2000.advanced_server/r-BWxTnT8Zk/Y6n5AzomkVoJ)
October 2003: MS officially extends support for Windows 2000 SP2 to June 2004 and introduced "Patch Tuesday". Trivia: https://docs.microsoft.com/en-us/security-updates/securitybulletins/2003/ms03-045 has a file date of August 2003.
April 2004: MS04-011 released with a long list of files, including even NetMeeting! (though at least RPC was a separate MS04-012) Trivia: The bug used by Sasser that was patched was reported by eEye in October 2003 (http://web.archive.org/web/20050307234702/http://www.eeye.com/html/Research/Advisories/AD20040413C.html), but not many patches for Windows 2000 was released in the meantime.
June 2004: Windows 2000 SP2 support was ended.
October 2004: MS04-032 was released with a much shorter list. For example, because of the end of support of Windows 2000 SP2, there was no ntdll.dll file.
June 2005: Windows 2000 SP3 support was ended, with six-month "Custom Support" available. From then, each patch/hotfix only patched a few files (at least the non Custom Support versions).

Monday, September 10, 2018

Google DoubleClick Mozilla overview (second draft)

Note: Notice the malware part has been removed.

There are many problems with web advertising in general, including annoying features like autoplay video ads and pop-ups and also problems like “click fraud” which matter to advertisers. This essay will however be focusing on the privacy issues with some of the kinds of ads that Google produces and the history behind them, and why Larry/Sergey didn’t consider them when buying DoubleClick for example. Also discussed is Mozilla and how they are involved (like in the Google/Mozilla search deal), including Brendan Eich who created JavaScript that eventually left Mozilla to found Brave. There is also the difficulty of solving these issues, which will also be discussed. Of course, advertising is not limited to the web and there are often many benefits and risks (like deceptive advertising) to advertising in general, most of which will not be discussed here.

The history of Google and its advertising will be discussed first. Google was founded in 1998 by Larry Page and Sergey Brin while at Stanford, and took VC funding from KP and other partners. Google was founded with the search engine (with the PageRank algorithm) as the first product, but later added products like Gmail. Eric Schmidt was bought in as CEO in 2001 and recently left but are still on the board. Google IPOed in 2004, using dual class stock for example.

The first kind of ads that Google did was AdWords, dating back to 2000. AdWords was based on search keywords, and the text ads were displayed at the top of the search results (labelled as ads) and were relatively simple. Typically the highest bidder was shown, and the advertiser paid Google when the user clicked on the ads. AdWords involved relatively little tracking at least initially and will not be mentioned much here. At this time Google was also taking a stand against popup ads.

AdSense was ads shown on webpages themselves, based on JavaScript. It was invented in 2003. AdSense at least initially was based on keywords on webpages themselves (which Google fetched from its cache for example), which advertisers could bid on. Like with AdWords, Google and websites gets paid when users click on the ads. It also involved little tracking at least initially.

Google bought DoubleClick in 2008. DoubleClick was invented in 1995. It made more sophisticated ad tracking via cookies and the like famous (which was often called “retargeting”), and the problems will be described here. DoubleClick themselves called its product “Dynamic Advertising Reporting and Targeting” at one point for example. Initially DoubleClick was mostly banner ads, and many users developed so-called banner-blindness from these ads. Cookies were itself invented in Netscape in 1994, and the IETF group that developed RFC 2109 and 2965 already know that tracking with “third-party cookies” were a problem (and it was mentioned in these RFCs). Those attempts at IETF cookie standards ultimately failed partly because they were incompatible with current browsers, and led to RFC 6265 that is closer to how cookies are implemented in browsers today. It also led to W3C P3P which was famously implemented in IE6, which also of course failed (partly because it was too complex) and was removed from Windows 10 but was an attempt to get the tracking under control.

Google bought Urchin in 2005, turning it into Google Analytics. Urchin was founded in 1998. Initially its product was to analyze web server log files, with JavaScript tags being added in Urchin 4 (called “Urchin Traffic Monitor”). The hosted version based entirely on JavaScript that was created later was initially called “Urchin on Demand” and was introduced in 2004. Of course, the original software that was sold receive little attention once Google bought it and it became Google Analytics and it was discontinued in 2012.

One problem with the ads is tracking. The current economy is a debt-based economy based on consumption. The more money advertisers can extract from consumers, the more they are willing to spend on ads. This results in tracking getting creepier and creepier, and encourage consolidation of data for example. Most of the ad tracking is called “retargeting” and it is often based on cookies and JavaScript, and DoubleClick was one of the first to do it. All ads encourages consumption by definition, but tracking ads are particularly bad for these reasons.

For example, DoubleClick has cross-device retargeting introduced in 2015. Of course, it is limited to logged-in users tracking via the user account at least initially (which any websites can do), but it illustrated the trend. Google changed the privacy policy to allow Google accounts to be used for such logged-in user tracking in 2016. Recently Google signed an agreement with MasterCard to obtain credit card sales data. Of course, credit cards directly ties an increase in debt to consumer spending, which in turn can go to Google as ad dollars.

According to http://adage.com/article/digital/google-turns-behavioral-targeting-beef-display-ads/135152/, “In December 2008 Google added DoubleClick cookies to AdSense ads”, tying the DoubleClick cookie-based tracking (dating long before Google bought it) to AdSense. I assume that AdSense tracking probably did not exist before Google bought DoubleClick. Google Analytics added AdWords and AdSense support in 2009. In 2012, Google changed its privacy policy to allow data to be consolidated, which was also very controversial. In 2014, Google Analytics integrated with DoubleClick, allowing things like remarketing lists to be shared according to https://analytics.googleblog.com/2014/05/google-analytics-summit-2014-whats-next.html. Remarketing lists are basically lists of website visitors that can be uniquely identified by things like cookies, and it is one of the ways of targeting ads to users. It can probably be assumed that sharing remarketing lists basically ties the tracking together. Sharing of Google Analytics remarketing lists with AdWords was introduced in 2015, along with linking of Google Analytics and AdWords “manager” accounts, according to https://adwords.googleblog.com/2015/11/share-google-analytics-data-and.html. “Google Analytics 365” came in 2016, according to https://analytics.googleblog.com/2016/03/introducing-google-analytics-360-suite.html. Remarketing lists for search ads was introduced in 2012 and was tied to Google Analytics in 2015 (though not all data from Google Analytics can be used). It allowed different search ads to be targeted to different visitors based on cookie-based tracking on websites (with sites using special tags for this purpose). For example, you can show different search ads to visitors that visit the site every day.

Of course, users often has little control and benefit over storage of user data and ad retargeting by trackers too, especially when many parties are involved. This was mentioned during the Google/DoubleClick acquisition for example. Of course, some provides more control than others, such as AdChoices for example. AdChoices was an attempt at self-regulation for ad publishers, and used an icon to indicate that data was being collected. You can click the icon to display the privacy policy for the ads or opt-out of ad targeting. It was not the same as blocking ads completely though, and did not solve all of the problems of ads either. There was also an attempt at a Do-Not-Track HTTP header, which was probably too simple (and thus was also very vague in its meaning) and there was no guarantee that a site would comply either obviously since it was just an HTTP header (IE11 enabling it by default was also controversial and Windows 10 no longer does so by default).

Some of the problems with the opt-out methods are similar to the problems of a national “do not email” registry proposed in the US CAN-SPAM Act of 2003 for spam messages, and such lists to “opt out” of spam are widely considered to be unacceptable in general. Even “opt-out” or “unsubscribe” links in spam is widely considered untrustworthy for obvious reasons, though legitimate mailing lists will also have them. That idea came from the similar “do not call” registry for telephone marketing (to stop annoying marketing phone calls which were considered more annoying than spam of course), but email and internet advertising ended up being very different from telephone calls making these laws difficult to enforce. It is far easier to send an email than to call someone for example, and email is also more difficult to trace to the origin especially given that the Internet is global. FTC has a report at https://www.ftc.gov/reports/can-spam-act-2003-national-do-not-email-registy-federal-trade-commission-report-congress describing these problems (it was a report to Congress that was required by CAN-SPAM), including the possibility that such a list can be abused by spammers for example. “Closed-loop opt-in” using confirmation emails for mailing lists on the other hand is widely accepted, but it is not mentioned in CAN-SPAM. One example includes the tracking of “opt-out” using cookies in things like AdChoices, which themselves can be used for other purposes obviously.

There are some reasons why these problems were not apparent (for example to Larry/Sergey) when Google bought DoubleClick, or when remarketing lists was shared, or for that matter when Urchin became Google Analytics and the data was merged with ad data.

The difficulty of researching things like the tying of remarketing lists during the writing of this essay shows some of the problems. It seems that no one cared about the privacy implications when remarketing lists in AdSense and DoubleClick was shared for example. In many cases, advertisers managed “remarketing” lists of “anonymous” visitors that was being tracked by cookies from a central console without thinking of the privacy problems, treating visitors almost as numbers. This ties in with the idea of treating people as “consumers” to be extracted from that are also fundamentally flawed. Another example of this is AOL that famously made it difficult to cancel at one point, partly because measuring “customer loyalty” as numbers to be extracted from consumers was part of their culture. To make it worse, they once charged consumers by the time spent on AOL, so the longer they stay the more revenue they made.

The Google-DoubleClick acquisitions was also controversial, with EPIC, CDD and US PIRG for example filing complaints with the FTC in April 2007, a “first supplement” to the complaint in June 2007, and a “second supplement” in September 2007. There was also a Senate hearing on Sept 27, 2007 with testimonies from a variety of sources regarding that issue. One of the concerns back then was aggregation of tracking data and lack of control by users, though other issues unrelated to ads like storage of IP addresses by search engines were also mentioned. Ultimately it took the FTC until the end of 2007 to approve the deals, after a “second request”.

Before the Google-DoubleClick acquisition, DoubleClick was once planned to merge with Abacus. FTC blocked the merger because of the privacy problems and it never happened. Abacus Direct seems to be a market researching company targeting consumer buying behavior. As a result, Abacus had a lot of personal info about consumers, and there were concerns that this data could be merged with DoubleClick data and may be used to deanonymize them.

In 2012, Jonathan Mayer discovered that Google used some tricks in JavaScript to allow tracking in Safari. It involved how Google was able to bypass cookie blocking policy in Safari by using an invisible form to fool Safari into allowing cookies. FTC fined Google $22.5 million over this behaviour, and more recently there has been lawsuits about it in the UK. There has been also a class action lawsuit about this in the US. Google argued the tracking was unintentional at the time and that it was related to Google+ “Plus” buttons on DoubleClick ads (for logged-in users I believe). It is probably worth mentioning here that a lot of these kind of buttons (like Facebook’s Like buttons, to name another example) do their own tracking too (they generally worked by using IFRAMEs to the website involved), and this has been well known for years. For example, according to https://www.technologyreview.com/s/541351/facebooks-like-buttons-will-soon-track-your-web-browsing-to-target-ads/ Facebook started using the tracking Like buttons to target ads in 2015. I think the Facebook-WhatsApp acquisition story is also famous by now BTW, including how they eventually allowed data sharing between the two (presumably after years of losses). It is worth mentioning how even the WhatsApp founders now recommend deleting Facebook (especially after the Cambridge Analytica debacle).

Now, let’s discuss Mozilla. Brendan Eich was the creator of JavaScript at Netscape when it was invented in 1995 and was the CTO of Mozilla Corporation from 2005 to 2014. After he stepped down from Mozilla in 2014 (just after he became CEO and after bad publicity stemming from his political donations about things like gay marriage), he was one of the founders of Brave with its Basic Attention Token etc. Andreas Gal joined Mozilla in 2008 and was the CTO from 2014 until 2015 when he left Mozilla.

Mozilla signed the Google search deal in 2004, before Google even IPOed (let alone things like DoubleClick). Mozilla switched to a Yahoo search deal in late 2014 (by then the search engine was based on MS’s Bing I think), which was part of Marissa Mayer’s attempt to fix Yahoo before it was sold to Verizon. Recently Mozilla switched back to Google as the default search engine.

BrendanEich mentioned in https://twitter.com/BrendanEich/status/932747825833680897 that “It's not a simple Newtonian-physics (or fake economics based on same) problem.” This was about the history of the Google search deal with Mozilla and the fact that it was signed before Google IPOed (when it was being funded by VCs). It is worth mentioning here that Google was founded in 1998 when the now famous dot-com bubble was at the peak and VC funding was common (allowing many startups to grow fast which was considered more important than profits). Many other dot-com startups at the time had problems and ended up failing when the bubble collapsed around 2001. It is worth mentioning that the DoubleClick acquisition dates back to 2007 which was just before the housing bubble famously collapsed leading to another recession, and that bubble probably started just after the dot-com bubble.

BrendanEich mentioned in https://twitter.com/BrendanEich/status/932473969625595904 that “A friend said in 2003 that Sergey declared G would not acquire display ads & arb. Search vs. Display as that would be “evil”.”, before Google even IPOed (in 2004). Unfortunately no other source was given.

It was mentioned on Twitter that Firefox OS enabled tracking protection by default unlike desktop Firefox. It was mentioned in https://twitter.com/andreasgal/status/932757853504339968 that “Yup. I was able to sneak that past management”. I then asked “I wonder if you ever talked to Larry/Sergey.” and Brendan then answered that Andreas didn’t of course. I wonder what would have happened if they did.

https://pagefair.com/blog/2017/gdpr_risk_to_the_duopoly/ has some information on the effect of EU GDPR on Google ads. Notice that AdWords comply if all “personalization” features are removed for example. This included things like “remarketing”. I suspect that AdWords when it was first created in 2000 did not have these features. Other features like “remarketing lists for search ads” are also listed as not compliant, which was of course probably added later too. There was also the infamous cookie law that required notification for placing cookies, which was not that effective but a major step in the direction given that most ad tracking (including DoubleClick) were based on cookies. Google’s implementation of GDPR caused some concerns with publishers (http://adage.com/article/digital/tensions-flare-google-publishers-gdpr-looms/313592/), and some publishers blocked EU IP addresses in response to GDPR.

Data breaches are also a problem. The AOL search data breach from 2006 is pretty famous. The data was “anonymized” but the search terms was often enough to deanonymize users. Ad tracking data is likely similar, including browsing history and the like. Anonymizing data is a useful technique to avoid accidental abuse, but some kinds of data are hard to anonymize in a way that prevent all abuse. For example, various techniques for anonymizing IP addresses and MAC addresses has been developed, including hashing and truncation. Of course, the more data that is consolidated and collected, the higher the risk and impact of a breach.

Of course, it is worth noting that Google/DoubleClick isn’t the only one involved in the ad bubble (though DoubleClick was one of the first to do ad tracking I think). I think Taboola is often considered even worse than Google for example. The same fundamental problems with tracking however tends to apply to all of the ad networks. Some of the worse ones may use browser fingerpointing via things like JavaScript, which is even worse than the tracking via cookies that is most commonly used. Browser fingerpointing is generally difficult to prevent on the browser side, but it is so famous that the WHATWG HTML spec mentions it and marks the parts of the spec where there is a risk. For example the list of browser plugins (navigator.plugins in JavaScript) could be used at one point (in Firefox it used not to be sorted so it would be unique for each user, which made the fingerpointing even easier), but fortunately plug-ins are dying off anyway because of other problems. EFF created Panopticlick which illustrated some of the fingerpointing that was possible, and other examples that became famous included Evercookie by Samy Kamkar. To make things worse, many plugins like Flash had their own cookies as well (though browsers have been getting better at clearing them). It is also worth noting that the current tracking ads are not the only kind of web advertising. There are so-called “first-party” and “third-party” ads and cookies. Example of first-party ads includes Twitter and Reddit ads. Example of third-party ads includes DoubleClick and Taboola ads. First-party ads don’t have the issues described here.

Recently, Google’s ad blocking and “better ads” (including so-called Better Ad Alliance) involves annoying ads, but don’t fix the fundamental issues described here. Apple’s ad blocking targets retargeting by limiting the life of cookies for example (making them less effective for tracking), but does not change the display of ads or make ads less annoying (for example, autoplay video ads are pretty famous as well, especially with Flash).

Now, fixing the problems might be difficult. Obviously it would affect not only shareholders but pretty much everyone else if Google completely got rid of tracking ads. This includes sites depending on Google ads for revenue as well as Google itself. One example here is that both Microsoft and Novell used Client Access Licenses (CALs). CALs (called node licenses by Novell I think) are per user or per computer licenses common in server software like NetWare and Windows Server. Of course, when Novell moved to Linux, it was open source software that didn’t have CALs (Like with Red Hat, the company only paid for support) meaning that Novell could not expect the same level of revenue as in the NetWare days (they moved to Linux by buying SUSE). The story about Sun’s open source projects and Jonathan Schwartz (the former “ponytail” CEO), and how they eventually had to sell to Oracle is probably pretty famous as well (some examples of open source projects from that period included OpenSolaris, OpenOffice, and OpenJDK). The ad bubble will probably not last forever though. Bubbles like this one is part of the problem of the current debt-based economy (the main problem is that it allows almost infinite amounts of “debt” in US dollars since we got off the gold standard in 1971, including most commonly government debt), especially it encourage extracting as much money as possible from so-called “consumers” (another example is Adobe Creative Cloud subscriptions and how Adobe’s stock price rose after it was implemented).

Google in 2015 hired Ruth Porat as CFO to bring financial discipline to Google. This included cutting unprofitable projects, especially “Google X” research projects and failed projects like Google Glass. According to https://www.bloomberg.com/news/features/2016-12-08/google-makes-so-much-money-it-never-had-to-worry-about-financial-discipline, one of the things they did was “to force the Other Bets to begin paying for the shared Google services they used”. It is probably reasonable to suspect that the increase in ad revenue due to DoubleClick etc is part of why they were able to start so many of these projects in the first place. One recent example is the recent changes in pricing of of Google Maps, mentioned in https://www.inderapotheke.de/blog/farewell-google-maps

For Mozilla, a good example to illustrate the problems with funding browser development is the Opera browser. It was founded in 1995 in Norway. First browser was released in 1996. It IPOed in 2004. The browser used its own engine and it had a lot of unique features, like relatively good CSS support early on (unlike Netscape 4 at the time which famously had relatively poor support and was a problem for web developers for years). At first it was officially a paid browser with a trial version (like Netscape was before 1998), but later they used ads (choices included banner ads or text-based Google ads) for non-paying customers. They eventually signed a search deal with Google which removed the ads and instead just used Google as the default search engine (like Mozilla’s). Of course, there wasn’t much profit margin in a web browser, and so they had to cut costs to keep stocks and quarterly earnings going up (so planning for the future was difficult for example). It was strong in the mobile world before WebKit became dominant there though (before things like iPhone and Android and when things like WML was common) and may still be strong in some embedded applications, with products like Opera Mini that was basically remote rendering of web pages (useful when devices had less processing power). Opera never had much market share (though it had plenty of fans back in the day), and in the end Opera had to switch to Chromium (with the Blink engine) instead of their own engine and codebase in the desktop browser (though they did release last updates for the old one that included for example TLS enhancements). Opera was eventually sold to a Chinese consortium, which eventually renamed the company Otello. The founders eventually started the Vivaldi browser, which is also based on Chromium/Blink but has many differences. In contrast, the Mozilla Foundation was created as a non-profit organization in around 2003 as the old Netscape was dying off with AOL’s help (AOL bought Netscape in 1998 BTW). It owns a for-profit Mozilla Corporation for tax reasons (non-profits are not subject to taxes that for-profits have in the US). I think the corporation owns the search deals like Yahoo and Google for example. You can still donate to the Mozilla Foundation today. Mozilla Firefox 1.0 was released in 2004 after the Foundation was created (and after the branded Netscape 6/7 releases) and quickly took market share from the dominant IE6 that was stagnating the web (by being virtually unchanged for a long time without any real development) and was also well known for security problems like the Download.Ject attacks. MS was forced to respond with IE6 in Windows XP SP2 which in addition to security enhancements also added a few features like pop-up blocking and IE7 which finally bought real enhancements to the core engine that help web developers (especially in places like CSS). The old Netscape search deal with Google dates back to 1999 (obviously Netscape.com was Netscape’s home page at the time), and the success of the deal probably inspired the later Google search deal that Mozilla did.

One alternative to the current tracking ads is called Basic Attention Token. Basic Attention Token is based on the Ethereum cryptocurrency and blockchain (this is like Bitcoin but it is GPU minable for example using a different algorithm and it is one of the most popular GPU minable coins). It was created by the Brave browser, which supports it directly. It is intended to “directly measure” attention. “Attention” is measured on the client side (based on local browser history) and tokens are rewarded for them (called “basic attention metrics”), eliminating the privacy issues. This is often called a “zero-knowledge proof”. There are also other benefits like reducing so-called “click fraud” that hurts advertisers that is a common problem with current ads and removing the need for intermediaries that do tracking like DoubleClick and Taboola (so advertisers also gets more of the money too since they don’t have to pay them). Many other kinds of tokens and “smart contracts” has been created on Ethereum, and so-called initial coin offerings (ICOs) has been the most common use of Ethereum (helping the price to rise). Of course, there is little to no regulation for them at the moment which results in many scam ICOs too (they tends to raise money very quickly, partly since it is so easy to give coins to them).

There are also systems for paying authors directly like Patreon, though it is also trivial to use PayPal or cryptocurrencies for this purpose (though also harder to donate). Patreon allow money to be “pledged” to specific authors. There are also many kinds of “paywalls” implemented on websites, many of which has their own problems like relying on cookies to track how many times people visited a site (to limit the number before the user have to pay of course) or making it difficult to post links on Slashdot, Reddit, and Hacker News that often dislike paywalls for obvious reasons (though some are better than others).

Of course, the problems described in the essay as well as other problems of ads (including annoyance and performance cost of ads) led to more use of ad blockers, which also have their own history. Banner ad blindness has also been known for years now, and Google’s ads tends to be simple text-based ads at least initially. One of the first type of blocking was popup blockers, and Google was taking a stand against popups in the early days (they were well known to be annoying). They became common in browsers by the mid-2000s (even IE6 in XP SP2 had them). At one point circa 2002, AOL/Netscape was disabling the popup blocker from Netscape-branded Mozilla releases (at one time there was the Mozilla source code/binaries and the official Netscape-branded builds based on the Mozilla source). Of course after user backlash they backed off from doing so. This was long before Google bought DoubleClick for example. Later more sophisticated ad and cookie blockers like AdBlock Plus and uBlock Origin came out as add-ons to browsers like Firefox, and one is built into Brave of course (along with BAT as a replacement for the lost ad revenue). Many other browsers have also similar tracking protection including Firefox and IE, but they just disable them by default and may require that ad blocking lists (such as EasyList) be manually loaded. Of course, some sites has been attempting to detect ad blockers and ask users to turn them off (even Ars Technica did it at one point though it only lasted one day), which is also ineffective and not a good idea for obvious reasons (including the fact that it reflects badly on the sites that are doing it). Lawsuits against ad blockers was also tried in some countries, which was obviously mostly unsuccessful (like a lawsuit against AdBlock Plus in Germany by publishers there).