Performing Data on the Internet

December 11, 2018

For our Performing the Internet final, Dom Chang and I worked together to build an interactive piece that can be performed live. It expands on his conceptual piece that asks the audience a series of yes/no questions, developing it into a much more complete experience.

We worked together to outline the experience, using the data to show colors play sounds on decentralized devices. I focused mostly on creating the interface for the questionnaire, while Dom focused more on the server side stuff, which was also overlapping with his Live Web class. So we had good overlap with other classes. I had a delight in learning HTML and CSS more completely, as well as using Jquery listen to modify the page. Check out the complete code here (it says PTI final but they’re the same project)!

Screen Shot 2018-12-10 at 1.10.17 PM.png

Dead Languages

December 6, 2018

For this project, my partners Keerthana, Hadar, and myself chose to look at extinct languages. Wikipedia has a great list here. The assignment was for mapping, and as these languages were from all across the globe, we figured it would be an interesting and appropriate task.

After we cleaned the data - reducing and converting times such as Dec 6 2016, and mid-1900’s, to a single year - we had a nice list from which to build some visuals. This would help us understand what we were really working with. This is what we came up with:

This shows clearly that the US leads the race - from the data we have at our disposal. There are a number of ways to go from here, represent the data with an icon to show the death, add lines to make the data more legible. Put it on a map. Put it on… — This shows clearly that the US leads the race - from the data we have at our disposal. There are a number of ways to go from here, represent the data with an icon to show the death, add lines to make the data more legible. Put it on a map. Put it on a map that changes over time. And so forth. Source code here

We had a great time working on this, but boy does working with data take some time! We definitely want to develop this concept further. Especially looking into completing, or at least getting some verification on our data set itself. That is one aspect of this project that was eye opening, is the research aspect involved. Data sets cannot be trusted in and of themselves, and necessarily require intense verification. This we did not do, as we were concerned with getting to square one - which again, took quite a while, but was fun! And we learned a lot. Which is always the most important thing, especially while in school.

Archival Project

October 23, 2018

For this project I had a hard time finding usable data for climate - what I really want to work on - and instead used compared text transcripts of speeches between Barack Obama and Donald Trump.

This is initially due to the fact that I had downloaded their speeches to text files* from The MIller Center for my Pop-Up Windows initial concepts

* Barack Obama Donald Trump

I was then inspired by a Making Media Making Devices class to use python to resolve my initial parsing of the content. I had wanted to use python for some time, and it turned out to be remarkably simple!

This code allowed me to open the whole text, split into lines, then into words, strip white space, check if it’s longer than a certain length, push it to a new array of words, capitalize them for consistency, convert it into a set reduced to unique entries, and finally find overlap between each set and remove it from each set of words.

  
    from collections import Counter
import csv

rawDataDT = []

longestWord = 11

with open('dt2.txt','r') as f:
    for line in f:
        for word in line.split():
           word.strip();

           if len(word) >= longestWord:
               rawDataDT.append(word)
           else:
               pass

cap_wordsDT = [word.upper() for word in rawDataDT]

wordsCondensedDT = set(cap_wordsDT)

rawDataBO = []

with open('bo.txt','r') as f:
    for line in f:
        for word in line.split():
            word.strip()
            if len(word) >= longestWord:
                rawDataBO.append(word)
            else:
                pass


cap_wordsBO = [word.upper() for word in rawDataBO]

wordsCondensedBO = set(cap_wordsBO)

overlap = wordsCondensedBO.intersection(wordsCondensedDT)
uniquesDT = str(wordsCondensedDT.difference(overlap))
uniquesBO = str(wordsCondensedBO.difference(overlap))

print('///////////////////////////////////\
////////////////////////////////////')

f = open('dt-uniques.txt','w')
f2 = open('bo-uniques.txt','w')
f.write(uniquesDT)
f2.write(uniquesBO)
f.close()
f2.close()

  

This left me with a truly unique set of terms for each president gathered from a random sampling of around 50 thousand words. From here we can see what kind of people they really are.

Screen Shot 2018-10-15 at 9.42.06 PM.png

After printing a bunch of examples to the console, and making fun art sketches, I was struck with the fact that python put quotes around each entry, and didn’t remove punctuation. This was something I needed to resolve, and I figured google sheets would be the easiest for this form of manual manipulation.

First of all - quite obviously Mr. Donald had a bunch of nonsensical entries that I had to omit. I can’t count such entries as ‘Bureaucrats-and’ or ‘Trump-there’ as a unique word. Of course, Obama had the occasional one as well, poor Donald’s run-on sentences hurt his total uniqueness here.

Then after removing extranneous characters, I had to again filter for uniques, which reduced each persons entry again. In the end I was left with 285 entries for the Don, and 435 for Barry.

I didn’t remove pluralization or variations, since I wasn’t sure where to draw the line. This might be an interesting next step.

I then output this data to a tsv and csv file (it took me a second to figure out how to output propery so that I could read the files in p5).

Next, I used some p5.js code to measure word count by starting letter for each person. It turns out, as I expected, Obama has a larger vocabulary. But their starting character word use follows a similar pattern, interestingly enough.

I probably should have used an object, but I was having trouble figuring out the proper syntax.

  
    var tnbData;
var tWords = [];
var bWords = [];
var tVals = [];
var bVals = [];

function preload() {
  tnbData = loadJSON("tnb.json")
}

function setup() {
  createCanvas(500,500);
  background(20);
  textSize(7);
  textAlign(CENTER);
  noStroke();
  makeDicts();
  firstChars();
  pushChars();
  TrumpLines();
  ObamaLines();
  showWords();
  count()
}


function showWords() {
    setInterval(function(){
    var bI = int(random(bWords.length))+1
    var tI = int(random(tWords.length))+1

      var bWord = bWords[bI]
      var tWord = tWords[tI]
      push()
      fill(20)
      rect(0,0,500,100)
      textSize(40)
      textAlign(LEFT)
      fill(80,130,255,120)
      text(bWord,10,50)
      fill(255,50,150,120)
      text(tWord,10,90)
      pop();

  }, 2000);
}



function htmlElements(){

    let obamaTitle = createElement('h1', "OBAMA Words")

    for(var i = 0; i< bWords.length; i++) {
      let p = createElement('p', bWords[i],10)
    }

    let trumpTitle = createElement('h1', "TRUMP Words")

    for(var i = 0; i< tWords.length; i++) {
      let p = createElement('p', tWords[i],10)
    }

}

var baseline = 320;
var xOff = 25;

function ObamaLines(){

  var heightPrev;
  var xposPrev;

  for(var i = 0; i< bVals.length; i++) {
    fill(80,130,255,40)
    var xpos = (i * 15) + xOff
    var height = bVals[i]*(-3)
    rect(xpos, baseline, 10, height)
    fill(80,130,255,180)
    text(bVals[i],xpos+5,baseline+10)

    if(i>0){
      stroke(80,130,255,180)
      strokeWeight(4)
      line(xpos+5,height+baseline,xposPrev+5,heightPrev+baseline)
      noStroke()
    }
    heightPrev = height
    xposPrev = xpos
    }
  }

function TrumpLines(){
  var heightPrev;
  var xposPrev;
    for(var i = 0; i< tVals.length; i++) {
      fill(255,50,100,40)
      var xpos = (i * 15) + xOff
      var height = tVals[i]*(-3)
      rect(xpos, baseline, 10, height)
      fill(255,50,100,180)
      text(tVals[i],xpos+5, baseline+20)
        if(i>0){
          stroke(255,50,100,180)
          strokeWeight(4)
          line(xpos+5,height+baseline,xposPrev+5,heightPrev+baseline)
          noStroke()
        }
      heightPrev = height
      xposPrev = xpos
    }
}

function makeDicts(){
  for (var i = 0; i < 435; i++) {
    if (i < 285) {
      tWords.push(tnbData[i].TRUMP)
    }
    bWords.push(tnbData[i].OBAMA)

  }

  var dict = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']

  dict.forEach(function (letter,i){
    var xPos = (i * 15) + xOff+5
    fill(200)
    text(letter,xPos, baseline+30)
  })
}

function firstChars(){
    tWords.forEach(function(word) {
      var string = word.toString();
      var firstchar = string.charAt(0);
      tIfs(firstchar)
    })

    bWords.forEach(function(word) {
      var string = word.toString();
      var firstchar = string.charAt(0);
      bIfs(firstchar);
    })
}

var bA=0
var bB=0
var bC=0
var bD=0
var bE=0
var bF=0
var bG=0
var bH=0
var bF=0
var bG=0
var bH=0
var bI=0
var bJ=0
var bK=0
var bL=0
var bM=0
var bN=0
var bO=0
var bP=0
var bQ=0
var bR=0
var bS=0
var bT=0
var bU=0
var bV=0
var bW=0
var bX=0
var bY=0
var bZ=0

var tA = 0
var tB  =   0
var tC  =   0
var tD  =   0
var tE  =   0
var tF  =   0
var tG  =   0
var tH  =   0
var tF  =   0
var tG  =   0
var tH  =   0
var tI  =   0
var tJ  =   0
var tK  =   0
var tL  =   0
var tM  =   0
var tN  =   0
var tO  =   0
var tP  =   0
var tQ  =   0
var tR  =   0
var tS  =   0
var tT  =   0
var tU  =   0
var tV  =   0
var tW  =   0
var tX  =   0
var tY  =   0
var tZ  =   0

function tIfs (firstchar){
  if (firstchar == 'A'){
      tA +=1
   }
   if (firstchar == 'B'){
     tB +=1
  }
  if (firstchar == 'C'){
    tC +=1
  }
  if (firstchar == 'D'){
    tD +=1
  }
  if (firstchar == 'E'){
    tE +=1
  }
  if (firstchar == 'F'){
    tF +=1
  }
  if (firstchar == 'G'){
    tG +=1
  }
  if (firstchar == 'H'){
    tH +=1
  }
  if (firstchar == 'I'){
    tI +=1
  }
  if (firstchar == 'J'){
    tJ +=1
  }
  if (firstchar == 'K'){
    tK +=1
  }
  if (firstchar == 'L'){
    tL +=1
  }
  if (firstchar == 'M'){
    tM +=1
  }
  if (firstchar == 'N'){
    tN +=1
  }
  if (firstchar == 'O'){
    tO +=1
  }
  if (firstchar == 'P'){
    tP +=1
  }
  if (firstchar == 'Q'){
    tQ +=1
  }
  if (firstchar == 'R'){
    tR +=1
  }
  if (firstchar == 'S'){
    tS +=1
  }
  if (firstchar == 'T'){
    tT +=1
  }
  if (firstchar == 'U'){
    tU +=1
  }
  if (firstchar == 'V'){
    tV +=1
  }
  if (firstchar == 'W'){
    tW +=1
  }
  if (firstchar == 'X'){
    tX +=1
  }
  if (firstchar == 'Y'){
    tY +=1
  }
  if (firstchar == 'Z'){
    tZ +=1
  }
}

function bIfs(firstchar){
  if (firstchar == 'A'){
      bA +=1
   }
   if (firstchar == 'B'){
     bB +=1
  }
  if (firstchar == 'C'){
    bC +=1
  }
  if (firstchar == 'D'){
    bD +=1
  }
  if (firstchar == 'E'){
    bE +=1
  }
  if (firstchar == 'F'){
    bF +=1
  }
  if (firstchar == 'G'){
    bG +=1
  }
  if (firstchar == 'H'){
    bH +=1
  }
  if (firstchar == 'I'){
    bI +=1
  }
  if (firstchar == 'J'){
    bJ +=1
  }
  if (firstchar == 'K'){
    bK +=1
  }
  if (firstchar == 'L'){
    bL +=1
  }
  if (firstchar == 'M'){
    bM +=1
  }
  if (firstchar == 'N'){
    bN +=1
  }
  if (firstchar == 'O'){
    bO +=1
  }
  if (firstchar == 'P'){
    bP +=1
  }
  if (firstchar == 'Q'){
    bQ +=1
  }
  if (firstchar == 'R'){
    bR +=1
  }
  if (firstchar == 'S'){
    bS +=1
  }
  if (firstchar == 'T'){
    bT +=1
  }
  if (firstchar == 'U'){
    bU +=1
  }
  if (firstchar == 'V'){
    bV +=1
  }
  if (firstchar == 'W'){
    bW +=1
  }
  if (firstchar == 'X'){
    bX +=1
  }
  if (firstchar == 'Y'){
    bY +=1
  }
  if (firstchar == 'Z'){
    bZ +=1
  }
}

function pushChars(){
  tVals.push(tA)
  tVals.push(tB)
  tVals.push(tC)
  tVals.push(tD)
  tVals.push(tE)
  tVals.push(tF)
  tVals.push(tG)
  tVals.push(tH)
  tVals.push(tI)
  tVals.push(tJ)
  tVals.push(tK)
  tVals.push(tL)
  tVals.push(tM)
  tVals.push(tN)
  tVals.push(tO)
  tVals.push(tP)
  tVals.push(tQ)
  tVals.push(tR)
  tVals.push(tS)
  tVals.push(tT)
  tVals.push(tU)
  tVals.push(tV)
  tVals.push(tW)
  tVals.push(tX)
  tVals.push(tY)
  tVals.push(tZ)

  bVals.push(bA)
  bVals.push(bB)
  bVals.push(bC)
  bVals.push(bD)
  bVals.push(bE)
  bVals.push(bF)
  bVals.push(bG)
  bVals.push(bH)
  bVals.push(bI)
  bVals.push(bJ)
  bVals.push(bK)
  bVals.push(bL)
  bVals.push(bM)
  bVals.push(bN)
  bVals.push(bO)
  bVals.push(bP)
  bVals.push(bQ)
  bVals.push(bR)
  bVals.push(bS)
  bVals.push(bT)
  bVals.push(bU)
  bVals.push(bV)
  bVals.push(bW)
  bVals.push(bX)
  bVals.push(bY)
  bVals.push(bZ)
}

function count() {
  var  tAmount = tWords.length
  var  bAmount = bWords.length
  var  tVal = tAmount.toString()
  var  bVal = bAmount.toString()
  fill(255,50,150,120)
  textSize(24)
  text (tAmount,450,baseline + 20)
  fill(80,130,255,120)
  text(bAmount,450,baseline)
}

  

Simple Bar graph to visualize the data. This was a little confusing as far as the curve is concerned though.

So I connected the peaks with lines counted each entry, and added letters for clarity.

As you can see the curves are similar, with early letters and particularly C the highest for both, as well the later grouping of P, R, S, & T. I’d have thought Trumps G words would be more unique, but he just uses common words a lot apparently.

It also struck me that after T, trump has a total of 1 unique entry, for V, while Obama has 27. for U, V, W, and Y. To me this is yet another example of the eloquence we’ve lost, and the marketing we’ve gained.

Self Portrait with Data

September 25, 2018

For this assignment, I had a hard time determining what data set to use. Initially, I wanted to asses my personal health using MicroLife points, but eventually quit on the idea once I realized I both didn’t have enough time to do the project properly - it’s a project that depends on grounded research and dry observation. This felt a little unfun, especially since the project was so personal. For the same reason I wanted to do a more qualitative assessment. At the moment I was pondering this idea, I noticed that I had four nearly full sketchbooks that I’d held on to for the past several years. I decided to do something about them.

My methodology is simple - each page can have up to 3 dots, indicating the type of data on the page, and those dots can be smaller or larger, depending on the amount of that type of data. If the content spreads over two pages, such as a two page drawing or super large text, then the dot becomes more like a pill, spreading over two lines. If no content is on the page, a black line represents the absence of content. If content is related, such as a list or a diagram, then a green line connects those dots. Finally, if there is content that is cross or scribbled out on the page, then a black line strikes through that dot.

I was inspired by Dear Data, and by the t-shirt design presented in this video (15:25). Using data as an ‘Easter Egg.’

INSPIRATION

Dear Data was an analog data art project undertaken by a few designers on either side of the Atlantic sending postcards back and forth.

Screen Shot 2018-09-25 at 3.00.08 PM.png

The Eyeo 2013 t-shirt was made from data taken from participant tests. The test had to do with moral integrity in the workplace.

That data has driven this work, namely what kind of content is on my sketchbook pages, is not clear from looking at the final piece, but it is an interesting art piece, made from data about art.

My first draft, below, included some post analysis. The final draft is above.

Selected sketchbook pages

Immigrants to the U.S.

September 11, 2018

I really wanted to create a moving visual for this data set, as it relates to literal movement, as well as ‘individualize’ the numbers, as it relates to individuals. It is also temporal data, even though no temporal data is in the data set, so I sought to create temporal meaning. Not everyone moves simultaneously, they move over the course of a year, or years. In a continuous ever-changing stream.

My first data visualizations showed the numbers as static large pieces. Which does help to understand quickly which are the most prominent places, but I think while I should incorporate elements from those into later designs, I shouldn’t use all the elements. Movement is much nicer the heaviness - in this circumstance.

Screen Shot 2018-09-11 at 3.17.02 PM.png

Link to code

Link to code

Screen Shot 2018-09-07 at 8.09.17 PM.png