Part I
Passing Arguments in C++
- Parameters in C++ are passed by value by default. You can pass reference if you'd like.
- You don't change the object by passing its value.
- Use pass-by-const-reference for objects you don’t intend to change.
Thinking recursively
if (simple case) { // base cases solve it directly; return; } else { // recursive cases Split the problem into one or more smaller problems with the same structure as the original; Solve each of those smaller problems; Combine the results to get the overall solution; Return the overall solution; }
the following shows a recursive find-max-value-in-vector solution.
int maxOf(const Vector<int>& elems) {
if (elems.size() == 1) { // only has 1 element?
return elems[0]; // return the only element
} else { // has more than 1 element?
int first = elems[0]; // save the first element
Vector<int> rest = elems.subList(1); // recursion
return max(first, maxOf(rest)); // compare the first elem and the max of the rest
}
}
Part II : Stacks and Queues
Stack
One interesting use of stacks: balancing parentheses. Take the "maxOf()" as an example:
int maxOf(const Vector<int>& elems) {
...
}
what the compiler sees (a stack):
push "("
push "<"
pop "<" // sees ">"
pop "(" // sees ")"
push "{"
pop "{" // sees "}"
Part III : Lexicon, Set and Map
Lexicon : a container that stores a collection of words.
- designed to solve "Given a word, is it contained in the Lexicon?" efficiently.
- does not support access by index.
- does support “does this word exist?” or “do any words have this as a prefix?”
Set : an unordered collection of distinct elements.
- element duplicates aren’t allowed, so adding an existing item has no effect.
Map : a set of key->value pairs.
- each key is associated with a value.
The following code collects words in the given txt file by their first letters.
Lexicon english("EnglishWords.txt");
Map<char, Lexicon> wordsByFirstLetter;
for (string word: english) {
wordsByFirstLetter[word[0]] += word; // words[0] being the first letter
// usage: map[key] = value;
}
Assignment 2
Rosetta Stones
#include "RosettaStone.h"
#include "GUI/SimpleTest.h"
#include "priorityqueue.h"
using namespace std;
Map<string, double> kGramsIn(const string& str, int kGramLength) {
if (kGramLength <=0 ) {
error("invalid gram length!");
}
if (str.length() < kGramLength) {
return {};
}
Map<string,double> kGrams;
for (size_t i=0;i<=str.length()-kGramLength;i++) {
string gram = str.substr(i,kGramLength);
kGrams[gram] += 1;
}
return kGrams;
}
Map<string, double> normalize(const Map<string, double>& input) {
double freqSum=0;
Vector<double> freqs = input.values();
for (size_t i=0; i<input.size();i++) {
freqSum += pow(freqs[i],2);
}
if (freqSum == 0) {
error("values are all zeros in input?");
}
freqSum = sqrt(freqSum);
Map<string, double> result;
for (string key: input) {
result[key] = input[key]/freqSum;
}
return result;
}
Map<string, double> topKGramsIn(const Map<string, double>& source, int numToKeep) {
// if (numToKeep >= source.size()) {
// return source;
// }
PriorityQueue<string> pq;
for (string key: source) {
pq.enqueue(key,source.get(key));
}
Map<string, double> result;
for (int i=0;i<source.size()-numToKeep;i++) {
/* ^ im using int here
* numToKeep might exceed source.size()
* causing the UNSIGNED size_t to overflow.
*/
pq.dequeue();
}
for (size_t i=pq.size(); i>0;i--) {
auto dqKey = pq.dequeue();
result[dqKey] = source.get(dqKey);
}
return result;
}
double cosineSimilarityOf(const Map<string, double>& lhs, const Map<string, double>& rhs) {
const auto& shorterHs = lhs.size()>=rhs.size()?rhs:lhs;
const auto& longerHs = lhs.size()<rhs.size()?rhs:lhs;
double similarity = 0;
for (string key: shorterHs) {
if (longerHs.containsKey(key)){
similarity += shorterHs.get(key)*longerHs.get(key);
}
}
return similarity;
}
string guessLanguageOf(const Map<string, double>& textProfile,
const Set<Corpus>& corpora) {
PriorityQueue<string> pq;
for (Corpus c: corpora) {
pq.enqueue(c.name, cosineSimilarityOf(textProfile, c.profile));
}
for (int i=pq.size()-1; i>0; i--){
pq.dequeue();
}
return pq.peek();
}