CS106B Lec04 Collections

26 年 1 月 21 日 星期三
563 字
3 分钟

Part I

Passing Arguments in C++

  • Parameters in C++ are passed by value by default. You can pass reference if you'd like.
    • You don't change the object by passing its value.
  • Use pass-by-const-reference for objects you don’t intend to change.

Thinking recursively

if (simple case) { // base cases solve it directly; return; } else { // recursive cases Split the problem into one or more smaller problems with the same structure as the original; Solve each of those smaller problems; Combine the results to get the overall solution; Return the overall solution; }

the following shows a recursive find-max-value-in-vector solution.

cpp
int maxOf(const Vector<int>& elems) {
	if (elems.size() == 1) {  // only has 1 element?
		return elems[0];  // return the only element
	} else {  // has more than 1 element?
		int first = elems[0]; // save the first element
		Vector<int> rest = elems.subList(1);  // recursion
		return max(first, maxOf(rest));  // compare the first elem and the max of the rest
	}
}

Part II : Stacks and Queues

Stack

One interesting use of stacks: balancing parentheses. Take the "maxOf()" as an example:

cpp
 int maxOf(const Vector<int>& elems) {
	 ...
 }

what the compiler sees (a stack):

text
push "("
push "<"
pop "<"  // sees ">"
pop "("  // sees ")"
push "{"
pop "{"  // sees "}"

Part III : Lexicon, Set and Map

Lexicon : a container that stores a collection of words.

  • designed to solve "Given a word, is it contained in the Lexicon?" efficiently.
  • does not support access by index.
  • does support “does this word exist?” or “do any words have this as a prefix?”

Set : an unordered collection of distinct elements.

  • element duplicates aren’t allowed, so adding an existing item has no effect.

Map : a set of key->value pairs.

  • each key is associated with a value.

The following code collects words in the given txt file by their first letters.

cpp
Lexicon english("EnglishWords.txt");
Map<char, Lexicon> wordsByFirstLetter;
for (string word: english) {
	wordsByFirstLetter[word[0]] += word;  // words[0] being the first letter
	// usage: map[key] = value;
}

Assignment 2

Rosetta Stones

cpp
#include "RosettaStone.h"
#include "GUI/SimpleTest.h"
#include "priorityqueue.h"
using namespace std;

Map<string, double> kGramsIn(const string& str, int kGramLength) {
    if (kGramLength <=0 ) {
        error("invalid gram length!");
    }
    if (str.length() < kGramLength) {
        return {};
    }

    Map<string,double> kGrams;

    for (size_t i=0;i<=str.length()-kGramLength;i++) {
        string gram = str.substr(i,kGramLength);
        kGrams[gram] += 1;
    }

    return kGrams;
}

Map<string, double> normalize(const Map<string, double>& input) {
    double freqSum=0;
    Vector<double> freqs = input.values();

    for (size_t i=0; i<input.size();i++) {
        freqSum += pow(freqs[i],2);
    }
    if (freqSum == 0) {
        error("values are all zeros in input?");
    }
    freqSum = sqrt(freqSum);

    Map<string, double> result;
    for (string key: input) {
        result[key] = input[key]/freqSum;
    }
    return result;
}

Map<string, double> topKGramsIn(const Map<string, double>& source, int numToKeep) {
    // if (numToKeep >= source.size()) {
    //     return source;
    // }
    PriorityQueue<string> pq;
    for (string key: source) {
        pq.enqueue(key,source.get(key));
    }
    Map<string, double> result;
    for (int i=0;i<source.size()-numToKeep;i++) {
    /*    ^ im using int here
     * numToKeep might exceed source.size()
     * causing the UNSIGNED size_t to overflow.
     */
        pq.dequeue();
    }
    for (size_t i=pq.size(); i>0;i--) {
        auto dqKey = pq.dequeue();
        result[dqKey] = source.get(dqKey);
    }

    return result;
}

double cosineSimilarityOf(const Map<string, double>& lhs, const Map<string, double>& rhs) {
    const auto& shorterHs = lhs.size()>=rhs.size()?rhs:lhs;
    const auto& longerHs = lhs.size()<rhs.size()?rhs:lhs;
    double similarity = 0;
    for (string key: shorterHs) {
        if (longerHs.containsKey(key)){
            similarity += shorterHs.get(key)*longerHs.get(key);
        }
    }
    return similarity;
}

string guessLanguageOf(const Map<string, double>& textProfile,
                       const Set<Corpus>& corpora) {
    PriorityQueue<string> pq;
    for (Corpus c: corpora) {
        pq.enqueue(c.name, cosineSimilarityOf(textProfile, c.profile));
    }
    for (int i=pq.size()-1; i>0; i--){
        pq.dequeue();
    }
    return pq.peek();
}

文章标题:CS106B Lec04 Collections

文章作者:xyla

文章链接:https://xylaalyx.github.io/posts/cs106b_lec04[复制]

最后修改时间:


Layer 1

商业转载请联系站长获得授权,非商业转载请注明本文出处及文章链接,您可以自由地在任何媒体以任何形式复制和分发作品,也可以修改和创作,但是分发衍生作品时必须采用相同的许可协议。
本文采用CC BY-NC-SA 4.0进行许可。